Nex-N2-mini-Turbo-Phase-Twin By Frosty40 Runs Local AI Fast

Nex-N2-mini-Turbo-Phase-Twin is a new release that brings a large language model to Intel Arc graphics cards. It packages the Qwen3.5-35B-A3B model into a single file that holds two different precision levels. Users can choose which precision to load without downloading anything extra.
Developer Frosty40 created this project to support local AI on Intel hardware. They tuned the model to run efficiently on Intel Arc GPUs by using a custom build of llama.cpp. This approach lets the software detect hardware and pick the right settings for optimal speed.
Features and hardware compatibility
- Tuned specifically for Intel Arc GPUs.
- Carries two expert precision levels in one.
- Runs multimodal reasoning and vision tasks.
- Supports single or dual graphics card setups.
- Includes automatic hardware detection and calibration.
This tool is for people running local AI tasks on Intel graphics hardware who want fast performance. It benefits those who need multimodal reasoning and vision capabilities directly on their own machines. Users with one or two 16GB graphics cards can expect high token speeds without relying on cloud services.
Important setup details
The software requires a custom llama.cpp build because standard versions cannot load this specific file type. Users must disable prompt cache and context checkpoints or the model will become incoherent after the first turn. A helpful launcher script is included to automatically detect graphics memory and adjust the setup accordingly.
"Made this for the A770 crowd... reports are good, its setup to work well with one card, and really come into its own with 2." - Source: Reddit