Dealignai unleashes Nemotron Cascade 2 30B A3B UNCENSORED JANG 2L

Nemotron Cascade 2 30B A3B UNCENSORED JANG 2L OR is a new large language model release designed for efficient local inference. The model uses a unique Cascade architecture with 30 billion total parameters but keeps only about 3 billion active during processing, making it significantly faster than traditional models of similar size.
Developed by Dealignai, this release targets users who need powerful AI capabilities without strict content filters. The 'UNCENSORED' designation means the model has fewer restrictions on generating responses, which appeals to researchers and developers working on diverse projects.
Model Size: 10GB & VRAM GPU: requirements vary
Key features and general benchmark performance
- Mixed quantization (8/6/2-bit) with 2.3-bit average for compact storage.
- Thinking mode toggle supported via ChatML format.
- HarmBench score of 99.7% indicating strong safety metrics.
- Speeds around 121 tokens per second on M3 Ultra hardware.
- Compatible with Mac systems having 16GB or more memory.
Users with consumer-grade hardware can run this model locally without needing enterprise equipment. The compact 10GB size makes it accessible for hobbyists and small agencies who want to experiment with uncensored models on standard computers.
Developer notes and limitations
The developer noted some unexpected results during testing. MMLU benchmark scores came in at 66.8%, which was lower than anticipated after the ablation process.
'Usually the MMLU scores go a little higher after ablation but I need to look into what went differently cuz the scores went down for both quants,'
the developer wrote in the project documentation.
Future plans include revisiting this model after completing work on Mistral 4. A larger 25-30GB equivalent version is also planned, which may offer different performance characteristics for users needing more capacity.
Download the Nemotron Cascade 2 30B A3B UNCENSORED JANG 2L OR model on Hugging Face.