JetBrains' Mellum2-12B-A2.5B-Thinking Thinks Out Loud On Your Machine

    
        By vramkickedin    
     | 
    
            June 15, 2026 at 1:44 pm        
    
     | 
    
        2 min read

JetBrains has released a new open-source AI model called Mellum2-12B-A2.5B-Thinking, a specialized reasoning tool that shows its work before giving you an answer. Unlike standard chatbots, this model wraps its logical steps inside special tags so you can follow its chain of thought, which helps with tough coding puzzles, multi-step planning, or complicated math problems. The model operates under an Apache 2.0 license, putting it fully in the hands of the community.

The team at JetBrains built this model to run efficiently on consumer-grade hardware, despite packing serious horsepower. They used a design where only 2.5 billion of the total 12 billion parameters are active at any given moment, dramatically reducing the memory and speed requirements. This approach makes advanced reasoning accessible for serious hobbyists and small agencies who want to run powerful AI locally without renting cloud instances.

Built for reasoning on local machines

Key Features

Explains reasoning inside think blocks.
Uses 2.5B active parameters per task.
Handles 128K tokens of context.
Runs efficiently on pro consumer GPUs.
Trains on RL with verifiable rewards.
Supports tool calling and function use.

Privacy-conscious professionals will find a natural fit here. You can handle sensitive debugging sessions without sending proprietary code to a third-party API, since everything stays on your own machine. Performance benchmarks show it trades blows with models nearly twice its active size, especially on coding tasks where LiveCodeBench results reach 75.1 percent. Adopters can get started quickly with vLLM or a simple Python client, making integration feel as straightforward as a few lines of code.

What went into the build

The model climbed a steep training ladder, digesting roughly 10.6 trillion tokens through a curriculum that shifted from broad web content toward highly technical code and math data. JetBrains applied supervised fine-tuning followed by reinforcement learning with verifiable rewards, a method that checks the steps rather than just grading the final answer. One compromise to keep in mind is that explicit reasoning traces create longer inference times, so for quick direct answers the Instruct variant remains the better choice.

"Mellum 2 is competitive with open-weight baselines in the 4B-14B range while running at the per-token compute of a 2.5B dense model." — Source: ArXiv Paper

Project Links