Nanbeige4.1-3B Bridges Reasoning and Agents in a Compact Model

A bunch of garbled text to appear as code

Nanbeige4.1-3B is a compact 3-billion parameter language model designed to handle reasoning, code generation, and agentic tasks in one package. The model performs multi-step problem solving while maintaining alignment with human preferences.

Built by the Nanbeige team as an evolution of their earlier Nanbeige4-3B-Thinking-2511, this release addresses a common limitation in small models. Most compact AI systems excel at either general reasoning or agent-style tasks, but rarely manage both well.

Model Size: 3B parameters & VRAM GPU: requirements vary

What this model brings to the table

  • Solves complex multi-step reasoning problems including advanced math competitions.
  • Handles over 500 consecutive tool calls for deep search and problem-solving workflows.
  • Outperforms larger models like Qwen3-30B-A3B on alignment benchmarks.
  • Generates code with high accuracy on LiveCodeBench-Pro benchmarks.
  • Supports native deep-search tasks previously only available in specialized agent models.

Developers working with limited GPU memory may find this model practical for building AI agents that need sustained tool interactions. Small teams building research assistants or coding helpers can run it locally without enterprise hardware, while privacy-conscious users get a capable reasoning model that stays on their own machine.

How the developers approached Nanbeige4.1-3B

The team combined point-wise and pair-wise reward modeling during training to improve both reasoning quality and preference alignment. For code generation, they implemented complexity-aware rewards in reinforcement learning to optimize correctness and efficiency simultaneously. The researchers note that this model:

'fills a long-standing gap in the small-model ecosystem where models are typically optimized for either general reasoning or agentic scenarios, but rarely excel at both.'

They also acknowledge standard limitations: despite safety efforts during training, the model's probabilistic nature means it may occasionally produce unexpected or biased outputs.

Recommended inference settings include a temperature of 0.6 and top-p of 0.95 for balanced responses.

Download Nanbeige4.1-3B from Hugging Face or read the full research paper on arXiv.