Zhayr1 strikes with BitMamba-2-1B for consumer CPUs

Independent researcher Zhayr1 released BitMamba-2-1B on January 27, 2026, a 1-billion parameter language model designed for high efficiency. Trained on 150 billion tokens using Google Cloud TPU v6e hardware, this model achieves 63.3% accuracy on the ARC-Easy benchmark.
It uses a hybrid architecture that combines Mamba-2 with 1.58-bit quantization. What this means is that the model runs at roughly 53 tokens per second on a standard Intel i3 processor while using only 621 MB of RAM.
Core Features & Technical Capabilities
- Hybrid architecture combining Mamba-2 and BitNet b1.58.
- Ternary weights for reduced memory usage.
- Training dataset of 150 billion tokens including FineWeb-Edu and Cosmopedia.
- Consumer CPU optimization achieving 53 tokens per second on Intel i3.
- Low RAM usage of 621 MB for the 1B model.
Benchmark Results & Performance Metrics
The model shows strong scaling capabilities compared to the smaller 255M baseline. Accuracy on HellaSwag improved by 10.4% to reach 45.59%, while ARC-Easy accuracy increased by 7.8%.
The WikiText-2 perplexity score dropped from 51.69 in the smaller model to 29.62 in this 1B version. Independent benchmarks confirm these results were achieved zero-shot.
Expert Analysis & Developer Insights
The project addresses specific challenges in AI development today regarding memory and computation. The research paper states,
'The scaling of Large Language Models (LLMs) is traditionally constrained by the quadratic complexity of Transformers and the memory bandwidth bottleneck associated with high-precision weights.'
To solve this, the developer utilized a specific weight restriction method. The Hugging Face page notes that the model uses
'1.58-bit (weights {-1, 0, 1}).'
Learn more about BitMamba-2-1B?
- Read the full project paper on Zendoo.
- Access the BitMamba-2-1B model on Hugging Face.
- Source code on GitHub.