SupraLabs Stretches Supra-1.5-50M-Base-exp Context Window Fivefold With New Pretraining Mix

SupraLabs has published Supra-1.5-50M-Base-exp, a continued pretraining update for their 50-million-parameter language model that stretches the usable context window fivefold. The release takes the original Supra-50M architecture and uses RoPE scaling with full-weight training to jump from 1,024 tokens to 5,120 tokens. This experimental base model is designed specifically as a foundation for future supervised fine-tuning and reinforcement learning projects.
The team SupraLabs who also released Supra-50m-Reasoning, continued training the model on a fresh 3-billion-token mix rather than starting from scratch. That mix deliberately blends 30% tool calling data, 30% ChatML conversations, 25% factual text from articles and essays, and 15% math and logic problems. The project ships alongside an Instruct fine-tune and GGUF quantized versions, making the entire family immediately usable on consumer hardware.
Context expansion and data mix
- Context length expanded from 1,024 to 5,120 tokens.
- Continued pretraining on 3 billion packed tokens.
- Data mix includes tool calling and ChatML.
- Same 50M parameter architecture and tokenizer.
- GGUF quantizations range from 1-bit to 32-bit.
- Instruct version uses Alpaca chat format.
- Raw and normalized inference show task-based differences.
Small-scale AI tinkerers and hobbyists can grab the GGUF files and run them locally through llama.cpp with a single command. Developers interested in supervised fine-tuning experiments now have a base model that understands tool calling and conversational formats natively, reducing the need for custom data preprocessing. The tiny footprint means even the full 32-bit version weighs just 208 megabytes, making it viable for embedded projects, rapid prototyping, and resource-constrained environments.
What the developers are noting
The team labels this release experimental and frames it as part of a larger initiative called Project Chimera. The Instruct variant scored a consistent 67.4 on BLiMP evaluations, with an unusual pattern emerging: science and factual questions performed better under raw inference, while math and logic tasks improved with normalized inference. Looking ahead, SupraLabs plans to release Supra-124M and Supra-350M families covering base, chat, reasoning, and coding capabilities, all under the Apache 2.0 license.
"The biggest upgrade is context. Supra-1.5 expands from 1,024 to 5,120 tokens using RoPE scaling, with continued pretraining on a 3B token mix of tool calling data, ChatML conversations, factual text, and math." — Source: Reddit