Trending Model:#1Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1366kTrending Model:#2GLM-5.2zai-org⬇191kTrending Model:#3Unlimited-OCRbaidu⬇885kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇323kTrending Model:#5DeepSeek-V4-Pro-DSparkdeepseek-ai⬇9kTrending Model:#6gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇329kTrending Model:#7Ornith-1.0-9Bdeepreinforce-ai⬇64kTrending Model:#8Ornith-1.0-9B-GGUFdeepreinforce-ai⬇288kTrending Model:#9Qwen3.6-27B-NVFP4nvidia⬇94kTrending Model:#10Agents-A1InternScience⬇4kTrending Model:#1Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1366kTrending Model:#2GLM-5.2zai-org⬇191kTrending Model:#3Unlimited-OCRbaidu⬇885kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇323kTrending Model:#5DeepSeek-V4-Pro-DSparkdeepseek-ai⬇9kTrending Model:#6gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇329kTrending Model:#7Ornith-1.0-9Bdeepreinforce-ai⬇64kTrending Model:#8Ornith-1.0-9B-GGUFdeepreinforce-ai⬇288kTrending Model:#9Qwen3.6-27B-NVFP4nvidia⬇94kTrending Model:#10Agents-A1InternScience⬇4k

SupraLabs Stretches Supra-1.5-50M-Base-exp Context Window Fivefold With New Pretraining Mix

Small processor chip morphing into an expanding window frame visible light beams.

SupraLabs has published Supra-1.5-50M-Base-exp, a continued pretraining update for their 50-million-parameter language model that stretches the usable context window fivefold. The release takes the original Supra-50M architecture and uses RoPE scaling with full-weight training to jump from 1,024 tokens to 5,120 tokens. This experimental base model is designed specifically as a foundation for future supervised fine-tuning and reinforcement learning projects.

The team SupraLabs who also released Supra-50m-Reasoning, continued training the model on a fresh 3-billion-token mix rather than starting from scratch. That mix deliberately blends 30% tool calling data, 30% ChatML conversations, 25% factual text from articles and essays, and 15% math and logic problems. The project ships alongside an Instruct fine-tune and GGUF quantized versions, making the entire family immediately usable on consumer hardware.

Context expansion and data mix

Key changes from the original
  • Context length expanded from 1,024 to 5,120 tokens.
  • Continued pretraining on 3 billion packed tokens.
  • Data mix includes tool calling and ChatML.
  • Same 50M parameter architecture and tokenizer.
  • GGUF quantizations range from 1-bit to 32-bit.
  • Instruct version uses Alpaca chat format.
  • Raw and normalized inference show task-based differences.

Small-scale AI tinkerers and hobbyists can grab the GGUF files and run them locally through llama.cpp with a single command. Developers interested in supervised fine-tuning experiments now have a base model that understands tool calling and conversational formats natively, reducing the need for custom data preprocessing. The tiny footprint means even the full 32-bit version weighs just 208 megabytes, making it viable for embedded projects, rapid prototyping, and resource-constrained environments.

What the developers are noting

The team labels this release experimental and frames it as part of a larger initiative called Project Chimera. The Instruct variant scored a consistent 67.4 on BLiMP evaluations, with an unusual pattern emerging: science and factual questions performed better under raw inference, while math and logic tasks improved with normalized inference. Looking ahead, SupraLabs plans to release Supra-124M and Supra-350M families covering base, chat, reasoning, and coding capabilities, all under the Apache 2.0 license.

"The biggest upgrade is context. Supra-1.5 expands from 1,024 to 5,120 tokens using RoPE scaling, with continued pretraining on a 3B token mix of tool calling data, ChatML conversations, factual text, and math." — Source: Reddit