Z.ai Launches GLM 4.7 Flash

GLM 4.7 Flash: Z.ai's Next-Generation AI Model Delivers Significant Performance Improvements
Z.ai has unveiled the lightweight GLM 4.7 Flash, a very powerful 30B-A3B MoE (Mixture of Experts) model that demonstrates substantial advancements across multiple benchmark categories. The new release showcases significant performance gains, particularly in coding, reasoning, and agent-based tasks, with large improvements over its predecessor GLM 4.6.
Key Performance Benchmarks
GLM 4.7 Flash delivers impressive results across critical evaluation metrics:
- Core Coding: 73.8% (+5.8%) on SWE-bench
- Multilingual Coding: 66.7% (+12.9%) on SWE-bench Multilingual
- Terminal-based Tasks: 41% (+16.5%) improvement on Terminal Bench 2.0
- Complex Reasoning: 42.8% (+12.4%) gain on the HLE benchmark
The model introduces advanced features like Interleaved Thinking, Preserved Thinking, and Turn-level Thinking, which enhance its ability to handle complex, multi-step tasks more effectively.
Deployment and Accessibility
GLM 4.7 Flash offers multiple deployment options for developers and researchers:
- Available FREE on HuggingFace
- Supports local inference via vLLM and SGLang frameworks
- Accessible through Z.ai API platform
- Worldwide access via OpenRouter
Developers can easily integrate the model using provided code snippets for transformers, vLLM, and SGLang, with comprehensive documentation available on the Z.ai platform.
Learn More About GLM 4.7 Flash
Explore more details about GLM 4.7 Flash through these resources:
- Z.ai Project Page: Here
- HuggingFace Model: GLM-4.7-Flash
- Unsloth's GGUFs: GGUFs Here