Z.ai Launches GLM 4.7 Flash

Graphical wallpaper of the Z.ai logo for GLM 4.7 Flash

GLM 4.7 Flash: Z.ai's Next-Generation AI Model Delivers Significant Performance Improvements

Z.ai has unveiled the lightweight GLM 4.7 Flash, a very powerful 30B-A3B MoE (Mixture of Experts) model that demonstrates substantial advancements across multiple benchmark categories. The new release showcases significant performance gains, particularly in coding, reasoning, and agent-based tasks, with large improvements over its predecessor GLM 4.6.

Key Performance Benchmarks

GLM 4.7 Flash delivers impressive results across critical evaluation metrics:

  • Core Coding: 73.8% (+5.8%) on SWE-bench
  • Multilingual Coding: 66.7% (+12.9%) on SWE-bench Multilingual
  • Terminal-based Tasks: 41% (+16.5%) improvement on Terminal Bench 2.0
  • Complex Reasoning: 42.8% (+12.4%) gain on the HLE benchmark

The model introduces advanced features like Interleaved Thinking, Preserved Thinking, and Turn-level Thinking, which enhance its ability to handle complex, multi-step tasks more effectively.

Deployment and Accessibility

GLM 4.7 Flash offers multiple deployment options for developers and researchers:

  • Available FREE on HuggingFace
  • Supports local inference via vLLM and SGLang frameworks
  • Accessible through Z.ai API platform
  • Worldwide access via OpenRouter

Developers can easily integrate the model using provided code snippets for transformers, vLLM, and SGLang, with comprehensive documentation available on the Z.ai platform.

Learn More About GLM 4.7 Flash

Explore more details about GLM 4.7 Flash through these resources: