Z.ai Launches GLM 4.7 Flash

    
        By vramkickedin    
     | 
    
            January 27, 2026 at 3:27 pm        
    
     | 
    
        2 min read

GLM 4.7 Flash: Z.ai's Next-Generation AI Model Delivers Significant Performance Improvements

Z.ai has unveiled the lightweight GLM 4.7 Flash, a very powerful 30B-A3B MoE (Mixture of Experts) model that demonstrates substantial advancements across multiple benchmark categories. The new release showcases significant performance gains, particularly in coding, reasoning, and agent-based tasks, with large improvements over its predecessor GLM 4.6.

Key Performance Benchmarks

GLM 4.7 Flash delivers impressive results across critical evaluation metrics:

Core Coding: 73.8% (+5.8%) on SWE-bench
Multilingual Coding: 66.7% (+12.9%) on SWE-bench Multilingual
Terminal-based Tasks: 41% (+16.5%) improvement on Terminal Bench 2.0
Complex Reasoning: 42.8% (+12.4%) gain on the HLE benchmark

The model introduces advanced features like Interleaved Thinking, Preserved Thinking, and Turn-level Thinking, which enhance its ability to handle complex, multi-step tasks more effectively.

Deployment and Accessibility

GLM 4.7 Flash offers multiple deployment options for developers and researchers:

Available FREE on HuggingFace
Supports local inference via vLLM and SGLang frameworks
Accessible through Z.ai API platform
Worldwide access via OpenRouter

Developers can easily integrate the model using provided code snippets for transformers, vLLM, and SGLang, with comprehensive documentation available on the Z.ai platform.