Meituan Debuts LongCat-Flash-Prover for Formal Math Proofs

LongCat-Flash-Prover is a 560-billion-parameter open-source model designed to handle formal mathematical reasoning. It uses a Mixture-of-Experts architecture to perform tasks like writing formal proofs, translating informal math problems into formal language, and generating proof sketches in Lean4.
Developed by Meituan, this tool addresses the challenge of automated theorem proving by breaking the process into three distinct capabilities: auto-formalization, sketching, and proving. The model interacts directly with Lean4 tools to verify its work, which helps ensure accuracy throughout the reasoning process.
Model Size: 560B parameters & VRAM GPU: requirements vary
Core capabilities for formal reasoning
- Auto-formalization converts natural language math problems into verified formal statements.
- Agentic sketching generates lemma-style outlines to structure complex proofs.
- Agentic proving creates complete proofs or helper lemmas for target theorems.
- Tool-integrated reasoning allows direct interaction with Lean4 for real-time verification.
- Hierarchical Importance Sampling Policy Optimization stabilizes training on long tasks.
Researchers and students working in formal mathematics can use this model to automate tedious proof-writing tasks. Small teams exploring automated reasoning may also find it helpful for testing hypotheses without building custom infrastructure from scratch.
Development approach and training methods
The team created a Hybrid-Experts Iteration Framework to generate high-quality training data. This approach uses specialized expert models for different tasks, each refined through trial and verification cycles similar to how humans learn from mistakes.
The developers note that the model is
'custom-optimized for mathematical and formal theorem proofs'
and recommend against using it as a general-purpose conversational AI. They have also implemented safeguards against reward hacking, where models might try to game the system rather than produce valid proofs. The model retains general reasoning abilities while specializing in formal tasks.
Get LongCat-Flash-Prover on Hugging Face. Read the full details in the research paper.