Trending Model:#1Unlimited-OCRbaidu⬇758kTrending Model:#2Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1251kTrending Model:#3GLM-5.2zai-org⬇176kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇285kTrending Model:#5Ornith-1.0-9B-GGUFdeepreinforce-ai⬇255kTrending Model:#6Ornith-1.0-9Bdeepreinforce-ai⬇58kTrending Model:#7gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇314kTrending Model:#8Ornith-1.0-35Bdeepreinforce-ai⬇186kTrending Model:#9DeepSeek-V4-Pro-DSparkdeepseek-ai⬇8kTrending Model:#10Qwen-AgentWorld-35B-A3BQwen⬇39kTrending Model:#1Unlimited-OCRbaidu⬇758kTrending Model:#2Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1251kTrending Model:#3GLM-5.2zai-org⬇176kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇285kTrending Model:#5Ornith-1.0-9B-GGUFdeepreinforce-ai⬇255kTrending Model:#6Ornith-1.0-9Bdeepreinforce-ai⬇58kTrending Model:#7gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇314kTrending Model:#8Ornith-1.0-35Bdeepreinforce-ai⬇186kTrending Model:#9DeepSeek-V4-Pro-DSparkdeepseek-ai⬇8kTrending Model:#10Qwen-AgentWorld-35B-A3BQwen⬇39k

Beamivalice Debuts PonyExl3 To Run Big AI Models On Macs

Digital graphic of a galloping pony figurine consists of sleek brushed aluminum and glowing silicon wafers.

PonyExl3 is a new tool that lets you run advanced compressed language models on Apple Silicon computers. It translates a specific high quality model format called Exl3 so it works natively on Mac chips. This allows large AI models to run efficiently without needing a massive graphics card.

Developer beamivalice created this project after testing a similar tool on an Nvidia graphics card. They wanted to see if they could get the same performance out of their Apple Silicon laptops. The result is a working port that brings this efficient model format to Mac hardware.

Project features and capabilities

Key Features
  • Exact Exl3 decode path with fused Metal
  • Full model loader for various architectures
  • One command HF to EXL3 converter
  • Verify gated speculative decoding support

This software is built for anyone running large language models on Mac computers. Users can fit larger models into their system memory while keeping fast generation speeds. It provides a way to run advanced AI locally without relying on external servers.

Developer notes and status

The project is currently in a Beta status and requires macOS on Apple Silicon along with Python version 3.14. Testing shows it can even surpass an RTX 4090 in decode speed for certain large models. The tool includes a simple one command process to convert standard model files into the Exl3 format.

I was playing with turboderp's exllamav3 in my RTX 4090 machine and I wonder why can't I run this on my M5/M1 Max - so I built one., Source: Reddit