Beamivalice Debuts PonyExl3 To Run Big AI Models On Macs

    
        By vramkickedin    
     | 
    
            June 30, 2026 at 3:16 pm        
    
     | 
    
        2 min read

PonyExl3 is a new tool that lets you run advanced compressed language models on Apple Silicon computers. It translates a specific high quality model format called Exl3 so it works natively on Mac chips. This allows large AI models to run efficiently without needing a massive graphics card.

Developer beamivalice created this project after testing a similar tool on an Nvidia graphics card. They wanted to see if they could get the same performance out of their Apple Silicon laptops. The result is a working port that brings this efficient model format to Mac hardware.

Project features and capabilities

Key Features

Exact Exl3 decode path with fused Metal
Full model loader for various architectures
One command HF to EXL3 converter
Verify gated speculative decoding support

This software is built for anyone running large language models on Mac computers. Users can fit larger models into their system memory while keeping fast generation speeds. It provides a way to run advanced AI locally without relying on external servers.

Developer notes and status

The project is currently in a Beta status and requires macOS on Apple Silicon along with Python version 3.14. Testing shows it can even surpass an RTX 4090 in decode speed for certain large models. The tool includes a simple one command process to convert standard model files into the Exl3 format.