Tokenspeed Streams Fake Tokens To Let You Feel LLM Speed

    
        By vramkickedin    
     | 
    
            May 23, 2026 at 10:06 pm        
    
     | 
    
        2 min read

Coming across tokens-per-second benchmarks is easy, but truly understanding what "47 tok/s" feels like while you work is much harder. A new open-source tool called Tokenspeed solves this problem by streaming fake tokens at any speed you choose, letting you experience the real-world feel of different throughput rates.

Created by developer MikeVeerman, this Python script turns abstract LLM performance numbers into something you can finally see and judge for yourself. Veerman built the tool with zero dependencies beyond Python 3 and a real terminal emulator. The entire project is minimal, straightforward, and ready to run in seconds.

Three modes for different use cases

What it can simulate

Code mode with syntax-highlighted pseudo-code.
Text mode streaming Wikipedia-style prose.
Think mode mimicking reasoning model output.
Nine speed presets from 5 to 800 tok/s.
Instant speed nudging with plus/minus keys.
Approximated BPE tokenization for realistic chunking.
Simple controls including pause and quit.

Privacy-conscious professionals and serious hobbyists running local models will find immediate value here. Someone evaluating whether 10 tok/s feels usable or whether upgrading hardware for 60 tok/s matters can now directly compare these experiences. Small agencies choosing between API providers can also demonstrate streaming quality differences to clients without needing actual model access.

Why perception matters more than numbers

The tool intentionally highlights something benchmarks hide: code and prose feel dramatically different at identical token speeds. Code is more token-dense than English, so 30 tok/s of Python delivers far less visible content than 30 tok/s of chat responses. The developer notes that English averages roughly 1.3 tokens per word, meaning 30 tok/s produces about 23 words per second, but the same rate in code mode looks noticeably slower because identifiers frequently split across multiple tokens.

There are no future plans or known limitations mentioned in the repository, though the tokenization is explicitly approximate rather than matching any specific vendor encoder.

"The benchmark number is honest; the perceptual effect just varies a lot by content type, which is exactly the gap this tool exists to expose." — Source: GitHub

Project Links