UniInfer Checks Hardware Fit Before You Download AI Models

60s art style of a large eye gazing into an open computer case

UniInfer is an open-source inference runtime that checks if an AI model fits your hardware before you download it. It calculates VRAM requirements and overhead to prevent out-of-memory errors before they happen.

Created by a solo developer, the tool aims to stop users from wasting bandwidth on models that their computers cannot run. It supports NVIDIA, AMD, Vulkan, and CPU hardware, making it versatile for many different setups.

Hardware management and compatibility

  • Calculates VRAM usage including model size, cache, and overhead.
  • Shows quantization options that fit your specific GPU.
  • Downloads the correct format automatically, such as GGUF or ONNX.
  • Features a built-in web dashboard with live metrics and a chat interface.
  • Offers an automatic hardware fallback if the primary device fails.

Pro consumers and privacy-conscious users can run models locally with confidence using this tool. The software acts as a drop-in replacement for OpenAI clients, allowing for easy integration into existing workflows without manual configuration.

Early development insights

The developer built this project to address a specific frustration with existing tools. Unlike Ollama, which may download a model before checking if it fits, UniInfer performs a validation check first. This prevents the annoyance of waiting for a download only to have the program crash immediately.

'I got tired of downloading 8GB models only to get a cryptic OOM crash,'

the developer said. The project is currently in an early stage and welcomes community input. Future updates will likely focus on the features that users find most valuable during this initial release period.

This tool provides a practical way to manage local AI models without technical guesswork. It streamlines the process of finding and running the right model for your specific machine.

You can try uniinfer on GitHub.