Shisa-Ai Hacks AMD GPUs With hipEngine To Run Massive AI Locally

hipEngine is a new local inference engine release for AMD RDNA3 GPUs that runs large language models without PyTorch. The v0.2.1 alpha from shisa-ai delivers fast, ROCm-native performance for Qwen 3.6 models using custom-tuned HIP kernels. It currently supports packed PARO and GGUF formats, and includes an OpenAI-compatible server.
shisa-ai built hipEngine from scratch, not by porting CUDA code, to unlock the full potential of consumer AMD hardware. The engine targets users who want high-speed, privacy-focused local AI without the memory bloat of PyTorch. By switching to an INT8 key‑value cache, hipEngine can squeeze a 256K context window for the Qwen 3.6 MoE model into under 24 GB of VRAM.
A torch-free runtime tuned for RDNA3
- No PyTorch needed for supported GPUs.
- Custom HIP kernels optimized for gfx1100/1151.
- Qwen 3.6 PARO and GGUF model support.
- INT8 KV cache to expand context length.
- OpenAI-compatible server with streaming replies.
- Four-axis plugin system for clean extension.
- CPU reference kernels for numerical checks.
Owners of AMD Radeon RX 7900 XTX, Radeon Pro W7900, or Strix Halo laptops can run large models fast and entirely offline. Small teams and agencies benefit from the OpenAI-compatible API that keeps data private and drops into existing toolchains. Users who push context limits will appreciate how a single 24 GB card can now hold a full 256K-token conversation.
Developer notes and alpha status
As v0.2.1 alpha, hipEngine only supports specific Qwen 3.5 and 3.6 models, and unsupported combinations fail immediately rather than falling back to a slow path. Loading GGUF models is slower (roughly 60 seconds) because of an on-load repacking step; the team is considering on-disk caching to improve startup. Future releases aim to add a CUDA SM86 backend and close the speed gap between the GGUF and packed PARO formats.
"It is built and distributed for anyone who has an AMD card that hasn't been living up to its compute potential." — Source: GitHub