Tencent HY-Embodied-0.5 Grants Robots Spatial Intelligence

Tencent has released HY-Embodied-0.5, an open-source toolkit that improves spatial awareness and planning for physical robots. The system blends image recognition with logical steps so machines can safely interact with real-world objects.
Engineers at Tencent Robotics X created this framework to connect standard vision processors with actual hardware control. Operators can now run the software locally to avoid external servers and paid cloud services.
Model Size: 8GB & VRAM GPU: 16GB required
HY-Embodied-0.5 technical capabilities
- Separates image and text processing to lower power consumption.
- Recognizes detailed 3D layouts from millions of training samples.
- Shrinks advanced reasoning from a larger model into a faster compact version.
- Connects directly to robot pipelines to coordinate physical movements.
- Handles multiple prompts at once for faster data analysis.
Teams testing automation scripts or building independent navigation systems will benefit from this lean architecture. Independent developers can run hardware simulations locally while keeping response times steady during active testing phases.
Installation and hardware notes
The software requires a Linux setup, Python 3.12, and CUDA 12.6 for proper operation.
"THY-Embodied serves as a robust "brain" for Vision-Language-Action (VLA) pipelines"
noted the creators in the official Hugging Face page.
Users must install a specific Transformers fork until updates reach the main branch. Developers also suggest disabling complex reasoning modes during initial checks to maintain steady processing speeds.
Review the research details in the technical paper or grab the official weights from Hugging Face.