Meta Unlocks sapiens2 For Private Human Figure Mapping

A smooth featureless white mannequin torso covered in translucent digital mesh composed of hundreds of tiny glowing cyan nodes.

Meta researchers have released sapiens2, a vision model collection designed to track human figures in standard photos. The software identifies precise joint positions, maps anatomical sections, and estimates surface angles from flat pictures.

Trained on one billion images, the project solves the need for accurate digital figure analysis. Engineers designed it for local execution, keeping private images secure while removing external server fees.

Model Size: 20.5GB & VRAM GPU: requirements vary

Precise human mapping across multiple tasks

  • Tracks 308 distinct body landmarks including facial features and finger joints.
  • Generates clean separation masks for individual limbs.
  • Calculates directional angles needed for basic three-dimensional lighting effects.
  • Supports both standard and ultra-high resolution source files.

Independent studios can run these tools on personal workstations without uploading client photos externally. Creative teams also gain reliable tracking that previously required costly cloud subscriptions.

Updated training strategies and setup notes

The creators combined image reconstruction techniques with contrastive learning to capture broad shapes and fine details simultaneously. This approach improved joint tracking and reduced lighting errors compared to earlier software.

Installation requires recent Python and PyTorch builds before running any scripts. The largest variant demands powerful graphics cards and updated driver software, while smaller builds operate on modest hardware setups.

Read the technical paper on arXiv, download the pose weights via Hugging Face, or clone the source code on GitHub.