LiquidAI LFM2.5-350M Brings Speed to Small Devices

Macro close-up of a small translucent azure droplet suspended in the center of the frame representing a compact model.

LFM2.5-350M is a compact AI model created by Liquid AI for on-device deployment across various hardware platforms. The 350-million parameter text-only model delivers competitive performance for data extraction and structured outputs while maintaining a small memory footprint.

Liquid AI developed this model to address a specific need: running capable AI on devices with limited compute resources. With training scaled up to 28 trillion tokens, it aims to match larger models while staying efficient enough for edge devices.

Model Size: <1GB & VRAM GPU: requirements vary

Built for constrained environments

  • Runs at 313 tokens per second on AMD CPU and 188 tokens per second on Snapdragon Gen4.
  • Operates under 1GB of memory with quantized versions under 500MB.
  • Supports context length of 32,768 tokens for longer conversations.
  • Compatible with llama.cpp, MLX, and vLLM frameworks out of the box.
  • Available in ONNX, OpenVINO, MLX, and native Transformers formats.
  • Supports function calling for agentic workflows and tool use.

Teams building mobile applications or embedded systems can benefit from this model's small footprint and cross-platform compatibility. The tool use capabilities make it suitable for automating data extraction tasks and creating structured outputs without relying on cloud services.

Intended use cases and limitations

Liquid AI has been clear about what this model should and should not be used for. The company recommends it for data extraction, structured outputs, and tool use scenarios. They explicitly advise against using it for knowledge-intensive tasks or programming, where larger models would perform better.

The model supports multiple inference frameworks and hardware configurations, making it flexible for different deployment scenarios.

'At <500MB when quantized, it is built for environments where compute, memory, and latency are particularly constrained,'

explained in a community post.

This compact model offers a practical option for projects requiring local AI with limited hardware resources. Download the model checkpoint on Hugging Face.