Microsoft Launches harrier-oss-v1 Multilingual Language Models

Microsoft has released harrier-oss-v1, a new family of multilingual text embedding models. These models convert text into dense mathematical representations to help computers understand meaning across different languages.
This release includes three different model sizes to suit various hardware setups. Microsoft designed these tools to handle complex language tasks while maintaining high accuracy.
Model Size: 27B & VRAM GPU: requirements vary
Multilingual text processing capabilities
- Supports a wide range of global languages including English, Chinese, Arabic, and Spanish.
- Utilizes a decoder-only architecture for efficient processing.
- Handles very long sequences with a maximum token limit of 32,768.
- Works across diverse tasks like document retrieval, clustering, and classification.
- Features pre-configured prompts for specific tasks like web searches.
- Uses last-token pooling and L2 normalization to create embeddings.
Small agencies running local AI setups may find these models useful for organizing large private datasets or building custom search engines. Because the models support many languages, they are also helpful for businesses managing international documentation.
Technical implementation details
Users should note that the model requires specific prompting to work correctly. The system is trained to expect a one-sentence instruction attached to every query to define the task. If these instructions are omitted, you may notice a drop in how well the model performs.
When using the models, the developers suggest using a custom instruction such as
'Retrieve semantically similar text'
to guide the output.
'The task definition should be a one-sentence instruction that describes the task,'
noted the Microsoft team in their model documentation.