Ettin-Reranker-1b-V1 Delivers Speedy Relevancy Checks Locally

A faceted dodecahedron made of frosted glass and brushed metal with a gentle stream of semi-transparent document cards.

The new cross-encoder, Ettin-Reranker-1b-V1, scores pairs of text to reassess search results and boost retrieval quality. It is a 1-billion-parameter transformer that handles sequences up to 7,999 tokens long. The model uses a ModernBERT backbone and was fine-tuned on 143 million query-passage pairs for pointwise relevance scoring.

Tom Aarsen developed the Ettin Reranker family, with this 1B version striking a balance between accuracy and hardware demands. They trained it in bfloat16 with Flash Attention 2 for one epoch on the large-scale `ettin-reranker-v1-data` dataset. The release targets anyone who wants to replace massive cloud rerankers with a checkpoint that still reaches an NDCG@10 of 0.6114 on the MTEB retrieval benchmark.

Fast, local-friendly reranking with ModernBERT

Key highlights
  • 1B-parameter cross-encoder built on ModernBERT architecture.
  • Supports inputs up to 7,999 tokens in length.
  • Delivers 189 text pairs per second on an RTX 3090.
  • MTEB retrieval mean NDCG@10 of 0.6114.
  • Trained on 143 million high-quality (query, document) pairs.
  • Requires Flash Attention 2 for best throughput.
  • Licensed under Apache 2.0 for commercial use.
  • Runs easily via the Sentence Transformers library.

Professionals who keep all data on-premises will benefit the most, as the model processes documents without a network call. Small agencies can blend it into local retrieval pipelines and skip recurring cloud expenses. Hobbyists with a single RTX 3090 get a reranker that is fast enough for interactive use while still catching subtle relevance signals.

What developers should know

Training took 20 hours on a single high-end GPU, and the checkpoint was saved after exactly one epoch, so further task-specific fine-tuning is viable. Without bfloat16 or Flash Attention 2, throughput drops noticeably; on a consumer card, the best configuration yields 189 pairs per second, while CPU-only inference falls to about two pairs per second. The model was evaluated against 13 other public rerankers and landed ahead of similarly sized alternatives, though the 4B Qwen3-based teacher still holds the top spot.

"It computes scores for pairs of texts, which can be used for text reranking and semantic search." — Source: Hugging Face