The Voyage 4 model family: shared embedding space with MoE architecture

TL;DR – We’re excited to introduce the Voyage 4 series, a new generation of text embedding models featuring industry-first shared embedding spaces. The series includes voyage-4-large, voyage-4, voyage-4-lite, and the open-weighted voyage-4-nano. All models produce compatible embeddings, allowing customers to mix and match models for query and document embedding based on their specific accuracy, latency, and cost requirements. Furthermore, voyage-4-large leverages a mixture-of-experts (MoE) model architecture to deliver state-of-the-art retrieval accuracy while maintaining serving costs 40% lower than comparable dense models.

Today, we’re excited to announce the Voyage 4 model family. These models serve two key use cases: existing customers seeking more accurate retrieval, and developers building context-engineered agents that require high retrieval accuracy with low latency and cost for high-volume reads (e.g., from shared memory):

voyage-4-large. Our new flagship embedding model leveraging a mixture-of-experts (MoE) architecture to establish a new state-of-the-art while maintaining serving costs 40% lower than comparable dense models. This is the first production-grade embedding model to utilize MoE architecture.
voyage-4. Approaches the retrieval quality of voyage-3-large while maintaining the efficiency of a mid-sized model.
voyage-4-lite. Approaches the retrieval accuracy of voyage-3.5 while requiring significantly fewer parameters, enabling high-quality embeddings at significantly reduced computational cost.
voyage-4-nano. Our first open-weight model, freely available on Hugging Face under the Apache 2.0 license. voyage-4-nano is ideal for local development and prototyping with an easy path to production.

**A single shared embedding space. **The Voyage 4 series introduces an industry-first capability: shared embedding spaces. All four models produce compatible embeddings, meaning embeddings generated from different models can be used interchangeably. For example, query embeddings generated using voyage-4-lite can be used to search for document embeddings generated using voyage-4-large; we refer to the practice of vectorizing queries and documents with different models as asymmetric retrieval.

Asymmetric retrieval is most effective when the upfront cost of vectorizing your document corpus is small relative to the cumulative cost of vectorizing queries over time. This is typically the case in production systems: documents are embedded once (or infrequently updated), while queries are embedded continuously at serving time. By vectorizing documents with voyage-4-large and queries with a smaller model such as voyage-4-nano, voyage-4-lite, or voyage-4, you get the retrieval accuracy benefits of the larger model’s document representations while keeping per-query latency and cost low:

Loading more...