TL;DR
We introduce RexRerankers, a family of state-of-the-art rerankers that estimate how relevant an e-commerce product is for a given query. We open-source Amazebay, a large-scale dataset collection for training and evaluating product relevance models:
- Amazebay-Catalog: product metadata for 37M items across categories
- Amazebay-Relevance: 6M query–product pairs with graded relevance scores, covering ~364k unique queries and ~3M products
For holistic evaluation of product discovery rerankers, we also release ERESS (E-commerce Relevance Evaluation Scoring Suite): 4.7k unique queries and 72k labeled query–product pairs designed to reflect real shopping search behavior.
Finally, we open-source a training recipe for efficient, high-performing rankers using a Distributional-Pointwise Loss that treats annotation noise as signal rather than purely as error-improving robustness and calibration in real-world relevance modeling.
Introduction
Search in modern systems is a multi-stage decision pipeline optimized for speed, relevance, and user satisfaction. Whether you’re building web search, enterprise search, or product search, the dominant architecture is:
- Candidate generation (retrieval): quickly fetch a few hundred to a few thousand potentially relevant items from millions
- Reranking: apply a stronger model to reorder those candidates by relevance
- Post-processing & business logic: enforce constraints (availability, compliance, diversity), personalize, and format results
E-commerce search looks like "search" but the definition of relevance is richer and more constrained. A product can match the query text and still be a bad result due to:
- Variant and attribute mismatch: size, color, material, compatibility, fit
- Category intent: "running shoes" vs "shoe laces," "sofa" vs "sofa cover"
- Brand sensitivity: explicit ("Nike"), implicit ("Apple charger"), or excluded ("no ads," "non-branded")
- Query language is messy: shorthand, typos, multi-intent queries, and colloquial attributes ("work bag that fits 16 inch laptop")
RexRerankers were built for this modern product discovery setting: high-recall retrieval + strong reranking, optimized for e-commerce semantics. The goal is to make reranking models that are:
- Accurate on fine-grained product relevance
- Robust to noisy or ambiguous supervision
- Practical to deploy with latency and cost constraints
- Capable of handling indirect utility queries