Show HN: PyNIFE. 400-900× speedup for embedding-based retrieval pipelines
github.com·2h·
Discuss: Hacker News
Flag this post

pyNIFE

NIFE compresses large embedding models into static, drop-in replacements with up to 1000x faster query embedding (see benchmarks).

Features

  • 400-900x faster CPU query embedding
  • Fully aligned with their teacher models
  • Re-use your existing vector index

Table of contents

  1. Quickstart
  2. Installation
  3. Usage
  4. Rationale

Introduction

Nearly Inference Free Embedding (NIFE) models are static embedding models that are fully aligned with a much larger model. Because static models are so small and fast, NIFE allows you to:

  1. Speed up query time immensely: 200x embed time speed-up on CPU.
  2. Get away with using a much smaller memory/com…

Similar Posts

Loading similar posts...