aashirpersonal/semantic-coverage: Automated detection of knowledge gaps and blind spots in RAG vector stores.

🏳️ semantic-coverage

The "Code Coverage" tool for RAG Knowledge Bases. Automated detection of knowledge gaps, hallucination spots, and representation bias in Vector Databases.

🛑 The Problem

In software engineering, we track Code Coverage to prevent bugs. In AI engineering, we ship RAG (Retrieval Augmented Generation) systems without Semantic Coverage.

Engineers often don’t know:

Blind Spots: What are users asking that our Vector DB has zero context for?
Data Drift: How is user intent shifting away from our indexed documentation over time?
Hallucination Triggers: Which clusters of queries systematically yield low-confidence retrieval?

⚡ The Solution: `semantic-coverage`

This tool provides semantic observability by projecting bo…

🏳️ semantic-coverage

The "Code Coverage" tool for RAG Knowledge Bases. Automated detection of knowledge gaps, hallucination spots, and representation bias in Vector Databases.

🛑 The Problem

In software engineering, we track Code Coverage to prevent bugs. In AI engineering, we ship RAG (Retrieval Augmented Generation) systems without Semantic Coverage.

Engineers often don’t know:

Blind Spots: What are users asking that our Vector DB has zero context for?
Data Drift: How is user intent shifting away from our indexed documentation over time?
Hallucination Triggers: Which clusters of queries systematically yield low-confidence retrieval?

⚡ The Solution: `semantic-coverage`

This tool provides semantic observability by projecting both Documents (Knowledge) and User Queries (Intent) into a shared latent space (using UMAP). It then uses density-based clustering (HDBSCAN) to identify "Red Zones"—areas of high user density but low document density.

🛠️ Tech Stack

Math Engine: Sentence-Transformers (SBERT), UMAP, HDBSCAN, Scikit-Learn
Backend: FastAPI (Async inference)
Frontend: React + Vite, Plotly.js (Interactive Scatter Plots)
Extensibility: Plugin architecture for Vector DBs

🚀 Quick Start

1. Installation

git clone https://github.com/aashirpersonal/semantic-coverage.git
cd semantic-coverage

# Backend Setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Frontend Setup
cd frontend
npm install

2. Run the Stack

# Terminal 1: Backend
uvicorn app.main:app --reload

# Terminal 2: Frontend
npm run dev

3. Usage

Navigate to http://localhost:5173. Paste your JSON export of queries and documents. The system will auto-generate a "Gap Report" identifying missing topics.

🔌 Enterprise Connectors

semantic-coverage is designed to be database-agnostic. We support a plugin architecture for major Vector Stores:

from app.core.connectors import get_connector

# Connect to Pinecone
db = get_connector("pinecone", api_key="...", index_name="knowledge-base-v1")
docs = db.fetch_documents(limit=5000)

# Connect to ChromaDB
db = get_connector("chroma", collection_name="support_tickets")
docs = db.fetch_documents()

🏗️ Architecture

Ingestion: Text is converted to 384-dim embeddings (all-MiniLM-L6-v2).
Projection: High-dimensional vectors are reduced to 2D via UMAP.
Clustering: User queries are clustered to find distinct "Topics."
Gap Analysis: For each query cluster, we calculate the Centroid Distance to the nearest Document neighbor.
Scoring: Clusters exceeding the distance threshold (0.7) are flagged as blind_spot.

📜 License

MIT

🏳️ semantic-coverage

🛑 The Problem

⚡ The Solution: semantic-coverage

🏳️ semantic-coverage

🛑 The Problem

⚡ The Solution: semantic-coverage

🛠️ Tech Stack

🚀 Quick Start

1. Installation

2. Run the Stack

3. Usage

🔌 Enterprise Connectors

🏗️ Architecture

📜 License

Similar Posts

⚡ The Solution: `semantic-coverage`

⚡ The Solution: `semantic-coverage`