OpenAI Knowledge Retrieval
Config-first RAG starter kit that pairs OpenAI File Search, Chatkit, and Evals with a pluggable ingestion, retrieval, and evaluation toolkit.
Quickstart
Launch a working knowledge retrieval app in a few commands:
# 1. Backend environment
python3 -m venv .venv
source .venv/bin/activate
make dev # or: make install
# 2. Credentials
cp .env.example .env
echo "OPENAI_API_KEY=sk-..." >> .env
# 3. Update configuration
# Edit the configuration in configs/default.openai.yaml to point to your documents.
# 4. Ingest corpus
make ingest
# 5. Run backend + frontend (in separate terminals)
make run-app-backend
# 6. Run frontend (in another terminal)
make run-app-frontend
# 7. Open <http://localhost:5172> to chat with the assistant. All answe...
OpenAI Knowledge Retrieval
Config-first RAG starter kit that pairs OpenAI File Search, Chatkit, and Evals with a pluggable ingestion, retrieval, and evaluation toolkit.
Quickstart
Launch a working knowledge retrieval app in a few commands:
# 1. Backend environment
python3 -m venv .venv
source .venv/bin/activate
make dev # or: make install
# 2. Credentials
cp .env.example .env
echo "OPENAI_API_KEY=sk-..." >> .env
# 3. Update configuration
# Edit the configuration in configs/default.openai.yaml to point to your documents.
# 4. Ingest corpus
make ingest
# 5. Run backend + frontend (in separate terminals)
make run-app-backend
# 6. Run frontend (in another terminal)
make run-app-frontend
# 7. Open <http://localhost:5172> to chat with the assistant. All answers are grounded with citations that link back to the ingested documents.
# 8. Evals: Generate synthetic data for evaluation and upload to OpenAI Evals
make eval
Features
- YAML-first configuration โ define ingestion, retrieval, and synthesis behavior without editing code.
- Multiple vector stores โ ship with OpenAI File Search or go custom with a local Qdrant DB.
- Typed pipelines โ ingestion, retrieval, and synthesis flows built with Pydantic models and Typer/FastAPI services.
- ChatKit UI โ Leverage Chatkit components for interacting with the knowledge base.
- Evaluation harness โ Generate synthetic data and run evals with OpenAI Evals or local grading.
Repository Structure
cli/โ Typer CLI (rag) entrypoint and config parsing.ingestion/โ loaders, chunkers, preprocessors, and orchestrated pipelines.retrieval/โ query expansion, filtering, reranking, and response assembly.stores/โ adapters for OpenAI File Search and custom vector stores (Qdrant example included).app/backend/โ FastAPI application that powers the chat API.app/frontend/โ Vite + React UI using Tailwind CSS and ChatKit components.configs/โ starter default YAMLs you can copy and customize.evals/โ evaluation harness, dataset generation, and reporters.templates/โ additional templates for different vector stores, chunking strategies, and retrieval methods.prompts/โ prompts for the different aspects of the knowledge retrieval workflow (query, expansion, reranking, local judging, etc.)
Pipeline Setup
To do a more custom implementation beyond the quickstart, you can follow the steps below.
1. Prerequisites
- Python 3.10 or later
- Node.js 18.18 or later (for the optional web UI)
- An OpenAI API key (
OPENAI_API_KEY) with access to File Search and Evals
2. Install backend dependencies
python3 -m venv .venv
source .venv/bin/activate
make dev # or: make install
3. Configure credentials
Copy .env.example to .env and fill in the values that apply to your environment. At minimum set OPENAI_API_KEY. Optional variables:
OPENAI_ORG_ID=...
OPENAI_PROJECT_ID=...
VECTOR_STORE_ID=vs_...
4. Create a configuration
Use the CLI to copy a YAML config template into configs/ (replace my with a project name):
rag init --template openai > configs/my.openai.yaml
Templates include openai and custom-qdrant, along with different options for chunking data. Adjust data.paths to point at the documents you want to ingest. Example datasets live in data/example_data/.
Add your config to the RAG_CONFIG environment variable.
5. (Optional) Spin up your qdrant vector store in docker
make qdrant-up
This will start a local Qdrant instance on port 6333.
6. Ingest your knowledge base
rag ingest --config configs/my.openai.yaml
The CLI will create or reuse a vector store, upload documents, and print the resulting vector_store_id. Persist that value in your .env.
7. Launch the experience
Start the FastAPI backend (reads RAG_CONFIG, defaulting to configs/default.openai.yaml):
make run-app-backend
Start the web UI in another terminal:
make run-app-frontend
The UI runs at http://localhost:5172 by default and communicates with the backend at http://localhost:8000.
8. Evaluate retrieval quality
rag eval --config configs/my.openai.yaml
The harness can either use an existing dataset (evals.mode: user) or synthesize one (evals.mode: auto) When evals.openai_evals.enabled: true, results are mirrored to the OpenAI Evals dashboard.
Configuration & Customization
Every aspect of the pipeline is driven by YAML. Use rag init --template <name> to scaffold a config, then customize the sections described below.
Templates
openaiโ Default template backed by OpenAI File Search. Works out of the box with the supplied example dataset.custom-qdrantโ Connects to a locally hosted Qdrant instance. Pair withmake qdrant-upto start Docker services automatically.
Add --chunking <strategy> when scaffolding to pick a predefined chunking approach (see next section).
Chunking strategies
Chunking is controlled under chunking in the config:
recursiveโ Recursively backs off from headings to paragraphs, sentences, then tokens to hit the target window.headingโ Uses heading hierarchy (#, ALL CAPS, numbered headings) to keep coherent sections.hybridโ Falls back to recursive when headings are missing (default).xml_awareโ Purpose-built for XML manuals; splits on semantic tags and converts tables to Markdown.customโ Load a custom chunker viacustom_chunker.{module_path,class_name,init_args}.
Common parameters:
target_token_range: [min, max]โ Token window to aim for before chunking stops.overlap_tokensโ Overlap between adjacent chunks to maintain context.rulesโ Strategy-specific options (heading detection hints, XML tag boundaries, etc.).
Vector store configuration
Select your store with vector_store.backend:
openai_file_search โ Managed vector store that handles uploads, dedupe, and metadata. Configure:
vector_store_namevector_store_id(reuse an existing store instead of creating a new one)expiry_days,chunking.max_chunk_size_tokens,chunking.chunk_overlap_tokens
custom โ Bring your own implementation. The included Qdrant adapter is enabled with:
vector_store:
backend: custom
custom:
kind: qdrant
qdrant:
url: http://localhost:6333
collection: my_collection
distance: cosine
ef: 64
m: 32
You can reference your own module by setting custom.kind: plugin and providing plugin.{module_path,class_name,init_args}. This will allow you to supply any vector db of your choice.
Retrieval pipeline
Four optional stages live under query:
query.expansion.enabledโ Uses an LLM to generate alternate phrasings. Configuremodel,prompt_path,variants, andstyle.query.hyde.enabledโ Adds HyDE (Hypothetical Document Embeddings); an LLM writes a synthetic passage that is embedded alongside the query.query.similarity_filter.enabledโ Drop results below a similaritythresholdbefore reranking.query.rerank.enabledโ Apply an LLM cross-encoder reranker; configuremodel,prompt_path,max_candidates, andscore_threshold.
Prompts live under prompts/expansion/, prompts/hyde/, and prompts/rerank/. Edit those files to change the guidance without touching code.
Response synthesis
Under synthesis you can set:
modelโ Response model (e.g.,gpt-4o-mini,o4-mini-high, etc.).system_promptโ Inline string or a path to a prompt file (default lives atprompts/system/assistant.md).structured_outputsโ Enable schema-based responses.reasoning_effortโ Hint the model on effort (low,medium,high).
Environment fallbacks
Use the env block to specify default values for credentials such as OPENAI_API_KEY, OPENAI_PROJECT_ID, VECTOR_STORE_ID, etc. Actual environment variables override these when set.
Frontend UI
The UI is built with Vite, React 19, Tailwind CSS, and @openai/chatkit-react. Helpful commands:
npm run dev # start dev server with hot reload
npm run build # create production bundle
npm run preview # preview the production build locally
npm run lint # eslint + typescript checks
Evaluations & OpenAI Evals
The evaluation harness supports both local grading and hosted OpenAI Evals.
Auto-generated datasets
Set evals.mode: auto to synthesize evaluation data from your corpus. The harness:
- Samples ingested chunks.
- Asks an LLM to create realistic user questions.
- Stores the dataset under
evals/datasets/(JSONL format).
Re-run the command to expand coverage or refresh stale questions.
Curated datasets
Switch to evals.mode: user and provide evals.dataset_path. Each JSONL record should include:
idโ unique identifier.questionโ the user query.citation_textโ supporting passage used for groundedness.correct_answer: ideal answer used for EM/F1 scoring (required)- Metadata (e.g.,
source_id,page,char_start,char_end,difficulty,tags).
See evals/datasets/schema.py for the full schema; the CLI validates these fields before each run and canonicalizes answers internally for consistent scoring.
Mirroring to OpenAI Evals
Keep evals.openai_evals.enabled: true to sync local runs to the hosted platform. Configure:
gradersโ choose groundedness, relevance, or custom rubrics.run_nameโ friendly label for dashboards.organization_id/project_idโ override defaults when needed.
Results are printed locally and reports are written to evals/reports/ (Markdown, HTML, and any OpenAI Evals link stubs).
Contributing
We welcome community contributions! Please read CONTRIBUTING.md for guidance on how to file issues, propose changes, and follow our coding standards.
Security
If you believe you have discovered a vulnerability, please refer to SECURITY.md for responsible disclosure instructions. Do not open a public issue for security reports.
License
This project is licensed under the MIT License.
Acknowledgements
- FastAPI for the backend framework.
- ChatKit components for the chat UI.
- Lucide icons (MIT licensed).
- Tailwind CSS utilities for styling.