You’ve spent weeks polishing your prompts. You have set up a robust retrieval system. You validate every piece of data going into your context window.
And yet, your RAG (Retrieval-Augmented Generation) bot still confidently tells users things that are completely wrong.
It doesn’t happen often, but when it does, it destroys user trust. The problem with LLMs in production isn’t just getting them to answer; it’s knowing when they are lying (hallucinating).
Standard software engineering practices, like regex-based unit tests, don’t work on non-deterministic natural language output. We need a new layer in our stack.
Here is how I approached building a "Bullshit Detector" middleware using TypeScript, Node.js, and PostgreSQL with pgvector.
The Architecture Problem
A t…
You’ve spent weeks polishing your prompts. You have set up a robust retrieval system. You validate every piece of data going into your context window.
And yet, your RAG (Retrieval-Augmented Generation) bot still confidently tells users things that are completely wrong.
It doesn’t happen often, but when it does, it destroys user trust. The problem with LLMs in production isn’t just getting them to answer; it’s knowing when they are lying (hallucinating).
Standard software engineering practices, like regex-based unit tests, don’t work on non-deterministic natural language output. We need a new layer in our stack.
Here is how I approached building a "Bullshit Detector" middleware using TypeScript, Node.js, and PostgreSQL with pgvector.
The Architecture Problem
A typical RAG flow looks like this:
- User asks question.
- App retrieves relevant context documents.
- LLM generates an answer based on context.
- User sees the answer (even if it’s wrong).
The issue is step 4. We are trusting the model implicitly.
To catch hallucinations, we need to introduce an adversarial step after generation but before the user sees it. We need a middleware that acts as a relentless fact-checker.
The Solution: Semantic Proximity Check
Since we already have the "source truth" (the documents we retrieved in step 2) and the generated "claim" (the LLM’s answer), we can mathematically measure how closely they align.
If the LLM’s answer is semantically distant from the source documents it was supposed to use, it’s likely bullshitting.
My stack for this middleware:
- Runtime: Node.js (lightweight, fast for I/O).
- Language: TypeScript (for type safety on the data structures).
- Vector DB: PostgreSQL with the
pgvectorextension.
I chose pgvector because keeping the operational data and vectors in the same database simplifies the architecture immensely compared to managing a separate Pinecone or Weaviate instance for just this validation step.
The Core Logic
The goal isn’t to re-run the entire RAG process. The goal is to take the final output and verify its "grounding."
Here is a simplified TypeScript view of the evaluation logic. We use an embeddings model to convert both the generated answer and the source text into vectors, and then calculate the cosine similarity.
import { embedText, cosineSimilarity } from './vectorUtils';
interface AuditRequest {
llmAnswer: string;
retrievedContext: string[]; // The raw text chunks passed to the LLM
threshold: number; // e.g., 0.75
}
export async function validateResponse(req: AuditRequest) {
// 1. Vectorize the "Claim" (the LLM's answer)
const answerVector = await embedText(req.llmAnswer);
let totalSimilarityScore = 0;
// 2. Compare the claim against every piece of context used
for (const sourceText of req.retrievedContext) {
// Vectorize the source truth
const sourceVector = await embedText(sourceText);
// Calculate semantic overlap (1.0 = identical meaning, 0.0 = unrelated)
const similarity = cosineSimilarity(answerVector, sourceVector);
totalSimilarityScore += similarity;
}
// 3. Calculate an average "Trust Score"
// (In production, we use weighted averages based on relevance)
const averageTrustScore = totalSimilarityScore / req.retrievedContext.length;
// 4. Make a Pass/Fail decision
if (averageTrustScore < req.threshold) {
return {
action: "REJECT",
score: averageTrustScore,
reason: "The generated response does not align semantically with the provided source context."
};
}
return {
action: "PASS",
score: averageTrustScore
};
}
The Resulting Data Structure
For this to be useful in a real application, the middleware can’t just return true/false. The frontend needs to know why something was flagged.
If the system detects a hallucination, it generates a detailed JSON object that can be logged for engineers or used to show a warning in the UI.
{
"id": "audit_123xyz",
"timestamp": "2023-10-27T10:00:00Z",
"trust_score": 0.42,
"action": "REJECT",
"audit_details": {
"reason": "Critical hallucination detected. Answer claims X, but source documents contain Y.",
"contradictions": [
{
"claim": "Product supports XML export",
"source_truth": "Export formats supported: JSON, CSV only."
}
]
}
}
Conclusion
Input validation is crucial, but for production-grade AI agents, output verification is mandatory. You cannot rely solely on prompt engineering to prevent hallucinations.
By treating the LLM as an untrusted component and wrapping it with a semantic validation layer using tools like Node.js and pgvector, we can build guardrails that actually work.
I packaged this exact logic into a standalone middleware tool called AgentAudit. It’s designed to drop into existing Node/TS backends to start catching lies immediately.
I’d love to hear how you handle this problem. Are you manually reviewing logs, or do you have automated checks in place?
You can check out the interactive demo here: https://agentaudit-dashboard.vercel.app/