HyperMind
๐ง Intelligent Memory Layer for Large Language Models
Transform stateless LLMs into context-aware AI agents with persistent, optimized memory
โจ What is HyperMind?
HyperMind is a production-grade memory proxy that sits between your application and any LLM provider (OpenAI, Anthropic, Groq, Google). It automatically manages conversation context and long-term memory using cognitive science principles and advanced optimization techniques, preventing vector database bloat while maintaining intelligent memory retention. Try out the demo at: HyperMind chat *experimental
๐ฏ The Problem
- LLMs are stateless - they forget everything after each conversation
- Vector databases grow indefinitely, causing performโฆ
HyperMind
๐ง Intelligent Memory Layer for Large Language Models
Transform stateless LLMs into context-aware AI agents with persistent, optimized memory
โจ What is HyperMind?
HyperMind is a production-grade memory proxy that sits between your application and any LLM provider (OpenAI, Anthropic, Groq, Google). It automatically manages conversation context and long-term memory using cognitive science principles and advanced optimization techniques, preventing vector database bloat while maintaining intelligent memory retention. Try out the demo at: HyperMind chat *experimental
๐ฏ The Problem
- LLMs are stateless - they forget everything after each conversation
- Vector databases grow indefinitely, causing performance degradation
- Building persistent memory is complex and expensive
- No intelligent filtering - everything gets stored, even irrelevant content
- Context windows are limited and expensive to extend
๐ The Solution
HyperMind provides a universal memory layer with comprehensive optimization:
# Instead of calling providers directly:
curl https://api.openai.com/v1/chat/completions
curl https://api.anthropic.com/v1/messages
curl https://api.groq.com/openai/v1/chat/completions
curl https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
# Call HyperMind (same API, but with intelligent memory):
curl https://your-hypermind.workers.dev/router/v1/chat/completions
Your AI now remembers everything - while staying fast and cost-efficient.
๐ Key Features
๐ง Memory Router
- ๐ Universal Proxy: Works with any LLM provider (OpenAI, Anthropic, Groq, Google)
- ๐ Multi-Provider Support: Seamlessly switch between providers while maintaining memory
- โก Low Latency: Transparent proxy adds <700ms overhead
- ๐ฐ Cost Transparent: Uses your API keys, zero markup
๐ Hybrid Search Engine
Combines three search strategies for comprehensive memory retrieval:
- ๐ฏ Vector Search - Semantic similarity using embeddings
- ๐ธ๏ธ Graph Traversal - Entity relationships and knowledge graphs
- โฐ Chronological - Recent context and temporal relevance
๐๏ธ Intelligent Memory Optimization
Prevents vector database bloat with advanced techniques:
- ๐ Smart Deduplication - Detects and merges similar memories (90% similarity threshold)
- ๐ Significance Filtering - Skips low-value content (greetings, filler, acknowledgments)
- ๐ฆ Tiered Archival - Moves old memories through HotโWarmโColdโArchived tiers
- ๐ Memory Consolidation - Clusters and summarizes related memories
- โก Batch Processing - Queues embeddings for efficient API usage
Result: 40-60% storage reduction, 2-3x faster search, 50-70% fewer API calls
๐ Knowledge Graph
- ๐ Temporal Triplets: Subject-Predicate-Object with time validity
- ๐ท๏ธ Entity Extraction: Automatic extraction of people, places, concepts
- ๐ Episodic Classification: Categorizes memories by type (comparison, question, definition, list, factual)
- ๐ Smart Decay: Different forgetting rates for different memory types
โฑ๏ธ Cognitive Science Integration
Based on Ebbinghausโ Forgetting Curve:
Tier | Age | Vector Search | Status |
---|---|---|---|
๐ฅ Hot | 0-7 days | Active | Full access |
๐ก๏ธ Warm | 7-30 days | Active | Full access |
โ๏ธ Cold | 30-90 days | Active | Lower priority |
๐ฆ Archived | 90+ days | Removed | D1 only |
๐๏ธ Ancient | 180+ days | Compressed | R2 storage (optional) |
๐ Quick Start
1. Deploy HyperMind (1-click)
2. Get Your API Key
Sign up for any LLM provider:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3.5)
- Groq (Llama 3.3) - Free tier available
- Google (Gemini 2.0)
3. Make Your First Request
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: groq" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "I am building a quantum computing system with 127 qubits"}
]
}'
4. Test Memory Recall
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: groq" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "What quantum computing project am I working on?"}
]
}'
Response: โYouโre building a quantum computing system with 127 qubits...โ โจ
๐๏ธ Architecture
Request Flow
sequenceDiagram
participant App as Your Application
participant Router as Memory Router
participant Search as Hybrid Search
participant Storage as Storage Layer
participant LLM as LLM Provider
participant Optim as Optimization
App->>Router: Chat Request<br/>(user message)
Note over Router: Step 1: Memory Retrieval
Router->>Search: Find relevant memories
par Parallel Search
Search->>Storage: Vector Search (semantic)
Search->>Storage: Graph Traversal (entities)
Search->>Storage: Chronological (recent)
end
Storage-->>Search: Combined Results
Search-->>Router: Top 15 relevant memories
Note over Router: Step 2: Context Injection
Router->>Router: Inject memories into prompt
Note over Router: Step 3: LLM Request
Router->>LLM: Enhanced request<br/>(with context)
LLM-->>Router: Response
Router-->>App: Final Response<br/>(with memory)
Note over Router: Step 4: Background Storage
Router->>Optim: Store conversation async
Optim->>Optim: Analyze Significance<br/>(score: 0.0-1.0)
alt Low Significance (< 0.6)
Optim->>Optim: Discard โ
else High Significance (>= 0.6)
Optim->>Optim: Check for duplicates<br/>(hash + similarity)
alt Similar Memory Found (> 0.9)
Optim->>Storage: Merge with existing ๐
else New Memory
Optim->>Optim: Add to batch queue
Optim->>Storage: Store when batch full
end
end
Note over Storage: Tiered Storage
Storage->>Storage: Hot (0-7d): Active<br/>Warm (7-30d): Active<br/>Cold (30-90d): Active<br/>Archived (90d+): D1 only<br/>Ancient (180d+): R2
Loading
Storage Infrastructure
Layer | Technology | Purpose | Data Retention |
---|---|---|---|
Active Index | Cloudflare Vectorize | Semantic search on hot/warm/cold memories | 0-90 days |
Primary DB | Cloudflare D1 (SQLite) | All memories, entities, triplets | Forever |
Query Cache | Cloudflare KV | LLM analysis results | 1 hour TTL |
Cold Archive | Cloudflare R2 (optional) | Compressed ancient memories | 180+ days |
Optimization Pipeline
Incoming Memory
โ
[Significance Analysis]
โ
Score < 0.6? โ Discard โ
โ
[Hash Check]
โ
Duplicate? โ Skip โ
โ
[Similarity Check]
โ
Similar (>0.9)? โ Merge ๐
โ
[Batch Queue]
โ
Queue Full (50)? โ Process Batch
โ
[Vector Storage]
โ
Stored โ
๐ Usage Examples
Memory Router API
# Chat with memory (works with any provider)
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: groq" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "My favorite programming language is Python"}
]
}'
# Switch to different provider (memory persists)
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Authorization: Bearer YOUR_OPENAI_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: openai" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What programming language do I prefer?"}
]
}'
Direct Memory API
# Store a memory manually
curl -X POST "https://your-hypermind.workers.dev/api/memories?userId=user123" \
-H "Content-Type: application/json" \
-d '{
"content": "I prefer TypeScript over JavaScript",
"metadata": {"source": "manual", "tags": ["programming"]}
}'
# Search memories
curl -X POST "https://your-hypermind.workers.dev/api/search?userId=user123" \
-H "Content-Type: application/json" \
-d '{
"query": "programming preferences",
"limit": 5
}'
โก Performance & Optimization
Optimization Features
Feature | Impact | Description |
---|---|---|
Smart Deduplication | 20-30% reduction | Merges similar memories (cosine similarity > 0.90) |
Significance Filtering | 30-40% reduction | Skips greetings, filler, low-value content |
Tiered Archival | 2-3x faster search | Removes old memories from active vector index |
Memory Consolidation | 30-40% reduction | Clusters related memories into summaries |
Batch Processing | 50-70% fewer API calls | Queues embeddings for batch processing |
Performance Benchmarks
Before Optimization:
- Storage: Linear growth, indefinite
- Search: 5-10s for 10k+ memories
- API Calls: Every conversation = 1+ embedding calls
After Optimization:
- Storage: 40-60% reduction
- Search: 2-3s for 10k+ memories (2-3x faster)
- API Calls: 50-70% reduction via batching
Configuration
Customize optimization thresholds in wrangler.toml
:
[vars]
DEDUP_SIMILARITY_THRESHOLD = "0.90" # 0.85-0.95 recommended
MIN_SIGNIFICANCE_SCORE = "0.60" # 0.5-0.7 recommended
CONSOLIDATION_ENABLED = "true" # Enable memory consolidation
BATCH_EMBEDDING_SIZE = "50" # Batch size: 10-100
ARCHIVE_COLD_AFTER_DAYS = "90" # Days before archival: 60-180
Automated Maintenance
HyperMind runs automated tasks via cron triggers:
Task | Schedule | Purpose |
---|---|---|
Forgetting Cycle | Daily 2 AM | Update relevance scores, archive old memories |
Consolidation | Daily 3 AM | Cluster and summarize related memories |
Batch Processing | Every 30 min | Process queued embeddings |
๐ ๏ธ Development
Local Setup
git clone https://github.com/yourusername/hypermind.git
cd hypermind
npm install
npm run dev
Environment Setup
# Create Cloudflare resources
wrangler d1 create hypermind-prod
wrangler vectorize create hypermind-embeddings --dimensions=768 --metric=cosine
wrangler kv:namespace create CACHE
# Optional: Create R2 bucket for ancient memory archival
wrangler r2 bucket create hypermind-archive
# Update wrangler.toml with your resource IDs
Database Migration
# Apply migrations to production
wrangler d1 migrations apply hypermind-prod --remote
Testing
npm test # Run tests
npm run test:coverage # With coverage
npm run lint # Code quality
๐ Database Schema
Core Tables
memories
: Conversation storage with optimization metadatamemory_consolidations
: Tracks consolidated memory summariesentities
: Extracted entities (people, places, concepts)temporal_triplets
: Subject-Predicate-Object relationshipsforgetting_config
: Per-user decay settings
Optimization Fields
-- New fields in memories table
significance_score REAL DEFAULT 1.0 -- 0.0-1.0 importance score
consolidated INTEGER DEFAULT 0 -- Is this memory consolidated?
consolidated_into TEXT -- Reference to summary memory
vector_archived INTEGER DEFAULT 0 -- Removed from vector index?
r2_archived INTEGER DEFAULT 0 -- Stored in R2?
dedup_hash TEXT -- Hash for duplicate detection
Knowledge Graph
-- Example temporal triplet
INSERT INTO temporal_triplets (subject, predicate, object, episodic_type, valid_from)
VALUES ('user123', 'prefers', 'TypeScript', 'factual', '2024-01-01');
๐ฏ Use Cases
๐ค AI Chatbots
Build chatbots that remember user preferences, conversation history, and context across sessions - without bloating your database.
๐ RAG Applications
Use HyperMind as your vector store with automatic optimization for document-based AI applications.
๐ Educational AI
Create AI tutors that remember student progress, learning patterns, and knowledge gaps - with intelligent consolidation.
๐ผ Business AI
Build AI assistants that remember customer interactions while archiving old, irrelevant data automatically.
๐ฎ Gaming AI
Create NPCs with persistent memory that evolves and consolidates over time.
๐ง Advanced Features
Smart Deduplication
Prevents storing duplicate or near-duplicate memories:
// Automatic similarity detection
const similarity = cosineSimilarity(newEmbedding, existingEmbedding);
if (similarity > 0.90) {
// Merge with existing memory instead of creating new
await mergeMemories(existing, newContent);
}
Significance Filtering
Filters out low-value content automatically:
- โ Generic greetings: โhiโ, โhelloโ, โthanksโ
- โ Acknowledgments: โokโ, โgot itโ, โunderstoodโ
- โ Emoji-only messages
- โ Very short content (< 20 characters)
- โ Technical discussions (high significance score)
- โ Personal information (high significance score)
Memory Consolidation
Automatically clusters and summarizes related memories:
// Daily consolidation process
1. Find related memories (cosine similarity > 0.70)
2. Group into clusters (3+ memories per cluster)
3. Generate summary memory
4. Mark originals as consolidated
5. Update vector index with summary
Result: 30-40% reduction in active corpus size
Tiered Archival
Automatically moves memories through storage tiers:
// Archival process
Hot (0-7d) โ Full vector search, all features active
Warm (7-30d) โ Full vector search, lower priority
Cold (30-90d)โ Vector search only if needed
Archived โ Removed from vector index, D1 only
Ancient โ Compressed, stored in R2 (optional)
Result: 2-3x faster search on large datasets
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Workflow
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Areas for Contribution
- ๐ Bug fixes
- โจ New features (LLM-powered summarization, multi-language support)
- ๐ Documentation improvements
- ๐งช Test coverage
- ๐จ UI/UX enhancements
- โก Performance optimizations
๐ License
This project is licensed under the MIT License - see the [LICENSE](https://opensource.org/license/MIT] for details.
๐ Acknowledgments
- Ebbinghaus for the forgetting curve research
- Cloudflare for the amazing Workers platform