Show HN: HyperMind – Experimental human-like memory layer for AI apps (OS)

HyperMind

🧠 Intelligent Memory Layer for Large Language Models

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory

✨ What is HyperMind?

HyperMind is a production-grade memory proxy that sits between your application and any LLM provider (OpenAI, Anthropic, Groq, Google). It automatically manages conversation context and long-term memory using cognitive science principles and advanced optimization techniques, preventing vector database bloat while maintaining intelligent memory retention. Try out the demo at: HyperMind chat *experimental

🎯 The Problem

LLMs are stateless - they forget everything after each conversation
Vector databases grow indefinitely, causing perform…

HyperMind

🧠 Intelligent Memory Layer for Large Language Models

Transform stateless LLMs into context-aware AI agents with persistent, optimized memory

✨ What is HyperMind?

🎯 The Problem

LLMs are stateless - they forget everything after each conversation
Vector databases grow indefinitely, causing performance degradation
Building persistent memory is complex and expensive
No intelligent filtering - everything gets stored, even irrelevant content
Context windows are limited and expensive to extend

🚀 The Solution

HyperMind provides a universal memory layer with comprehensive optimization:

# Instead of calling providers directly:
curl https://api.openai.com/v1/chat/completions
curl https://api.anthropic.com/v1/messages
curl https://api.groq.com/openai/v1/chat/completions
curl https://generativelanguage.googleapis.com/v1beta/openai/chat/completions

# Call HyperMind (same API, but with intelligent memory):
curl https://your-hypermind.workers.dev/router/v1/chat/completions

Your AI now remembers everything - while staying fast and cost-efficient.

🌟 Key Features

🧠 Memory Router

🔌 Universal Proxy: Works with any LLM provider (OpenAI, Anthropic, Groq, Google)
🔄 Multi-Provider Support: Seamlessly switch between providers while maintaining memory
⚡ Low Latency: Transparent proxy adds <700ms overhead
💰 Cost Transparent: Uses your API keys, zero markup

🔍 Hybrid Search Engine

Combines three search strategies for comprehensive memory retrieval:

🎯 Vector Search - Semantic similarity using embeddings
🕸️ Graph Traversal - Entity relationships and knowledge graphs
⏰ Chronological - Recent context and temporal relevance

🎛️ Intelligent Memory Optimization

Prevents vector database bloat with advanced techniques:

🔗 Smart Deduplication - Detects and merges similar memories (90% similarity threshold)
📊 Significance Filtering - Skips low-value content (greetings, filler, acknowledgments)
📦 Tiered Archival - Moves old memories through Hot→Warm→Cold→Archived tiers
🔄 Memory Consolidation - Clusters and summarizes related memories
⚡ Batch Processing - Queues embeddings for efficient API usage

Result: 40-60% storage reduction, 2-3x faster search, 50-70% fewer API calls

📊 Knowledge Graph

🔗 Temporal Triplets: Subject-Predicate-Object with time validity
🏷️ Entity Extraction: Automatic extraction of people, places, concepts
📝 Episodic Classification: Categorizes memories by type (comparison, question, definition, list, factual)
📉 Smart Decay: Different forgetting rates for different memory types

⏱️ Cognitive Science Integration

Based on Ebbinghaus’ Forgetting Curve:

Tier	Age	Vector Search	Status
🔥 Hot	0-7 days	Active	Full access
🌡️ Warm	7-30 days	Active	Full access
❄️ Cold	30-90 days	Active	Lower priority
📦 Archived	90+ days	Removed	D1 only
🗄️ Ancient	180+ days	Compressed	R2 storage (optional)

🚀 Quick Start

1. Deploy HyperMind (1-click)

2. Get Your API Key

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude 3.5)
Groq (Llama 3.3) - Free tier available
Google (Gemini 2.0)

3. Make Your First Request

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: groq" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "I am building a quantum computing system with 127 qubits"}
]
}'

4. Test Memory Recall

curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: groq" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "What quantum computing project am I working on?"}
]
}'

Response: “You’re building a quantum computing system with 127 qubits...” ✨

🏗️ Architecture

Request Flow

sequenceDiagram
participant App as Your Application
participant Router as Memory Router
participant Search as Hybrid Search
participant Storage as Storage Layer
participant LLM as LLM Provider
participant Optim as Optimization

App->>Router: Chat Request<br/>(user message)

Note over Router: Step 1: Memory Retrieval
Router->>Search: Find relevant memories

par Parallel Search
Search->>Storage: Vector Search (semantic)
Search->>Storage: Graph Traversal (entities)
Search->>Storage: Chronological (recent)
end

Storage-->>Search: Combined Results
Search-->>Router: Top 15 relevant memories

Note over Router: Step 2: Context Injection
Router->>Router: Inject memories into prompt

Note over Router: Step 3: LLM Request
Router->>LLM: Enhanced request<br/>(with context)
LLM-->>Router: Response

Router-->>App: Final Response<br/>(with memory)

Note over Router: Step 4: Background Storage
Router->>Optim: Store conversation async

Optim->>Optim: Analyze Significance<br/>(score: 0.0-1.0)

alt Low Significance (< 0.6)
Optim->>Optim: Discard ❌
else High Significance (>= 0.6)
Optim->>Optim: Check for duplicates<br/>(hash + similarity)

alt Similar Memory Found (> 0.9)
Optim->>Storage: Merge with existing 🔗
else New Memory
Optim->>Optim: Add to batch queue
Optim->>Storage: Store when batch full
end
end

Note over Storage: Tiered Storage
Storage->>Storage: Hot (0-7d): Active<br/>Warm (7-30d): Active<br/>Cold (30-90d): Active<br/>Archived (90d+): D1 only<br/>Ancient (180d+): R2

Storage Infrastructure

Layer	Technology	Purpose	Data Retention
Active Index	Cloudflare Vectorize	Semantic search on hot/warm/cold memories	0-90 days
Primary DB	Cloudflare D1 (SQLite)	All memories, entities, triplets	Forever
Query Cache	Cloudflare KV	LLM analysis results	1 hour TTL
Cold Archive	Cloudflare R2 (optional)	Compressed ancient memories	180+ days

Optimization Pipeline

Incoming Memory
↓
[Significance Analysis]
↓
Score < 0.6? → Discard ❌
↓
[Hash Check]
↓
Duplicate? → Skip ❌
↓
[Similarity Check]
↓
Similar (>0.9)? → Merge 🔗
↓
[Batch Queue]
↓
Queue Full (50)? → Process Batch
↓
[Vector Storage]
↓
Stored ✅

📖 Usage Examples

Memory Router API

# Chat with memory (works with any provider)
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: groq" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "My favorite programming language is Python"}
]
}'

# Switch to different provider (memory persists)
curl -X POST "https://your-hypermind.workers.dev/router/v1/chat/completions" \
-H "Authorization: Bearer YOUR_OPENAI_KEY" \
-H "x-hypermind-user-id: user123" \
-H "x-hypermind-provider: openai" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What programming language do I prefer?"}
]
}'

Direct Memory API

# Store a memory manually
curl -X POST "https://your-hypermind.workers.dev/api/memories?userId=user123" \
-H "Content-Type: application/json" \
-d '{
"content": "I prefer TypeScript over JavaScript",
"metadata": {"source": "manual", "tags": ["programming"]}
}'

# Search memories
curl -X POST "https://your-hypermind.workers.dev/api/search?userId=user123" \
-H "Content-Type: application/json" \
-d '{
"query": "programming preferences",
"limit": 5
}'

⚡ Performance & Optimization

Optimization Features

Feature	Impact	Description
Smart Deduplication	20-30% reduction	Merges similar memories (cosine similarity > 0.90)
Significance Filtering	30-40% reduction	Skips greetings, filler, low-value content
Tiered Archival	2-3x faster search	Removes old memories from active vector index
Memory Consolidation	30-40% reduction	Clusters related memories into summaries
Batch Processing	50-70% fewer API calls	Queues embeddings for batch processing

Performance Benchmarks

Before Optimization:

Storage: Linear growth, indefinite
Search: 5-10s for 10k+ memories
API Calls: Every conversation = 1+ embedding calls

After Optimization:

Storage: 40-60% reduction
Search: 2-3s for 10k+ memories (2-3x faster)
API Calls: 50-70% reduction via batching

Configuration

Customize optimization thresholds in wrangler.toml:

[vars]
DEDUP_SIMILARITY_THRESHOLD = "0.90"  # 0.85-0.95 recommended
MIN_SIGNIFICANCE_SCORE = "0.60"      # 0.5-0.7 recommended
CONSOLIDATION_ENABLED = "true"        # Enable memory consolidation
BATCH_EMBEDDING_SIZE = "50"          # Batch size: 10-100
ARCHIVE_COLD_AFTER_DAYS = "90"       # Days before archival: 60-180

Automated Maintenance

HyperMind runs automated tasks via cron triggers:

Task	Schedule	Purpose
Forgetting Cycle	Daily 2 AM	Update relevance scores, archive old memories
Consolidation	Daily 3 AM	Cluster and summarize related memories
Batch Processing	Every 30 min	Process queued embeddings

🛠️ Development

Local Setup

git clone https://github.com/yourusername/hypermind.git
cd hypermind
npm install
npm run dev

Environment Setup

# Create Cloudflare resources
wrangler d1 create hypermind-prod
wrangler vectorize create hypermind-embeddings --dimensions=768 --metric=cosine
wrangler kv:namespace create CACHE

# Optional: Create R2 bucket for ancient memory archival
wrangler r2 bucket create hypermind-archive

# Update wrangler.toml with your resource IDs

Database Migration

# Apply migrations to production
wrangler d1 migrations apply hypermind-prod --remote

Testing

npm test              # Run tests
npm run test:coverage # With coverage
npm run lint          # Code quality

📊 Database Schema

Core Tables

memories: Conversation storage with optimization metadata
memory_consolidations: Tracks consolidated memory summaries
entities: Extracted entities (people, places, concepts)
temporal_triplets: Subject-Predicate-Object relationships
forgetting_config: Per-user decay settings

Optimization Fields

-- New fields in memories table
significance_score REAL DEFAULT 1.0    -- 0.0-1.0 importance score
consolidated INTEGER DEFAULT 0         -- Is this memory consolidated?
consolidated_into TEXT                 -- Reference to summary memory
vector_archived INTEGER DEFAULT 0      -- Removed from vector index?
r2_archived INTEGER DEFAULT 0          -- Stored in R2?
dedup_hash TEXT                        -- Hash for duplicate detection

Knowledge Graph

-- Example temporal triplet
INSERT INTO temporal_triplets (subject, predicate, object, episodic_type, valid_from)
VALUES ('user123', 'prefers', 'TypeScript', 'factual', '2024-01-01');

🎯 Use Cases

🤖 AI Chatbots

Build chatbots that remember user preferences, conversation history, and context across sessions - without bloating your database.

📚 RAG Applications

Use HyperMind as your vector store with automatic optimization for document-based AI applications.

🎓 Educational AI

Create AI tutors that remember student progress, learning patterns, and knowledge gaps - with intelligent consolidation.

💼 Business AI

Build AI assistants that remember customer interactions while archiving old, irrelevant data automatically.

🎮 Gaming AI

Create NPCs with persistent memory that evolves and consolidates over time.

🔧 Advanced Features

Smart Deduplication

Prevents storing duplicate or near-duplicate memories:

// Automatic similarity detection
const similarity = cosineSimilarity(newEmbedding, existingEmbedding);
if (similarity > 0.90) {
// Merge with existing memory instead of creating new
await mergeMemories(existing, newContent);
}

Significance Filtering

Filters out low-value content automatically:

❌ Generic greetings: “hi”, “hello”, “thanks”
❌ Acknowledgments: “ok”, “got it”, “understood”
❌ Emoji-only messages
❌ Very short content (< 20 characters)
✅ Technical discussions (high significance score)
✅ Personal information (high significance score)

Memory Consolidation

Automatically clusters and summarizes related memories:

// Daily consolidation process
1. Find related memories (cosine similarity > 0.70)
2. Group into clusters (3+ memories per cluster)
3. Generate summary memory
4. Mark originals as consolidated
5. Update vector index with summary

Result: 30-40% reduction in active corpus size

Tiered Archival

Automatically moves memories through storage tiers:

// Archival process
Hot (0-7d)   → Full vector search, all features active
Warm (7-30d) → Full vector search, lower priority
Cold (30-90d)→ Vector search only if needed
Archived     → Removed from vector index, D1 only
Ancient      → Compressed, stored in R2 (optional)

Result: 2-3x faster search on large datasets

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Areas for Contribution

🐛 Bug fixes
✨ New features (LLM-powered summarization, multi-language support)
📚 Documentation improvements
🧪 Test coverage
🎨 UI/UX enhancements
⚡ Performance optimizations

📄 License

This project is licensed under the MIT License - see the [LICENSE](https://opensource.org/license/MIT] for details.

🙏 Acknowledgments

Ebbinghaus for the forgetting curve research
Cloudflare for the amazing Workers platform

HyperMind

✨ What is HyperMind?

🎯 The Problem

HyperMind

✨ What is HyperMind?

🎯 The Problem

🚀 The Solution

🌟 Key Features

🧠 Memory Router

🔍 Hybrid Search Engine

🎛️ Intelligent Memory Optimization

📊 Knowledge Graph

⏱️ Cognitive Science Integration

🚀 Quick Start

1. Deploy HyperMind (1-click)

2. Get Your API Key

3. Make Your First Request

4. Test Memory Recall

🏗️ Architecture

Request Flow

Storage Infrastructure

Optimization Pipeline

📖 Usage Examples

Memory Router API

Direct Memory API

⚡ Performance & Optimization

Optimization Features

Performance Benchmarks

Configuration

Automated Maintenance

🛠️ Development

Local Setup

Environment Setup

Database Migration

Testing

📊 Database Schema

Core Tables

Optimization Fields

Knowledge Graph

🎯 Use Cases

🤖 AI Chatbots

📚 RAG Applications

🎓 Educational AI

💼 Business AI

🎮 Gaming AI

🔧 Advanced Features

Smart Deduplication

Significance Filtering

Memory Consolidation

Tiered Archival

🤝 Contributing

Development Workflow

Areas for Contribution

📄 License

🙏 Acknowledgments

Similar Posts