Iβm home alone after New Years. What do I decide to do? Force my two favorite AI "friends" to go head-to-head.
I expected to find a winner. Instead, I found something more interesting: using both models together was more effective than using either individually.
Iβve seen a lot of discourse about Claude vs Codex - so as a preface, this post will not be another "WOW Codex is soooooo much better than Claude ??!?!!!?" or "Opus 4.5 is crazy!! are we gonna have jobs in 1 year????".
Itβs about what happens when you stop looking for the best tool and start treating them as collaborators instead.
These models went head to head not on benchmarks, or a toy task. This was on "real" non-trivial work: adding vector search to my AI dev tooling.
The rules: SOTA (state of β¦
Iβm home alone after New Years. What do I decide to do? Force my two favorite AI "friends" to go head-to-head.
I expected to find a winner. Instead, I found something more interesting: using both models together was more effective than using either individually.
Iβve seen a lot of discourse about Claude vs Codex - so as a preface, this post will not be another "WOW Codex is soooooo much better than Claude ??!?!!!?" or "Opus 4.5 is crazy!! are we gonna have jobs in 1 year????".
Itβs about what happens when you stop looking for the best tool and start treating them as collaborators instead.
These models went head to head not on benchmarks, or a toy task. This was on "real" non-trivial work: adding vector search to my AI dev tooling.
The rules: SOTA (state of the art) models, same starting prompt, parallel terminals.
The tools: Anthropic $100/m subscription, ChatGPT Plus ($20 $0/m for this month thanks Sam! )
Disclaimer: I am a MongoDB employee and decided to use MongoDB products (Atlas, vector search) as the Vector DB in this experiment for a few reasons:
-
Itβs free! (for me :))
-
We mentioned AI 82 times in our last earnings call and the stock went up 20% afterwards (not joking btw)
-
We spent $220M acquiring Voyage AI and I still havenβt used it
-
The companyβs last acquisition was Realm for $39M, and was directly related to the last team I worked on. Our product is now deprecated... (Iβm assuming so we can throw more money in the AI pit)
What is "The Messy Middle"?
I have a few problems with both benchmarks and user-driven evals:
-
Benchmarks DO NOT reflect real usages of vibe coding - they are so sandboxed that itβs I view it as more a test of raw model intelligence than effective it is in practice
-
Vibe coders usually have AT LEAST a plan -> execute workflow, knowing that planning the change out dramatically increases performance on complex tasks
-
They also work in large codebases and deal with changing requirements / undefined problems, none of which is captured in a benchmark test
-
Most people testing these tools donβt know WTF theyβre doing
-
"Make Minecraft from scratch!" which again is not reflective of real engineering using these tools
-
Done by people with little experience vibe coding, or unwilling to pay for SOTA models
-
"Codex is so much better than Claude (but I used Haiku 4.5 and Codex 5.2 with max thinking)"
I also donβt know WTF Iβm doing (and I suspect very few in this space actually do), but at least Iβve been experimenting for ~1y with a bunch of various AI coding tools.
"Real" vibe coding, the type done by most software engineers, is existing code, existing decisions, imperfect context, and changing goalposts.
Iβm also convinced that every vibe coder has a different workflow adapted to their needs. Testing how models adapt to unique workflows is extremely valuable. Benchmarks only capture vanilla case.
These agents are meant to be tools and should behave as such, maintaining high performance across a variety of environments and tasks.
What This Is (and Isnβt)
What I can conclude: Running a second model on your plan catches errors youβd otherwise miss. The cognitive diversity matters more than which model is "better."
What I canβt conclude: That Codex is objectively superior. Iβve used Claude daily since April; Iβve used Codex for three days. The novelty effect is real. My prompts are probably Claude-shaped. My workflow (APEX) was built around Claudeβs patterns. (Given the simplicity of Codex, I think I have the hang of it, but thereβs still certainly a Claude bias.)
This is also just one night of usage. Models & harness are changing by the day - so view this as a snapshot in time of model performance instead of a definitive answer.
This is n=1. But itβs a documented n=1 with receipts, which is more than most AI comparisons offer.
The Task
Some background on my workflow:
-
APEX is my personal AI dev workflow that exposes
-
research -> plan -> implement -> ship phases
-
an experiment with mcp for longer term memory storage
This post explores the research + plan phases, along with a review of the plan. Why not implement? Because IMO having a good plan is 80% of the work in vibe coding.
The task is to add high quality semantic search over my .apex/tasks markdown files to allow the agent to search across task memory.
Research Phase
This phase is meant to gather background research on the task from the codebase, git history, web, etc.
The high level goal of this step is to understand what weβre doing, why, and find all relevant code.
Both models got the same prompt. Watch for:
-
The Voyage moment - Claude recommended Voyage because theyβre an "Anthropic partner" I laughed out loud. Claude citing its creatorβs business partnerships as a technical justification is either endearing or concerning, not sure which at this point.
-
This is especially funny given the flak that OpenAI have been getting for planning to introduce ads - turns out Anthropic may have beat them to it!
-
Questions - Claude clarifies architecture decisions with me in this phase.
-
Cross-pollination - What happens when I share Codexβs take with Claude.
ThinkingTool calls
user
do research into what it would take to make apex vector search over .md files. assume 1. during apex workflow, everything is written to .md file 2. i have a mongodb atlas cluster with vector search i can easily use
assistant
## Research Complete
Iβve completed the research phase for implementing vector search over .md files with MongoDB Atlas. Hereβs the summary:
### Key Findings
**1. APEX Architecture is Ready**
- Task files are already written to `.apex/tasks/[ID].md` during workflow
- YAML frontmatter + markdown structure is ideal for chunking
- Current SQLite FTS can coexist with vector search
**2. MongoDB Atlas Vector Search**
- `$vectorSearch` aggregation is straightforward to implement
- Supports up to 8192 dimensions (all embedding models fit)
- Free tier gives you 512MB and 3 indexes to start
- Pre-filtering by phase, task type, tags is built-in
**3. Embedding Recommendations** | Use Case | Provider | Model | Cost | |βββββ|βββββ|βββ|ββ| | **Simple start** | OpenAI | text-embedding-3-small | $0.02/1M tokens | | **Best retrieval** | Voyage AI | voyage-3.5-lite | $0.02/1M tokens | | **Zero cost** | Ollama | nomic-embed-text | Free |
**4. Chunking Strategy**
- Use `MarkdownHeaderTextSplitter` to preserve heading structure
- 400-512 tokens per chunk with 10% overlap
- Store heading hierarchy as metadata for context
### Proposed Components
1. **Indexing Pipeline** - Watches `.apex/tasks/*.md`, chunks, embeds, upserts to MongoDB 2. **MCP Tool**: `apex_semantic_search` - Natural language queries over all APEX documents 3. **Embedding Abstraction** - Pluggable provider (OpenAI/Voyage/Ollama)
### Effort Estimate ~25-35 hours total implementation work
### Open Questions for You
1. Which embedding provider do you want to start with? 2. Should we index only task files, or also patterns and skills? 3. Environment variable for MongoDB connection string, or APEX config?
The full research brief is saved at `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`. Ready for `/apex:plan` when you want to proceed.
user
use voyage ai index only task files use apex config
also, i want to embed the WHOLE md - no XML parsing
assistant
## Research Complete
Updated the task brief with your decisions:
| Decision | Choice | |βββββ|ββββ| | **Embeddings** | Voyage AI `voyage-3.5-lite` (1024 dims, 32K context) | | **Scope** | Task files only (`.apex/tasks/*.md`) | | **Config** | APEX config (not env vars) | | **Approach** | Embed whole markdown as plain text - no XML parsing, no chunking |
**Simplified architecture**:
- One embedding per task file
- `voyage-3.5-lite` handles up to 32K tokens (plenty for task files)
- ~15-20 hours estimated effort (down from 25-35 with chunking)
**Task brief saved at**: `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`
Ready to run `/apex:plan` when you want to proceed to implementation planning.
user
actually letβs use .env for the creds to keep it simple
assistant
Updated. Credentials now via `.env`:
```bash MONGODB_URI=mongodb+srv://... VOYAGE_API_KEY=pa-... ```
Ready for `/apex:plan` when you want to proceed.
user
do you think we should chunk by something? whatβs the current strategy
assistant
Good question. Let me think through the trade-offs:
**Whole file (current):**
- Simpler, fewer API calls
- Works well for "find similar tasks"
- Risk: Long files become diluted embeddings - specific content gets lost in the average
**Chunk by XML section (`<research>`, `<plan>`, `<implementation>`, `<ship>`):**
- Better retrieval for "what was the architecture for X" β hits `<plan>` directly
- Natural boundaries already exist in task files
- But you said no XML parsing...
**Chunk by heading (## Section):**
- Also natural boundaries
- More granular than XML sections
- But maybe too granular?
**Hybrid - whole file + sections:**
- Store both: one embedding for whole doc, one per major section
- Best retrieval but 2-5x the embeddings/cost
**My take**: For task files that are typically <5K tokens, whole file is probably fine. The 32K context of voyage-3.5-lite wonβt struggle.
But if you anticipate:
- Very long task files (10K+ tokens)
- Queries like "find the implementation details for X" (section-specific)
Then chunking by the 4 workflow phases (`<research>`, `<plan>`, `<implementation>`, `<ship>`) would give better precision.
Whatβs your expected use case - finding similar tasks overall, or drilling into specific phases?
user
it will either be XML OR markdown with frontmatter. letβs use hybrid to detect which and chunk by that
assistant
Got it - detect format and chunk accordingly:
1. **XML format** β chunk by `<research>`, `<plan>`, `<implementation>`, `<ship>` sections 2. **Markdown format** β chunk by `##` headings
assistant
Updated. Chunking strategy:
| Format Detected | Chunk By | |ββββββ|βββββ| | XML (`<research>`, `<plan>`, `<implementation>`, `<ship>`) | Each XML section β 1 chunk | | Markdown | Each `##` heading β 1 chunk |
**Document schema**: ```javascript { taskId: string, section: "research" | "plan" | "## Overview" | ..., content: string, embedding: Binary, ... } ```
Task brief updated at `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`. Ready for `/apex:plan`.
user
compare and constrast this against chunking the whole document
assistant
| Aspect | Whole Document | Hybrid Chunking (current) | |ββββ|ββββββββ|βββββββββ| | **Retrieval precision** | Lower - query matches diluted across entire file | Higher - query matches specific section directly | | **Storage** | 1 embedding per file | 2-6 embeddings per file (more storage, cost) | | **API cost** | 1 Voyage call per file | 2-6 calls per file | | **Query: "find similar tasks"** | β Better - holistic comparison | β οΈ Worse - comparing chunks, not whole tasks | | **Query: "architecture decisions for X"** | β οΈ Worse - must return whole file, hope itβs relevant | β Better - returns `<plan>` or `## Architecture` chunk | | **Complexity** | Simpler - no parsing logic | More complex - format detection, chunking | | **Result granularity** | Returns whole file | Returns specific section (more focused) | | **Re-indexing on edit** | 1 embedding to regenerate | Multiple embeddings to regenerate |
**Key question**: Whatβs the primary use case?
- **"Find tasks similar to this one"** β Whole document wins
- **"What did we decide about X"** β Chunking wins
- **Both** β Could do hybrid: whole doc embedding + chunk embeddings (but 2-7x cost)
My take: If task files are typically <5K tokens, **whole document** is probably fine and simpler. Chunking adds complexity for marginal retrieval gain on small docs.
Want to stick with hybrid chunking, or simplify back to whole document?
user
codex disagrees
Hybrid (XML/Markdownβaware) vs wholeβdocument embedding:
- Hybrid chunking: better recall/precision (sections are semantically focused), returns pinpointed results with section metadata, supports partial reβindexing on small changes, but costs more (more embeddings), and adds chunking complexity.
- Wholeβdocument embedding: simplest and cheapest to build (one vector per file), but recall is weaker (mixed topics dilute similarity), results are less specific, any small change forces full reβembed, and long task files can exceed model limits or truncate important content.
Recommendation: stick with hybrid chunking. If you want a compromise, we can also store a single fileβlevel embedding for coarse routing and still return chunk hits for precision.
assistant
Codex makes good points I understated:
1. **Partial re-indexing** - Only re-embed changed sections (though our current impl deletes all chunks anyway) 2. **Truncation risk** - Long files could exceed limits; chunks avoid this 3. **Semantic focus** - Sections are topically coherent; whole doc is mixed signals
**The hybrid + whole-file idea is interesting:**
```javascript // Per task file, store: { // Coarse: "find similar tasks" { taskId, section: "_whole", embedding: [...] }
// Fine: "find specific info about X" { taskId, section: "research", embedding: [...] } { taskId, section: "plan", embedding: [...] } ... } ```
Query routing:
- `apex_semantic_search` β search chunks (precise)
- `apex_find_similar_tasks` β search `_whole` embeddings (holistic)
Cost: ~5-7 embeddings per file instead of 4-6. Marginal increase for best of both.
Want to add the whole-file embedding alongside chunks?
user
y
assistant
Done. Final strategy:
| Embedding | Purpose | MCP Tool | |ββββ|βββ|βββββ| | `section: "_whole"` | Task similarity | `apex_find_similar_tasks` | | `section: "research"` etc. | Precise search | `apex_semantic_search` |
~5-7 embeddings per task file. Ready for `/apex:plan`.
Planning Phase
This phase is meant to use the research to create a concrete plan. Consider alternatives, and propose the specific pieces of code to be changed.
Same task, same constraints. Watch how differently they approach it:
- Question style - Claude asks binary A/B/C questions; Codex asks open-ended questions and offers to explain tradeoffs when I said "I donβt have enough information".
- Artifact format - Claude produces markdown tables and structured sections. Codex produces XML-based architecture documents. The markdown is more readable, but the XML is technically what I asked for in my skill instructions.
ThinkingTool calls
user
.apex/tasks/_d3NcldtrLAotoiCziqQE.md
assistant
Now I understand the APEX plan phase. I need to: 1. Verify the task is in "research" phase (it shows `phase: research` in frontmatter) 2. Spawn verification agents to extend research 3. Work through the 5 mandatory artifacts interactively 4. Get user confirmation at checkpoints
Let me set up the todo list and begin the architecture process.
assistant
The MCP server is in TypeScript. Let me read the key files to understand the architecture.
assistant
Based on research and analysis, I understand we need to implement **vector search over markdown task files** using MongoDB Atlas and Voyage AI.
## Key Findings
**Current Architecture at Key Files:**
- MCP server: `src/mcp/server.ts:68-147` (initialization pattern)
- Tool registration: `src/mcp/tools/index.ts:126-346` (switch-case pattern)
- Existing search: FTS3 in SQLite (no true vector embeddings currently)
**Discovered Patterns with Trust Scores:** | Pattern | Trust | Applicability | |βββ|βββ|βββββ| | Binary Graceful Fallback (`PAT:AUTO:Coh7YFt2`) | β β β ββ (0.67) | Make vector search optional | | MCP DB Initialization (`PAT:AUTO:a9f5JGcA`) | β β βββ (0.50) | Init MongoDB in startup | | Zod Validation (`PAT:VALIDATION:ZOD`) | β β βββ (0.50) | Schema validation |
**Key Complexity Identified:** 1. **Greenfield external APIs** - No existing Voyage AI or MongoDB patterns in codebase 2. **FTS3 vs Vector** - Current "semantic" search is actually FTS + scoring, not embeddings 3. **High churn area** - FTS schema management has 8+ fixes in 90 days (proceed carefully) 4. **Credential management** - No existing API key validation patterns
## Questions Requiring Human Judgment
1. **Graceful Degradation Strategy**: Should vector search be optional (feature-flagged) or required when configured?
- Option A: Required - fail startup if MongoDB/Voyage credentials missing
- Option B: Optional - silently fall back to FTS3 search
2. **Indexing Trigger**: How should we trigger re-indexing of task files?
- Option A: File watcher (real-time, adapting existing `PatternWatcher`)
- Option B: On-demand via MCP tool call
- Option C: Both (watcher for new/changed, manual for full re-index)
3. **Embedding Storage**: Where should we cache embeddings to avoid re-computing on restart?
- Option A: MongoDB only (simple, but requires connection)
- Option B: SQLite + MongoDB (local cache, resilient)
Letβs address these before I develop architecture options.
user
A A A
assistant
Got it. Decisions confirmed:
- **Required credentials** - fail startup if MongoDB/Voyage missing
- **File watcher** - real-time indexing via adapted `PatternWatcher`
- **MongoDB only** - embeddings stored in Atlas (no SQLite cache)
## Proposed Architecture Structure
### Core Components:
1. **VoyageClient** (`src/clients/voyage.ts`) - Embedding generation with retry/rate-limiting 2. **MongoVectorStore** (`src/clients/mongodb.ts`) - Singleton connection, $vectorSearch queries 3. **MarkdownChunker** (`src/indexing/markdown-chunker.ts`) - Hybrid XML/heading chunking 4. **TaskIndexer** (`src/indexing/task-indexer.ts`) - File watcher + embedding pipeline 5. **SemanticSearchService** (`src/mcp/tools/semantic-search.ts`) - MCP tool implementation
### Implementation Phases:
1. **Phase 1: Clients** - VoyageClient + MongoVectorStore with connection validation 2. **Phase 2: Chunker** - Hybrid markdown/XML chunking with content hashing 3. **Phase 3: Indexer** - File watcher integration for `.apex/tasks/*.md` 4. **Phase 4: MCP Tools** - `apex_semantic_search` + `apex_find_similar_tasks` tools 5. **Phase 5: Integration** - Wire into MCP server startup, add to tool registry
### Data Flow: ``` Task File Change β Watcher β Chunker β VoyageClient β MongoVectorStore β User Query β MCP Tool β VoyageClient β MongoVectorStore β Results ```
Does this structure align with your vision? Should I adjust before developing the 5 mandatory artifacts?
user
yes
assistant
Now developing the 5 mandatory artifacts.
## Artifact 1: Chain of Thought Analysis
### Current State
| Component | Location | Purpose | |ββββ|βββββ|βββ| | MCP Server | `src/mcp/server.ts:68-147` | Initializes pattern system, runs migrations | | Tool Registry | `src/mcp/tools/index.ts:126-346` | Switch-case tool dispatch | | Pattern Discovery | `src/mcp/tools/discover.ts` | FTS3 + semantic scoring (NOT vector) | | Task Repository | `src/storage/repositories/task-repository.ts` | SQLite task CRUD | | File Watcher | `src/storage/watcher.ts:22-46` | Chokidar-based, 200ms debounce | | Config | `src/config/apex-config.ts` | Static config, env var handling |
**How it got here (git archaeology):**
- `c1044e9` (2025-08-02): Semantic discovery added with FTS + scoring
- `b472a68` (2025-09-06): FTS5βFTS3 migration for WASM compatibility
- `a83d685` (2025-11-01): Overview tool added (39 tests, pagination)
- Recent stability: MCP tools layer is mature, low churn
### Problem Decomposition
**Core Problem:** Enable semantic search over `.apex/tasks/*.md` files using true vector embeddings instead of keyword-based FTS.
**Sub-problems:** 1. External API integration (Voyage AI) - no existing pattern 2. External database integration (MongoDB Atlas) - no existing pattern 3. Markdown chunking with mixed XML/heading formats 4. Real-time indexing via file watcher 5. MCP tool registration following existing patterns 6. Startup validation for required credentials
### Hidden Complexity
| Issue | Source | Impact | |βββ|ββββ|ββββ| | No `.env` auto-loading | `apex-config.ts` | Must document env var setup | | FTS schema churn | 8 fixes in 90 days | Avoid touching SQLite FTS | | Singleton patterns | `database.ts:86` | Must follow for MongoDB | | Rate limiting needed | Voyage API limits | Implement like `llm-extractor.ts:8` | | 37+ markdown files | Varying formats | Robust chunker required |
### Success Criteria
**Automated:**
- `npm test` passes with new semantic search tests
- `npm run type-check` clean
- MCP tool responds to `apex_semantic_search` calls
- Connection validation fails startup if credentials missing
**Manual:**
- Query "authentication patterns" returns relevant task files
- File edit triggers re-indexing within 1 second
- Similarity search returns semantically related tasks
assistant
## Artifact 2: Tree of Thought Solutions
### Solution A: Direct Integration
**Approach:** Straightforward client classes with singleton patterns
**Implementation:** 1. `VoyageClient` class with `embed()` method at `src/clients/voyage.ts` 2. `MongoVectorStore` singleton at `src/clients/mongodb.ts` 3. `TaskIndexer` using file watcher at `src/indexing/task-indexer.ts` 4. `SemanticSearchService` at `src/mcp/tools/semantic-search.ts` 5. Initialize in `server.ts:initializePatternSystem()`
**Patterns Used:**
- `PAT:AUTO:Coh7YFt2` (Graceful Fallback) - β β β ββ
- `PAT:AUTO:a9f5JGcA` (MCP DB Init) - β β βββ
**Pros:**
- Simple, follows existing codebase patterns
- Easy to understand and debug
- Matches `discover.ts` tool structure
- Low abstraction overhead
**Cons:**
- Tight coupling to Voyage/MongoDB
- Harder to swap providers later
- No local fallback if MongoDB down
**Complexity:** 4/10 **Risk:** LOW - follows established patterns
### Solution B: Provider Abstraction Layer
**Approach:** Interface-based abstraction for embedding and vector store
**Implementation:** 1. `EmbeddingProvider` interface + `VoyageProvider` implementation 2. `VectorStore` interface + `MongoVectorStore` implementation 3. `VectorSearchService` depends on interfaces, not implementations 4. Factory pattern for provider instantiation 5. Config-driven provider selection
**Patterns Used:**
- `PAT:VALIDATION:ZOD` for provider config schemas
- Dependency injection pattern (new to codebase)
**Pros:**
- Swappable providers (OpenAI, local models later)
- Testable with mock implementations
- Future-proof architecture
**Cons:**
- Over-engineering for current scope (only one provider)
- More files, more complexity
- No existing DI pattern in codebase
- Abstractions may not match real provider differences
**Complexity:** 7/10 **Risk:** MEDIUM - introduces new patterns
### Solution C: Event-Driven Pipeline
**Approach:** Decoupled components communicating via EventEmitter
**Implementation:** 1. `IndexingPipeline` as EventEmitter 2. Events: `file:changed`, `chunk:created`, `embedding:ready`, `indexed` 3. Separate handlers for each stage 4. Queue-based processing with backpressure 5. Persistent queue in SQLite for crash recovery
**Patterns Used:**
- EventEmitter pattern (exists in Node.js core)
- Queue pattern from `batch-processor.ts`
**Pros:**
- Highly decoupled components
- Easy to add monitoring/logging at each stage
- Natural backpressure handling
- Crash recovery with persistent queue
**Cons:**
- Significant over-engineering
- Hard to trace execution flow
- Event ordering complexity
- Not aligned with existing sync patterns
**Complexity:** 8/10 **Risk:** HIGH - major architectural divergence
### Comparative Analysis
| Criterion | Solution A | Solution B | Solution C | |ββββ|ββββ|ββββ|ββββ| | Complexity | 4/10 | 7/10 | 8/10 | | Codebase Fit | Excellent | Moderate | Poor | | Future Flexibility | Low | High | High | | Implementation Time | ~17hrs | ~25hrs | ~35hrs | | Risk | LOW | MEDIUM | HIGH |
**Winner: Solution A (Direct Integration)**
**Reasoning:**
- Aligns with existing patterns (`discover.ts`, `task.ts`)
- Minimal abstraction - YAGNI principle
- Fastest to implement and validate
- If provider swap needed later, refactor then
**Runner-up: Solution B**
**Why not:** Premature abstraction. We have one embedding provider (Voyage) and one vector store (MongoDB). Building interfaces for hypothetical future providers violates YAGNI. If we need to swap providers, the refactor is straightforward.
assistant
## Artifact 3: Chain of Draft Evolution
### Draft 1 (Raw Instinct)
**Core Design:** Add MongoDB and Voyage clients, create MCP tool, done.
``` VoyageClient.embed(text) β MongoClient.insert(embedding) MongoClient.search(query) β results ```
**Identified Issues:**
- Where does initialization happen?
- How do we handle credentials missing?
- What about file watching?
- No chunking strategy defined
- No error handling for API failures
### Draft 2 (Pattern-Guided Refinement)
**Core Design:** Follow `PAT:AUTO:a9f5JGcA` (MCP DB Init) pattern - initialize in `initializePatternSystem()`. Add singleton pattern for MongoDB. Wire file watcher.
```typescript // server.ts:initializePatternSystem() const voyageClient = new VoyageClient(process.env.VOYAGE_API_KEY); const mongoStore = await MongoVectorStore.getInstance(process.env.MONGODB_URI); const taskIndexer = new TaskIndexer(voyageClient, mongoStore); await taskIndexer.startWatcher(β.apex/tasksβ); await initializeTools(repository, sharedDb, { voyageClient, mongoStore }); ```
**Improvements:**
- Clear initialization location
- Singleton for connection reuse
- Watcher integrated at startup
**Remaining Issues:**
- What if credentials missing? Silent fail or crash?
- No validation before starting watcher
- Chunking logic still undefined
- No content hashing to skip unchanged files
### Draft 3 (Production-Ready)
**Core Design:** Validate credentials at startup (fail fast). Initialize clients with retry. Create singleton stores. Start watcher only after successful connection. Chunk using hybrid XML/heading strategy with content hashing.
```typescript // server.ts:initializePatternSystem() // 1. Validate required credentials (fail fast) validateRequiredEnv([βVOYAGE_API_KEYβ, βMONGODB_URIβ]);
// 2. Initialize clients with connection test const voyageClient = new VoyageClient({ apiKey: process.env.VOYAGE_API_KEY, model: βvoyage-3.5-liteβ, dimensions: 1024, rateLimitMs: 100, }); await voyageClient.testConnection();
const mongoStore = await MongoVectorStore.create({ uri: process.env.MONGODB_URI, database: βapexβ, collection: βtask_chunksβ, indexName: βtask_vector_indexβ, });
// 3. Initialize chunker with hybrid strategy const chunker = new MarkdownChunker({ strategy: βhybridβ, // XML sections > markdown headings maxChunkSize: 2000, includeWholeFile: true, });
// 4. Start indexer with content hashing const taskIndexer = new TaskIndexer({ voyageClient, mongoStore, chunker, watchPath: β.apex/tasksβ, debounceMs: 500, useContentHash: true, }); await taskIndexer.startWatcher();
// 5. Pass to tools initialization await initializeTools(repository, sharedDb, { mongoStore, voyageClient }); ```
**Why This Evolved:**
- Draft 1 was too simple - no error handling, no lifecycle
- Draft 2 added structure but lacked validation
- Draft 3 applies Graceful Fallback pattern properly:
- Validate upfront (fail fast with clear error)
- Test connections before proceeding
- Content hashing prevents redundant embeddings
- Debounce prevents API spam
**Patterns Integrated:**
- `PAT:AUTO:Coh7YFt2` β Connection testing, validation
- `PAT:AUTO:a9f5JGcA` β Initialization in `initializePatternSystem()`
- Rate limiting from `llm-extractor.ts:8`
- Debounce from `watcher.ts:13`
assistant
## Artifact 4: YAGNI Declaration
### Explicitly Excluding
| Feature | Why Not | Cost If Included | Defer Until | |βββ|βββ|ββββββ|βββββ| | **Provider Abstraction Layer** | Only one embedding provider (Voyage) | +8hrs, 4 extra files | Need to swap providers | | **SQLite Embedding Cache** | User chose MongoDB-only storage | +6hrs, schema migration | MongoDB connectivity issues | | **Multiple Vector Indexes** | One index sufficient for task files | +4hrs, complexity | Performance issues at scale | | **Streaming Chunker** | Task files are small (<100KB) | +5hrs | Files exceed memory | | **Background Queue Persistence** | File watcher handles recovery | +6hrs, SQLite schema | Crash recovery needed | | **Admin UI for Vector Status** | CLI/MCP sufficient | +12hrs | User request | | **Automatic Index Creation** | Manual Atlas setup acceptable | +3hrs, complexity | Deployment automation |
### Preventing Scope Creep
| Temptation | Why Resisting | |ββββ|βββββ| | "Add OpenAI embeddings option" | Voyage is sufficient; abstraction is premature | | "Cache embeddings locally for offline" | Adds complexity; user explicitly chose MongoDB-only | | "Add vector search to existing discover tool" | Keep tools separate; donβt modify working FTS code | | "Real-time similarity updates in UI" | No UI exists; MCP tools sufficient | | "Support all markdown files, not just tasks" | Scope is task files; expand later if needed |
### Future Considerations
| Enhancement | When Makes Sense | |βββββ|ββββββ| | Provider abstraction | When adding second embedding provider | | Local cache | When MongoDB latency becomes a problem | | Batch re-indexing tool | When task count exceeds 1000 | | Similarity-based task suggestions | After basic search validated |
### Complexity Budget
| Aspect | Allocated | Used | Reserved | |ββββ|ββββ|ββ|βββββ| | **Total** | 6/10 | 4/10 | 2/10 | | New files | 6 | 6 | - | | External dependencies | 2 | 2 (mongodb, voyageai) | - | | MCP tools | 2 | 2 | - | | Config options | 3 | 2 | 1 |
assistant
## Artifact 5: Pattern Selection Rationale
### Applying
| Pattern ID | Trust | Usage | Why This Pattern | Where Applying | Source | |ββββ|βββ|βββ|ββββββ|ββββββββ|ββββ| | `PAT:AUTO:Coh7YFt2` (Graceful Fallback) | β β β ββ (0.67) | 1 use, 100% success | Make vector search fail gracefully during init; test connections before proceeding | `server.ts:initializePatternSystem()` | ctx.patterns | | `PAT:AUTO:a9f5JGcA` (MCP DB Init) | β β βββ (0.50) | 0 uses | Initialize MongoDB client alongside PatternRepository in existing init flow | `server.ts:68-147` | ctx.patterns | | `PAT:VALIDATION:ZOD` (Zod schemas) | β β βββ (0.50) | 0 uses | Validate MCP tool inputs using Zod; generate JSON Schema | `src/schemas/search/types.ts` | ctx.patterns | | `PAT:MCP:SERVICE` (Service class) | β β βββ (0.50) | 0 uses | Follow TaskService pattern for SemanticSearchService | `src/mcp/tools/semantic-search.ts` | ctx.patterns |
### Codebase Patterns (from ctx.impl)
| Pattern | Location | Application | |βββ|βββββ|βββββ| | Singleton + getInstance | `database.ts:86` | Use for `MongoVectorStore.getInstance()` | | Rate limiting | `llm-extractor.ts:8-9` | Apply to `VoyageClient.embed()` | | Debounced watcher | `watcher.ts:13,67-88` | Adapt for `TaskIndexer` | | Retry with backoff | `database-utils.ts:23-75` | Use for MongoDB/Voyage API calls | | Zod schema generation | `schemas/generator.ts` | Generate tool input schemas |
### Considering But Not Using
| Pattern | Why Not | |βββ|βββ| | `ANTI:AUTO:09FZrpo0` (Complex Fallback) | Weβre doing simple fail-fast, not fallback chains | | Event-driven patterns | Over-engineering; direct calls simpler | | Worker threads | Task files too small to warrant parallelism |
### Missing Patterns (Gaps)
| Need | Workaround | |ββ|ββββ| | External HTTP API client | Use native `fetch` with retry wrapper | | MongoDB integration | Follow standard Node.js driver patterns | | Markdown/XML chunking | First-principles implementation | | Vector embedding | Use Voyage SDK directly |
assistant
## Architecture Review Checkpoint
Iβve completed the 5 mandatory artifacts. Hereβs the selected architecture:
**Chosen Solution**: A (Direct Integration)
**Key Patterns**: 1. `PAT:AUTO:Coh7YFt2` - Graceful Fallback (β β β ββ) 2. `PAT:AUTO:a9f5JGcA` - MCP DB Init (β β βββ) 3. `PAT:MCP:SERVICE` - Service class pattern (β β βββ)
**Excluded Scope**: 1. Provider abstraction layer 2. SQLite embedding cache 3. Multiple vector indexes
**Complexity**: 4/10 **Risk Level**: LOW
**Implementation will**: 1. Add `VoyageClient` + `MongoVectorStore` clients with connection validation 2. Create hybrid `MarkdownChunker` for XML/heading chunking 3. Wire `TaskIndexer` with file watcher for `.apex/tasks/*.md` 4. Register `apex_semantic_search` + `apex_find_similar_tasks` MCP tools 5. Fail startup if `VOYAGE_API_KEY` or `MONGODB_URI` missing
**Implementation will NOT**:
- Add provider abstraction (YAGNI)
- Cache embeddings in SQLite
- Modify existing FTS/discover tools
- Add admin UI
**Files to create** (6):
- `src/clients/voyage.ts`
- `src/clients/mongodb.ts`
- `src/indexing/markdown-chunker.ts`
- `src/indexing/task-indexer.ts`
- `src/mcp/tools/semantic-search.ts`
- `src/schemas/search/types.ts`
**Files to modify** (3):
- `src/mcp/server.ts` (add vector init)
- `src/mcp/tools/index.ts` (register tools)
- `package.json` (add dependencies)
Should I proceed with the detailed architecture decision record and write the plan to the task file?
user
update the doc
assistant
**Plan phase complete.** Task file updated at `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`
## Summary
| Aspect | Value | |ββββ|βββ| | **Architecture** | Solution A: Direct Integration | | **Complexity** | 4/10 | | **Risk** | LOW | | **Files to create** | 6 | | **Files to modify** | 3 |
**Key decisions:**
- Credentials required (fail startup if missing)
- File watcher for real-time indexing
- MongoDB-only storage (no SQLite cache)
- 4 patterns applied (Graceful Fallback, MCP DB Init, Zod, MCP Service)
**Next step:** Run `/apex:implement vector-search-md-mongodb` to begin implementation.
The Plan
Hereβs what they actually produced:
Plan outputs
β id: _d3NcldtrLAotoiCziqQE identifier: vector-search-md-mongodb title: "Vector Search over .md Files with MongoDB Atlas" created: 2026-01-02 updated: 2026-01-02 phase: plan status: active confidence: 0.75 tags: [vector-search, mongodb-atlas, markdown, semantic-search, embeddings] decisions: - embedding_provider: voyage-ai - embedding_model: voyage-3.5-lite - scope: task-files-only - config: .env (MONGODB_URI, VOYAGE_API_KEY) - chunking: hybrid (XML sections OR markdown headings) + whole-file embedding - architecture: direct-integration (Solution A) - credentials: required (fail startup if missing) - indexing: file-watcher (real-time) - storage: mongodb-only (no SQLite cache) β # Vector Search over .md Files with MongoDB Atlas ## Executive Summary This research investigates implementing vector search over markdown files generated during APEX workflow, using MongoDB Atlas vector search. The approach is **fully viable** with straightforward integration paths. **Key finding**: APEXβs current architecture writes task files to `.apex/tasks/[ID].md` during workflow phases. These can be embedded as whole documents and indexed in MongoDB Atlas for semantic retrieval. ## Decisions Made - **Embedding Provider**: Voyage AI (`voyage-3.5-lite`) - best retrieval quality at $0.02/1M tokens - **Scope**: Task files only (`.apex/tasks/*.md`) - **Config**: Credentials in `.env` (`MONGODB_URI`, `VOYAGE_API_KEY`) - **Chunking**: Hybrid + whole-file - Chunks: XML sections (`<research>`, `<plan>`, etc.) OR markdown headings (`##`) - Plus: whole-file embedding (`section: "_whole"`) for task similarity β ## Research Findings ### 1. Current APEX .md File Architecture **Where .md files are written:** - **Task files**: `./.apex/tasks/[IDENTIFIER].md` (primary workflow output) - **Skills**: `/skills/*/SKILL.md` (static, checked into repo) - **Pattern packs**: `[pack-name]/README.md` (distribution packs) **Task file structure:** ```yaml β id: [nanoid] identifier: [kebab-case-name] phase: [research|plan|implement|ship] status: [active|complete|blocked] β # Title <research> <metadata>...</metadata> <executive-summary>...</executive-summary> </research> <plan> <architecture>...</architecture> </plan> <implementation> <changes>...</changes> </implementation> <ship> <review>...</review> </ship> ``` **Current search**: SQLite FTS5 via `pattern_fts` virtual table. No vector/semantic search. β ### 2. MongoDB Atlas Vector Search **Key capabilities:** - `$vectorSearch` aggregation stage (replaces deprecated `knnBeta`) - Supports 1-8192 dimensions - Three similarity functions: `cosine`, `dotProduct`, `euclidean` - Pre-filtering via indexed filter fields - HNSW algorithm for approximate nearest neighbor **Index configuration:** ```json { "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 1536, "similarity": "cosine" }, { "type": "filter", "path": "phase" }, { "type": "filter", "path": "taskType" } ] } ``` **Node.js query:** ```javascript const pipeline = [ { $vectorSearch: { index: βapex_docs_vectorβ, path: βembeddingβ, queryVector: queryEmbedding, numCandidates: 200, limit: 10, filter: { phase: βimplementβ } } }, { $project: { title: 1, content: 1, taskId: 1, score: { $meta: βvectorSearchScoreβ } } } ]; ``` **Pricing:** - Free tier (M0): 512MB storage, 3 indexes - Flex: $8/month, 5GB storage, 10 indexes - Dedicated: $57+/month for production β ### 3. Embedding: Voyage AI **Selected model**: `voyage-3.5-lite` - **Dimensions**: 1024 (configurable: 2048/1024/512/256) - **Cost**: $0.02/1M tokens - **Context**: 32K tokens (handles large task files) - **Quality**: Outperforms OpenAI text-embedding-3-large on retrieval benchmarks **Why Voyage AI**: - Best-in-class retrieval quality at competitive price - 32K context window handles entire task files - Anthropicβs recommended embedding partner - Flexible dimensions via Matryoshka learning β ## Proposed Architecture ``` βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β APEX Workflow β β /research β /plan β /implement β /ship β βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β writes βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β .apex/tasks/[ID].md β β (Source of truth - whole file is the document) β βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β watch/hook βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Indexing Pipeline (new) β β 1. Read file, extract YAML frontmatter β β 2. Detect format: XML sections or markdown headings β β 3. Chunk by detected format β β 4. Generate embedding per chunk via Voyage AI β β 5. Upsert chunks to MongoDB Atlas β βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β store βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β MongoDB Atlas β β Collection: apex_task_chunks β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β { ββ β β _id: ObjectId, ββ β β taskId: string, ββ β β section: string (e.g. "research", "## Overview"), ββ β β content: string (chunk text), ββ β β phase: string, ββ β β tags: string[], ββ β β embedding: Binary (1024 dims), ββ β β updatedAt: Date ββ β β } ββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Vector Index: apex_chunks_vector β βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β query βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β New MCP Tool: apex_semantic_search β β - Accepts natural language query β β - Generates query embedding via Voyage AI β β - Returns relevant task files with similarity scores β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ``` β ## Implementation Approach ### Component 1: Voyage AI Client ```typescript // src/vector-search/voyage-client.ts import { VoyageAIClient } from βvoyageaiβ; export class VoyageEmbeddings { private client: VoyageAIClient; constructor(apiKey: string) { this.client = new VoyageAIClient({ apiKey }); } async embed(text: string): Promise<number[]> { const response = await this.client.embed({ input: text, model: βvoyage-3.5-liteβ, outputDimension: 1024 }); return response.data[0].embedding; } } ``` ### Component 2: Hybrid Chunker ```typescript // src/vector-search/chunker.ts interface Chunk { section: string; // "research" | "plan" | "## Heading" content: string; } function chunkDocument(content: string): Chunk[] { // Detect XML sections const xmlSections = [βresearchβ, βplanβ, βimplementationβ, βshipβ]; const hasXml = xmlSections.some(s => content.includes(`<${s}>`)); if (hasXml) { return chunkByXmlSections(content, xmlSections); } else { return chunkByMarkdownHeadings(content); } } function chunkByXmlSections(content: string, sections: string[]): Chunk[] { const chunks: Chunk[] = []; for (const section of sections) { const regex = new RegExp(`<${section}>([\s\S]*?)</${section}>`, βiβ); const match = content.match(regex); if (match) { chunks.push({ section, content: match[1].trim() }); } } return chunks; } function chunkByMarkdownHeadings(content: string): Chunk[] { const lines = content.split(β\nβ); const chunks: Chunk[] = []; let currentSection = βintroβ; let currentContent: string[] = []; for (const line of lines) { if (line.startsWith(β## β)) { if (currentContent.length) { chunks.push({ section: currentSection, content: currentContent.join(β\nβ) }); } currentSection = line; currentContent = []; } else { currentContent.push(line); } } if (currentContent.length) { chunks.push({ section: currentSection, content: currentContent.join(β\nβ) }); } return chunks; } ``` ### Component 3: Task Indexer ```typescript // src/vector-search/task-indexer.ts interface ChunkDocument { taskId: string; section: string; content: string; phase: string; tags: string[]; embedding: Binary; updatedAt: Date; } class TaskIndexer { async indexTaskFile(filePath: string): Promise<void> { const content = await fs.readFile(filePath, βutf-8β); const frontmatter = this.extractFrontmatter(content); const chunks = chunkDocument(content); // Delete old chunks for this task await this.collection.deleteMany({ taskId: frontmatter.id }); // Insert whole-file embedding for task similarity const wholeEmbedding = await this.voyage.embed(content); await this.collection.insertOne({ taskId: frontmatter.id, section: β_wholeβ, content: content, phase: frontmatter.phase, tags: frontmatter.tags || [], embedding: Binary.fromFloat32Array(Float32Array.from(wholeEmbedding)), updatedAt: new Date() }); // Insert chunk embeddings for precise search for (const chunk of chunks) { const embedding = await this.voyage.embed(chunk.content); await this.collection.insertOne({ taskId: frontmatter.id, section: chunk.section, content: chunk.content, phase: frontmatter.phase, tags: frontmatter.tags || [], embedding: Binary.fromFloat32Array(Float32Array.from(embedding)), updatedAt: new Date() }); } } } ``` ### Component 4: MCP Tools ```typescript // src/mcp/tools/semantic-search.ts // Tool 1: Search chunks for specific info { name: βapex_semantic_searchβ, description: βSearch task file sections for specific informationβ, parameters: { query: { type: βstringβ, description: βNatural language queryβ }, filter: { phase: { type: βstringβ, enum: [βresearchβ, βplanβ, βimplementβ, βshipβ] }, section: { type: βstringβ, description: βLimit to specific sectionβ }, tags: { type: βarrayβ, items: { type: βstringβ } } }, limit: { type: βnumberβ, default: 10, maximum: 50 } } // Searches chunks where section != β_wholeβ } // Tool 2: Find similar tasks (holistic) { name: βapex_find_similar_tasksβ, description: βFind tasks similar to a query or another taskβ, parameters: { query: { type: βstringβ, description: βNatural language description or task IDβ }, limit: { type: βnumberβ, default: 5, maximum: 20 } } // Searches only section == β_wholeβ } ``` ### Component 5: Environment Variables ```bash # .env MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/apex VOYAGE_API_KEY=pa-xxxxxxxxxxxxxxxx ``` ```typescript // src/vector-search/config.ts export const vectorSearchConfig = { mongodbUri: process.env.MONGODB_URI, voyageApiKey: process.env.VOYAGE_API_KEY, database: βapexβ, collection: βtasksβ }; ``/Users/ben/dev/apex/.apex/tasks/vector-search-md.md` β ## Technical Decisions (Finalized) | Decision | Choice | Rationale | |βββββ|ββββ|ββββ| | Embedding provider | Voyage AI | Best retrieval quality, Anthropic partner | | Embedding model | voyage-3.5-lite | 32K context, $0.02/1M tokens | | Embedding dimensions | 1024 | Balance of quality/storage | | Chunking | Hybrid + whole | Chunks + whole-file embedding for similarity | | Config storage | .env | Simple, standard approach | | MongoDB driver |