Claude vs. Codex in the Messy Middle

I’m home alone after New Years. What do I decide to do? Force my two favorite AI "friends" to go head-to-head.

I expected to find a winner. Instead, I found something more interesting: using both models together was more effective than using either individually.

I’ve seen a lot of discourse about Claude vs Codex - so as a preface, this post will not be another "WOW Codex is soooooo much better than Claude ??!?!!!?" or "Opus 4.5 is crazy!! are we gonna have jobs in 1 year????".

It’s about what happens when you stop looking for the best tool and start treating them as collaborators instead.

These models went head to head not on benchmarks, or a toy task. This was on "real" non-trivial work: adding vector search to my AI dev tooling.

The rules: SOTA (state of …

I’m home alone after New Years. What do I decide to do? Force my two favorite AI "friends" to go head-to-head.

I expected to find a winner. Instead, I found something more interesting: using both models together was more effective than using either individually.

It’s about what happens when you stop looking for the best tool and start treating them as collaborators instead.

These models went head to head not on benchmarks, or a toy task. This was on "real" non-trivial work: adding vector search to my AI dev tooling.

The rules: SOTA (state of the art) models, same starting prompt, parallel terminals.

The tools: Anthropic $100/m subscription, ChatGPT Plus ($20 $0/m for this month thanks Sam! )

Disclaimer: I am a MongoDB employee and decided to use MongoDB products (Atlas, vector search) as the Vector DB in this experiment for a few reasons:

It’s free! (for me :))
We mentioned AI 82 times in our last earnings call and the stock went up 20% afterwards (not joking btw)
We spent $220M acquiring Voyage AI and I still haven’t used it
The company’s last acquisition was Realm for $39M, and was directly related to the last team I worked on. Our product is now deprecated... (I’m assuming so we can throw more money in the AI pit)

What is "The Messy Middle"?

I have a few problems with both benchmarks and user-driven evals:

Benchmarks DO NOT reflect real usages of vibe coding - they are so sandboxed that it’s I view it as more a test of raw model intelligence than effective it is in practice
Vibe coders usually have AT LEAST a plan -> execute workflow, knowing that planning the change out dramatically increases performance on complex tasks
They also work in large codebases and deal with changing requirements / undefined problems, none of which is captured in a benchmark test
Most people testing these tools don’t know WTF they’re doing
"Make Minecraft from scratch!" which again is not reflective of real engineering using these tools
Done by people with little experience vibe coding, or unwilling to pay for SOTA models
"Codex is so much better than Claude (but I used Haiku 4.5 and Codex 5.2 with max thinking)"

I also don’t know WTF I’m doing (and I suspect very few in this space actually do), but at least I’ve been experimenting for ~1y with a bunch of various AI coding tools.

"Real" vibe coding, the type done by most software engineers, is existing code, existing decisions, imperfect context, and changing goalposts.

I’m also convinced that every vibe coder has a different workflow adapted to their needs. Testing how models adapt to unique workflows is extremely valuable. Benchmarks only capture vanilla case.

These agents are meant to be tools and should behave as such, maintaining high performance across a variety of environments and tasks.

What This Is (and Isn’t)

What I can conclude: Running a second model on your plan catches errors you’d otherwise miss. The cognitive diversity matters more than which model is "better."

What I can’t conclude: That Codex is objectively superior. I’ve used Claude daily since April; I’ve used Codex for three days. The novelty effect is real. My prompts are probably Claude-shaped. My workflow (APEX) was built around Claude’s patterns. (Given the simplicity of Codex, I think I have the hang of it, but there’s still certainly a Claude bias.)

This is also just one night of usage. Models & harness are changing by the day - so view this as a snapshot in time of model performance instead of a definitive answer.

This is n=1. But it’s a documented n=1 with receipts, which is more than most AI comparisons offer.

The Task

Some background on my workflow:

APEX is my personal AI dev workflow that exposes
research -> plan -> implement -> ship phases
an experiment with mcp for longer term memory storage

This post explores the research + plan phases, along with a review of the plan. Why not implement? Because IMO having a good plan is 80% of the work in vibe coding.

The task is to add high quality semantic search over my .apex/tasks markdown files to allow the agent to search across task memory.

Research Phase

This phase is meant to gather background research on the task from the codebase, git history, web, etc.

The high level goal of this step is to understand what we’re doing, why, and find all relevant code.

Both models got the same prompt. Watch for:

The Voyage moment - Claude recommended Voyage because they’re an "Anthropic partner" I laughed out loud. Claude citing its creator’s business partnerships as a technical justification is either endearing or concerning, not sure which at this point.
This is especially funny given the flak that OpenAI have been getting for planning to introduce ads - turns out Anthropic may have beat them to it!
Questions - Claude clarifies architecture decisions with me in this phase.
Cross-pollination - What happens when I share Codex’s take with Claude.

ThinkingTool calls

user

do research into what it would take to make apex vector search over .md files. assume 1. during apex workflow, everything is written to .md file 2. i have a mongodb atlas cluster with vector search i can easily use

assistant

## Research Complete

I’ve completed the research phase for implementing vector search over .md files with MongoDB Atlas. Here’s the summary:

### Key Findings

**1. APEX Architecture is Ready**

Task files are already written to `.apex/tasks/[ID].md` during workflow
YAML frontmatter + markdown structure is ideal for chunking
Current SQLite FTS can coexist with vector search

**2. MongoDB Atlas Vector Search**

`$vectorSearch` aggregation is straightforward to implement
Supports up to 8192 dimensions (all embedding models fit)
Free tier gives you 512MB and 3 indexes to start
Pre-filtering by phase, task type, tags is built-in

**4. Chunking Strategy**

Use `MarkdownHeaderTextSplitter` to preserve heading structure
400-512 tokens per chunk with 10% overlap
Store heading hierarchy as metadata for context

### Proposed Components

1. **Indexing Pipeline** - Watches `.apex/tasks/*.md`, chunks, embeds, upserts to MongoDB 2. **MCP Tool**: `apex_semantic_search` - Natural language queries over all APEX documents 3. **Embedding Abstraction** - Pluggable provider (OpenAI/Voyage/Ollama)

### Effort Estimate ~25-35 hours total implementation work

### Open Questions for You

1. Which embedding provider do you want to start with? 2. Should we index only task files, or also patterns and skills? 3. Environment variable for MongoDB connection string, or APEX config?

The full research brief is saved at `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`. Ready for `/apex:plan` when you want to proceed.

user

use voyage ai index only task files use apex config

also, i want to embed the WHOLE md - no XML parsing

assistant

## Research Complete

Updated the task brief with your decisions:

| Decision | Choice | |–––––|––––| | **Embeddings** | Voyage AI `voyage-3.5-lite` (1024 dims, 32K context) | | **Scope** | Task files only (`.apex/tasks/*.md`) | | **Config** | APEX config (not env vars) | | **Approach** | Embed whole markdown as plain text - no XML parsing, no chunking |

**Simplified architecture**:

One embedding per task file
`voyage-3.5-lite` handles up to 32K tokens (plenty for task files)
~15-20 hours estimated effort (down from 25-35 with chunking)

**Task brief saved at**: `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`

Ready to run `/apex:plan` when you want to proceed to implementation planning.

user

actually let’s use .env for the creds to keep it simple

assistant

Updated. Credentials now via `.env`:

```bash MONGODB_URI=mongodb+srv://... VOYAGE_API_KEY=pa-... ```

Ready for `/apex:plan` when you want to proceed.

user

do you think we should chunk by something? what’s the current strategy

assistant

Good question. Let me think through the trade-offs:

**Whole file (current):**

Simpler, fewer API calls
Works well for "find similar tasks"
Risk: Long files become diluted embeddings - specific content gets lost in the average

**Chunk by XML section (`<research>`, `<plan>`, `<implementation>`, `<ship>`):**

Better retrieval for "what was the architecture for X" → hits `<plan>` directly
Natural boundaries already exist in task files
But you said no XML parsing...

**Chunk by heading (## Section):**

Also natural boundaries
More granular than XML sections
But maybe too granular?

**Hybrid - whole file + sections:**

Store both: one embedding for whole doc, one per major section
Best retrieval but 2-5x the embeddings/cost

**My take**: For task files that are typically <5K tokens, whole file is probably fine. The 32K context of voyage-3.5-lite won’t struggle.

But if you anticipate:

Very long task files (10K+ tokens)
Queries like "find the implementation details for X" (section-specific)

Then chunking by the 4 workflow phases (`<research>`, `<plan>`, `<implementation>`, `<ship>`) would give better precision.

What’s your expected use case - finding similar tasks overall, or drilling into specific phases?

user

it will either be XML OR markdown with frontmatter. let’s use hybrid to detect which and chunk by that

assistant

Got it - detect format and chunk accordingly:

1. **XML format** → chunk by `<research>`, `<plan>`, `<implementation>`, `<ship>` sections 2. **Markdown format** → chunk by `##` headings

assistant

Updated. Chunking strategy:

**Document schema**: ```javascript { taskId: string, section: "research" | "plan" | "## Overview" | ..., content: string, embedding: Binary, ... } ```

Task brief updated at `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`. Ready for `/apex:plan`.

user

compare and constrast this against chunking the whole document

assistant

**Key question**: What’s the primary use case?

**"Find tasks similar to this one"** → Whole document wins
**"What did we decide about X"** → Chunking wins
**Both** → Could do hybrid: whole doc embedding + chunk embeddings (but 2-7x cost)

My take: If task files are typically <5K tokens, **whole document** is probably fine and simpler. Chunking adds complexity for marginal retrieval gain on small docs.

Want to stick with hybrid chunking, or simplify back to whole document?

user

codex disagrees

Hybrid (XML/Markdown‑aware) vs whole‑document embedding:

Hybrid chunking: better recall/precision (sections are semantically focused), returns pinpointed results with section metadata, supports partial re‑indexing on small changes, but costs more (more embeddings), and adds chunking complexity.
Whole‑document embedding: simplest and cheapest to build (one vector per file), but recall is weaker (mixed topics dilute similarity), results are less specific, any small change forces full re‑embed, and long task files can exceed model limits or truncate important content.

Recommendation: stick with hybrid chunking. If you want a compromise, we can also store a single file‑level embedding for coarse routing and still return chunk hits for precision.

assistant

Codex makes good points I understated:

1. **Partial re-indexing** - Only re-embed changed sections (though our current impl deletes all chunks anyway) 2. **Truncation risk** - Long files could exceed limits; chunks avoid this 3. **Semantic focus** - Sections are topically coherent; whole doc is mixed signals

**The hybrid + whole-file idea is interesting:**

```javascript // Per task file, store: { // Coarse: "find similar tasks" { taskId, section: "_whole", embedding: [...] }

// Fine: "find specific info about X" { taskId, section: "research", embedding: [...] } { taskId, section: "plan", embedding: [...] } ... } ```

Query routing:

`apex_semantic_search` → search chunks (precise)
`apex_find_similar_tasks` → search `_whole` embeddings (holistic)

Cost: ~5-7 embeddings per file instead of 4-6. Marginal increase for best of both.

Want to add the whole-file embedding alongside chunks?

user

assistant

Done. Final strategy:

~5-7 embeddings per task file. Ready for `/apex:plan`.

Planning Phase

This phase is meant to use the research to create a concrete plan. Consider alternatives, and propose the specific pieces of code to be changed.

Same task, same constraints. Watch how differently they approach it:

Question style - Claude asks binary A/B/C questions; Codex asks open-ended questions and offers to explain tradeoffs when I said "I don’t have enough information".
Artifact format - Claude produces markdown tables and structured sections. Codex produces XML-based architecture documents. The markdown is more readable, but the XML is technically what I asked for in my skill instructions.

ThinkingTool calls

user

.apex/tasks/_d3NcldtrLAotoiCziqQE.md

assistant

Now I understand the APEX plan phase. I need to: 1. Verify the task is in "research" phase (it shows `phase: research` in frontmatter) 2. Spawn verification agents to extend research 3. Work through the 5 mandatory artifacts interactively 4. Get user confirmation at checkpoints

Let me set up the todo list and begin the architecture process.

assistant

The MCP server is in TypeScript. Let me read the key files to understand the architecture.

assistant

Based on research and analysis, I understand we need to implement **vector search over markdown task files** using MongoDB Atlas and Voyage AI.

## Key Findings

**Current Architecture at Key Files:**

MCP server: `src/mcp/server.ts:68-147` (initialization pattern)
Tool registration: `src/mcp/tools/index.ts:126-346` (switch-case pattern)
Existing search: FTS3 in SQLite (no true vector embeddings currently)

**Key Complexity Identified:** 1. **Greenfield external APIs** - No existing Voyage AI or MongoDB patterns in codebase 2. **FTS3 vs Vector** - Current "semantic" search is actually FTS + scoring, not embeddings 3. **High churn area** - FTS schema management has 8+ fixes in 90 days (proceed carefully) 4. **Credential management** - No existing API key validation patterns

## Questions Requiring Human Judgment

1. **Graceful Degradation Strategy**: Should vector search be optional (feature-flagged) or required when configured?

Option A: Required - fail startup if MongoDB/Voyage credentials missing
Option B: Optional - silently fall back to FTS3 search

2. **Indexing Trigger**: How should we trigger re-indexing of task files?

Option A: File watcher (real-time, adapting existing `PatternWatcher`)
Option B: On-demand via MCP tool call
Option C: Both (watcher for new/changed, manual for full re-index)

3. **Embedding Storage**: Where should we cache embeddings to avoid re-computing on restart?

Option A: MongoDB only (simple, but requires connection)
Option B: SQLite + MongoDB (local cache, resilient)

Let’s address these before I develop architecture options.

user

A A A

assistant

Got it. Decisions confirmed:

**Required credentials** - fail startup if MongoDB/Voyage missing
**File watcher** - real-time indexing via adapted `PatternWatcher`
**MongoDB only** - embeddings stored in Atlas (no SQLite cache)

## Proposed Architecture Structure

### Core Components:

1. **VoyageClient** (`src/clients/voyage.ts`) - Embedding generation with retry/rate-limiting 2. **MongoVectorStore** (`src/clients/mongodb.ts`) - Singleton connection, $vectorSearch queries 3. **MarkdownChunker** (`src/indexing/markdown-chunker.ts`) - Hybrid XML/heading chunking 4. **TaskIndexer** (`src/indexing/task-indexer.ts`) - File watcher + embedding pipeline 5. **SemanticSearchService** (`src/mcp/tools/semantic-search.ts`) - MCP tool implementation

### Implementation Phases:

1. **Phase 1: Clients** - VoyageClient + MongoVectorStore with connection validation 2. **Phase 2: Chunker** - Hybrid markdown/XML chunking with content hashing 3. **Phase 3: Indexer** - File watcher integration for `.apex/tasks/*.md` 4. **Phase 4: MCP Tools** - `apex_semantic_search` + `apex_find_similar_tasks` tools 5. **Phase 5: Integration** - Wire into MCP server startup, add to tool registry

### Data Flow: ``` Task File Change → Watcher → Chunker → VoyageClient → MongoVectorStore ↓ User Query → MCP Tool → VoyageClient → MongoVectorStore → Results ```

Does this structure align with your vision? Should I adjust before developing the 5 mandatory artifacts?

user

yes

assistant

Now developing the 5 mandatory artifacts.

## Artifact 1: Chain of Thought Analysis

### Current State

**How it got here (git archaeology):**

`c1044e9` (2025-08-02): Semantic discovery added with FTS + scoring
`b472a68` (2025-09-06): FTS5→FTS3 migration for WASM compatibility
`a83d685` (2025-11-01): Overview tool added (39 tests, pagination)
Recent stability: MCP tools layer is mature, low churn

### Problem Decomposition

**Core Problem:** Enable semantic search over `.apex/tasks/*.md` files using true vector embeddings instead of keyword-based FTS.

**Sub-problems:** 1. External API integration (Voyage AI) - no existing pattern 2. External database integration (MongoDB Atlas) - no existing pattern 3. Markdown chunking with mixed XML/heading formats 4. Real-time indexing via file watcher 5. MCP tool registration following existing patterns 6. Startup validation for required credentials

### Hidden Complexity

### Success Criteria

**Automated:**

`npm test` passes with new semantic search tests
`npm run type-check` clean
MCP tool responds to `apex_semantic_search` calls
Connection validation fails startup if credentials missing

**Manual:**

Query "authentication patterns" returns relevant task files
File edit triggers re-indexing within 1 second
Similarity search returns semantically related tasks

assistant

## Artifact 2: Tree of Thought Solutions

### Solution A: Direct Integration

**Approach:** Straightforward client classes with singleton patterns

**Implementation:** 1. `VoyageClient` class with `embed()` method at `src/clients/voyage.ts` 2. `MongoVectorStore` singleton at `src/clients/mongodb.ts` 3. `TaskIndexer` using file watcher at `src/indexing/task-indexer.ts` 4. `SemanticSearchService` at `src/mcp/tools/semantic-search.ts` 5. Initialize in `server.ts:initializePatternSystem()`

**Patterns Used:**

`PAT:AUTO:Coh7YFt2` (Graceful Fallback) - ★★★☆☆
`PAT:AUTO:a9f5JGcA` (MCP DB Init) - ★★☆☆☆

**Pros:**

Simple, follows existing codebase patterns
Easy to understand and debug
Matches `discover.ts` tool structure
Low abstraction overhead

**Cons:**

Tight coupling to Voyage/MongoDB
Harder to swap providers later
No local fallback if MongoDB down

**Complexity:** 4/10 **Risk:** LOW - follows established patterns

### Solution B: Provider Abstraction Layer

**Approach:** Interface-based abstraction for embedding and vector store

**Implementation:** 1. `EmbeddingProvider` interface + `VoyageProvider` implementation 2. `VectorStore` interface + `MongoVectorStore` implementation 3. `VectorSearchService` depends on interfaces, not implementations 4. Factory pattern for provider instantiation 5. Config-driven provider selection

**Patterns Used:**

`PAT:VALIDATION:ZOD` for provider config schemas
Dependency injection pattern (new to codebase)

**Pros:**

Swappable providers (OpenAI, local models later)
Testable with mock implementations
Future-proof architecture

**Cons:**

Over-engineering for current scope (only one provider)
More files, more complexity
No existing DI pattern in codebase
Abstractions may not match real provider differences

**Complexity:** 7/10 **Risk:** MEDIUM - introduces new patterns

### Solution C: Event-Driven Pipeline

**Approach:** Decoupled components communicating via EventEmitter

**Implementation:** 1. `IndexingPipeline` as EventEmitter 2. Events: `file:changed`, `chunk:created`, `embedding:ready`, `indexed` 3. Separate handlers for each stage 4. Queue-based processing with backpressure 5. Persistent queue in SQLite for crash recovery

**Patterns Used:**

EventEmitter pattern (exists in Node.js core)
Queue pattern from `batch-processor.ts`

**Pros:**

Highly decoupled components
Easy to add monitoring/logging at each stage
Natural backpressure handling
Crash recovery with persistent queue

**Cons:**

Significant over-engineering
Hard to trace execution flow
Event ordering complexity
Not aligned with existing sync patterns

**Complexity:** 8/10 **Risk:** HIGH - major architectural divergence

### Comparative Analysis

| Criterion | Solution A | Solution B | Solution C | |———–|————|————|————| | Complexity | 4/10 | 7/10 | 8/10 | | Codebase Fit | Excellent | Moderate | Poor | | Future Flexibility | Low | High | High | | Implementation Time | ~17hrs | ~25hrs | ~35hrs | | Risk | LOW | MEDIUM | HIGH |

**Winner: Solution A (Direct Integration)**

**Reasoning:**

Aligns with existing patterns (`discover.ts`, `task.ts`)
Minimal abstraction - YAGNI principle
Fastest to implement and validate
If provider swap needed later, refactor then

**Runner-up: Solution B**

**Why not:** Premature abstraction. We have one embedding provider (Voyage) and one vector store (MongoDB). Building interfaces for hypothetical future providers violates YAGNI. If we need to swap providers, the refactor is straightforward.

assistant

## Artifact 3: Chain of Draft Evolution

### Draft 1 (Raw Instinct)

**Core Design:** Add MongoDB and Voyage clients, create MCP tool, done.

``` VoyageClient.embed(text) → MongoClient.insert(embedding) MongoClient.search(query) → results ```

**Identified Issues:**

Where does initialization happen?
How do we handle credentials missing?
What about file watching?
No chunking strategy defined
No error handling for API failures

### Draft 2 (Pattern-Guided Refinement)

**Core Design:** Follow `PAT:AUTO:a9f5JGcA` (MCP DB Init) pattern - initialize in `initializePatternSystem()`. Add singleton pattern for MongoDB. Wire file watcher.

```typescript // server.ts:initializePatternSystem() const voyageClient = new VoyageClient(process.env.VOYAGE_API_KEY); const mongoStore = await MongoVectorStore.getInstance(process.env.MONGODB_URI); const taskIndexer = new TaskIndexer(voyageClient, mongoStore); await taskIndexer.startWatcher(‘.apex/tasks’); await initializeTools(repository, sharedDb, { voyageClient, mongoStore }); ```

**Improvements:**

Clear initialization location
Singleton for connection reuse
Watcher integrated at startup

**Remaining Issues:**

What if credentials missing? Silent fail or crash?
No validation before starting watcher
Chunking logic still undefined
No content hashing to skip unchanged files

### Draft 3 (Production-Ready)

**Core Design:** Validate credentials at startup (fail fast). Initialize clients with retry. Create singleton stores. Start watcher only after successful connection. Chunk using hybrid XML/heading strategy with content hashing.

```typescript // server.ts:initializePatternSystem() // 1. Validate required credentials (fail fast) validateRequiredEnv([‘VOYAGE_API_KEY’, ‘MONGODB_URI’]);

// 2. Initialize clients with connection test const voyageClient = new VoyageClient({ apiKey: process.env.VOYAGE_API_KEY, model: ‘voyage-3.5-lite’, dimensions: 1024, rateLimitMs: 100, }); await voyageClient.testConnection();

const mongoStore = await MongoVectorStore.create({ uri: process.env.MONGODB_URI, database: ‘apex’, collection: ‘task_chunks’, indexName: ‘task_vector_index’, });

// 3. Initialize chunker with hybrid strategy const chunker = new MarkdownChunker({ strategy: ‘hybrid’, // XML sections > markdown headings maxChunkSize: 2000, includeWholeFile: true, });

// 4. Start indexer with content hashing const taskIndexer = new TaskIndexer({ voyageClient, mongoStore, chunker, watchPath: ‘.apex/tasks’, debounceMs: 500, useContentHash: true, }); await taskIndexer.startWatcher();

// 5. Pass to tools initialization await initializeTools(repository, sharedDb, { mongoStore, voyageClient }); ```

**Why This Evolved:**

Draft 1 was too simple - no error handling, no lifecycle
Draft 2 added structure but lacked validation
Draft 3 applies Graceful Fallback pattern properly:
Validate upfront (fail fast with clear error)
Test connections before proceeding
Content hashing prevents redundant embeddings
Debounce prevents API spam

**Patterns Integrated:**

`PAT:AUTO:Coh7YFt2` → Connection testing, validation
`PAT:AUTO:a9f5JGcA` → Initialization in `initializePatternSystem()`
Rate limiting from `llm-extractor.ts:8`
Debounce from `watcher.ts:13`

assistant

## Artifact 4: YAGNI Declaration

### Explicitly Excluding

### Preventing Scope Creep

### Future Considerations

### Complexity Budget

| Aspect | Allocated | Used | Reserved | |––––|———–|——|–––––| | **Total** | 6/10 | 4/10 | 2/10 | | New files | 6 | 6 | - | | External dependencies | 2 | 2 (mongodb, voyageai) | - | | MCP tools | 2 | 2 | - | | Config options | 3 | 2 | 1 |

assistant

## Artifact 5: Pattern Selection Rationale

### Applying

### Codebase Patterns (from ctx.impl)

### Considering But Not Using

### Missing Patterns (Gaps)

assistant

## Architecture Review Checkpoint

I’ve completed the 5 mandatory artifacts. Here’s the selected architecture:

**Chosen Solution**: A (Direct Integration)

**Key Patterns**: 1. `PAT:AUTO:Coh7YFt2` - Graceful Fallback (★★★☆☆) 2. `PAT:AUTO:a9f5JGcA` - MCP DB Init (★★☆☆☆) 3. `PAT:MCP:SERVICE` - Service class pattern (★★☆☆☆)

**Excluded Scope**: 1. Provider abstraction layer 2. SQLite embedding cache 3. Multiple vector indexes

**Complexity**: 4/10 **Risk Level**: LOW

**Implementation will**: 1. Add `VoyageClient` + `MongoVectorStore` clients with connection validation 2. Create hybrid `MarkdownChunker` for XML/heading chunking 3. Wire `TaskIndexer` with file watcher for `.apex/tasks/*.md` 4. Register `apex_semantic_search` + `apex_find_similar_tasks` MCP tools 5. Fail startup if `VOYAGE_API_KEY` or `MONGODB_URI` missing

**Implementation will NOT**:

Add provider abstraction (YAGNI)
Cache embeddings in SQLite
Modify existing FTS/discover tools
Add admin UI

**Files to create** (6):

`src/clients/voyage.ts`
`src/clients/mongodb.ts`
`src/indexing/markdown-chunker.ts`
`src/indexing/task-indexer.ts`
`src/mcp/tools/semantic-search.ts`
`src/schemas/search/types.ts`

**Files to modify** (3):

`src/mcp/server.ts` (add vector init)
`src/mcp/tools/index.ts` (register tools)
`package.json` (add dependencies)

Should I proceed with the detailed architecture decision record and write the plan to the task file?

user

update the doc

assistant

**Plan phase complete.** Task file updated at `.apex/tasks/_d3NcldtrLAotoiCziqQE.md`

## Summary

| Aspect | Value | |––––|—––| | **Architecture** | Solution A: Direct Integration | | **Complexity** | 4/10 | | **Risk** | LOW | | **Files to create** | 6 | | **Files to modify** | 3 |

**Key decisions:**

Credentials required (fail startup if missing)
File watcher for real-time indexing
MongoDB-only storage (no SQLite cache)
4 patterns applied (Graceful Fallback, MCP DB Init, Zod, MCP Service)

**Next step:** Run `/apex:implement vector-search-md-mongodb` to begin implementation.

The Plan

Here’s what they actually produced:

Plan outputs

— id: _d3NcldtrLAotoiCziqQE identifier: vector-search-md-mongodb title: "Vector Search over .md Files with MongoDB Atlas" created: 2026-01-02 updated: 2026-01-02 phase: plan status: active confidence: 0.75 tags: [vector-search, mongodb-atlas, markdown, semantic-search, embeddings] decisions: - embedding_provider: voyage-ai - embedding_model: voyage-3.5-lite - scope: task-files-only - config: .env (MONGODB_URI, VOYAGE_API_KEY) - chunking: hybrid (XML sections OR markdown headings) + whole-file embedding - architecture: direct-integration (Solution A) - credentials: required (fail startup if missing) - indexing: file-watcher (real-time) - storage: mongodb-only (no SQLite cache) — # Vector Search over .md Files with MongoDB Atlas ## Executive Summary This research investigates implementing vector search over markdown files generated during APEX workflow, using MongoDB Atlas vector search. The approach is **fully viable** with straightforward integration paths. **Key finding**: APEX’s current architecture writes task files to `.apex/tasks/[ID].md` during workflow phases. These can be embedded as whole documents and indexed in MongoDB Atlas for semantic retrieval. ## Decisions Made - **Embedding Provider**: Voyage AI (`voyage-3.5-lite`) - best retrieval quality at $0.02/1M tokens - **Scope**: Task files only (`.apex/tasks/*.md`) - **Config**: Credentials in `.env` (`MONGODB_URI`, `VOYAGE_API_KEY`) - **Chunking**: Hybrid + whole-file - Chunks: XML sections (`<research>`, `<plan>`, etc.) OR markdown headings (`##`) - Plus: whole-file embedding (`section: "_whole"`) for task similarity — ## Research Findings ### 1. Current APEX .md File Architecture **Where .md files are written:** - **Task files**: `./.apex/tasks/[IDENTIFIER].md` (primary workflow output) - **Skills**: `/skills/*/SKILL.md` (static, checked into repo) - **Pattern packs**: `[pack-name]/README.md` (distribution packs) **Task file structure:** ```yaml — id: [nanoid] identifier: [kebab-case-name] phase: [research|plan|implement|ship] status: [active|complete|blocked] — # Title <research> <metadata>...</metadata> <executive-summary>...</executive-summary> </research> <plan> <architecture>...</architecture> </plan> <implementation> <changes>...</changes> </implementation> <ship> <review>...</review> </ship> ``` **Current search**: SQLite FTS5 via `pattern_fts` virtual table. No vector/semantic search. — ### 2. MongoDB Atlas Vector Search **Key capabilities:** - `$vectorSearch` aggregation stage (replaces deprecated `knnBeta`) - Supports 1-8192 dimensions - Three similarity functions: `cosine`, `dotProduct`, `euclidean` - Pre-filtering via indexed filter fields - HNSW algorithm for approximate nearest neighbor **Index configuration:** ```json { "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 1536, "similarity": "cosine" }, { "type": "filter", "path": "phase" }, { "type": "filter", "path": "taskType" } ] } ``` **Node.js query:** ```javascript const pipeline = [ { $vectorSearch: { index: ‘apex_docs_vector’, path: ‘embedding’, queryVector: queryEmbedding, numCandidates: 200, limit: 10, filter: { phase: ‘implement’ } } }, { $project: { title: 1, content: 1, taskId: 1, score: { $meta: ‘vectorSearchScore’ } } } ]; ``` **Pricing:** - Free tier (M0): 512MB storage, 3 indexes - Flex: $8/month, 5GB storage, 10 indexes - Dedicated: $57+/month for production — ### 3. Embedding: Voyage AI **Selected model**: `voyage-3.5-lite` - **Dimensions**: 1024 (configurable: 2048/1024/512/256) - **Cost**: $0.02/1M tokens - **Context**: 32K tokens (handles large task files) - **Quality**: Outperforms OpenAI text-embedding-3-large on retrieval benchmarks **Why Voyage AI**: - Best-in-class retrieval quality at competitive price - 32K context window handles entire task files - Anthropic’s recommended embedding partner - Flexible dimensions via Matryoshka learning — ## Proposed Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ APEX Workflow │ │ /research → /plan → /implement → /ship │ └───────────────────────┬─────────────────────────────────────┘ │ writes ▼ ┌─────────────────────────────────────────────────────────────┐ │ .apex/tasks/[ID].md │ │ (Source of truth - whole file is the document) │ └───────────────────────┬─────────────────────────────────────┘ │ watch/hook ▼ ┌─────────────────────────────────────────────────────────────┐ │ Indexing Pipeline (new) │ │ 1. Read file, extract YAML frontmatter │ │ 2. Detect format: XML sections or markdown headings │ │ 3. Chunk by detected format │ │ 4. Generate embedding per chunk via Voyage AI │ │ 5. Upsert chunks to MongoDB Atlas │ └───────────────────────┬─────────────────────────────────────┘ │ store ▼ ┌─────────────────────────────────────────────────────────────┐ │ MongoDB Atlas │ │ Collection: apex_task_chunks │ │ ┌────────────────────────────────────────────────────────┐│ │ │ { ││ │ │ _id: ObjectId, ││ │ │ taskId: string, ││ │ │ section: string (e.g. "research", "## Overview"), ││ │ │ content: string (chunk text), ││ │ │ phase: string, ││ │ │ tags: string[], ││ │ │ embedding: Binary (1024 dims), ││ │ │ updatedAt: Date ││ │ │ } ││ │ └────────────────────────────────────────────────────────┘│ │ Vector Index: apex_chunks_vector │ └───────────────────────┬─────────────────────────────────────┘ │ query ▼ ┌─────────────────────────────────────────────────────────────┐ │ New MCP Tool: apex_semantic_search │ │ - Accepts natural language query │ │ - Generates query embedding via Voyage AI │ │ - Returns relevant task files with similarity scores │ └─────────────────────────────────────────────────────────────┘ ``` — ## Implementation Approach ### Component 1: Voyage AI Client ```typescript // src/vector-search/voyage-client.ts import { VoyageAIClient } from ‘voyageai’; export class VoyageEmbeddings { private client: VoyageAIClient; constructor(apiKey: string) { this.client = new VoyageAIClient({ apiKey }); } async embed(text: string): Promise<number[]> { const response = await this.client.embed({ input: text, model: ‘voyage-3.5-lite’, outputDimension: 1024 }); return response.data[0].embedding; } } ``` ### Component 2: Hybrid Chunker ```typescript // src/vector-search/chunker.ts interface Chunk { section: string; // "research" | "plan" | "## Heading" content: string; } function chunkDocument(content: string): Chunk[] { // Detect XML sections const xmlSections = [‘research’, ‘plan’, ‘implementation’, ‘ship’]; const hasXml = xmlSections.some(s => content.includes(`<${s}>`)); if (hasXml) { return chunkByXmlSections(content, xmlSections); } else { return chunkByMarkdownHeadings(content); } } function chunkByXmlSections(content: string, sections: string[]): Chunk[] { const chunks: Chunk[] = []; for (const section of sections) { const regex = new RegExp(`<${section}>([\s\S]*?)</${section}>`, ‘i’); const match = content.match(regex); if (match) { chunks.push({ section, content: match[1].trim() }); } } return chunks; } function chunkByMarkdownHeadings(content: string): Chunk[] { const lines = content.split(‘\n’); const chunks: Chunk[] = []; let currentSection = ‘intro’; let currentContent: string[] = []; for (const line of lines) { if (line.startsWith(’## ‘)) { if (currentContent.length) { chunks.push({ section: currentSection, content: currentContent.join(’\n’) }); } currentSection = line; currentContent = []; } else { currentContent.push(line); } } if (currentContent.length) { chunks.push({ section: currentSection, content: currentContent.join(‘\n’) }); } return chunks; } ``` ### Component 3: Task Indexer ```typescript // src/vector-search/task-indexer.ts interface ChunkDocument { taskId: string; section: string; content: string; phase: string; tags: string[]; embedding: Binary; updatedAt: Date; } class TaskIndexer { async indexTaskFile(filePath: string): Promise<void> { const content = await fs.readFile(filePath, ‘utf-8’); const frontmatter = this.extractFrontmatter(content); const chunks = chunkDocument(content); // Delete old chunks for this task await this.collection.deleteMany({ taskId: frontmatter.id }); // Insert whole-file embedding for task similarity const wholeEmbedding = await this.voyage.embed(content); await this.collection.insertOne({ taskId: frontmatter.id, section: ‘_whole’, content: content, phase: frontmatter.phase, tags: frontmatter.tags || [], embedding: Binary.fromFloat32Array(Float32Array.from(wholeEmbedding)), updatedAt: new Date() }); // Insert chunk embeddings for precise search for (const chunk of chunks) { const embedding = await this.voyage.embed(chunk.content); await this.collection.insertOne({ taskId: frontmatter.id, section: chunk.section, content: chunk.content, phase: frontmatter.phase, tags: frontmatter.tags || [], embedding: Binary.fromFloat32Array(Float32Array.from(embedding)), updatedAt: new Date() }); } } } ``` ### Component 4: MCP Tools ```typescript // src/mcp/tools/semantic-search.ts // Tool 1: Search chunks for specific info { name: ‘apex_semantic_search’, description: ‘Search task file sections for specific information’, parameters: { query: { type: ‘string’, description: ‘Natural language query’ }, filter: { phase: { type: ‘string’, enum: [‘research’, ‘plan’, ‘implement’, ‘ship’] }, section: { type: ‘string’, description: ‘Limit to specific section’ }, tags: { type: ‘array’, items: { type: ‘string’ } } }, limit: { type: ‘number’, default: 10, maximum: 50 } } // Searches chunks where section != ‘_whole’ } // Tool 2: Find similar tasks (holistic) { name: ‘apex_find_similar_tasks’, description: ‘Find tasks similar to a query or another task’, parameters: { query: { type: ‘string’, description: ‘Natural language description or task ID’ }, limit: { type: ‘number’, default: 5, maximum: 20 } } // Searches only section == ‘_whole’ } ``` ### Component 5: Environment Variables ```bash # .env MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/apex VOYAGE_API_KEY=pa-xxxxxxxxxxxxxxxx ``` ```typescript // src/vector-search/config.ts export const vectorSearchConfig = { mongodbUri: process.env.MONGODB_URI, voyageApiKey: process.env.VOYAGE_API_KEY, database: ‘apex’, collection: ‘tasks’ }; ``/Users/ben/dev/apex/.apex/tasks/vector-search-md.md` — ## Technical Decisions (Finalized) | Decision | Choice | Rationale | |–––––|––––|———–| | Embedding provider | Voyage AI | Best retrieval quality, Anthropic partner | | Embedding model | voyage-3.5-lite | 32K context, $0.02/1M tokens | | Embedding dimensions | 1024 | Balance of quality/storage | | Chunking | Hybrid + whole | Chunks + whole-file embedding for similarity | | Config storage | .env | Simple, standard approach | | MongoDB driver |

What is "The Messy Middle"?

What This Is (and Isn’t)

The Task

Research Phase

Planning Phase

The Plan

Similar Posts