Building a Production RAG System for Resume Search: What Actually Worked (and What Didn't)

After 25 years in software development, I recently tackled a problem that’s becoming increasingly common: implementing a production-ready RAG (Retrieval-Augmented Generation) system using AWS Bedrock Knowledge Bases. The use case was resume search for a recruiting database, and the results were significant enough that I wanted to share what worked, what didn’t, and the gotchas I hit along the way.

The Problem: Keyword Search Isn’t Enough

Recruiters were drowning in manual work. They’d run keyword searches against the resume database, then spend hours manually sifting through results trying to match candidates to job descriptions. The core issue? Basic keyword search doesn’t understand context or semantic meaning.

A job description asking for “frontend expertise with modern Jav…

The Problem: Keyword Search Isn’t Enough

A job description asking for “frontend expertise with modern JavaScript frameworks” might miss excellent candidates whose resumes say “React developer” or “Vue.js specialist” because the exact keywords don’t match. Recruiters were compensating by running dozens of search variations, then manually evaluating each result.

The pain point: Searches that should take an hour were taking 2 days or more to compile a quality candidate list.

The Solution: AWS Bedrock Knowledge Base with Semantic Search

Instead of making recruiters search harder, I inverted the problem. The system takes a job description, generates an optimized semantic query, and searches the knowledge base for candidates whose experience actually matches what the role needs - not just the keywords used to describe it.

The architecture is straightforward:

Data source: Bullhorn ATS
Data transformation: Resumes exported and converted to Markdown files with recruiter notes embedded for additional context
Storage: Transformed files in S3
Vector store: AWS Bedrock Knowledge Base with OpenSearch Serverless
Embeddings: Cohere Embed model
Query processing: Job descriptions converted to optimized prompts before querying

The Markdown transformation was critical - it provided a clean, consistent format while preserving the recruiter context that often makes the difference between “technically qualified” and “actually a good fit.”

The Chunking Strategy That Actually Worked

This was the most critical decision and took the most trial and error.

What I Tried First (That Didn’t Work Well)

My initial approach used semantic splitting - breaking documents at natural boundaries based on content meaning. The theory was sound: keep related information together. The results were... okay. Not terrible, but not targeted enough. Search results felt scattered, like the system was giving me pieces of information from across someone’s entire career instead of the relevant parts.

What Actually Worked: Fixed-Size Chunks with Overlap

I switched to 400-character chunks with 20% overlap (80 characters of overlap between adjacent chunks).

Why this worked better:

Consistency matters for vector search. Fixed-size chunks create more uniform embeddings, which means more predictable search results. When chunks vary wildly in size (which happens with semantic splitting on resumes), you get inconsistent matching quality.

Overlap prevents context loss. A candidate’s key qualification might span a chunk boundary. The 20% overlap ensures that critical context doesn’t get split awkwardly. If someone’s resume says “Led migration from Angular to React, reducing bundle size by 40%,” you want that whole thought captured somewhere.

Shorter chunks = more precise matching. At 400 characters, each chunk represents a focused piece of someone’s experience. When a query matches, you’re getting specific relevant context, not an entire job history.

The difference was noticeable immediately. Search results became more focused and relevant sections of resumes consistently bubbled to the top.

Metadata: What Mattered and What Didn’t

This is where the real power comes in. Raw semantic search is good; semantic search + metadata filtering is game-changing.

Metadata I Attached to Each Chunk:

Title (job title)
Department
Recruiter (who sourced them)
Skills (extracted skill list)
Companies (employment history)
Education
Location
Email

What Actually Got Used:

Department filtering was huge. Being able to search within “Accounting” vs “Creative” vs “HR” immediately cuts noise and improves relevance. When you’re filling a finance role, you don’t want creative professionals showing up just because they mentioned “budgets” in their portfolio management section.

Skills metadata let recruiters combine semantic search with hard requirements. “Find candidates whose experience matches this job description AND have Python + AWS in their skills list” catches people who might describe their work differently but have the required technical background.

Location filtering became essential for roles that weren’t fully remote. No point showing Dallas candidates for a NYC office-required position.

The other metadata fields were valuable for the application layer but didn’t significantly improve search quality itself. They’re worth capturing, but they’re not doing heavy lifting in the retrieval step.

Real Results: 2 Days or More to 2 Hours

The numbers tell the story:

Before: Compiling a quality candidate list for a specialized role took 2 days or more of recruiter time - running multiple keyword searches, manually reviewing resumes, cross-referencing requirements.

After: Same task took 2 hours. The recruiter reviews the job description, the system returns ranked candidates with relevant experience highlighted, and they spend their time on actual evaluation instead of archaeological searches.

Quality validation: When we tracked placements, candidates that were actually hired appeared in the top search results the majority of the time. The system wasn’t just faster - it was finding the right people.

The Biggest Gotcha: Indexing Strategy

This cost me the most time. My initial implementation used inline documents (sending resume content directly to Bedrock via API). It worked, but indexing was painfully slow - we’re talking days to process and index resumes from Bullhorn.

One important note: resumes are only indexed if they have been updated witin the past 2 years. Anything older than that quickly loses relevance in the recruiting world - skills change, candidates move on, contact information goes stale. This scoping decision kept the database manageable and ensured search results stayed current.

The fix: A two-step transformation process. First, I exported resumes from Bullhorn (filtering for the past 2 years) and converted them to Markdown files, embedding recruiter notes as additional context within the documents. Then I stored these transformed files in S3 and pointed the Knowledge Base at the bucket.

Result: Indexing time dropped from days to hours.

Why did this work so much better?

AWS Bedrock Knowledge Bases are optimized for S3-based workflows. The service can parallelize file processing much more efficiently than handling inline content through API calls.

Markdown provided structure and consistency. Instead of dealing with varied resume formats (PDFs, Word docs, text files), everything became clean, parseable Markdown. This consistency improved both indexing speed and search quality.

Recruiter notes added critical context. When a recruiter writes “Strong cultural fit for startup environments” or “Excellent communicator, handled difficult client situations well,” that context gets indexed alongside technical qualifications. The semantic search can now match on soft skills and work style, not just keywords.

If I were starting over, I’d go straight to the Markdown transformation approach and save myself a week of frustration. Plus, updates became simpler - export from Bullhorn, transform to Markdown, drop in S3, trigger a sync, done.

What I’d Still Improve

This system is production-ready, but it’s not perfect. It’s a work in progress, and there are areas I’d continue refining:

Dynamic chunk sizing based on content type. Resumes aren’t uniform - executive resumes vs junior developer resumes have different information density. Adaptive chunking might improve results further.

Better handling of acronyms and industry jargon. The system sometimes misses matches when candidates use acronyms (e.g., “K8s”) vs full terms (e.g., “Kubernetes”). Some preprocessing or synonym expansion could help.

Query optimization based on role type. Different job categories (engineering vs sales vs finance) might benefit from different query formulations. There’s room to tune the prompt generation based on department.

Lessons Learned

Start with fixed-size chunking. It’s simpler, more predictable, and easier to tune than semantic splitting. You can always get fancier later. 1.

Overlap is worth it. The 20% overlap prevented edge cases where important context got split. The storage cost is negligible compared to the quality improvement. 1.

Metadata filtering is not optional. Semantic search alone is good. Semantic search + metadata filtering is what makes the system production-ready. 1.

Transform your data into a consistent format. Converting everything to Markdown before indexing improved both speed and quality. If you’re working with an ATS or any heterogeneous data source, the transformation step is worth the effort. 1.

Use S3 for document storage from day one. Don’t make my mistake of starting with inline documents. 1.

Embed human context wherever possible. Recruiter notes added dimensions that pure resume text couldn’t capture. If you have expert annotations, comments, or contextual notes in your source system, include them. 1.

Measure what matters. We tracked time-to-candidate-list and placement rates. Those metrics proved the system’s value and guided optimization priorities.

The Bottom Line

Implementing RAG systems with AWS Bedrock isn’t rocket science, but the details matter. The difference between “okay results” and “recruiters will use this every day” came down to chunking strategy, metadata design, data transformation, and infrastructure choices.

For this use case, the combination of pulling from Bullhorn ATS, transforming to Markdown with recruiter context, using 400-character fixed chunks with 20% overlap, rich metadata filtering, and S3-based storage created a system that genuinely improved recruiter productivity. Searches that took 2 days or more became 2-hour searches, and the candidates being placed were consistently appearing in top results.

If you’re building similar systems - whether for recruiting, document search, customer support, or any other knowledge retrieval application - the principles transfer. Start simple, measure results, and optimize based on what your users actually need. And if you’re working with data from an ATS, CRM, or any structured system, invest the time in a clean transformation pipeline. It pays dividends.

Need help implementing AWS Bedrock Knowledge Bases or optimizing your RAG system? I do consulting and implementation work through Concept Cache. Feel free to reach out if you’re hitting similar challenges.

The Problem: Keyword Search Isn’t Enough

The Problem: Keyword Search Isn’t Enough

The Solution: AWS Bedrock Knowledge Base with Semantic Search

The Chunking Strategy That Actually Worked

What I Tried First (That Didn’t Work Well)

What Actually Worked: Fixed-Size Chunks with Overlap

Metadata: What Mattered and What Didn’t

Metadata I Attached to Each Chunk:

What Actually Got Used:

Real Results: 2 Days or More to 2 Hours

The Biggest Gotcha: Indexing Strategy

What I’d Still Improve

Lessons Learned

The Bottom Line

Similar Posts