Why My Second RAG System Was Built in Rails, Not Python’s FastAPI

Built the same RAG system in FastAPI and Ruby on Rails. FastAPI took weeks, Rails took 24 hours. Here's what that taught me about choosing frameworks for AI products.

A hands-on comparison from a Rails developer's first AI project

Built the same RAG system in FastAPI and Ruby on Rails. FastAPI took weeks, Rails took 24 hours. Here's what that taught me about choosing frameworks for AI products.

A hands-on comparison from a Rails developer's first AI project

Picture this: You're starting a new RAG project. You open your laptop, and immediately hit the question every non-Python developer dreads: "Do I really need to learn Python for this?"

Every tutorial assumes Python. Every example uses FastAPI or Flask. And you're sitting there thinking, "But... I already know Rails."

That was me a few months ago.

I'm a Rails developer. I love Rails. But I kept hearing about Python's AI dominance, so I decided to stop wondering and just build something. I spent a some time building a RAG system with FastAPI, learning embeddings, vector databases, and how to actually make an LLM answer questions about my documents.

It worked. I learned a ton. Case closed, right?

Then our company announced an AI hackathon. I needed to build another RAG system, but this time I had 48 hours, but this time I decided to built it in Rails. This wasn’t a strategic rewrite or a long-term migration decision. It was a practical choice made under time constraints.

Same features. Same vector database. Same LLM. Different framework.

What surprised me was how much smoother the experience felt.

This article explains why I rebuilt the RAG system in Rails, what changed, and why this series exists. This is what that experience was actually like.

TL;DR

Built the same RAG system in FastAPI (took weeks) and Rails (took 24 hours)
The actual AI logic was identical—the difference was infrastructure
Rails' mature tooling (Sidekiq, ActiveRecord, console) made development faster
Python still wins for ML-heavy experimentation
My recommendation: Rails for the app, Python microservices only when needed
You don't need to rewrite your Rails app in Python to add AI features

What I built (in both versions)

Both RAG systems had the same core functionality:

The tech stack:

FastAPI version:

FastAPI for the API layer
Celery for background jobs
SQLAlchemy for database access
OpenAI API for embeddings and completions
pgvector for similarity search

Rails version:

Rails API backend
React frontend
Sidekiq for background jobs
ActiveRecord for database access
Same OpenAI API and pgvector setup

The AI logic was identical. The framework wrapping it was different.

The Architecture (Same for Both)

Both implementations follow this exact flow:

Document Upload → User uploads PDF via web interface
Text Extraction → Extract and clean text from PDF
Chunking → Split document into ~500 token chunks with overlap
Embedding Generation → Send chunks to OpenAI's embedding API
Vector Storage → Store embeddings in Pinecone with metadata
Question Processing → When user asks a question:
- Generate embedding for the question(text-embedding-3-small model)
- Query Pinecone for top 5 similar chunks
- Send question + context to GPT-4
- Stream response back to user

Next Up: Where the Differences Actually Showed Up

The FastAPI Version: Where I Spent My Time

Building the FastAPI version worked. But I spent more time on infrastructure than on the actual AI features.

In theory, Celery handles async tasks. In practice, I became the one handling Celery.

When an embedding job failed (and they did, API timeouts, rate limits, malformed PDFs), here's what debugging looked like:

Background jobs became a maintenance burden:

# FastAPI/Celery - Replay a failed embedding job
# 1. Find the task ID in logs
# 2. Check Celery flower or redis
# 3. Manually construct retry logic
# 4. Hope the async session doesn't break again

@celery_app.task(bind=True, max_retries=3)
async def embed_document(self, doc_id):
    try:
        async with get_db_session() as session:
            # embedding logic
            pass
    except Exception as exc:
        raise self.retry(exc=exc, countdown=60)

Compare this to Rails:

# Rails - Replay failed job from console or UI
DocumentEmbeddingJob.perform_later(document.id)
# Built-in retry with exponential backoff
class DocumentEmbeddingJob < ApplicationJob
retry_on OpenAI::Error, wait: :exponentially_longer, attempts: 5
def perform(document_id)
document = Document.find(document_id)
embedding = OpenAI.embed(document.content)
document.update!(embedding: embedding)
end
end

In Rails, I can see failed jobs in Sidekiq's web UI, click "Retry," and watch it work. In FastAPI, I was writing custom monitoring and retry logic.

Database session management became a daily puzzle:

Every async endpoint needed careful session handling. I'd write a feature, run tests, and watch them randomly fail because some session somewhere wasn't properly closed. I spent more time reading asyncio documentation than building features.

# FastAPI - Manual session lifecycle everywhere
@app.post("/documents")
async def create_document(doc: DocumentCreate):
    async with get_db_session() as session:
        async with session.begin():
            # Don't forget to close this!
            # Or rollback on error!
            # Or handle connection pool limits!
            pass

Meanwhile in Rails? ActiveRecord just handles it. I never thought about sessions once during the hackathon.

Deployment felt fragile:

I had to manually configure:

Async workers
Job queues
Retry policies
Monitoring dashboards
Database connection pools for async

Rails gives me all of this out of the box.

The Rails version: exactly what I needed

During the hackathon, I had 48 hours to ship a working demo. Not a prototype. Not a proof-of-concept. A working system that non-technical people could use.

I chose Rails not because it's "better for AI" (it's probably not), but because I knew exactly where my time would go: building features, not configuring infrastructure.

Here's what the system did:

Ingested documentation from multiple sources
Split content into chunks
Generated embeddings via OpenAI
Stored vectors in Postgres with pgvector
Answered questions using retrieved context
Actually ship it before the demo

The entire backend:

# Model
class Document < ApplicationRecord
  has_neighbors :embedding
after_create_commit :enqueue_embedding_job
def enqueue_embedding_job
DocumentEmbeddingJob.perform_later(id)
end
end
# Background job
class DocumentEmbeddingJob < ApplicationJob
retry_on OpenAI::Error, wait: :exponentially_longer
def perform(document_id)
document = Document.find(document_id)
<span class="n">chunks</span> <span class="o">=</span> <span class="n">split_into_chunks</span><span class="p">(</span><span class="n">document</span><span class="p">.</span><span class="nf">content</span><span class="p">)</span>
<span class="n">chunks</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">chunk</span><span class="o">|</span>
  <span class="n">embedding</span> <span class="o">=</span> <span class="n">generate_embedding</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
  <span class="no">DocumentChunk</span><span class="p">.</span><span class="nf">create!</span><span class="p">(</span>
    <span class="ss">document: </span><span class="n">document</span><span class="p">,</span>
    <span class="ss">content: </span><span class="n">chunk</span><span class="p">,</span>
    <span class="ss">embedding: </span><span class="n">embedding</span>
  <span class="p">)</span>
<span class="k">end</span>

end
private
def generate_embedding(text)
client = OpenAI::Client.new
response = client.embeddings(
parameters: {
model: “text-embedding-3-small”,
input: text
}
)
response.dig(“data”, 0, “embedding”)
end
end
# Query service
class RagQueryService
def initialize(query)
@query = query
@embedding = generate_embedding(query) # Convert question to vector
end
def answer
# Find the 5 most similar document chunks
relevant_chunks = DocumentChunk
.nearest_neighbors(:embedding, @embedding, distance: “cosine”)
.limit(5)
<span class="c1"># Combine them into context</span>
<span class="n">context</span> <span class="o">=</span> <span class="n">relevant_chunks</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:content</span><span class="p">).</span><span class="nf">join</span><span class="p">(</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span>

<span class="c1"># Ask GPT-4 with the context</span>
<span class="n">client</span> <span class="o">=</span> <span class="no">OpenAI</span><span class="o">::</span><span class="no">Client</span><span class="p">.</span><span class="nf">new</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="nf">chat</span><span class="p">(</span>
  <span class="ss">parameters: </span><span class="p">{</span>
    <span class="ss">model: </span><span class="s2">"gpt-4"</span><span class="p">,</span>
    <span class="ss">messages: </span><span class="p">[</span>
      <span class="p">{</span> <span class="ss">role: </span><span class="s2">"system"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"Answer based on this context: </span><span class="si">#{</span><span class="n">context</span><span class="si">}</span><span class="s2">"</span> <span class="p">},</span>
      <span class="p">{</span> <span class="ss">role: </span><span class="s2">"user"</span><span class="p">,</span> <span class="ss">content: </span><span class="vi">@query</span> <span class="p">}</span>
    <span class="p">]</span>
  <span class="p">}</span>
<span class="p">)</span>

<span class="n">response</span><span class="p">.</span><span class="nf">dig</span><span class="p">(</span><span class="s2">"choices"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"message"</span><span class="p">,</span> <span class="s2">"content"</span><span class="p">)</span>

end
end

Reality check: This code is simplified for the article. The real version had error handling, logging, and rate limiting. But the core logic? Pretty much this.

That's it. The entire RAG pipeline in ~60 lines of readable Ruby.
No async session management. No custom retry logic. No Celery flower dashboard. Just Rails doing what Rails does best: letting you build features instead of infrastructure.

Could I have made the FastAPI version this clean? Maybe. But I didn't have time to figure it out. And that's the point.

What made Rails faster for me

I shipped the Rails version in 24 hours. The FastAPI version took weeks to get stable.

Here's why:

1. Background jobs are a solved problem

Sidekiq gives me:

Web UI to monitor jobs
Automatic retries with backoff
Dead job queues
Performance metrics
One-click job replay

I didn't write any of this. It was already there.

2. Database access is predictable

No async session managers. No connection pool tuning. No event loop surprises. It just works.

3. Debugging is straightforward

When an embedding job failed:

Open Sidekiq UI
See the error and full backtrace
Click "Retry"
Check logs if needed

In FastAPI, I was tailing Celery logs and rebuilding context manually.

4. The ecosystem has what I needed

ruby-openai gem for API calls
neighbor gem for vector similarity
pgvector extension for Postgres
Standard Rails patterns for everything else

No async complications. No compatibility issues.

5. The developer workflow is integrated, not assembled

This is easy to underestimate until you feel it.

Rails gives you a tight feedback loop by default:

A powerful interactive Rails console for debugging live data and jobs
Database migrations that are simple, versioned, and reversible
A test setup that’s predictable and deeply integrated with the framework

When I wanted to:

Inspect a document’s embeddings
Replay a failed job
Tweak a schema and re-run ingestion
Write or debug a failing test

When I needed to inspect embeddings, replay a failed job, or tweak the schema, I did it from the console in seconds.

In the FastAPI setup, these same tasks required more manual work:

Managing Alembic migrations explicitly
Configuring async test fixtures
Debugging through logs instead of an interactive console

None of this is impossible in Python — but it is more fragmented.

Rails optimizes for flow.
FastAPI optimizes for flexibility.

When you're iterating on AI features under a deadline, that difference compounds daily.

The key insight: AI primitives are framework-agnostic

After building both versions, here's what became clear:

The actual AI logic was identical.

Both systems used the same process:

Split documents into chunks
Generate embeddings via OpenAI API
Store vectors in Postgres
Retrieve similar chunks
Pass context to LLM

The intelligence came from:

Quality of input data
Chunking strategy
Prompt engineering
Retrieval precision

Not from the framework.

Rails didn’t make the model smarter. It made the system easier to reason about, operate, and change.

And for a product engineer shipping features, that matters more than access to the latest ML libraries.

Where Python still clearly wins

Let me be clear: there are cases where Python is the right choice.

Use Python when you need:

Advanced document loaders (LangChain, LlamaIndex)
Custom re-ranking models
Sophisticated evaluation frameworks
Rapid ML experimentation
Fine-tuning workflows

For research and ML-heavy work, Python is unmatched.

Examples where I'd choose Python:

You're experimenting with multiple embedding models weekly
You need custom re-ranking with a BERT model
You're running A/B tests on different chunking strategies
Your team is already Python-first

But for building a production RAG feature in an existing Rails app? You probably don't need to rewrite everything in Python.

My Approach Now: Hybrid Architecture

After building both versions, I use this mental model:

Rails handles the application:

API endpoints
Background jobs
Database models
User authentication
Business logic

Python microservices for ML-specific work:

Custom re-ranking models
Advanced document parsing
Specialized ML pipelines
Evaluation frameworks

Why this works:

Product code stays stable and maintainable
AI experiments stay isolated
Infrastructure stays simple
Team velocity stays high

Instead of migrating my entire Rails app to FastAPI, I integrate Python only where it adds specific value.

The real takeaway

Strip away the AI layer, and a RAG system is still just a distributed application:

Background jobs
Database transactions
Retries and failures
User-facing latency

The framework you choose determines how painful these problems are to live with.

That's why this comparison isn't really about Rails vs FastAPI.

It's about choosing tools that let you focus on product behavior instead of infrastructure glue.

Final thoughts

Building the same RAG system in two different frameworks taught me something simple but important:

The hard parts of production software aren't the AI API calls. They're:

Reliable background processing
Debugging production failures
Managing deployments
Maintaining clean architecture
Shipping quickly

If you're already working in Rails (or Django, or any mature web framework), you already have solutions for these problems. Adding AI features doesn't change that.

Python has incredible AI tooling. Rails has incredible application tooling.

You don't need to choose one or the other. You can use both strategically.

If you're a Rails developer wondering whether you need to learn FastAPI to build AI features: you probably don't. Start with Rails. Add Python services only when you hit a real limitation.

Building AI features in non-Python frameworks? I'd love to hear about your experience. Drop a comment or connect with me on LinkedIn.