Built the same RAG system in FastAPI and Ruby on Rails. FastAPI took weeks, Rails took 24 hours. Here's what that taught me about choosing frameworks for AI products.
A hands-on comparison from a Rails developer's first AI project
Built the same RAG system in FastAPI and Ruby on Rails. FastAPI took weeks, Rails took 24 hours. Here's what that taught me about choosing frameworks for AI products. A hands-on comparison from a Rails developer's first AI project Picture this: You're starting a new RAG project. You open your laptop, and immediately hit the question every non-Python developer dreads: "Do I really need to learn Python for this?" Every tutorial assumes Python. Every example uses FastAPI or Flask. And you're sitting there thinking, "But... I already know Rails." That was me a few months ago. I'm a Rails developer. I love Rails. But I kept hearing about Python's AI dominance, so I decided to stop wondering and just build something. I spent a some time building a RAG system with FastAPI, learning embeddings, vector databases, and how to actually make an LLM answer questions about my documents. It worked. I learned a ton. Case closed, right? Then our company announced an AI hackathon. I needed to build another RAG system, but this time I had 48 hours, but this time I decided to built it in Rails. This wasn’t a strategic rewrite or a long-term migration decision. It was a practical choice made under time constraints. Same features. Same vector database. Same LLM. Different framework. What surprised me was how much smoother the experience felt. This article explains why I rebuilt the RAG system in Rails, what changed, and why this series exists. This is what that experience was actually like. TL;DR What I built (in both versions) Both RAG systems had the same core functionality: The tech stack: FastAPI version: Rails version: The AI logic was identical. The framework wrapping it was different. The Architecture (Same for Both) Both implementations follow this exact flow: Next Up: Where the Differences Actually Showed Up The FastAPI Version: Where I Spent My Time Building the FastAPI version worked. But I spent more time on infrastructure than on the actual AI features. In theory, Celery handles async tasks. In practice, I became the one handling Celery. When an embedding job failed (and they did, API timeouts, rate limits, malformed PDFs), here's what debugging looked like: Background jobs became a maintenance burden: Compare this to Rails: # Built-in retry with exponential backoff
class DocumentEmbeddingJob < ApplicationJob
retry_on OpenAI::Error, wait: :exponentially_longer, attempts: 5 def perform(document_id)
document = Document.find(document_id)
embedding = OpenAI.embed(document.content)
document.update!(embedding: embedding)
end
end
In Rails, I can see failed jobs in Sidekiq's web UI, click "Retry," and watch it work. In FastAPI, I was writing custom monitoring and retry logic. Database session management became a daily puzzle: Every async endpoint needed careful session handling. I'd write a feature, run tests, and watch them randomly fail because some session somewhere wasn't properly closed. I spent more time reading asyncio documentation than building features. Meanwhile in Rails? ActiveRecord just handles it. I never thought about sessions once during the hackathon. Deployment felt fragile: I had to manually configure: Rails gives me all of this out of the box. The Rails version: exactly what I needed During the hackathon, I had 48 hours to ship a working demo. Not a prototype. Not a proof-of-concept. A working system that non-technical people could use. I chose Rails not because it's "better for AI" (it's probably not), but because I knew exactly where my time would go: building features, not configuring infrastructure. Here's what the system did: The entire backend: after_create_commit :enqueue_embedding_job def enqueue_embedding_job
DocumentEmbeddingJob.perform_later(id)
end
end # Background job
class DocumentEmbeddingJob < ApplicationJob
retry_on OpenAI::Error, wait: :exponentially_longer def perform(document_id)
document = Document.find(document_id) end private def generate_embedding(text)
client = OpenAI::Client.new
response = client.embeddings(
parameters: {
model: “text-embedding-3-small”,
input: text
}
)
response.dig(“data”, 0, “embedding”)
end
end # Query service
class RagQueryService
def initialize(query)
@query = query
@embedding = generate_embedding(query) # Convert question to vector
end def answer
# Find the 5 most similar document chunks
relevant_chunks = DocumentChunk
.nearest_neighbors(:embedding, @embedding, distance: “cosine”)
.limit(5) end
end
Reality check: This code is simplified for the article. The real version had error handling, logging, and rate limiting. But the core logic? Pretty much this. That's it. The entire RAG pipeline in ~60 lines of readable Ruby. Could I have made the FastAPI version this clean? Maybe. But I didn't have time to figure it out. And that's the point. What made Rails faster for me I shipped the Rails version in 24 hours. The FastAPI version took weeks to get stable. Here's why: 1. Background jobs are a solved problem Sidekiq gives me: I didn't write any of this. It was already there. 2. Database access is predictable No async session managers. No connection pool tuning. No event loop surprises. It just works. 3. Debugging is straightforward When an embedding job failed: In FastAPI, I was tailing Celery logs and rebuilding context manually. 4. The ecosystem has what I needed No async complications. No compatibility issues. 5. The developer workflow is integrated, not assembled This is easy to underestimate until you feel it. Rails gives you a tight feedback loop by default: When I wanted to: When I needed to inspect embeddings, replay a failed job, or tweak the schema, I did it from the console in seconds. In the FastAPI setup, these same tasks required more manual work: None of this is impossible in Python — but it is more fragmented. Rails optimizes for flow. When you're iterating on AI features under a deadline, that difference compounds daily. The key insight: AI primitives are framework-agnostic After building both versions, here's what became clear: The actual AI logic was identical. Both systems used the same process: The intelligence came from: Not from the framework. Rails didn’t make the model smarter. It made the system easier to reason about, operate, and change. And for a product engineer shipping features, that matters more than access to the latest ML libraries. Where Python still clearly wins Let me be clear: there are cases where Python is the right choice. Use Python when you need: For research and ML-heavy work, Python is unmatched. Examples where I'd choose Python: But for building a production RAG feature in an existing Rails app? You probably don't need to rewrite everything in Python. My Approach Now: Hybrid Architecture After building both versions, I use this mental model: Rails handles the application: Python microservices for ML-specific work: Why this works: Instead of migrating my entire Rails app to FastAPI, I integrate Python only where it adds specific value. The real takeaway Strip away the AI layer, and a RAG system is still just a distributed application: The framework you choose determines how painful these problems are to live with. That's why this comparison isn't really about Rails vs FastAPI. It's about choosing tools that let you focus on product behavior instead of infrastructure glue. Final thoughts Building the same RAG system in two different frameworks taught me something simple but important: The hard parts of production software aren't the AI API calls. They're: If you're already working in Rails (or Django, or any mature web framework), you already have solutions for these problems. Adding AI features doesn't change that. Python has incredible AI tooling. Rails has incredible application tooling. You don't need to choose one or the other. You can use both strategically. If you're a Rails developer wondering whether you need to learn FastAPI to build AI features: you probably don't. Start with Rails. Add Python services only when you hit a real limitation. Building AI features in non-Python frameworks? I'd love to hear about your experience. Drop a comment or connect with me on LinkedIn.
# FastAPI/Celery - Replay a failed embedding job
# 1. Find the task ID in logs
# 2. Check Celery flower or redis
# 3. Manually construct retry logic
# 4. Hope the async session doesn't break again
@celery_app.task(bind=True, max_retries=3)
async def embed_document(self, doc_id):
try:
async with get_db_session() as session:
# embedding logic
pass
except Exception as exc:
raise self.retry(exc=exc, countdown=60)
# Rails - Replay failed job from console or UI
DocumentEmbeddingJob.perform_later(document.id)
# FastAPI - Manual session lifecycle everywhere
@app.post("/documents")
async def create_document(doc: DocumentCreate):
async with get_db_session() as session:
async with session.begin():
# Don't forget to close this!
# Or rollback on error!
# Or handle connection pool limits!
pass
# Model
class Document < ApplicationRecord
has_neighbors :embedding
<span class="n">chunks</span> <span class="o">=</span> <span class="n">split_into_chunks</span><span class="p">(</span><span class="n">document</span><span class="p">.</span><span class="nf">content</span><span class="p">)</span>
<span class="n">chunks</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">chunk</span><span class="o">|</span>
<span class="n">embedding</span> <span class="o">=</span> <span class="n">generate_embedding</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>
<span class="no">DocumentChunk</span><span class="p">.</span><span class="nf">create!</span><span class="p">(</span>
<span class="ss">document: </span><span class="n">document</span><span class="p">,</span>
<span class="ss">content: </span><span class="n">chunk</span><span class="p">,</span>
<span class="ss">embedding: </span><span class="n">embedding</span>
<span class="p">)</span>
<span class="k">end</span>
<span class="c1"># Combine them into context</span>
<span class="n">context</span> <span class="o">=</span> <span class="n">relevant_chunks</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&</span><span class="ss">:content</span><span class="p">).</span><span class="nf">join</span><span class="p">(</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span>
<span class="c1"># Ask GPT-4 with the context</span>
<span class="n">client</span> <span class="o">=</span> <span class="no">OpenAI</span><span class="o">::</span><span class="no">Client</span><span class="p">.</span><span class="nf">new</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="nf">chat</span><span class="p">(</span>
<span class="ss">parameters: </span><span class="p">{</span>
<span class="ss">model: </span><span class="s2">"gpt-4"</span><span class="p">,</span>
<span class="ss">messages: </span><span class="p">[</span>
<span class="p">{</span> <span class="ss">role: </span><span class="s2">"system"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"Answer based on this context: </span><span class="si">#{</span><span class="n">context</span><span class="si">}</span><span class="s2">"</span> <span class="p">},</span>
<span class="p">{</span> <span class="ss">role: </span><span class="s2">"user"</span><span class="p">,</span> <span class="ss">content: </span><span class="vi">@query</span> <span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="n">response</span><span class="p">.</span><span class="nf">dig</span><span class="p">(</span><span class="s2">"choices"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"message"</span><span class="p">,</span> <span class="s2">"content"</span><span class="p">)</span>
No async session management. No custom retry logic. No Celery flower dashboard. Just Rails doing what Rails does best: letting you build features instead of infrastructure.
ruby-openai gem for API callsneighbor gem for vector similaritypgvector extension for Postgres
FastAPI optimizes for flexibility.