I built a fully local Retrieval-Augmented Generation (RAG) system that lets a Llama 3 model answer questions about my own PDFs and Markdown files, no cloud APIs, no external servers, all running on my machine.

It’s powered by:

  • Streamlit for the frontend
  • FastAPI for the backend
  • ChromaDB for vector storage
  • Ollama to run Llama 3 locally

The system ingests documents, chunks and embeds them, retrieves relevant parts for a query, and feeds them into Llama 3 to generate grounded answers.


🧠 Introduction – Why Go Local?

It started with a simple frustration, I had a bunch of private PDFs and notes I wanted to query like ChatGPT, but without sending anything to the cloud. LLMs are powerful, but they don’t know your documents. RAG changes that, it…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help