Agentic RAG on knowledge graphs, local voice assistants, KL regularization, and learning without forgetting.
Good morning, AI enthusiasts,
This week, we step back from tools and trends to ask a more fundamental question: what kind of system should you actually build? In What’s AI, I share lessons from real-world agent engineering, why most “agents” should really be workflows, when a single agent with strong tools beats a multi-agent setup, and how constraints should shape architecture choices, not hype.
The curated pieces carry this idea forward. You’ll find a deep dive into building Agentic RAG on a Neo4j knowledge graph to handle precise, multi-hop reasoning; a practical guide to entirely local LLM voice assistants and the architectural tradeoffs behind them; and a...
Agentic RAG on knowledge graphs, local voice assistants, KL regularization, and learning without forgetting.
Good morning, AI enthusiasts,
This week, we step back from tools and trends to ask a more fundamental question: what kind of system should you actually build? In What’s AI, I share lessons from real-world agent engineering, why most “agents” should really be workflows, when a single agent with strong tools beats a multi-agent setup, and how constraints should shape architecture choices, not hype.
The curated pieces carry this idea forward. You’ll find a deep dive into building Agentic RAG on a Neo4j knowledge graph to handle precise, multi-hop reasoning; a practical guide to entirely local LLM voice assistants and the architectural tradeoffs behind them; and a clear explanation of KL divergence as a way to fine-tune models without pushing them into brittle overconfidence. We also explore catastrophic forgetting, why it happens, why it mirrors human learning more than we admit, and how approaches like Nested Learning aim to manage, not eliminate, it.
Let’s get into it.
What’s AI Weekly
This week, I am sharing a talk I gave at the University of San Diego. It was about AI engineering in the agents era, and more specifically, how we decide what to build here at Towards AI. I primarily covered how to choose between Workflows, Agents, and Multi-Agent Systems. Just like in traditional software engineering, you don’t pick a stack because it’s trendy. You pick it because it fits the constraints. The core message was simple: Most “agents” should be workflows. I also shared two real builds: A client CRM project where a single agent + strong tools + validation loops beat a multi-agent setup, and our internal research + writing system, where research needs flexibility but writing needs tight constraints. Watch the entire talk on YouTube!
— Louis-François Bouchard, Towards AI Co-founder & Head of Community
Learn AI Together Community Section!
Featured Community post from the Discord
Playful_courgette_43578 has created a tool to help you transfer chat history from one LLM to another. ChatGPT2Gemini integrates with your accounts to allow a smooth, easy setup, letting you select which accounts you want to transfer from and to. This seems extremely valuable in the long run when you want to import long-lasting context directly into your new chats and projects. Check it out here and support a fellow community member. If you have any questions about how this works, connect with him in the thread!
AI poll of the week!

The room leans yes on using ChatGPT-style search for the web, with a solid minority still loyal to classic Google. This is quite interesting. Last year, the gap between the two was much tighter (53 to 47). LLM search kills “tab pinball” on messy, synthesizing queries: one good summary with citations beats 12 open links, while people keep Google for fresh news, exact docs/operators, and source-auditing where crawling depth and recency still matter.
Share two real queries you run often, one where LLM search clearly wins and one where Google still does, and tell us the deciding factor (time to answer, trust in sources, or result quality). Let’s talk in the thread!
Collaboration Opportunities
The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week!
1. Aj_doublea is looking for a study partner to learn together and collaborate with me from time to time. If you want to start learning or building something, connect with him in the thread!
2. Mani066655 is learning AI and planning to develop and deploy real-world AI apps, and is looking for someone who wants to collaborate. If this sounds like you, reach out to him in the thread!
3. Liquid25082 is looking for an AI/ML expert who can build a custom model that can classify AI-generated text and human-written text. If you’ve done something similar before, contact him in the thread!
Meme of the week!

Meme shared by hudsong0
TAI Curated Section
Article of the week
Building Agentic RAG on Neo4j’s Knowledge Graph By Yogender Pal
To overcome the limitations of vector search for precise, multi-hop questions, this article details an Agentic RAG system built on a Neo4j knowledge graph. The system uses an LLM-powered agent to route natural language queries to the most appropriate tool, such as predefined Cypher functions or a dynamic Text2Cypher generator grounded in the graph’s schema. A critique agent assesses the completeness of the answer and initiates follow-up queries to ensure all parts of the question are addressed. This multi-step process yields comprehensive responses strictly derived from the database, thereby reducing hallucinations.
Our must-read articles
1. Kullback-Leibler (KL) Divergence for LLMs By Kuriko Iwai
Kullback-Leibler (KL) Divergence provides a statistical method for balancing learning and stability when fine-tuning LLMs. This piece explains its function as a regularization technique that measures the difference between a fine-tuned model’s output distribution and that of its base model. This approach helps prevent “policy collapse,” in which a model becomes overconfident on a new task. The article presents an experiment demonstrating that KL-regularized models produce more balanced outputs, effectively avoiding the extreme certainty observed in standard Supervised Fine-Tuning.
2. Building a Fully Local LLM Voice Assistant: A Practical Architecture Guide By Cosmo W. Q
This article presents a practical architectural guide for building an entirely local LLM voice assistant, moving beyond a simple pipeline to a more dynamic, loop-based model. It breaks the system into six independent stages: voice capture, speech-to-text, text cleanup, the assistant core, text-to-speech, and voice output. The core is central, integrating an LLM for reasoning, a context store for memory, and a tool layer for actions. A key recommendation is a distributed architecture that offloads heavy computations to a dedicated local machine while a lightweight device orchestrates tasks, making the system both powerful and flexible.
3. Catastrophic Forgetting: The Architecture of Becoming Human By Akilesh
This article discusses the phenomenon of catastrophic forgetting in AI, in which a neural network overwrites previous knowledge when learning a new task. The author connects this technical challenge to a personal experience of losing deep coding skills after shifting to a new domain. It evaluates standard but insufficient solutions, such as freezing weights and using larger models, which only hide the problem. The piece then explains more effective architectural approaches, including Progressive Neural Networks and Google’s “Nested Learning,” which structures learning into layers with different update speeds. The key conclusion is that both AI and human learning depend on structured resource allocation, suggesting the goal is not to prevent forgetting but to manage it effectively.
4. Google’s Nested Learning: The Brain-Inspired AI That Never Forgets By Sai Insights
The article discusses Google’s Nested Learning framework, an AI model design inspired by neuroscience to address catastrophic forgetting. The framework treats a neural network as a set of nested optimization problems, each learning at a different speed, much like the human brain’s multi-frequency processing. This allows the model to adapt to new information using its faster-updating components while preserving stable, long-term knowledge in its slower ones. It also highlights the Higher-Order Processing Engine (HOPE) architecture, which has shown improved performance in continual learning, long-context understanding, and language modeling compared to standard transformers.
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
LAI #106: Choosing the Right Shape for AI Systems was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.