Teaching My AI to Sleep: Continual Learning with Llama 3 on a Mac Mini

3 min readJust now

–

I have been obsessed with the idea of giving my local LLM a “circadian rhythm.”

Humans consolidate memories during sleep. We strengthen procedural skills (how to do things) and replay episodic details (what happened), but crucially, we also prune the noise. I wanted to see if I could replicate this cycle on a Mac Mini M2 using a small Llama model.

My goal was simple: A model that chats during the day, and then “sleeps” at night to process those conversations, updating its weights without forgetting how to speak English.

I call the project Circadia. Here is how I built it, and the weird things I found out about dataset size along the way.

The Theory: LoRA vs. RAG

Full fine-tuning is too expensive for a nightly routine. LoRA (Low-Rank Adaptation) o…

3 min readJust now

–

I have been obsessed with the idea of giving my local LLM a “circadian rhythm.”

My goal was simple: A model that chats during the day, and then “sleeps” at night to process those conversations, updating its weights without forgetting how to speak English.

I call the project Circadia. Here is how I built it, and the weird things I found out about dataset size along the way.

The Theory: LoRA vs. RAG

Full fine-tuning is too expensive for a nightly routine. LoRA (Low-Rank Adaptation) offers a lightweight alternative for nightly updates, but if you do it naively, the model starts forgetting its original training (catastrophic forgetting).

On the other hand, RAG (Retrieval-Augmented Generation) is great for remembering specific facts verbatim, but it doesn’t help the model adapt its personality or style.

My approach was to pair them. I use LoRA for the “how” (style, behavior, preferences) and RAG for the “what” (facts, code snippets, logs).

The Setup

Hardware: Mac mini M2, 16 GB RAM.
• Model: Llama-3.2–3B-Instruct (4-bit quant via MLX).
• Training: QLoRA (batch size 1, max sequence 1024).
• Storage: ChromaDB for the RAG memory.
The Workflow
1. Daytime (Chat): I use a script to log all my conversations. The model can pull past context via RAG if needed.
2. Nighttime (Sleep): A separate script splits the day’s logs.
• Factual stuff (code blocks, URLs, JSON) gets sliced off and stored in the RAG database.
• Everything else (conversational flow) gets fed into a nightly LoRA training run.
The Experiment: The “Dolly” Paradox
I ran several “sleep cycles” to see if the model would drift or get dumber. I tracked “token loss” (how well it predicted the chat logs) and tested it against an 8-task harness (math, reasoning, summarization, etc.).
This is where I found something counter-intuitive.
I compared a small, synthetic dataset I generated (“Gemini synth”, 287 items) against the massive, professional “Dolly 15k” dataset. You would assume the big, pro dataset would be better for preserving general intelligence.
You would be wrong.
• Run “gem7" (My small synthetic data): The model learned the new style well, and only dropped 1 point on the capability harness (4/8 score).
• Run “dolly1" (The big pro data): The model actually got worse. It dropped 2 points on the harness (3/8 score).
It turns out that for a nightly “sleep” cycle, less is more. A short, focused dataset interfered less with the model’s brain than a massive generalized instruction set.
Key Takeaways
1. LoRA needs a brake. I found the “sweet spot” was a very gentle learning rate (2e-5) for just 50 iterations. Anything more aggressive and the model started losing IQ points.
2. Routing is everything. You cannot shove everything into the model’s weights. I built a router that sends “facts” to the database and only sends “vibes” to the training script.
3. The “Harness Hit” is real. Even with gentle settings, you will likely lose a tiny bit of reasoning capability (about 1 task on my scale) in exchange for the model becoming more personalized.
Future Plans
Right now, the system uses heuristics to decide what gets memorized and what gets trained. I want to replace that with a smarter classifier. I’m also looking at “incremental RAG,” where the model organizes its database during sleep, chunking long conversations into better memories.

Similar Posts