3 min readJust now
–
I have been obsessed with the idea of giving my local LLM a “circadian rhythm.”
Humans consolidate memories during sleep. We strengthen procedural skills (how to do things) and replay episodic details (what happened), but crucially, we also prune the noise. I wanted to see if I could replicate this cycle on a Mac Mini M2 using a small Llama model.
My goal was simple: A model that chats during the day, and then “sleeps” at night to process those conversations, updating its weights without forgetting how to speak English.
I call the project Circadia. Here is how I built it, and the weird things I found out about dataset size along the way.
The Theory: LoRA vs. RAG
Full fine-tuning is too expensive for a nightly routine. LoRA (Low-Rank Adaptation) o…
3 min readJust now
–
I have been obsessed with the idea of giving my local LLM a “circadian rhythm.”
Humans consolidate memories during sleep. We strengthen procedural skills (how to do things) and replay episodic details (what happened), but crucially, we also prune the noise. I wanted to see if I could replicate this cycle on a Mac Mini M2 using a small Llama model.
My goal was simple: A model that chats during the day, and then “sleeps” at night to process those conversations, updating its weights without forgetting how to speak English.
I call the project Circadia. Here is how I built it, and the weird things I found out about dataset size along the way.
The Theory: LoRA vs. RAG
Full fine-tuning is too expensive for a nightly routine. LoRA (Low-Rank Adaptation) offers a lightweight alternative for nightly updates, but if you do it naively, the model starts forgetting its original training (catastrophic forgetting).
On the other hand, RAG (Retrieval-Augmented Generation) is great for remembering specific facts verbatim, but it doesn’t help the model adapt its personality or style.
My approach was to pair them. I use LoRA for the “how” (style, behavior, preferences) and RAG for the “what” (facts, code snippets, logs).
The Setup
- Hardware: Mac mini M2, 16 GB RAM.
- • Model: Llama-3.2–3B-Instruct (4-bit quant via MLX).
- • Training: QLoRA (batch size 1, max sequence 1024).
- • Storage: ChromaDB for the RAG memory.
- The Workflow
- 1. Daytime (Chat): I use a script to log all my conversations. The model can pull past context via RAG if needed.
- 2. Nighttime (Sleep): A separate script splits the day’s logs.
- • Factual stuff (code blocks, URLs, JSON) gets sliced off and stored in the RAG database.
- • Everything else (conversational flow) gets fed into a nightly LoRA training run.
- The Experiment: The “Dolly” Paradox
- I ran several “sleep cycles” to see if the model would drift or get dumber. I tracked “token loss” (how well it predicted the chat logs) and tested it against an 8-task harness (math, reasoning, summarization, etc.).
- This is where I found something counter-intuitive.
- I compared a small, synthetic dataset I generated (“Gemini synth”, 287 items) against the massive, professional “Dolly 15k” dataset. You would assume the big, pro dataset would be better for preserving general intelligence.
- You would be wrong.
- • Run “gem7" (My small synthetic data): The model learned the new style well, and only dropped 1 point on the capability harness (4/8 score).
- • Run “dolly1" (The big pro data): The model actually got worse. It dropped 2 points on the harness (3/8 score).
- It turns out that for a nightly “sleep” cycle, less is more. A short, focused dataset interfered less with the model’s brain than a massive generalized instruction set.
- Key Takeaways
- 1. LoRA needs a brake. I found the “sweet spot” was a very gentle learning rate (2e-5) for just 50 iterations. Anything more aggressive and the model started losing IQ points.
- 2. Routing is everything. You cannot shove everything into the model’s weights. I built a router that sends “facts” to the database and only sends “vibes” to the training script.
- 3. The “Harness Hit” is real. Even with gentle settings, you will likely lose a tiny bit of reasoning capability (about 1 task on my scale) in exchange for the model becoming more personalized.
- Future Plans
- Right now, the system uses heuristics to decide what gets memorized and what gets trained. I want to replace that with a smarter classifier. I’m also looking at “incremental RAG,” where the model organizes its database during sleep, chunking long conversations into better memories.