Three AI customisation concepts

I think in metaphors. It’s how I understand anything complex – by finding connections to something I already know, preferably from a completely different domain. When I first started learning about AI concepts, I’d read the formal definitions and feel like I was staring at a wall of text. But then someone would say “it’s like…” and everything would click into place.

So I started collecting these. Every time a concept finally made sense, I’d write down what comparison made it work for me. And here’s what I noticed: I usually needed at least two different metaphors before something truly landed. The baking metaphor would get me 60% of the way there, then the developer metaphor would fill in the rest. Or vice versa.

I’ve narrowed this down to three concepts that feel foundational –…

I’ve narrowed this down to three concepts that feel foundational – the ones I keep coming back to when I’m trying to understand how AI systems actually work, or when I’m explaining MCP to someone, or when I’m making decisions about how to build something. These three form a kind of progression: understanding how AI represents meaning, then how you customize it, then how you make customization practical.

The patterns between these concepts interest me as much as the concepts themselves. How embeddings enable RAG, how LoRA makes fine-tuning accessible, how choosing between RAG and fine-tuning depends on whether you’re teaching facts or behavior. These connections make the whole landscape easier to navigate.

Embedding

TL;DR Converting text, images, or audio into dense numerical vectors (arrays of numbers) where similar meanings equal similar numbers. This is the foundation of semantic search and RAG.

What matters most: Embeddings capture meaning, not just keywords – “car” and “automobile” end up with similar embeddings. They’re fixed-size representations, so “hi” and a 500-word essay both become the same length vector. They’re used for similarity search – you compare vectors to find related content. Different models produce different dimensions – text-embedding-3-small gives you 1,536 dimensions. And there are cost-quality trade-offs to consider: smaller embeddings are cheaper but less nuanced.

The colour version: Every colour can be described with exactly three numbers: RGB values. Red equals [255, 0, 0], orange equals [255, 165, 0]. Colours that look similar have similar numbers. You can find “similar colours” by comparing these triplets. Embeddings do the same for text, except instead of three numbers describing colour, you use 1,536 numbers describing meaning. Similar meanings equal similar numbers, just like similar colours equal similar RGB values.

The developer version: It’s like hashing, but preserving similarity instead of randomising it. A hash function converts “hello” into something like 2cf24dba5fb0a30e. Embeddings convert “hello” into [0.1, 0.3, -0.2, …]. But unlike hashing where similar inputs give totally different hashes, embeddings make similar inputs give similar vectors. It’s a “similarity-preserving hash” – you can compare the results to find related content. Use cosine similarity instead of equality checks.

Fine-Tuning vs RAG

TL;DR Two different approaches to customisation. RAG equals retrieve external knowledge and include it in the prompt. Fine-tuning equals retrain model weights on your data. Use RAG for facts, fine-tuning for behaviour.

What matters most: Use RAG for changing knowledge – prices, inventory, news, data that updates frequently. Use fine-tuning for changing behaviour – style, format, tone, domain terminology. RAG is cheaper and faster because there’s no retraining needed, you just update documents. Fine-tuning has no retrieval overhead because everything’s baked into the weights. Often they work best together – fine-tune for style, RAG for facts.

The restaurant version: RAG is a chef with access to an ingredient database. Every time someone orders, they look up what ingredients to use, then cook. If ingredient prices change, the database updates automatically. Fine-tuning is training the chef in a specific cuisine. They’ve practised Italian cooking so much, they automatically know techniques, flavour combinations, and traditions without looking up recipes. Use RAG when ingredients (facts) change daily. Fine-tune when you need consistent technique (style). Best restaurants do both: trained chefs (fine-tuned) with access to fresh ingredient databases (RAG).

The developer version: RAG is dependency injection – inject external data at runtime via function parameters. Fine-tuning is monkey-patching the library itself – you’re modifying the internal behaviour. RAG equals generateResponse(prompt, retrievedContext) where you control context per request. Fine-tuning equals recompiling the function with different internal logic. Use RAG when your data changes frequently (like pulling from an API). Fine-tune when you need to change how the function processes things (like changing validation logic). Often you do both: custom business logic (fine-tuned) that queries your database (RAG).

LoRA (Low-Rank Adaptation)

TL;DR A parameter-efficient fine-tuning method. It freezes the base model and trains small “adapter” matrices instead. This reduces trainable parameters by 99%-plus. It enables fine-tuning on consumer GPUs.

What matters most: LoRA trains less than 1% of parameters, which dramatically reduces memory requirements. It produces small files – 20MB adapter versus 14GB full model. You can merge or swap adapters – keep the base model, swap adapters for different tasks. Quality is close to full fine-tuning, surprisingly effective despite fewer parameters. And it makes fine-tuning accessible, running on a 24GB GPU instead of needing clusters.

The lens filters version: The base model is a camera lens (14GB). Full fine-tuning is grinding and re-polishing the entire lens – expensive and permanent. LoRA is screwing on a small filter (20MB) that changes how light passes through. The filter is 99% smaller than the lens, but it effectively modifies the output. You can keep one lens and swap filters: sepia filter, polarising filter, UV filter. LoRA adds a small mathematical “filter” that transforms the model’s computations without changing the underlying “lens” (weights).

The developer version: It’s like the decorator pattern instead of inheritance. Full fine-tuning equals a subclass that overrides every method. LoRA equals a wrapper that intercepts some method calls and tweaks outputs. When the model computes output = input @ weights, LoRA intercepts: output = input @ (weights + A @ B) where A and B are tiny matrices. The original weights stay frozen. You’re training A and B (1% of the size) instead of retraining all weights. It’s monkey-patching with small patches instead of forking the entire codebase.

Here’s what I’m still figuring out: whether these explanations actually work for people who aren’t developers. Some concepts might still feel abstract no matter how many metaphors I throw at them.

I’m also wondering if I’ve oversimplified some of these. The fine-tuning comparison, for instance, glosses over when you might genuinely need full fine-tuning instead of LoRA, or the cases where RAG and fine-tuning aren’t alternatives at all but complementary. But I think that’s okay – this is meant to build intuition, not replace documentation.

What I’ve learned from writing this is that the same concept really does need multiple entry points. My team would understand things best through the developer metaphors. A product manager or designer might prefer the non-technical versions. And I find myself using different metaphors depending on my own mental state – sometimes the abstract database comparisons help, sometimes I need the cake-baking version.

The progression here matters too. You can’t really understand RAG without first understanding embeddings – how else would you search for similar content? And you can’t appreciate LoRA without understanding that fine-tuning exists but is expensive. These three concepts build on each other in a way that mirrors how you’d actually approach customising an AI system: understand how it represents meaning, decide whether you need to change knowledge or behaviour, then learn there’s a practical way to change behaviour without needing a GPU cluster.

If you found this useful, I’d be curious which metaphors actually worked for you. And which concepts still feel fuzzy – those are probably the ones I haven’t fully understood myself yet, despite writing explanations for them.

Embedding

Fine-Tuning vs RAG

LoRA (Low-Rank Adaptation)

Similar Posts