LLM & AI Agent Applications with LangChain and LangGraph — Part 2: What is a machine learning model and what makes LLMs special?
9 min readNov 26, 2025
–
Press enter or click to view image in full size
Welcome in next chapter in the series about LLMs-based application development.
In this part I want to clarify two things that appear constantly in any discussion about AI: what a machine learning model actually is, and what is specific about the large language models we will be using in our applications.
My goal is not to give you a textbook definition, but to build an intuition you can carry with you when we start wiring models into LangChain and LangGraph flows.
Imagine the word model in AI as shorthand for “a very specialized expert that has learned to recognize p…
LLM & AI Agent Applications with LangChain and LangGraph — Part 2: What is a machine learning model and what makes LLMs special?
9 min readNov 26, 2025
–
Press enter or click to view image in full size
Welcome in next chapter in the series about LLMs-based application development.
In this part I want to clarify two things that appear constantly in any discussion about AI: what a machine learning model actually is, and what is specific about the large language models we will be using in our applications.
My goal is not to give you a textbook definition, but to build an intuition you can carry with you when we start wiring models into LangChain and LangGraph flows.
Imagine the word model in AI as shorthand for “a very specialized expert that has learned to recognize patterns”.
Picture a person who has spent years looking at thousands of dog photos, different breeds, angles, lighting conditions. At some point, you can show them a completely new image and they answer instantly: yes, that’s a dog or no, that’s not a dog. They don’t recite a formal rule. They recognize patterns they’ve absorbed over time.
A machine learning model works in a similar way. We show it a huge number of examples with labels, such as images tagged “dog” and “not dog”. During training it tunes its internal parameters so that its predictions match the labels as often as possible. Over time, it learns what makes something look like a dog, even if it has never seen that exact picture before.
Press enter or click to view image in full size
From a technical point of view, a machine learning model is a mathematical or statistical structure trained on data to perform a specific task: classification, prediction, pattern recognition, optimization and so on. It takes input data, processes it through its internal structure (weights, parameters, layers) and produces an output: a class, a probability, a number, a sequence.
The key aspect is that we do not manually program all the rules for this mapping. Instead, we let the model learn from training data. We feed it many input–output examples, compare its predictions with the true answers and adjust the parameters to reduce the error. You can think of it like tuning a musical instrument: at first it sounds terrible, then after many small corrections it starts to produce something that “sounds right”.
There are many kinds of models. Some are quite simple, like linear models that fit a straight line or a plane through the data. Others, such as decision trees or gradient boosted trees, split the input space into regions and apply different rules in each region. And then we have neural networks, which can become very large and deep and are the main workhorse in modern deep learning.
Press enter or click to view image in full size
In the training process the model keeps adjusting its parameters to minimise the difference between its predictions and the true targets. The more effectively it can reduce this difference on good, representative data, the better it will generalise to new, unseen examples. That is why the quality of the training set and the training procedure matters just as much as the model architecture.
For us, the important takeaway is that a trained model is a frozen result of this learning process. Once training is done, we do not see the learning anymore. We just send inputs and get outputs. But under the hood there is a large mathematical object that encodes what it has learned from the data.
Press enter or click to view image in full size
Within the family of neural networks there is a special group devoted to language: text models.
A text model is like a specialist who lives entirely in the space of words and sentences. It reads text, represents it internally in a numeric form, and then uses that representation to understand or generate new text. You can think of it as having a translator, an editor, a copywriter and a text analyst merged into a single system.
Such a model can translate between languages, classify emails as spam or not spam, detect sentiment, answer questions about a document, and also generate completely new content. All of this is possible because during training it has seen enormous amounts of natural language and has discovered patterns in how words and phrases tend to appear together.
In the world of text models there are different architectural approaches depending on the task. It is a bit like having different tools in a workshop: a screwdriver, a drill, a saw. All of them work with “material”, but each one is better suited for a specific operation.
Before transformers took over, a large part of NLP relied on recurrent neural networks (RNNs). To understand transformers and LLMs later, it helps to briefly look at how these recurrent models think about sequences.
Press enter or click to view image in full size
many-to-many network
On the slide we see a “one-to-many” text model drawn as a chain of boxes. Each box represents a copy of the same neural network cell. At each step it receives two things: the current word (encoded as a vector) and the hidden state coming from the previous box, which is basically compressed information about everything processed so far.
Text is a sequence, so to process a sentence you feed the words one by one into those boxes. Each box does its computation and produces two outputs: one is the actual output at that time step (for example, a prediction of the next word or a tag for the current word), and the second is the new hidden state passed to the next box in the chain.
The first box has no previous box, so as the “previous state” it gets a vector of zeros. After that the chain fills this hidden state with progressively richer information about the sequence.
When every input word produces an output, and the sequence length at the input equals the sequence length at the output, we talk about a many-to-many recurrent network with Tx = Ty. You put in a sequence of length Tx, you get out a sequence of the same length Ty. This structure is used in tasks such as speech recognition, where you convert a sequence of audio frames into a sequence of text units.
Press enter or click to view image in full size
many-to-many network Tx = Ty
When you write an email or a text message, you sometimes see your phone suggesting the next word. This is a slightly different setup, often described as one-to-many. You give the network some initial input and then ask it to start generating a sequence. The previously generated word is fed back as input for the next step, together with the hidden state that carries context from all earlier steps.
Get Michalzarnecki’s stories in your inbox
Join Medium for free to get updates from this writer.
This allows the model to generate an arbitrarily long sequence of text. In theory it could go on forever. In practice, recurrent networks have trouble keeping track of very long-range dependencies. They tend to lose information about the distant past, and they are prone to repeating themselves or getting stuck in loops. That is why pure one-to-many RNNs are usually limited to relatively short texts or used as building blocks inside more complex architectures.
So far we have seen many-to-many, where the number of inputs and outputs match, and one-to-many, where you start from a fixed input and generate a longer sequence. In real applications, such as answering a question, solving a problem or translating text, we often have another scenario: the number of input words is different from the number of output words. The same idea expressed in two languages may use very different sentence lengths.
Press enter or click to view image in full size
one-to-many network
To handle such tasks, the field moved towards combining both ideas into an encoder–decoder architecture.
The first part of the network plays the role of an encoder. It processes the entire input sequence in a many-to-many fashion (Tx = Ty if you look at the internal steps), but we do not care about its explicit outputs at each step. What interests us is the final hidden state, a compressed representation of the whole input sequence. This vector, or set of vectors, is treated as a high-level summary of the text: its meaning, intent and key details.
This encoded representation is then handed to the second part of the network, the decoder. The decoder is a one-to-many network that takes this summary and starts generating the output sequence word by word. At each step it uses both the encoded context from the encoder and the words it has already generated.
You can think of it like this: the encoder reads the original text, makes internal notes in its own technical language and passes those notes to the decoder. The decoder then uses these notes to produce the final answer, translation or explanation.
This encoder–decoder setup became very important in machine translation, summarization and question answering. And it sets the stage for the next big idea: transformers.
Press enter or click to view image in full size
many-to-many network Tx ≠Ty
A large text model is essentially a very advanced version of these language models, trained on huge text corpora and containing billions or even trillions of parameters. The sheer size gives it the capacity to represent very subtle patterns in how language is used. It can track long contexts, understand complex queries and generate responses that feel natural and coherent.
This brings us to the Transformer architecture. You can imagine the transformer as a kind of text-processing factory with two main departments: an encoder side and a decoder side.
The encoder is responsible for analysis. It takes the input text and converts it into an internal representation that is much more suitable for computation. It is as if you had a specialist who reads a Polish document and rewrites it into a precise, technical shorthand that only machines understand.
The decoder is responsible for production. It takes this internal code and turns it into new text in the target language or format. This is the part that actually writes out the translation, the answer or the continuation of a sentence.
The crucial difference between transformers and earlier recurrent architectures is how they handle sequences. Instead of processing words strictly one by one in order, transformers use mechanisms like self-attention to look at many positions in the input at the same time. This allows them to analyse entire sentences, or even paragraphs, more in parallel, which is both more powerful and more efficient on modern hardware.
Press enter or click to view image in full size
Once you stack many transformer layers, train them on huge text datasets and push the parameter count to the billions, you arrive at what we usually call a Large Language Model, an LLM.
The difference between a “regular” text model and an LLM is a bit like the difference between a single newspaper article and an entire encyclopaedia. Both are written in language, but the encyclopaedia covers many more topics and patterns. An LLM has effectively internalised a massive library of linguistic and factual patterns in its parameters.
Thanks to this scale it can better follow context, pick up on nuances, adapt its answers to different tasks and styles, and reach very high performance on a broad range of NLP tasks.
Press enter or click to view image in full size
If you would like to get an even deeper feel for how an LLM processes text step by step, I recommend an interactive visualization at bbycroft.net/llm. It is a great way to see how token probabilities evolve as the model generates text.
Because of the enormous computational and data requirements, LLMs are usually trained by large tech companies or specialized research groups. Training them from scratch is far beyond what we need to do in typical application development.
Our role, and the focus of this series is different: we will use already trained models as components inside our own systems. We will connect them to our data, wrap them in agents, define guardrails and evaluation, and orchestrate everything with LangChain and LangGraph.
Thats all for this chapter. In the next part we will move closer to transformers and attention from the practical side, and then transition to writing code that uses these ideas in your own applications.
**see previous **chapter
**see next **chapter
see the GitHub repository with the code examples: https://github.com/mzarnecki/course_llm_agent_apps_with_langchain_and_langgraph