
Essential Chunking Techniques for Building Better LLM Applications Image by Author
Introduction
Every large language model (LLM) application that retrieves information faces a simple problem: how do you break down a 50-page document into pieces that a model can actually use? So when you’re building a retrieval-augmented generation (RAG) app, before your vector database retrieves anything and your LLM generates responses, your documents need to be split into chunks.
The way you split documents into chunks determines what information your system can retrieve and how accurately it can answer queries. This preprocessing step, often trea…

Essential Chunking Techniques for Building Better LLM Applications Image by Author
Introduction
Every large language model (LLM) application that retrieves information faces a simple problem: how do you break down a 50-page document into pieces that a model can actually use? So when you’re building a retrieval-augmented generation (RAG) app, before your vector database retrieves anything and your LLM generates responses, your documents need to be split into chunks.
The way you split documents into chunks determines what information your system can retrieve and how accurately it can answer queries. This preprocessing step, often treated as a minor implementation detail, actually determines whether your RAG system succeeds or fails.
The reason is simple: retrieval operates at the chunk level, not the document level. Proper chunking improves retrieval accuracy, reduces hallucinations, and ensures the LLM receives focused, relevant context. Poor chunking cascades through your entire system, causing failures that retrieval mechanisms can’t fix.
This article covers essential chunking strategies and explains when to use each method.
Why Chunking Matters
Embedding models and LLMs have finite context windows. Documents typically exceed these limits. Chunking solves this by breaking long documents into smaller segments, but introduces an important trade-off: chunks must be small enough for efficient retrieval while remaining large enough to preserve semantic coherence.
Vector search operates on chunk-level embeddings. When chunks mix multiple topics, their embeddings represent an average of those concepts, making precise retrieval difficult. When chunks are too small, they lack sufficient context for the LLM to generate useful responses.
The challenge is finding the middle ground where chunks are semantically focused yet contextually complete. Now let’s get to the actual chunking techniques you can experiment with.
1. Fixed-Size Chunking
Fixed-size chunking splits text based on a predetermined number of tokens or characters. The implementation is straightforward:
- Select a chunk size (commonly 512 or 1024 tokens)
- Add overlap (typically 10–20%)
- Divide the document
The method ignores document structure entirely. Text splits at arbitrary points regardless of semantic boundaries, often mid-sentence or mid-paragraph. Overlap helps preserve context at boundaries but doesn’t address the core issue of structure-blind splitting.
Despite its limitations, fixed-size chunking provides a solid baseline. It’s fast, deterministic, and works adequately for documents without strong structural elements.
When to use: Baseline implementations, simple documents, rapid prototyping.
2. Recursive Chunking
Recursive chunking improves on fixed-size approaches by respecting natural text boundaries. It attempts to split at progressively finer separators — first at paragraph breaks, then sentences, then words — until chunks fit within the target size.

Recursive Chunking Image by Author
The algorithm tries to keep semantically related content together. If splitting at paragraph boundaries produces chunks within the size limit, it stops there. If paragraphs are too large, it recursively applies sentence-level splitting to oversized chunks only.
This maintains more of the document’s original structure than arbitrary character splitting. Chunks tend to align with natural thought boundaries, improving both retrieval relevance and generation quality.
When to use: General-purpose applications, unstructured text like articles and reports.
3. Semantic Chunking
Rather than relying on characters or structure, semantic chunking uses meaning to determine boundaries. The process embeds individual sentences, compares their semantic similarity, and identifies points where topic shifts occur.

Semantic Chunking Image by Author
Implementation involves computing embeddings for each sentence, measuring distances between consecutive sentence embeddings, and splitting where distance exceeds a threshold. This creates chunks where content coheres around a single topic or concept.
The computational cost is higher. But the result is semantically coherent chunks that often improve retrieval quality for complex documents.
When to use: Dense academic papers, technical documentation where topics shift unpredictably.
4. Document-Based Chunking
Documents with explicit structure — Markdown headers, HTML tags, code function definitions — contain natural splitting points. Document-based chunking leverages these structural elements.
For Markdown, split on header levels. For HTML, split on semantic tags like <section> or <article>. For code, split on function or class boundaries. The resulting chunks align with the document’s logical organization, which typically correlates with semantic organization. Here’s an example of document-based chunking:

Document-Based Chunking Image by Author
Libraries like LangChain and LlamaIndex provide specialized splitters for various formats, handling the parsing complexity while letting you focus on chunk size parameters.
When to use: Structured documents with clear hierarchical elements.
5. Late Chunking
Late chunking reverses the typical embedding-then-chunking sequence. First, embed the entire document using a long-context model. Then split the document and derive chunk embeddings by averaging the relevant token-level embeddings from the full document embedding.
This preserves global context. Each chunk’s embedding reflects not just its own content but its relationship to the broader document. References to earlier concepts, shared terminology, and document-wide themes remain encoded in the embeddings.
The approach requires long-context embedding models capable of processing entire documents, limiting its applicability to reasonably sized documents.
When to use: Technical documents with significant cross-references, legal texts with internal dependencies.
6. Adaptive Chunking
Adaptive chunking dynamically adjusts chunk parameters based on content characteristics. Dense, information-rich sections receive smaller chunks to maintain granularity. Sparse, contextual sections receive larger chunks to preserve coherence.

Adaptive Chunking Image by Author
The implementation typically uses heuristics or lightweight models to assess content density and adjust chunk size accordingly.
When to use: Documents with highly variable information density.
7. Hierarchical Chunking
Hierarchical chunking creates multiple granularity levels. Large parent chunks capture broad themes, while smaller child chunks contain specific details. At query time, retrieve coarse chunks first, then drill into fine-grained chunks within relevant parents.
This enables both high-level queries (“What does this document cover?”) and specific queries (“What’s the exact configuration syntax?”) using the same chunked corpus. Implementation requires maintaining relationships between chunk levels and traversing them during retrieval.
When to use: Large technical manuals, textbooks, comprehensive documentation.
8. LLM-Based Chunking
In LLM-based chunking, we use an LLM to determine chunk boundaries and push chunking into intelligent territory. Instead of rules or embeddings, the LLM analyzes the document and decides how to split it based on semantic understanding.

LLM-Based Chunking Image by Author
Approaches include breaking text into atomic propositions, generating summaries for sections, or identifying logical breakpoints. The LLM can also enrich chunks with metadata or contextual descriptions that improve retrieval.
This approach is expensive — requiring LLM calls for every document — but produces highly coherent chunks. For high-stakes applications where retrieval quality justifies the cost, LLM-based chunking often outperforms simpler methods.
When to use: Applications where retrieval quality matters more than processing cost.
9. Agentic Chunking
Agentic chunking extends LLM-based approaches by having an agent analyze each document and select the appropriate chunking strategy dynamically. The agent considers document structure, content density, and format to choose between fixed-size, recursive, semantic, or other approaches on a per-document basis.

Agentic Chunking Image by Author
This handles heterogeneous document collections where a single strategy performs poorly. The agent might use document-based chunking for structured reports and semantic chunking for narrative content within the same corpus.
The trade-off is complexity and cost. Each document requires agent analysis before chunking can begin.
When to use: Diverse document collections where optimal strategy varies significantly.
Conclusion
Chunking determines what information your retrieval system can find and what context your LLM receives for generation. Now that you understand the different chunking techniques, how do you select a chunking strategy for your application? You can do so based on your document characteristics:
- Short, standalone documents (FAQs, product descriptions): No chunking needed
- Structured documents (Markdown, HTML, code): Document-based chunking
- Unstructured text (articles, reports): Try recursive or hierarchical chunking if fixed-size chunking doesn’t give good results
- Complex, high-value documents: Semantic or adaptive or LLM-based chunking
- Heterogeneous collections: Agentic chunking
Also consider your embedding model’s context window and typical query patterns. If users ask specific factual questions, favor smaller chunks for precision. If queries require understanding broader context, use larger chunks.
More importantly, establish metrics and test. Track retrieval precision, answer accuracy, and user satisfaction across different chunking strategies. Use representative queries with known correct answers. Measure whether the correct chunks are retrieved and whether the LLM generates accurate responses from those chunks.
Frameworks like LangChain and LlamaIndex provide pre-built splitters for most strategies. For custom approaches, implement the logic directly to maintain control and minimize dependencies. Happy chunking!
References & Further Learning
- Chunking Techniques with Langchain and LlamaIndex
- Chunking Strategies to Improve Your RAG Performance | Weaviate
- 5 Chunking Strategies For RAG by Avi Chawla
- Finding the Best Chunking Strategy for Accurate AI Responses | NVIDIA Technical Blog
- Semantic Chunking – 3 Methods for Better RAG
- Chunking Strategies for LLM Applications | Pinecone