A better way for coding agents to read files
• January 30, 2026
We decreased token usage for ultra-large files by 90%.
When coding agents need to understand a codebase, they typically use a combination of searching and reading files to find the relevant parts. This works fine for small files, but becomes prohibitively expensive for large ones.
Consider a scenario where an agent needs to understand a 5,000-line file to add a new endpoint. Reading the entire file consumes 40,000+ tokens - that’s $0.20 with Claude Opus. But the cost isn’t just financial: you’re also waiting for the model to process all those tokens, adding latency to every response. Worse, as the context window fills with irrelevant code, you get context rot - the model’s attention degrades, and it becom…
A better way for coding agents to read files
• January 30, 2026
We decreased token usage for ultra-large files by 90%.
When coding agents need to understand a codebase, they typically use a combination of searching and reading files to find the relevant parts. This works fine for small files, but becomes prohibitively expensive for large ones.
Consider a scenario where an agent needs to understand a 5,000-line file to add a new endpoint. Reading the entire file consumes 40,000+ tokens - that’s $0.20 with Claude Opus. But the cost isn’t just financial: you’re also waiting for the model to process all those tokens, adding latency to every response. Worse, as the context window fills with irrelevant code, you get context rot - the model’s attention degrades, and it becomes more likely to miss important details or make mistakes. Over an entire agent trajectory, these problems compound significantly.
The Problem with Line-Based File Reading
Typical agents use start and end line numbers to read files. The tool interface looks like this:
This approach has a fundamental flaw: it forces the LLM to make blind decisions about which parts of a file are relevant before seeing what those sections contain.
Let’s say you have a 3,000-line file and the agent needs to understand the function. The agent has two bad options:
- Read the entire file - Costs 25,000+ tokens, most of which are irrelevant
- Guess line numbers - Search for “calculate_shipping_cost” to find the start line - but the end line is unknown.
Here’s what this looks like in practice:
We found that coding agents either:
- Over-read which wastes tokens
- Under-read and miss critical context, requiring a second read (which wastes time and makes you pay for two requests to the underlying model)
The initial solution we tried was giving the agent a separate tool (like ) that returns the structure without the full content. In theory, the agent could peek at the outline first, then read specific sections as needed. The sequence of tool calls would look something like:
In practice, agents struggle to use tools like this as they’re trained to read files directly. The agent hasn’t been trained to internally make a “should I peek first?” decision, so it just calls directly without using the outline tool.
We tried to fix this with elaborate prompting (instructing the agent to always check the outline first), but this approach doesn’t work well because it’s fighting the model’s instincts and adding latency to every file read, even for small files that don’t need it. Determining whether to use the outline tool is simple. If the number of tokens is over a certain threshold we can show the overview, otherwise allow the read file call to go through normally.
Our Solution
The solution that actually works is invisible to the agent: wrap the existing tool so that it directly returns an outline when the file is too large.
From the agent’s perspective, it can call , and it gets back something useful. The wrapper handles the complexity:
- If the file is small enough, return the full contents as normal
- If it’s too large, return a structural outline with line numbers. The agent can then request specific line ranges to dig deeper.
This works with the model’s training rather than against it. The agent just reads files, and large files come back as navigable outlines.
Finding the right preview format
We experimented with several formats to show the LLM. First we showed a given symbol outline like so:
[...5 nested items, read lines 298-558 for details]
Click or hover to highlight the tokens
This is a good start, but we found a more token efficient way to show this. The format below is 9 tokens while the original was 15 tokens (saving approximately 30% over the entire outline):
(5 children) [298:558]
Click or hover to highlight the tokens
This symbol outline contains classes, functions, and properties with their visibility modifiers and line ranges. This gives the agent exactly what it needs to navigate: the shape of the code and where to look for details.
The full format looks like this:
The agent can immediately see that is the meatiest function and request lines 40-95 if that’s where it needs to look.
Here’s another example using from Zed. The entire file is ~207,000 tokens using the OpenAI tokenizer, while our outline is only 3,694 tokens (saving 98.2%):
File Outline (click to select)
📄 Full File (all 0 lines)
Full File (0 lines, 0 tokens)
File Preview (lines 1-28553, 0 tokens)
Adaptive depth for very large files
We can optimize the outline generation even further. Instead of showing the full outline, we can cap the depth adaptively based on file size. We start with a greedy approach and only reduce depth when necessary:
- Generate the full outline with unlimited depth
- Check the size. If it’s under 10,000 tokens, we’re done. If not, regenerate with depth capped at 10
- Keep reducing depth by 1 until it fits, stopping at depth 1 (top-level symbols only)
The markers are important. They tell the agent there’s structure it isn’t seeing. Combined with line ranges, the agent can decide whether to request on lines 50-150 to explore , or whether the top-level view is enough for its current task.
This approach means small-to-medium files get full structural detail, while massive files still return something useful instead of truncating or erroring out.
The results
After implementing this approach in Sweep Agent, we saw:
- 90% reduction in token usage for files larger than 2,000 lines
- Faster response times - Fewer tokens means faster generation and lower latency
- Reduced context rot - Smaller, more focused context windows keep the agent’s attention sharp
- Less compaction and truncation needed for large files
Our agent now navigates large files efficiently without any special training. By using the outline, it’s able to be much more surgical in its file reads, identifying the relevant sections and requesting only what it needs. Small files still return in full, while large files are now much more manageable.
If you’re using JetBrains IDEs, you can download our plugin to try this out.