How to use TOON to reduce your token usage by 60%

In the rapidly evolving landscape of large language models (LLMs), token efficiency is becoming a serious concern. As developers and researchers keep pushing more structured data into models, the cost and latency tied to token count only grow. That’s where Token-Oriented Object Notation (TOON) (GitHub repository here) comes in. It’s a serialization format built specifically for LLM prompts, aiming to cut down token usage while keeping the data structured and machine-readable. The authors describe TOON as “a compact, deterministic JSON format for LLM prompts,” and their benchmarks show 30–60 percent few…

In this post, we introduce TOON, walk through a simple token-count comparison against JSON, look at where the savings come from, and end with a realistic note: TOON isn’t a one-size-fits-all solution.

🚀 Sign up for The Replay newsletter

The Replay is a weekly newsletter for dev and engineering leaders.

Delivered once a week, it’s your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.

What is TOON?

TOON is a serialization format for structured data designed with LLM inputs in mind. It is human-readable, uses minimal syntax (leaning on indentation and compact arrays), and aims to remove the repeated overhead of typical JSON when dealing with large uniform arrays of objects.

Key features include:

Declaring the length of arrays and the field names once (for tabular data) instead of repeating keys for each object
Using indentation rather than braces/brackets in many places helps reduce token overhead
Support for alternate delimiters (tab \t, pipe |) to further reduce token count when arrays are very large
A clear “when to use / when not to use” guidance in the spec: it excels with uniform arrays of primitive-valued objects; for deeply nested, irregular, or non-uniform data, it may not perform as well

In short: if you regularly pass large chunks of tabular or array-structured data to an LLM, TOON offers a compelling alternative to JSON (or YAML) by saving tokens and retaining structure.

A simple token count comparison

Let’s walk through a minimal example to illustrate how token count savings can occur with TOON.

The JSON baseline

Suppose you have the following JSON data:

{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}

In many LLM systems, each token contributes cost or context usage. Because JSON repeats the property names (“id”, “name”, “role”) for each object, there’s overhead.

The TOON equivalent

Using TOON, the same data might be encoded as:

users[2]{id,name,role}:
1,Alice,admin
2,Bob,user

Notice:

The array length (“2”) is declared once after users
The field names id, name, role are declared once rather than repeated
Each row then only contains the values, separated by commas (or other delimiters)

It is clear that the intuition behind TOON is simple: spare the repetition of the field names, parentheses, and other symbols; you just pass the data to the LLM.

JSON – repeating keys → higher token count
**TOON **– keys once + compact rows → fewer tokens

The Format Tokenization Exploration tool allows you to compare token usage side by side across different serialization formats, including CSV, pretty-printed JSON, compressed JSON, YAML, and TOON (Token-Oriented Object Notation). As the playground shows, for the same dataset, you might see JSON using, say, X tokens, while TOON uses significantly fewer (sometimes ~30-60% less) under the right conditions. The ability to toggle dataset size and complexity makes it immediately clear how format choice directly impacts token cost – a powerful visual for anyone working with LLM-prompt budgets.

Here’s a quick comparison showing how TOON stacks up against other data formats in terms of structure and token usage:

Format	Example Structure	Approx. Token Count	Relative Size vs. JSON
Pretty-Printed JSON	Human-readable JSON with indentation and repeated keys	6,360	100 % (baseline)
Minified JSON	Compact JSON, no spaces or line breaks	5,420	≈ 85 %
YAML	Whitespace-based structure still repeats keys	6,050	≈ 95 %
CSV	Flat table, minimal structure	2,360	≈ 37 %
TOON	Declares fields once, tabular rows below	2,518	≈ 40 %

The team behind TOON offers not only the specification but also the libraries for a range of languages to handle the TOON encode/decode tasks. In the library, a function is also available to estimate the savings of using TOON instead of JSON. The following code fragment shows an example of its use and the result (we are using the Python version here):

from toon_format import estimate_savings

# Your typical prompt data
prompt_data = {
"context": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Analyze this data"}
],
"data": [
{"id": i, "value": f"Item {i}", "score": i * 10}
for i in range(1, 101)  # 100 items
]
}

result = estimate_savings(prompt_data["data"])

# Compare formats
# GPT-5 pricing (example: $0.01 per 1K tokens)

cost_per_1k = 0.01

json_cost = (result['json_tokens'] / 1000) * cost_per_1k
toon_cost = (result['toon_tokens'] / 1000) * cost_per_1k

print(f"JSON: {result['json_tokens']} tokens")
print(f"TOON: {result['toon_tokens']} tokens")

print(f"JSON cost per request: ${json_cost:.4f}")
print(f"TOON cost per request: ${toon_cost:.4f}")

print(f"Savings: {result['savings_percent']:.1f}%")

print(f"Savings per request: ${json_cost - toon_cost:.4f}")
print(f"Savings per 10,000 requests: ${(json_cost - toon_cost) * 10000:.2f}")

The results, with the data in the example, are:

JSON: 2703 tokens
TOON: 1009 tokens
JSON cost per request: $0.0270
TOON cost per request: $0.0101
Savings: 62.7%
Savings per request: $0.0169
Savings per 10,000 requests: $169.40

The code uses an estimated cost of $0.01 per 1,000 tokens, which is not a real measurement. This is because perfect visibility or guarantees about the final token count are not possible before the payload reaches the model. The estimate_savings method relies on the tiktoken library, the same tokenizer used by OpenAI; however, other LLMs may use a different tokenizer, meaning actual savings can vary.

Caveats & limitations

While TOON offers meaningful benefits in many scenarios, it’s not a silver bullet. Here are some important caveats to keep in mind:

Best for uniform arrays of objects – TOON shines when you have many objects with identical fields and primitive values. If your objects vary in keys, or you have deep nesting or mixed types, the tabular assumptions of TOON break down. This is not rocket science; if you have long arrays of objects, you can easily expect that by not repeating the field names for every single element of the array, it saves tokens
Flat CSV may be more efficient – For pure tabular data, CSV slightly outperforms TOON. So if you’re purely flat table and don’t need the extra structure semantics, CSV might suffice
Tooling and ecosystem considerations – While TOON has a TypeScript SDK and CLI, many existing pipelines and systems assume JSON. Introducing a new format may result in additional conversion, reduced tooling, and increased complexity
Readability & familiarity – For humans, JSON remains extremely familiar. While TOON is reasonably readable, developers will need to learn its syntax and conventions (length markers, delimiters, indentation rules), and some may find it less immediately intuitive
Token savings but not magic – Even with TOON, the overhead of the data may still dominate. The actual savings depend on dataset size, structure uniformity, and the specifics of the LLM’s tokenizer. Always benchmark with your own data

Real-world usage

Because TOON is a relatively new format, today’s LLMs are far more familiar with conventional structures like JSON, which appear heavily in their training data. As a result, you can’t assume a model will recognize or produce TOON without guidance.

In practice, you need to teach the format by example: show a small TOON snippet, name the format explicitly, and clearly state that the model should use it in its response. The authors emphasize that demonstration works better than explanation – LLMs quickly infer the pattern once they see the header, the field list, and a few aligned rows. After generation, it’s good practice to decode or parse the output to confirm it matches the expected structure before using it downstream.

This simple loop – show, request, verify – tends to be the most reliable way to incorporate TOON into real workflows. So, for example, before passing real data with the prompt, you can show a fragment of TOON to the LLM and/or explain how to manipulate it: with a fragment like this, you explain the TOON format and also teach the LLM how to present TOON data (notice toon – at the beginning of the fragment)

```toon
users[3,]{id,name,age}:
1,Alice,30
2,Bob,25
3,Charlie,35


Or you can use a brief TOON explanation before using it:

Respond using TOON format (Token-Oriented Object Notation):

Use key: value for objects
Use indentation for nesting
Use [N] to indicate array lengths
Use tabular format [N,]{fields}: for uniform arrays

Example: users[2,]{id,name}: 1,Alice 2,Bob


## Conclusion

As LLM usage grows and context windows expand, every token counts – both in terms of cost and performance\. TOON offers a compelling format for those feeding structured, repetitive data into models: by declaring field names once, flattening rows, and cutting syntactic overhead, it achieves meaningful reductions in token usage while preserving structure\.

That said, the format has its sweet spots: when your data is uniform, tabular, and high-volume\. If your dataset is deeply nested, irregular, or you’re already using simple CSV, the gains may be minimal or even reversed\.

If you’re working with LLM prompts that include large arrays of objects and you’re hitting token-budget constraints, it’s worth exploring TOON: try converting a sample dataset, measure token counts with your target tokenizer, compare accuracy/bandwidth, and decide whether the switch is worth it in your context\.