Slash Your LLM API Costs by 40% with TOON
Slash Your LLM API Costs by 40% with TOON
If you're building applications with Large Language Models (LLMs), you know the pain of token costs. Whether you're using OpenAI's GPT-4, Anthropic's Claude, or Google's Gemini, every token sent and received adds up.
For data-heavy applications—like RAG systems, analytics dashboards, or e-commerce agents—JSON is often the biggest culprit. It's verbose, repetitive, and token-heavy.
The Cost of JSON
Let's look at a real-world example. Imagine you're feeding a list of e-commerce orders to an LLM for analysis.
JSON Input:
[
{ "id": "ORD-001", "customer": "Alice", "total": 120.50, "status": "shipped" },
{ "id": "ORD-002", "customer": "Bob", "total": 85.00, "status": "pending" },
{ "id": "ORD-003", "customer": "Charlie", "total": 210.25, "status": "delivered" }
]
The keys "id", "customer", "total", and "status" are repeated for every single order. Plus, you're paying for all those curly braces {}, brackets [], and quotes "".
TOON Input:
|id|customer|total|status|
ORD-001|Alice|120.50|shipped
ORD-002|Bob|85.00|pending
ORD-003|Charlie|210.25|delivered
In TOON, the schema is defined once in the header. The data follows in a compact, CSV-like format.
Benchmark Results
We ran extensive benchmarks comparing TOON against standard JSON, Minified JSON, YAML, and XML across various data structures. Here are the results using the GPT-4 tokenizer:
1. Mixed-Structure Track
Datasets with nested or semi-uniform structures.
| Format | Token Count | Comparison to JSON |
|---|---|---|
| TOON | 226,613 | -21.8% |
| JSON | 289,901 | Baseline |
| YAML | 239,958 | -5.6% |
| XML | 328,191 | +13.2% |
2. E-commerce Orders (Nested)
Real-world order data with line items.
| Format | Token Count | Savings vs JSON |
|---|---|---|
| TOON | 72,771 | 33.1% |
| JSON | 108,806 | - |
3. Deeply Nested Configuration
Complex config files with low uniformity.
| Format | Token Count | Savings vs JSON |
|---|---|---|
| TOON | 631 | 31.3% |
| JSON | 919 | - |
Why It Matters
1. Lower API Bills
A 30-40% reduction in input tokens directly translates to a 30-40% reduction in your monthly API bill for input costs. For high-volume applications, this can save thousands of dollars.
2. Faster Latency
Fewer tokens mean less processing time for the LLM. This results in lower Time To First Token (TTFT) and faster overall generation, improving the user experience.
3. Larger Context Window
By compressing your data, you can fit more information into the model's context window. This is crucial for RAG applications where you want to retrieve as many relevant chunks as possible.
Conclusion
Switching to TOON is one of the easiest optimizations you can make for your LLM pipeline. It requires no model fine-tuning, no complex compression algorithms, and maintains human readability.
Start converting your data today and see how much you can save.