Slash Your LLM API Costs by 40% with TOON

Slash Your LLM API Costs by 40% with TOON

If you're building applications with Large Language Models (LLMs), you know the pain of token costs. Whether you're using OpenAI's GPT-4, Anthropic's Claude, or Google's Gemini, every token sent and received adds up.

For data-heavy applications—like RAG systems, analytics dashboards, or e-commerce agents—JSON is often the biggest culprit. It's verbose, repetitive, and token-heavy.

The Cost of JSON

Let's look at a real-world example. Imagine you're feeding a list of e-commerce orders to an LLM for analysis.

JSON Input:

[
  { "id": "ORD-001", "customer": "Alice", "total": 120.50, "status": "shipped" },
  { "id": "ORD-002", "customer": "Bob", "total": 85.00, "status": "pending" },
  { "id": "ORD-003", "customer": "Charlie", "total": 210.25, "status": "delivered" }
]

The keys "id", "customer", "total", and "status" are repeated for every single order. Plus, you're paying for all those curly braces {}, brackets [], and quotes "".

TOON Input:

|id|customer|total|status|
ORD-001|Alice|120.50|shipped
ORD-002|Bob|85.00|pending
ORD-003|Charlie|210.25|delivered

In TOON, the schema is defined once in the header. The data follows in a compact, CSV-like format.

Benchmark Results

We ran extensive benchmarks comparing TOON against standard JSON, Minified JSON, YAML, and XML across various data structures. Here are the results using the GPT-4 tokenizer:

1. Mixed-Structure Track

Datasets with nested or semi-uniform structures.

FormatToken CountComparison to JSON
TOON226,613-21.8%
JSON289,901Baseline
YAML239,958-5.6%
XML328,191+13.2%

2. E-commerce Orders (Nested)

Real-world order data with line items.

FormatToken CountSavings vs JSON
TOON72,77133.1%
JSON108,806-

3. Deeply Nested Configuration

Complex config files with low uniformity.

FormatToken CountSavings vs JSON
TOON63131.3%
JSON919-

Why It Matters

1. Lower API Bills

A 30-40% reduction in input tokens directly translates to a 30-40% reduction in your monthly API bill for input costs. For high-volume applications, this can save thousands of dollars.

2. Faster Latency

Fewer tokens mean less processing time for the LLM. This results in lower Time To First Token (TTFT) and faster overall generation, improving the user experience.

3. Larger Context Window

By compressing your data, you can fit more information into the model's context window. This is crucial for RAG applications where you want to retrieve as many relevant chunks as possible.

Conclusion

Switching to TOON is one of the easiest optimizations you can make for your LLM pipeline. It requires no model fine-tuning, no complex compression algorithms, and maintains human readability.

Start converting your data today and see how much you can save.