Skip to content

Benchmarks

Real-world performance data demonstrating AGON's adaptive format selection and token savings.


Overview

These benchmarks measure token counts across 7 real-world datasets using tiktoken's o200k_base encoding (GPT-4, GPT-4 Turbo, GPT-4o). All results are reproducible—run make benchmarks to verify.


Benchmark Datasets

Dataset Size Description Characteristics
toon.json 0.6 KB Hiking records with nested context Uniform array (3 records, 6 fields), mixed nesting
scars.json 9.8 KB Error tracking data Mixed structure, heterogeneous fields
128KB.json 249 KB Large structured data (788 employee records) Many nested arrays, wide tables
historical.json 127 KB Historical OHLCV data Repeated {time, value} pattern (struct candidate)
chart.json 196 KB 1,256 candles Deep nesting, array-heavy, metadata objects
quote.json 283 KB Single quote (nested) Complex nested structure with 20+ fields
gainers.json 257 KB 100 complex quotes Complex irregular nested objects (20+ fields each)

Results Summary

Dataset Pretty JSON Compact JSON AGONRows AGONColumns AGONStruct Auto Selected Savings
toon.json 229 139 96 108 144 rows (96) +58.1%
scars.json 2,600 2,144 2,225 2,230 2,448 json (2,144) +17.5%
128KB.json 77,346 62,378 54,622 54,292 59,926 rows (54,622) +29.4%
historical.json 84,094 55,228 70,286 70,286 48,969 struct (48,969) +41.8%
chart.json 101,767 71,623 51,541 51,558 65,364 rows (51,541) +49.4%
quote.json 128,981 85,956 67,251 65,586 69,053 columns (65,586) +49.2%
gainers.json 142,791 91,634 113,132 113,132 89,012 struct (89,012) +37.7%

Safety Net Demonstrated

scars.json shows auto mode's safety guarantee in action:

  • All AGON formats produce worse or marginal results compared to compact JSON
  • Auto mode correctly fell back to JSON, avoiding regression
  • Auto selection uses the compact-JSON baseline for min_savings gating (see AGON.encode)

gainers.json demonstrates adaptive format selection:

  • Rows/Columns formats made token counts worse than compact JSON (113K vs 91K)
  • Auto mode selected Struct format (89,012 tokens), achieving 37.7% savings vs pretty JSON

Performance

AGON's core encoding engine is built in Rust and exposed to Python via PyO3, delivering exceptional performance even on large datasets.

Encode Times

Time to encode data to each format (in milliseconds):

Dataset Size Records JSON Rows Columns Struct Auto (selected)
toon.json 0.6 KB 1 0.00 ms 0.10 ms 0.09 ms 0.14 ms 0.40 ms (rows)
scars.json 9.8 KB 1 0.01 ms 0.56 ms 0.51 ms 0.64 ms 1.65 ms (json)
128KB.json 249 KB 788 0.16 ms 16.82 ms 14.10 ms 19.49 ms 27.94 ms (rows)
historical.json 127 KB 1 1.05 ms 20.72 ms 21.09 ms 31.90 ms 36.22 ms (struct)
chart.json 196 KB 1,256 0.50 ms 26.46 ms 25.27 ms 35.97 ms 36.55 ms (rows)
quote.json 283 KB 1 0.62 ms 47.15 ms 42.86 ms 67.44 ms 63.21 ms (columns)
gainers.json 257 KB 100 0.72 ms 47.46 ms 42.46 ms 62.38 ms 71.10 ms (struct)

Decode Times

Time to decode data from each format back to Python objects (in milliseconds):

Dataset Size Records JSON Rows Columns Struct Auto (selected)
toon.json 0.6 KB 1 0.01 ms 0.30 ms 0.12 ms 0.29 ms 0.48 ms (rows)
scars.json 9.8 KB 1 0.05 ms 3.26 ms 0.76 ms 3.20 ms 0.11 ms (json)
128KB.json 249 KB 788 0.91 ms 22.68 ms 17.28 ms 60.26 ms 19.91 ms (rows)
historical.json 127 KB 1 2.50 ms 131.49 ms 30.78 ms 68.84 ms 68.35 ms (struct)
chart.json 196 KB 1,256 1.30 ms 33.20 ms 31.50 ms 57.79 ms 33.39 ms (rows)
quote.json 283 KB 1 1.91 ms 92.92 ms 52.45 ms 102.22 ms 45.21 ms (columns)
gainers.json 257 KB 100 2.06 ms 241.39 ms 68.67 ms 139.56 ms 141.88 ms (struct)

Rust + PyO3 Architecture

AGON's performance comes from its Rust core with zero-copy PyO3 bindings:

  • Parallel encoding: Uses rayon for concurrent format evaluation in auto mode
  • Fast tokenization: Rust implementation of tiktoken for accurate token counting
  • Memory efficient: Minimal allocations, string operations optimized
  • Native speed: Compiled Rust code with Python convenience
# Behind the scenes, this Rust code runs:
# - Parallel format encoding with rayon
# - Fast JSON parsing with serde_json
# - Efficient string building with zero allocations
result = AGON.encode(large_dataset, format="auto")

Savings


Running Benchmarks

Reproduce these results locally:

# Run all benchmarks with verbose output
uv run pytest tests/test_benchmarks.py -v

# Run benchmarks for specific dataset
uv run pytest tests/test_benchmarks.py::test_benchmark_toon -v

Methodology

Token Counting

All token counts use tiktoken library with o200k_base encoding:

import tiktoken

encoding = tiktoken.get_encoding("o200k_base")
tokens = len(encoding.encode(text))

This encoding is used by:

  • GPT-4 (all variants)
  • GPT-4 Turbo
  • GPT-4o

Baseline Comparison

Pretty JSON: json.dumps(data, indent=2)

  • Standard 2-space indentation
  • Newlines after each field
  • Human-readable, not optimized

Compact JSON: json.dumps(data, separators=(',', ':'))

  • No whitespace
  • Minimal formatting
  • Primary baseline for AGON min_savings comparison

Format Testing

Each dataset tested with all formats:

  1. AGONRows: Row-based tabular encoding
  2. AGONColumns: Columnar transpose encoding
  3. AGONStruct: Template-based encoding
  4. Auto mode: Selects best of above or falls back to JSON

Savings Calculation

savings_percent = ((baseline - agon) / baseline) * 100
  • Positive %: AGON saved tokens (better)
  • Negative %: AGON used more tokens (worse—triggers JSON fallback)

Next Steps

JSON Fallback

View how JSON is used as a safety net

AGONRows Format

Learn about the most common format

API Reference

Complete API documentation

Core Concepts

Design principles and adaptive approach