Benchmarks¶

Real-world performance data demonstrating AGON's adaptive format selection and token savings.

Overview¶

These benchmarks measure token counts across 7 real-world datasets using tiktoken's o200k_base encoding (GPT-4, GPT-4 Turbo, GPT-4o). All results are reproducible—run make benchmarks to verify.

Benchmark Datasets¶

Dataset	Size	Description	Characteristics
toon.json	0.6 KB	Hiking records with nested context	Uniform array (3 records, 6 fields), mixed nesting
scars.json	9.8 KB	Error tracking data	Mixed structure, heterogeneous fields
128KB.json	249 KB	Large structured data (788 employee records)	Many nested arrays, wide tables
historical.json	127 KB	Historical OHLCV data	Repeated `{time, value}` pattern (struct candidate)
chart.json	196 KB	1,256 candles	Deep nesting, array-heavy, metadata objects
quote.json	283 KB	Single quote (nested)	Complex nested structure with 20+ fields
gainers.json	257 KB	100 complex quotes	Complex irregular nested objects (20+ fields each)

Results Summary¶

Dataset	Pretty JSON	Compact JSON	AGONRows	AGONColumns	AGONStruct	Auto Selected	Savings
toon.json	229	139	96	108	144	rows (96)	+58.1%
scars.json	2,600	2,144	2,225	2,230	2,448	json (2,144)	+17.5%
128KB.json	77,346	62,378	54,622	54,292	59,926	rows (54,622)	+29.4%
historical.json	84,094	55,228	70,286	70,286	48,969	struct (48,969)	+41.8%
chart.json	101,767	71,623	51,541	51,558	65,364	rows (51,541)	+49.4%
quote.json	128,981	85,956	67,251	65,586	69,053	columns (65,586)	+49.2%
gainers.json	142,791	91,634	113,132	113,132	89,012	struct (89,012)	+37.7%

Safety Net Demonstrated

scars.json shows auto mode's safety guarantee in action:

All AGON formats produce worse or marginal results compared to compact JSON
Auto mode correctly fell back to JSON, avoiding regression
Auto selection uses the compact-JSON baseline for min_savings gating (see AGON.encode)

gainers.json demonstrates adaptive format selection:

Rows/Columns formats made token counts worse than compact JSON (113K vs 91K)
Auto mode selected Struct format (89,012 tokens), achieving 37.7% savings vs pretty JSON

Performance¶

AGON's core encoding engine is built in Rust and exposed to Python via PyO3, delivering exceptional performance even on large datasets.

Encode Times¶

Time to encode data to each format (in milliseconds):

Dataset	Size	Records	JSON	Rows	Columns	Struct	Auto (selected)
toon.json	0.6 KB	1	0.00 ms	0.10 ms	0.09 ms	0.14 ms	0.40 ms (rows)
scars.json	9.8 KB	1	0.01 ms	0.56 ms	0.51 ms	0.64 ms	1.65 ms (json)
128KB.json	249 KB	788	0.16 ms	16.82 ms	14.10 ms	19.49 ms	27.94 ms (rows)
historical.json	127 KB	1	1.05 ms	20.72 ms	21.09 ms	31.90 ms	36.22 ms (struct)
chart.json	196 KB	1,256	0.50 ms	26.46 ms	25.27 ms	35.97 ms	36.55 ms (rows)
quote.json	283 KB	1	0.62 ms	47.15 ms	42.86 ms	67.44 ms	63.21 ms (columns)
gainers.json	257 KB	100	0.72 ms	47.46 ms	42.46 ms	62.38 ms	71.10 ms (struct)

Decode Times¶

Time to decode data from each format back to Python objects (in milliseconds):

Dataset	Size	Records	JSON	Rows	Columns	Struct	Auto (selected)
toon.json	0.6 KB	1	0.01 ms	0.30 ms	0.12 ms	0.29 ms	0.48 ms (rows)
scars.json	9.8 KB	1	0.05 ms	3.26 ms	0.76 ms	3.20 ms	0.11 ms (json)
128KB.json	249 KB	788	0.91 ms	22.68 ms	17.28 ms	60.26 ms	19.91 ms (rows)
historical.json	127 KB	1	2.50 ms	131.49 ms	30.78 ms	68.84 ms	68.35 ms (struct)
chart.json	196 KB	1,256	1.30 ms	33.20 ms	31.50 ms	57.79 ms	33.39 ms (rows)
quote.json	283 KB	1	1.91 ms	92.92 ms	52.45 ms	102.22 ms	45.21 ms (columns)
gainers.json	257 KB	100	2.06 ms	241.39 ms	68.67 ms	139.56 ms	141.88 ms (struct)

Rust + PyO3 Architecture¶

AGON's performance comes from its Rust core with zero-copy PyO3 bindings:

Parallel encoding: Uses rayon for concurrent format evaluation in auto mode
Fast tokenization: Rust implementation of tiktoken for accurate token counting
Memory efficient: Minimal allocations, string operations optimized
Native speed: Compiled Rust code with Python convenience

# Behind the scenes, this Rust code runs:
# - Parallel format encoding with rayon
# - Fast JSON parsing with serde_json
# - Efficient string building with zero allocations
result = AGON.encode(large_dataset, format="auto")

Savings¶

Running Benchmarks¶

Reproduce these results locally:

# Run all benchmarks with verbose output
uv run pytest tests/test_benchmarks.py -v

# Run benchmarks for specific dataset
uv run pytest tests/test_benchmarks.py::test_benchmark_toon -v

Methodology¶

Token Counting¶

All token counts use tiktoken library with o200k_base encoding:

import tiktoken

encoding = tiktoken.get_encoding("o200k_base")
tokens = len(encoding.encode(text))

This encoding is used by:

GPT-4 (all variants)
GPT-4 Turbo
GPT-4o

Baseline Comparison¶

Pretty JSON: json.dumps(data, indent=2)

Standard 2-space indentation
Newlines after each field
Human-readable, not optimized

Compact JSON: json.dumps(data, separators=(',', ':'))

No whitespace
Minimal formatting
Primary baseline for AGON min_savings comparison

Format Testing¶

Each dataset tested with all formats:

AGONRows: Row-based tabular encoding
AGONColumns: Columnar transpose encoding
AGONStruct: Template-based encoding
Auto mode: Selects best of above or falls back to JSON

Savings Calculation¶

savings_percent = ((baseline - agon) / baseline) * 100

Positive %: AGON saved tokens (better)
Negative %: AGON used more tokens (worse—triggers JSON fallback)

Next Steps¶

JSON Fallback ¶

View how JSON is used as a safety net

AGONRows Format ¶

Learn about the most common format

API Reference ¶

Complete API documentation

Core Concepts ¶

Design principles and adaptive approach