AGONColumns Format¶

Columnar transpose encoding for wide tables—optimized for 10+ fields per record.

Overview¶

AGONColumns is a columnar encoding format that transposes data to group values by field (column) instead of by record (row). This type clustering improves token efficiency for wide tables and numeric-heavy datasets.

Best for:

Wide tables (10+ fields per record)
Employee records, financial data, analytics tables
Numeric-heavy or homogeneous types per column
Sparse data with many missing values

Token efficiency: 50-70% savings vs pretty JSON, 20-40% vs compact JSON

Basic Example¶

Let's encode a simple employee table with 12 fields:

Input (JSON)Output (AGONColumns)Token Breakdown

[
  {
    "id": 1, "name": "Alice", "email": "alice@example.com",
    "age": 28, "city": "NYC", "state": "NY", "zip": "10001",
    "phone": "555-0001", "dept": "Eng", "title": "SWE",
    "salary": 120000, "start_date": "2020-01-15"
  },
  {
    "id": 2, "name": "Bob", "email": "bob@example.com",
    "age": 32, "city": "SF", "state": "CA", "zip": "94105",
    "phone": "555-0002", "dept": "Sales", "title": "Manager",
    "salary": 135000, "start_date": "2019-03-20"
  },
  {
    "id": 3, "name": "Charlie", "email": "charlie@example.com",
    "age": 25, "city": "Austin", "state": "TX", "zip": "78701",
    "phone": "555-0003", "dept": "Eng", "title": "SWE",
    "salary": 115000, "start_date": "2021-07-10"
  }
]

[3]
├ id: 1 2   3
├ name: Alice   Bob Charlie
├ email: alice@example.com  bob@example.com charlie@example.com
├ age: 28   32  25
├ city: NYC SF  Austin
├ state: NY CA  TX
├ zip: "10001"  "94105" "78701"
├ phone: 555-0001   555-0002    555-0003
├ dept: Eng Sales   Eng
├ title: SWE    Manager SWE
├ salary: 120000    135000  115000
└ start_date: 2020-01-15    2019-03-20  2021-07-10

Format elements:

[3] - Array length declaration
├ / └ - Tree characters (branch / last branch)
Field names followed by tab-delimited values
Last field uses └ to indicate end of structure

Format	Tokens	Savings
Pretty JSON	309	baseline
Compact JSON	190	+38.5%
AGONRows	137	+55.7%
AGONColumns	158	+48.9%

Why columns helps: With 12 fields, grouping by type (all IDs together, all names together) provides better compression than row-based format. For even wider tables (20+ fields), the advantage increases.

Format Specification¶

Syntax¶

[count]
├ field1: val1  val2    val3
├ field2: val1  val2    val3
└ fieldN: val1  val2    val3

Components:

Array length: [N] where N is the number of records
Tree structure: ├ for fields with siblings, └ for last field
Field lines: field: value1<delimiter>value2<delimiter>...
Delimiters: Tab character (\t) by default

Tree Characters¶

Unicode (default):

├ (U+251C) - Branch: has more siblings below
└ (U+2514) - Last branch: final field

ASCII mode:

from agon import AGONColumns

# Use ASCII tree characters for compatibility
encoded = AGONColumns.encode(data, use_ascii=True)
# Output:
# [3]
# | id: 1, 2, 3
# | name: Alice, Bob, Charlie
# ` email: ...

Delimiters¶

Default: Tab character (\t)

Custom delimiter:

from agon import AGONColumns

# Use comma-space delimiter
encoded = AGONColumns.encode(data, delimiter=", ")

# Output:
# [3]
# ├ id: 1, 2, 3
# ├ name: Alice, Bob, Charlie
# └ email: alice@example.com, bob@example.com, charlie@example.com

Encoding Rules¶

Primitives¶

AGONColumns infers types from content—no type markers needed:

Type	Example Input	Encoded Output
String	`"Alice"`	`Alice`
Integer	`42`	`42`
Float	`3.14`	`3.14`
Boolean	`true`	`true`
Null	`null`	`null`

Missing Values¶

Missing/null values in columns → Empty cell (consecutive delimiters):

[3]
├ id: 1 2   3
├ name: Alice   Bob Charlie
└ email: alice@example.com      charlie@example.com

Row 2 (Bob) has missing email field—shown by consecutive tabs.

Important distinction:

Empty cell (``): field is absent from object
Literal null: field is present with value None

data = [
    {"id": 1, "name": "Alice"},           # no email field
    {"id": 2, "name": "Bob", "email": None}  # email field = null
]

# Encodes as:
# [2]
# ├ id: 1   2
# ├ name: Alice Bob
# └ email:  null

Quoting¶

Simple values (no special characters) → Unquoted:

├ name: Alice   Bob Charlie

Values with spaces, delimiters, or special chars → Quoted:

├ address: "123 Main St"    "456 Oak Ave"   "789 Pine Rd"
├ zip: "10001"  "94105" "78701"

Quotes in values → Escaped with backslash:

├ bio: "Alice \"The Great\""    "Bob \"Builder\""

Type Clustering Advantage¶

Columnar format groups same-type values together, improving LLM tokenization:

Example: Numeric clustering

├ id: 1 2   3   4   5   6   7   8   9   10
├ age: 28   32  25  45  38  29  41  33  27  36
├ salary: 120000    135000  115000  150000  128000  122000  145000  132000  118000  140000

All numeric values are adjacent, creating better token compression than alternating between numbers and strings in row-based format.

Nested Objects¶

Nested objects → Indented key-value pairs:

company:
  name: Acme Corp
  founded: 2010
  headquarters:
    city: San Francisco
    state: CA
employees[3]
├ id: 1 2   3
├ name: Alice   Bob Charlie
└ dept: Eng Sales   Eng

Arrays¶

Primitive arrays → Inline with delimiter:

tags[5]: python javascript  rust    go  typescript
scores[4]: 95   87  92  88

Object arrays → Columnar format (primary use case):

products[3]
├ sku: A123 B456    C789
├ name: Widget  Gadget  Gizmo
└ price: 9.99   19.99   29.99

Complete Example¶

Real-world employee data:

Input (JSON)Output (AGONColumns)Token Comparison

{
  "department": "Engineering",
  "headcount": 3,
  "employees": [
    {
      "id": 1, "name": "Alice", "email": "alice@example.com",
      "age": 28, "city": "NYC", "state": "NY", "zip": "10001",
      "phone": "555-0001", "dept": "Eng", "title": "SWE",
      "salary": 120000, "start_date": "2020-01-15"
    },
    {
      "id": 2, "name": "Bob", "email": "bob@example.com",
      "age": 32, "city": "SF", "state": "CA", "zip": "94105",
      "phone": "555-0002", "dept": "Sales", "title": "Manager",
      "salary": 135000, "start_date": "2019-03-20"
    },
    {
      "id": 3, "name": "Charlie", "email": "charlie@example.com",
      "age": 25, "city": "Austin", "state": "TX", "zip": "78701",
      "phone": "555-0003", "dept": "Eng", "title": "SWE",
      "salary": 115000, "start_date": "2021-07-10"
    }
  ]
}

department: Engineering
headcount: 3
employees[3]
├ id: 1 2   3
├ name: Alice   Bob Charlie
├ email: alice@example.com  bob@example.com charlie@example.com
├ age: 28   32  25
├ city: NYC SF  Austin
├ state: NY CA  TX
├ zip: "10001"  "94105" "78701"
├ phone: 555-0001   555-0002    555-0003
├ dept: Eng Sales   Eng
├ title: SWE    Manager SWE
├ salary: 120000    135000  115000
└ start_date: 2020-01-15    2019-03-20  2021-07-10

| Format | Tokens | Savings | |--------|--------|---------| | Pretty JSON | 381 | baseline | | Compact JSON | 231 | +39.4% | | AGONRows | 171 | +55.1% | | AGONColumns | 186 | +51.2% |

Trade-off: AGONRows wins for this example (fewer fields), but as field count grows beyond 10, AGONColumns pulls ahead due to type clustering.

When AGONColumns Wins¶

Wide tables with 10+ fields per record (sweet spot: 15-50 fields)
Financial data (many numeric columns: price, volume, market_cap, P/E, etc.)
Analytics tables (metrics, dimensions, timestamps, aggregations)
Employee databases (ID, name, email, age, city, state, zip, phone, dept, title, salary, etc.)
Sparse data with many missing values (columnar handles better than row-based)
Homogeneous column types (all numbers, all strings, etc.)
Scientific datasets with measurement arrays (time, temp, pressure, velocity, etc.)

When AGONColumns Loses¶

Few fields (2-5 fields) → AGONRows wins with simpler row-based format
Highly irregular structure (fields vary between records) → JSON fallback
Deeply nested objects with no arrays → AGONStruct or JSON
Heterogeneous data per column (mixed types) → Row-based better
Very small arrays (<3 records) → Overhead not worth it

Example where Text wins:

# Only 3 fields - too narrow for columnar advantage
user_data = [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"},
    {"id": 3, "name": "Charlie", "role": "user"}
]

result = AGON.encode(user_data, format="auto")
# → Selects "text" (simpler for narrow tables)

Direct Usage¶

For advanced use cases, use AGONColumns encoder directly:

from agon import AGONColumns

# Encode with default options
encoded = AGONColumns.encode(data)

# Custom delimiter (comma-space)
encoded = AGONColumns.encode(data, delimiter=", ")

# ASCII tree characters (for compatibility)
encoded = AGONColumns.encode(data, use_ascii=True)
# Output uses | and ` instead of ├ and └

# Without header (for LLM prompts)
encoded = AGONColumns.encode(data, include_header=False)

# With header (for decoding)
encoded_with_header = AGONColumns.encode(data, include_header=True)
# → @AGON columns\n\n[3]...

# Decode
decoded = AGONColumns.decode(encoded)
assert decoded == data  # Lossless

Edge Cases¶

Empty array

data = []

result = AGON.encode(data, format="columns")
# → [0]

Single item array

data = [{"id": 1, "name": "Alice", "email": "alice@example.com"}]

result = AGON.encode(data, format="columns")
# → [1]
#   ├ id: 1
#   ├ name: Alice
#   └ email: alice@example.com

All null values

data = [{"a": None, "b": None}, {"a": None, "b": None}]

result = AGON.encode(data, format="columns")
# → [2]
#   ├ a: null   null
#   └ b: null   null

Missing fields (sparse data)

data = [
    {"id": 1, "name": "Alice", "email": "alice@example.com"},
    {"id": 2, "name": "Bob"},  # no email
    {"id": 3, "name": "Charlie", "email": "charlie@example.com"}
]

result = AGON.encode(data, format="columns")
# → [3]
#   ├ id: 1 2   3
#   ├ name: Alice   Bob Charlie
#   └ email: alice@example.com      charlie@example.com

(Empty cell for Bob's missing email—two consecutive tabs)

Special characters in column values

data = [
    {"name": "Alice\tSmith", "bio": "She said \"hi\""},
    {"name": "Bob", "bio": "Normal text"}
]

result = AGON.encode(data, format="columns")
# → [2]
#   ├ name: "Alice\tSmith"  Bob
#   └ bio: "She said \"hi\""    "Normal text"

(Automatic quoting and escaping)

Comparison: Columns vs Text¶

For the same employee dataset with 12 fields:

AGONRows (Row-Based)AGONColumns (Columnar)

[3]{id  name    email   age city    state   zip phone   dept    title   salary  start_date}
1   Alice   alice@example.com   28  NYC NY  "10001" 555-0001    Eng SWE 120000  2020-01-15
2   Bob bob@example.com 32  SF  CA  "94105" 555-0002    Sales   Manager 135000  2019-03-20
3   Charlie charlie@example.com 25  Austin  TX  "78701" 555-0003    Eng SWE 115000  2021-07-10

Tokens: 137 (better for this case—12 fields is borderline)

[3]
├ id: 1 2   3
├ name: Alice   Bob Charlie
├ email: alice@example.com  bob@example.com charlie@example.com
├ age: 28   32  25
├ city: NYC SF  Austin
├ state: NY CA  TX
├ zip: "10001"  "94105" "78701"
├ phone: 555-0001   555-0002    555-0003
├ dept: Eng Sales   Eng
├ title: SWE    Manager SWE
├ salary: 120000    135000  115000
└ start_date: 2020-01-15    2019-03-20  2021-07-10

Tokens: 158 (slightly worse due to tree overhead, but scales better with more fields)

Decision factors:

2-10 fields: Use AGONRows (simpler, less overhead)
10-15 fields: Borderline—auto mode chooses based on data
15+ fields: Use AGONColumns (type clustering advantage wins)

FAQ¶

When should I use Columns vs Text?

Use Columns when:

10+ fields per record (sweet spot: 15-50 fields)
Numeric-heavy data (financial, scientific)
Sparse data with many nulls

Use Text when:

2-10 fields (simpler row-based format)
Mixed data types per field
Narrow tables

Can I customize tree characters?

Yes! Use ASCII mode for compatibility:

from agon import AGONColumns
encoded = AGONColumns.encode(data, use_ascii=True)
# Uses | and ` instead of ├ and └

How does AGONColumns handle missing fields?

Empty cells (consecutive delimiters) indicate missing fields:

data = [
    {"id": 1, "name": "Alice", "email": "alice@example.com"},
    {"id": 2, "name": "Bob"},  # no email
]

# Encodes as:
# [2]
# ├ id: 1   2
# ├ name: Alice Bob
# └ email: alice@example.com

Why doesn't auto mode always choose Columns for wide tables?

Auto mode considers multiple factors:

Token count: Columns must save ≥ 10% vs compact JSON
Field count: 10+ fields favors Columns
Type homogeneity: Mixed types reduce clustering benefit
Data regularity: Irregular structure may favor JSON

Use force=True to guarantee specialized format selection.

AGONColumns Format¶

Overview¶

Basic Example¶

Format Specification¶

Syntax¶

Tree Characters¶

Delimiters¶

Encoding Rules¶

Primitives¶

Missing Values¶

Quoting¶

Type Clustering Advantage¶

Nested Objects¶

Arrays¶

Complete Example¶

When AGONColumns Wins¶

When AGONColumns Loses¶

Direct Usage¶

Edge Cases¶

Comparison: Columns vs Text¶

FAQ¶

Next Steps¶

AGONRows Format ¶

AGONStruct Format ¶

Benchmarks ¶

API Reference ¶

AGONColumns Format¶

Overview¶

Basic Example¶

Format Specification¶

Syntax¶

Tree Characters¶

Delimiters¶

Encoding Rules¶

Primitives¶

Missing Values¶

Quoting¶

Type Clustering Advantage¶

Nested Objects¶

Arrays¶

Complete Example¶

When AGONColumns Wins¶

When AGONColumns Loses¶

Direct Usage¶

Edge Cases¶

Comparison: Columns vs Text¶

FAQ¶

Next Steps¶

AGONRows Format¶

AGONStruct Format¶

Benchmarks¶

API Reference¶

AGONRows Format ¶

AGONStruct Format ¶

Benchmarks ¶

API Reference ¶