Skip to content

Response Types and Formats

Chimeric normalises all provider responses into consistent types so your code works identically regardless of which provider is behind the model.

CompletionResponse

Non-streaming calls return a CompletionResponse:

from chimeric import Chimeric

client = Chimeric()
response = client.generate(model="gpt-4o", messages="Explain quantum physics")

print(response.content)   # str | list — generated text
print(response.model)     # str | None — model that responded
print(response.metadata)  # dict | None — provider-specific extras

if response.usage:
    print(response.usage.prompt_tokens)      # int — input tokens
    print(response.usage.completion_tokens)  # int — output tokens
    print(response.usage.total_tokens)       # int — total tokens

str(response) returns the content text directly, so you can use the response object wherever a string is expected.

StreamChunk

Streaming calls yield StreamChunk objects:

stream = client.generate(model="gpt-4o", messages="Write a story", stream=True)

for chunk in stream:
    print(chunk.delta or "", end="", flush=True)  # incremental text
Field Type Description
content str | list[Any] Accumulated text up to this chunk
delta str | None New text added in this chunk
finish_reason str | None Present only on the final chunk
metadata dict[str, Any] | None Provider-specific extras

str(chunk) returns delta (or "" for metadata-only chunks).

Cross-Provider Consistency

The unified types work the same across all providers:

def summarize(model: str, text: str) -> str:
    response = client.generate(model=model, messages=f"Summarize: {text}")
    print(f"Used {response.usage.total_tokens} tokens")
    return response.content

# All three use the same interface
summarize("gpt-4o", "...")
summarize("claude-3-5-sonnet-20241022", "...")
summarize("gemini-1.5-pro", "...")

Provider-specific details that don't map to the standard fields are available in response.metadata.

EmbeddingResponse

embed() and aembed() return an EmbeddingResponse:

result = client.embed(model="text-embedding-3-small", input="Hello")

print(result.embedding)           # list[float] — single vector
print(result.model)               # str | None — model used
print(result.usage.prompt_tokens) # int
print(result.usage.total_tokens)  # int

For batch input the vectors arrive in embeddings and embedding is None:

result = client.embed(
    model="text-embedding-3-small",
    input=["first text", "second text"],
)

print(result.embedding)   # None
print(result.embeddings)  # list[list[float]] — one vector per input
Field Type Description
embedding list[float] \| None Single vector; None for batch
embeddings list[list[float]] All vectors for batch; empty for single
model str \| None Model that produced the response
usage EmbeddingUsage \| None prompt_tokens and total_tokens

Async Support

agenerate() and aembed() are the async counterparts with the same return types:

import asyncio


async def main():
    # Non-streaming completion
    response = await client.agenerate(model="gpt-4o", messages="Hello")
    print(response.content)

    # Streaming completion
    stream = await client.agenerate(model="gpt-4o", messages="Tell a story", stream=True)
    async for chunk in stream:
        print(chunk.delta or "", end="", flush=True)

    # Embedding
    result = await client.aembed(model="text-embedding-3-small", input="Hello")
    print(len(result.embedding))


asyncio.run(main())