Streaming Responses¶

Chimeric provides comprehensive streaming support that allows you to receive AI model responses in real-time as they are generated, rather than waiting for the complete response. This is particularly useful for interactive applications, chatbots, and scenarios where you want to display responses progressively.

Overview¶

Streaming enables token-by-token delivery of responses, providing immediate feedback to users and creating more responsive applications. Chimeric's streaming system:

Unified Interface: Same streaming API across all providers
Dual Format Support: Access both unified and native streaming formats
Advanced Features: Tool call streaming and multi-turn conversations
State Management: Automatic content accumulation and metadata handling

Basic Streaming¶

Simple Text Streaming¶

Enable streaming by setting stream=True:

from chimeric import Chimeric

client = Chimeric()

# Basic streaming
stream = client.generate(
    model="gpt-4o",
    messages="Tell me a story about space exploration",
    stream=True
)

# Process chunks in real-time
for chunk in stream:
    if chunk.delta:  # New content in this chunk
        print(chunk.delta, end="", flush=True)

    if chunk.finish_reason:
        print(f"\nStreaming finished: {chunk.finish_reason}")

Understanding Stream Chunks¶

Each stream chunk contains several fields:

stream = client.generate(
    model="gpt-4o",
    messages="Explain quantum physics briefly",
    stream=True
)

for chunk in stream:
    print(f"Content: {chunk.content}")      # Accumulated text so far
    print(f"Delta: {chunk.delta}")          # New text in this chunk
    print(f"Finish: {chunk.finish_reason}") # Why streaming stopped (if finished)
    print(f"Meta: {chunk.metadata}")        # Additional chunk info
    print("---")

Stream Chunk Fields¶

content¶

The accumulated text content from the start of the response up to the current chunk:

accumulated_text = ""
for chunk in stream:
    # chunk.content contains all text so far
    accumulated_text = chunk.content
    print(f"Total so far: {accumulated_text}")

delta¶

The incremental text added in this specific chunk:

full_response = ""
for chunk in stream:
    if chunk.delta:
        full_response += chunk.delta  # Build response incrementally
        print(chunk.delta, end="")    # Display new text immediately

finish_reason¶

Indicates why the stream ended (typically in the final chunk):

for chunk in stream:
    if chunk.finish_reason:
        print(f"Stream ended: {chunk.finish_reason}")
        # Common values: "stop", "length", "tool_calls", "content_filter"

metadata¶

Contains additional information about the chunk or final response:

for chunk in stream:
    if chunk.metadata:
        print(f"Chunk metadata: {chunk.metadata}")
        # May include: token counts, model info, request IDs, etc.

Async Streaming¶

Use async streaming for high-performance applications:

import asyncio

async def stream_example():
    client = Chimeric()

    stream = await client.agenerate(
        model="gpt-4o",
        messages="Write a poem about artificial intelligence",
        stream=True
    )

    async for chunk in stream:
        if chunk.delta:
            print(chunk.delta, end="", flush=True)

        if chunk.finish_reason:
            print(f"\nFinished: {chunk.finish_reason}")

asyncio.run(stream_example())