Skip to content

Streaming Responses

Chimeric provides comprehensive streaming support that allows you to receive AI model responses in real-time as they are generated, rather than waiting for the complete response. This is particularly useful for interactive applications, chatbots, and scenarios where you want to display responses progressively.

Overview

Streaming enables token-by-token delivery of responses, providing immediate feedback to users and creating more responsive applications. Chimeric's streaming system:

  • Unified Interface: Same streaming API across all providers
  • Dual Format Support: Access both unified and native streaming formats
  • Advanced Features: Tool call streaming and multi-turn conversations
  • State Management: Automatic content accumulation and metadata handling

Basic Streaming

Simple Text Streaming

Enable streaming by setting stream=True:

from chimeric import Chimeric

client = Chimeric()

# Basic streaming
stream = client.generate(
    model="gpt-4o",
    messages="Tell me a story about space exploration",
    stream=True
)

# Process chunks in real-time
for chunk in stream:
    if chunk.delta:  # New content in this chunk
        print(chunk.delta, end="", flush=True)

    if chunk.finish_reason:
        print(f"\nStreaming finished: {chunk.finish_reason}")

Understanding Stream Chunks

Each stream chunk contains several fields:

stream = client.generate(
    model="gpt-4o",
    messages="Explain quantum physics briefly",
    stream=True
)

for chunk in stream:
    print(f"Content: {chunk.content}")      # Accumulated text so far
    print(f"Delta: {chunk.delta}")          # New text in this chunk
    print(f"Finish: {chunk.finish_reason}") # Why streaming stopped (if finished)
    print(f"Meta: {chunk.metadata}")        # Additional chunk info
    print("---")

Stream Chunk Fields

content

The accumulated text content from the start of the response up to the current chunk:

accumulated_text = ""
for chunk in stream:
    # chunk.content contains all text so far
    accumulated_text = chunk.content
    print(f"Total so far: {accumulated_text}")

delta

The incremental text added in this specific chunk:

full_response = ""
for chunk in stream:
    if chunk.delta:
        full_response += chunk.delta  # Build response incrementally
        print(chunk.delta, end="")    # Display new text immediately

finish_reason

Indicates why the stream ended (typically in the final chunk):

for chunk in stream:
    if chunk.finish_reason:
        print(f"Stream ended: {chunk.finish_reason}")
        # Common values: "stop", "length", "tool_calls", "content_filter"

metadata

Contains additional information about the chunk or final response:

for chunk in stream:
    if chunk.metadata:
        print(f"Chunk metadata: {chunk.metadata}")
        # May include: token counts, model info, request IDs, etc.

Async Streaming

Use async streaming for high-performance applications:

import asyncio

async def stream_example():
    client = Chimeric()

    stream = await client.agenerate(
        model="gpt-4o",
        messages="Write a poem about artificial intelligence",
        stream=True
    )

    async for chunk in stream:
        if chunk.delta:
            print(chunk.delta, end="", flush=True)

        if chunk.finish_reason:
            print(f"\nFinished: {chunk.finish_reason}")

asyncio.run(stream_example())