Streaming Pipeline
Source: src/query.ts — streaming event handlers and src/services/claude.ts
Overview
Claude Code processes API responses as a real-time stream rather than waiting for complete responses. This enables immediate text rendering, progressive tool call detection, and responsive cancellation — critical for an interactive terminal agent.
Stream Event Flow
Event Types
The Claude API sends Server-Sent Events (SSE) with these key event types:
| Event | Purpose | Handler Action |
|---|---|---|
message_start | Begin new response | Initialize response buffer |
content_block_start | New text or tool block | Create block accumulator |
content_block_delta | Incremental content | Append to current block |
content_block_stop | Block complete | Finalize and dispatch |
message_delta | Stop reason + usage | Record stop reason |
message_stop | Response complete | Trigger post-processing |
Text Rendering Pipeline
Text deltas flow through a rendering pipeline before reaching the terminal:
Key behaviors:
- Text is buffered briefly to avoid excessive re-renders on rapid deltas
- Markdown is parsed incrementally — partial bold/code blocks are handled gracefully
- The Ink rendering engine batches updates to minimize terminal flicker
Tool Call Detection
Tool calls arrive as incremental JSON fragments within content_block_delta events:
- Accumulation — JSON fragments are concatenated into a buffer
- Type Detection —
content_block_startidentifies the block astool_usetype - Parameter Parsing — When
content_block_stopfires, the full JSON is parsed - Validation — Parameters are validated against the tool’s JSON Schema
- Dispatch — Valid tool calls enter the Tool Call Loop
// Simplified tool call accumulation
interface ToolCallAccumulator {
id: string;
name: string;
inputJson: string; // accumulated JSON fragments
}Stop Reasons
The message_delta event carries a stop_reason that determines what happens next:
end_turn— Claude finished its response naturallytool_use— Claude wants to execute one or more toolsmax_tokens— Response hit the token limit; may need continuationstop_sequence— A configured stop sequence was hit
Token Usage Tracking
Every response includes token usage data tracked for:
- Cost estimation — Display running cost to the user
- Context window management — Determine when to compress history
- Cache hit tracking — Monitor prompt caching effectiveness
interface TokenUsage {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens: number;
cache_read_input_tokens: number;
}Cancellation
Users can cancel a streaming response with Ctrl+C:
- The SSE connection is aborted
- Partial text is preserved in conversation history
- Any in-progress tool calls are discarded
- The session returns to input mode
Design Patterns
- Observer Pattern — Stream events are dispatched to multiple handlers (UI, token tracker, tool detector)
- Accumulator Pattern — Partial JSON fragments are accumulated until complete
- Backpressure — Text rendering buffers deltas to prevent overwhelming the terminal
Related
- Overview — Query Engine overview
- Context Assembly — What happens before the stream begins
- Tool Call Loop — What happens when a tool call is detected
Last updated on