Middleware: Summarization

Overview

The Summarization middleware automatically compresses conversation history when the token count exceeds a configured threshold. This helps maintain context continuity in long conversations while staying within the model’s token limits.

💡 This middleware was introduced in v0.8.0.Beta.

Quick Start

import (
    "context"
    "github.com/cloudwego/eino/adk/middlewares/summarization"
)

// Create middleware with minimal configuration
mw, err := summarization.New(ctx, &summarization.Config{
    Model: yourChatModel,  // Required: model used for generating summaries
})
if err != nil {
    // Handle error
}

// Use with ChatModelAgent
agent, err := adk.NewChatModelAgent(ctx, &adk.ChatModelAgentConfig{
    Model:       yourChatModel,
    Middlewares: []adk.ChatModelAgentMiddleware{mw},
})

Configuration Options

FieldTypeRequiredDefaultDescription
Modelmodel.BaseChatModelYes
  • Chat model used for generating summaries
    ModelOptions[]model.OptionNo
  • Options passed to the model when generating summaries
    TokenCounterTokenCounterFuncNo~4 chars/tokenCustom token counting function
    Trigger*TriggerConditionNo190,000 tokensCondition to trigger summarization
    InstructionstringNoBuilt-in promptCustom summarization instruction
    TranscriptFilePathstringNo
  • Full conversation transcript file path
    PreparePrepareFuncNo
  • Custom preprocessing function before summary generation
    FinalizeFinalizeFuncNo
  • Custom post-processing function for final messages
    CallbackCallbackFuncNo
  • Called after Finalize to observe state changes (read-only)
    EmitInternalEventsboolNofalseWhether to emit internal events
    PreserveUserMessages*PreserveUserMessagesNoEnabled: trueWhether to preserve original user messages in summary

    TriggerCondition Structure

    type TriggerCondition struct {
        // ContextTokens triggers summarization when total token count exceeds this threshold
        ContextTokens int
    }
    

    PreserveUserMessages Structure

    type PreserveUserMessages struct {
        // Enabled whether to enable user message preservation
        Enabled bool
        
        // MaxTokens maximum tokens for preserved user messages
        // Only preserves the most recent user messages until this limit is reached
        // Defaults to 1/3 of TriggerCondition.ContextTokens
        MaxTokens int
    }
    

    Configuration Examples

    Custom Token Threshold

    mw, err := summarization.New(ctx, &summarization.Config{
        Model: yourChatModel,
        Trigger: &summarization.TriggerCondition{
            ContextTokens: 100000,  // Trigger at 100k tokens
        },
    })
    

    Custom Token Counter

    mw, err := summarization.New(ctx, &summarization.Config{
        Model: yourChatModel,
        TokenCounter: func(ctx context.Context, input *summarization.TokenCounterInput) (int, error) {
            // Use your tokenizer
            return yourTokenizer.Count(input.Messages)
        },
    })
    

    Set Transcript File Path

    mw, err := summarization.New(ctx, &summarization.Config{
        Model:              yourChatModel,
        TranscriptFilePath: "/path/to/transcript.txt",
    })
    

    Custom Finalize Function

    mw, err := summarization.New(ctx, &summarization.Config{
        Model: yourChatModel,
        Finalize: func(ctx context.Context, originalMessages []adk.Message, summary adk.Message) ([]adk.Message, error) {
            // Custom logic to build final messages
            return []adk.Message{
                schema.SystemMessage("Your system prompt"),
                summary,
            }, nil
        },
    })
    

    Using Callback to Observe State Changes/Store

    mw, err := summarization.New(ctx, &summarization.Config{
        Model: yourChatModel,
        Callback: func(ctx context.Context, before, after adk.ChatModelAgentState) error {
            log.Printf("Summarization completed: %d messages -> %d messages", 
                len(before.Messages), len(after.Messages))
            return nil
        },
    })
    

    Control User Message Preservation

    mw, err := summarization.New(ctx, &summarization.Config{
        Model: yourChatModel,
        PreserveUserMessages: &summarization.PreserveUserMessages{
            Enabled:   true,
            MaxTokens: 50000, // Preserve up to 50k tokens of user messages
        },
    })
    

    How It Works

    flowchart TD
        A[BeforeModelRewriteState] --> B{Token count exceeds threshold?}
        B -->|No| C[Return original state]
        B -->|Yes| D[Emit BeforeSummary event]
        D --> E{Has custom Prepare?}
        E -->|Yes| F[Call Prepare]
        E -->|No| G[Call model to generate summary]
        F --> G
        G --> H{Has custom Finalize?}
        H -->|Yes| I[Call Finalize]
        H -->|No| L{Has custom Callback?}
        I --> L
        L -->|Yes| M[Call Callback]
        L -->|No| J[Emit AfterSummary event]
        M --> J
        J --> K[Return new state]
    
        style A fill:#e3f2fd
        style G fill:#fff3e0
        style D fill:#e8f5e9
        style J fill:#e8f5e9
        style K fill:#c8e6c9
        style C fill:#f5f5f5
        style M fill:#fce4ec
        style F fill:#fff3e0
        style I fill:#fff3e0
    

    Internal Events

    When EmitInternalEvents is set to true, the middleware emits events at key points:

    Event TypeTrigger TimingCarried Data
    ActionTypeBeforeSummaryBefore generating summaryOriginal message list
    ActionTypeAfterSummaryAfter completing summaryFinal message list

    Usage Example

    mw, err := summarization.New(ctx, &summarization.Config{
        Model:              yourChatModel,
        EmitInternalEvents: true,
    })
    
    // Listen for events in your event handler
    

    Best Practices

    1. Set TranscriptFilePath: It’s recommended to always provide a conversation transcript file path so the model can reference the original conversation when needed.
    2. Adjust Token Threshold: Adjust Trigger.MaxTokens based on the model’s context window size. Generally recommended to set it to 80-90% of the model’s limit.
    3. Custom Token Counter: In production environments, it’s recommended to implement a custom TokenCounter that matches the model’s tokenizer for accurate counting.

    Last modified March 2, 2026: feat: sync eino docs (#1512) (96139d41)