Context Window Management¶
This docs was updated at: 2026-02-23
Long-running conversations can exceed your model's context window limit. Agentle4j provides pluggable strategies to manage context length automatically.
Quick Start¶
import com.paragon.agents.context.*;
Agent agent = Agent.builder()
.name("Assistant")
.model("openai/gpt-4o")
.instructions("You are a helpful assistant.")
.responder(responder)
.contextManagement(ContextManagementConfig.builder()
.strategy(new SlidingWindowStrategy())
.maxTokens(4000)
.build())
.build();
Configuration¶
Use ContextManagementConfig.builder() to configure context management:
ContextManagementConfig config = ContextManagementConfig.builder()
.strategy(new SlidingWindowStrategy()) // Required: the management strategy
.maxTokens(4000) // Required: maximum token limit
.tokenCounter(new SimpleTokenCounter()) // Optional: custom token counter
.build();
Available Strategies¶
SlidingWindowStrategy¶
Removes oldest messages when context exceeds the limit:
// Basic sliding window
ContextWindowStrategy strategy = new SlidingWindowStrategy();
// Preserve developer message at the start of conversation
ContextWindowStrategy preserving = SlidingWindowStrategy.preservingDeveloperMessage();
Best for: - Simple use cases where older context is less important - Low latency (no API calls required) - Cost-sensitive applications
SummarizationStrategy¶
Summarizes older messages using an LLM:
ContextWindowStrategy strategy = SummarizationStrategy.builder()
.responder(responder)
.model("openai/gpt-4o-mini") // Use a fast, cheap model
.keepRecentMessages(5) // Keep last 5 messages intact
.build();
// Or use the factory method
ContextWindowStrategy quick = SummarizationStrategy.withResponder(
responder, "openai/gpt-4o-mini");
Best for: - Conversations where older context contains important information - Use cases requiring high context retention - When you can afford the latency of an additional LLM call
Custom Token Counter¶
By default, SimpleTokenCounter estimates tokens as text.length() / 4. You can provide a custom implementation:
public class MyTokenCounter implements TokenCounter {
@Override
public int countTokens(ResponseInputItem item) {
// Your custom implementation
}
@Override
public int countText(String text) {
// Use a proper tokenizer like jtokkit
}
@Override
public int countImage(Image image) {
// Calculate based on image dimensions
}
}
ContextManagementConfig config = ContextManagementConfig.builder()
.strategy(new SlidingWindowStrategy())
.maxTokens(4000)
.tokenCounter(new MyTokenCounter())
.build();
Token Estimation¶
SimpleTokenCounter uses these estimates:
| Content Type | Token Estimate |
|---|---|
| Text | text.length() / 4 |
| High-detail image | 765 tokens |
| Low-detail image | 85 tokens |
| Auto-detail image | 170 tokens |
When to Use Context Management¶
Enable context management when:
- Your agent handles long conversations
- Users may send many messages in a session
- You're seeing context length errors from the API
- You want to optimize costs by limiting context size
Best Practices¶
- Choose the right limit: Set
maxTokensbased on your model's context window minus space for the response - Use summarization for important context: If older messages contain decisions or preferences, use
SummarizationStrategy - Preserve system prompts: Use
SlidingWindowStrategy.preservingDeveloperMessage()to keep system instructions - Monitor token usage: Check response usage metrics to tune your limit