Skip to content

Responder Guide

This docs was updated at: 2026-03-21

The Responder is the core HTTP client for the OpenAI Responses API. It handles request execution, streaming, telemetry, and provider configuration.


Overview

flowchart LR
    subgraph Your Application
        A[Your Code]
    end

    subgraph Agentle4j
        B[Responder]
        C[Payload Builder]
        D[OkHttp Client]
    end

    subgraph Providers
        E[OpenRouter]
        F[OpenAI]
        G[Groq]
        H[Custom API]
    end

    A --> B --> C --> D
    D <--> E
    D <--> F
    D <--> G
    D <--> H

The Responder is thread-safe and reusable. Create one instance and share it across your application.


Creating a Responder

Basic Setup

// Minimal configuration
Responder responder = Responder.builder()
    .openRouter()
    .apiKey("your-api-key")
    .build();

Full Configuration

import java.time.Duration;

Responder responder = Responder.builder()
    .openRouter()
    .apiKey(System.getenv("OPENROUTER_API_KEY"))

    // Telemetry (optional)
    .addTelemetryProcessor(LangfuseProcessor.fromEnv())

    .build();

Configuration Options

Option Default Description
.apiKey(String) Required Your API key
.addTelemetryProcessor(TelemetryProcessor) None Observability integration
.maxRetries(int) 3 Max retry attempts for transient failures
.retryPolicy(RetryPolicy) defaults() Full retry configuration

Retry Configuration

Responder automatically retries on transient failures with exponential backoff:

Default Behavior

By default, Responder retries 3 times on: - 429 - Rate limiting - 500, 502, 503, 504 - Server errors - Network failures (connection timeout, etc.)

Simple Configuration

// Set max retries (uses default backoff settings)
Responder responder = Responder.builder()
    .openRouter()
    .apiKey(apiKey)
    .maxRetries(5)  // Retry up to 5 times
    .build();

// Disable retries
Responder responder = Responder.builder()
    .openRouter()
    .apiKey(apiKey)
    .maxRetries(0)  // No retries
    .build();

Advanced Configuration

import com.paragon.http.RetryPolicy;

RetryPolicy policy = RetryPolicy.builder()
    .maxRetries(5)
    .initialDelay(Duration.ofMillis(500))  // Start with 500ms delay
    .maxDelay(Duration.ofSeconds(30))      // Cap at 30 seconds
    .multiplier(2.0)                       // Double delay each retry
    .retryableStatusCodes(Set.of(429, 503)) // Only retry these codes
    .build();

Responder responder = Responder.builder()
    .openRouter()
    .apiKey(apiKey)
    .retryPolicy(policy)
    .build();

Backoff Calculation

Retry delays follow exponential backoff:

Attempt Delay (default)
1 1 second
2 2 seconds
3 4 seconds
4+ Up to 30 seconds (max)

Supported Providers

Access 300+ models through a single API:

Responder responder = Responder.builder()
    .openRouter()
    .apiKey(System.getenv("OPENROUTER_API_KEY"))
    .build();

Available Models (examples): - openai/gpt-4o, openai/gpt-4o-mini - anthropic/claude-3.5-sonnet, anthropic/claude-3-opus - google/gemini-pro, google/gemini-1.5-pro - meta-llama/llama-3.1-70b-instruct - See all models →

Responder responder = Responder.builder()
    .openAi()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build();

Available Models: - gpt-4o, gpt-4o-mini - gpt-4-turbo - o1-preview, o1-mini

Ultra-fast inference:

Responder responder = Responder.builder()
    .baseUrl(HttpUrl.parse("https://api.groq.com/openai/v1"))
    .apiKey(System.getenv("GROQ_API_KEY"))
    .build();

Available Models: - llama-3.1-70b-versatile - llama-3.1-8b-instant - mixtral-8x7b-32768

Any OpenAI-compatible API:

Responder responder = Responder.builder()
    .baseUrl(HttpUrl.parse("https://api.your-company.com/v1"))
    .apiKey("your-key")
    .build();

Building Payloads

The CreateResponsePayload.builder() provides a fluent API:

Basic Payload

var payload = CreateResponsePayload.builder()
    .model("openai/gpt-4o")
    .addDeveloperMessage("You are a helpful assistant.")
    .addUserMessage("Hello!")
    .build();

All Options

var payload = CreateResponsePayload.builder()
    // Required
    .model("openai/gpt-4o")

    // Messages
    .addDeveloperMessage("System prompt here")  // First message (optional)
    .addUserMessage("User's question")           // User input
    .addAssistantMessage("Previous response")    // For multi-turn

    // Generation parameters
    .temperature(0.7)          // Creativity (0.0-2.0)
    .topP(0.9)                 // Nucleus sampling
    .maxOutputTokens(1000)     // Response length limit

    // Advanced
    .safetyIdentifier("user-123")  // Stable user identifier for abuse detection

    .build();

Parameter Reference

Parameter Range Description
temperature 0.0-2.0 Higher = more creative, lower = more focused
topP 0.0-1.0 Nucleus sampling threshold
maxOutputTokens 1+ Maximum response tokens

Temperature Examples

// Factual/deterministic (code generation, Q&A)
.temperature(0.0)

// Balanced (general chat)
.temperature(0.7)

// Creative (stories, brainstorming)
.temperature(1.2)

Making Requests

Synchronous (Default)

// Simple blocking call - efficient with Virtual Threads
Response response = responder.respond(payload);
System.out.println(response.outputText());

Parallel Requests

With Java 25 Virtual Threads, you can efficiently run parallel requests:

import java.util.concurrent.*;

// Run requests in parallel using virtual threads
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    var future1 = executor.submit(() -> responder.respond(payload1));
    var future2 = executor.submit(() -> responder.respond(payload2));
    var future3 = executor.submit(() -> responder.respond(payload3));

    Response response1 = future1.get();
    Response response2 = future2.get();
    Response response3 = future3.get();
}

Streaming

Enable streaming for real-time responses:

var payload = CreateResponsePayload.builder()
    .model("openai/gpt-4o")
    .addUserMessage("Write a poem about Java")
    .streaming()  // Enable streaming
    .build();

responder.respond(payload)
    .onTextDelta(delta -> {
        System.out.print(delta);
        System.out.flush();
    })
    .onComplete(response -> {
        System.out.println("\n\nDone!");
    })
    .onError(Throwable::printStackTrace)
    .start();

Streaming Callbacks

Callback When Called
.onTextDelta(String) Each text chunk arrives
.onComplete(Response) Stream finished successfully
.onError(Throwable) Error occurred

Structured Output

Get type-safe JSON responses:

public record Person(String name, int age, String occupation) {}

var payload = CreateResponsePayload.builder()
    .model("openai/gpt-4o")
    .addUserMessage("Create a fictional software engineer")
    .withStructuredOutput(Person.class)
    .build();

ParsedResponse<Person> response = responder.respond(payload);
Person person = response.outputParsed();

Response Object

The Response contains all available information:

Response response = responder.respond(payload);

// Text output
String text = response.outputText();

// Metadata
String id = response.id();
String model = response.model();
Number createdAt = response.createdAt();

// Output items (for complex responses)
List<ResponseOutput> items = response.output();

Error Handling

Responder is synchronous and currently surfaces request failures as RuntimeException after the configured retries are exhausted. Streaming failures are delivered to .onError(...), typically as StreamingException.

Exception Types

Error surface When it happens
RuntimeException Request failed after retries or structured parsing failed
IllegalArgumentException Structured payload is misconfigured before the request is sent
StreamingException Streaming connection dropped or timed out

Request Error Handling

try {
    Response response = responder.respond(payload);
} catch (RuntimeException e) {
    System.err.println("Request failed: " + e.getMessage());
}

[!TIP] The built-in retry policy automatically retries retryable HTTP failures before the exception reaches your code. Handle the final failure path only.

Streaming Error Recovery

responder.respond(streamingPayload)
    .onError(error -> {
        if (error instanceof StreamingException se) {
            // Recover partial output
            String partial = se.partialOutput();
            if (partial != null) {
                savePartialOutput(partial);
            }
            System.err.println("Streaming failed after " + se.bytesReceived() + " bytes");
        }
    })
    .start();

Best Practices

✅ Do

// Reuse the Responder instance
private final Responder responder;

public MyService(String apiKey) {
    this.responder = Responder.builder()
        .openRouter()
        .apiKey(apiKey)
        .build();
}

// Load API keys from environment
String apiKey = System.getenv("OPENROUTER_API_KEY");

// Handle errors appropriately
try {
    Response response = responder.respond(payload);
} catch (RuntimeException e) {
    // handle
}

❌ Don't

// Don't create new Responder for each request
public String chat(String message) {
    Responder r = Responder.builder()...build();  // Bad!
    return r.respond(payload).outputText();
}

// Don't hardcode API keys
Responder responder = Responder.builder()
    .apiKey("sk-xxxxxxxxxxxxx")  // Bad!
    .build();

// Don't ignore errors
Response response = responder.respond(payload);  // Add error handling!

Next Steps