Skip to content

Metrics and Observability

AgentEnsemble provides two layers of observability: execution metrics (token counts, timing, costs) available on every run result, and tool metrics (pluggable per-tool counters and timers via the ToolMetrics interface).


Execution Metrics

Every EnsembleOutput carries an ExecutionMetrics object and every TaskOutput carries a TaskMetrics object. These are populated automatically -- no configuration required.

EnsembleOutput output = ensemble.run();

// Per-run totals
ExecutionMetrics metrics = output.getMetrics();
System.out.println("Total tokens:     " + metrics.getTotalTokens());
System.out.println("Input tokens:     " + metrics.getTotalInputTokens());
System.out.println("Output tokens:    " + metrics.getTotalOutputTokens());
System.out.println("LLM latency:      " + metrics.getTotalLlmLatency());
System.out.println("Tool exec time:   " + metrics.getTotalToolExecutionTime());
System.out.println("LLM calls:        " + metrics.getTotalLlmCallCount());

// Per-task breakdown
for (TaskOutput task : output.getTaskOutputs()) {
    TaskMetrics tm = task.getMetrics();
    System.out.printf("[%s] tokens=%d (in=%d out=%d) llm=%s tools=%s%n",
        task.getAgentRole(),
        tm.getTotalTokens(),
        tm.getInputTokens(),
        tm.getOutputTokens(),
        tm.getLlmLatency(),
        tm.getToolExecutionTime());
}

Token counts

Token counts are sourced from ChatResponse.tokenUsage(). When the LLM provider does not return usage metadata, token fields are -1 (unknown) rather than 0. A value of 0 means zero tokens were used, not that the count is unavailable.

long inputTokens = task.getMetrics().getInputTokens();
if (inputTokens < 0) {
    System.out.println("Token usage not available for this provider");
} else {
    System.out.println("Input tokens: " + inputTokens);
}

When any task in the run has unknown token counts, the aggregate ExecutionMetrics.getTotalTokens() is also -1.

Timing breakdown

TaskMetrics tracks four distinct timings:

Field Description
llmLatency Cumulative time waiting for LLM responses across all ReAct iterations
toolExecutionTime Cumulative time executing tools (excluding wait for LLM)
promptBuildTime Time building system + user prompts before the first LLM call
memoryRetrievalTime Time querying long-term and entity memory stores

All durations use java.time.Duration. Use .toMillis(), .toSeconds(), or .toString() to format them.


Cost Estimation

Provide per-token rates and the framework multiplies them by the actual token counts.

Ensemble ensemble = Ensemble.builder()
    .agent(researcher)
    .task(researchTask)
    .costConfiguration(CostConfiguration.builder()
        .inputTokenRate(new BigDecimal("0.0000025"))   // $2.50 / 1M input tokens
        .outputTokenRate(new BigDecimal("0.0000100"))  // $10.00 / 1M output tokens
        .currency("USD")
        .build())
    .build();

EnsembleOutput output = ensemble.run();

// Per-run cost
CostEstimate total = output.getMetrics().getTotalCostEstimate();
if (total != null) {
    System.out.printf("Run cost: $%.6f (in=%.6f out=%.6f)%n",
        total.getTotalCost(),
        total.getInputCost(),
        total.getOutputCost());
}

// Per-task cost
for (TaskOutput task : output.getTaskOutputs()) {
    CostEstimate cost = task.getMetrics().getCostEstimate();
    if (cost != null) {
        System.out.printf("[%s] $%.6f%n", task.getAgentRole(), cost.getTotalCost());
    }
}

Cost estimation requires that the LLM provider returns token usage. When token counts are -1, getCostEstimate() returns null rather than an incorrect zero.


Execution Trace

Every run produces a complete ExecutionTrace -- a hierarchical record of every LLM interaction, every tool call with its input and output, all prompts sent, and delegation chains. This is the primary resource for post-mortem debugging and analysis.

EnsembleOutput output = ensemble.run();

ExecutionTrace trace = output.getTrace();
System.out.println("Run ID:    " + trace.getEnsembleId());
System.out.println("Workflow:  " + trace.getWorkflow());
System.out.println("Duration:  " + trace.getTotalDuration());

// Inspect each task's LLM interactions
for (TaskTrace task : trace.getTaskTraces()) {
    System.out.printf("Task [%s]: %d LLM call(s)%n",
        task.getAgentRole(), task.getLlmInteractions().size());
    for (LlmInteraction interaction : task.getLlmInteractions()) {
        System.out.printf("  Iteration %d: %s, %dms, %d tool call(s)%n",
            interaction.getIterationIndex(),
            interaction.getResponseType(),
            interaction.getLatency().toMillis(),
            interaction.getToolCalls().size());
    }
}

Export to JSON

The trace serializes to pretty-printed JSON with a single method call. All Instant fields are ISO-8601 strings and Duration fields are ISO-8601 duration strings (PT12.345S).

// Get as JSON string
String json = output.getTrace().toJson();

// Write to a file
output.getTrace().toJson(Path.of("run-trace.json"));

Automatic export

Register a traceExporter on the ensemble to automatically export after every run:

Ensemble ensemble = Ensemble.builder()
    .agent(researcher)
    .task(researchTask)
    // Write each run to traces/<ensembleId>.json
    .traceExporter(new JsonTraceExporter(Path.of("traces/")))
    .build();

JsonTraceExporter supports two modes: - Directory mode (default): each run writes {ensembleId}.json inside the directory - File mode: always overwrites the same file -- useful for single-run pipelines

// Directory mode (each run = new file)
new JsonTraceExporter(Path.of("traces/"))

// File mode (always overwrites)
new JsonTraceExporter(Path.of("run-trace.json"), false)

Implement ExecutionTraceExporter to send traces to any destination:

Ensemble.builder()
    .traceExporter(trace -> {
        myObservabilityApi.ingest(trace.toJson());
    })
    .build();

Trace structure

The trace is organized as a hierarchy:

ExecutionTrace
  schemaVersion, ensembleId, workflow
  startedAt, completedAt, totalDuration
  inputs (template variables)
  agents[] (role, goal, toolNames, allowDelegation)
  taskTraces[]
    agentRole, taskDescription, duration
    prompts (systemPrompt, userPrompt)
    llmInteractions[]
      iterationIndex, latency, inputTokens, outputTokens
      responseType (TOOL_CALLS or FINAL_ANSWER)
      responseText (on FINAL_ANSWER)
      toolCalls[]
        toolName, arguments, result, duration, outcome
    delegations[] (for peer delegation)
    finalOutput, parsedOutput
    metrics (TaskMetrics)
  metrics (ExecutionMetrics)
  totalCostEstimate
  errors[]

Accessing prompt content

The exact prompts sent to the LLM are captured on each TaskTrace:

for (TaskTrace task : trace.getTaskTraces()) {
    TaskPrompts prompts = task.getPrompts();
    System.out.println("=== System prompt ===");
    System.out.println(prompts.getSystemPrompt());
    System.out.println("=== User prompt ===");
    System.out.println(prompts.getUserPrompt());
}

Inspecting tool calls

Every tool invocation is recorded with its arguments, result, timing, and outcome:

for (TaskTrace task : trace.getTaskTraces()) {
    for (LlmInteraction iter : task.getLlmInteractions()) {
        for (ToolCallTrace tool : iter.getToolCalls()) {
            System.out.printf("[%s] %s(%s) -> %s [%dms, %s]%n",
                task.getAgentRole(),
                tool.getToolName(),
                tool.getArguments(),
                tool.getResult(),
                tool.getDuration().toMillis(),
                tool.getOutcome());
        }
    }
}

Tool call outcomes: - SUCCESS -- tool returned a successful ToolResult - FAILURE -- tool returned a failed ToolResult (error message begins with "Error: ") - ERROR -- tool threw an uncaught exception - SKIPPED_MAX_ITERATIONS -- tool was not executed because the iteration limit was reached


Tool Metrics

In addition to execution metrics, individual tool executions can be instrumented with the pluggable ToolMetrics interface. Every tool that extends AbstractAgentTool is automatically instrumented.

How tool metrics work

When a tool is executed, AbstractAgentTool.execute() automatically records:

  • Success counter -- incremented when doExecute() returns a successful ToolResult
  • Failure counter -- incremented when doExecute() returns a failed ToolResult
  • Error counter -- incremented when doExecute() throws an uncaught exception
  • Duration timer -- recorded on every execution regardless of outcome

All measurements are tagged with the tool name and the agent role that invoked the tool.

Tools can also record custom measurements using the metrics() accessor:

public class InventoryTool extends AbstractAgentTool {
    @Override
    protected ToolResult doExecute(String input) {
        metrics().incrementCounter("inventory.cache.hit", agentRole());
        // ... execute tool
        return ToolResult.success(result);
    }
}

Micrometer integration

Use the agentensemble-metrics-micrometer module to export tool metrics to any Micrometer-compatible registry (Prometheus, Datadog, CloudWatch, etc.):

MeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);

Ensemble ensemble = Ensemble.builder()
    .toolMetrics(new MicrometerToolMetrics(registry))
    .build();

Custom tool metrics implementation

Implement ToolMetrics directly for custom backends:

public class MyToolMetrics implements ToolMetrics {
    @Override
    public void incrementSuccess(String toolName, String agentRole) {
        // record success
    }
    @Override
    public void recordDuration(String toolName, String agentRole, Duration duration) {
        // record duration
    }
    // ... other methods
}

Ensemble.builder()
    .toolMetrics(new MyToolMetrics())
    .build();