Metrics and Observability¶
AgentEnsemble provides two layers of observability: execution metrics (token counts,
timing, costs) available on every run result, and tool metrics (pluggable per-tool
counters and timers via the ToolMetrics interface).
Execution Metrics¶
Every EnsembleOutput carries an ExecutionMetrics object and every TaskOutput carries
a TaskMetrics object. These are populated automatically -- no configuration required.
EnsembleOutput output = ensemble.run();
// Per-run totals
ExecutionMetrics metrics = output.getMetrics();
System.out.println("Total tokens: " + metrics.getTotalTokens());
System.out.println("Input tokens: " + metrics.getTotalInputTokens());
System.out.println("Output tokens: " + metrics.getTotalOutputTokens());
System.out.println("LLM latency: " + metrics.getTotalLlmLatency());
System.out.println("Tool exec time: " + metrics.getTotalToolExecutionTime());
System.out.println("LLM calls: " + metrics.getTotalLlmCallCount());
// Per-task breakdown
for (TaskOutput task : output.getTaskOutputs()) {
TaskMetrics tm = task.getMetrics();
System.out.printf("[%s] tokens=%d (in=%d out=%d) llm=%s tools=%s%n",
task.getAgentRole(),
tm.getTotalTokens(),
tm.getInputTokens(),
tm.getOutputTokens(),
tm.getLlmLatency(),
tm.getToolExecutionTime());
}
Token counts¶
Token counts are sourced from ChatResponse.tokenUsage(). When the LLM provider
does not return usage metadata, token fields are -1 (unknown) rather than 0.
A value of 0 means zero tokens were used, not that the count is unavailable.
long inputTokens = task.getMetrics().getInputTokens();
if (inputTokens < 0) {
System.out.println("Token usage not available for this provider");
} else {
System.out.println("Input tokens: " + inputTokens);
}
When any task in the run has unknown token counts, the aggregate
ExecutionMetrics.getTotalTokens() is also -1.
Timing breakdown¶
TaskMetrics tracks four distinct timings:
| Field | Description |
|---|---|
llmLatency |
Cumulative time waiting for LLM responses across all ReAct iterations |
toolExecutionTime |
Cumulative time executing tools (excluding wait for LLM) |
promptBuildTime |
Time building system + user prompts before the first LLM call |
memoryRetrievalTime |
Time querying long-term and entity memory stores |
All durations use java.time.Duration. Use .toMillis(), .toSeconds(), or
.toString() to format them.
Cost Estimation¶
Provide per-token rates and the framework multiplies them by the actual token counts.
Ensemble ensemble = Ensemble.builder()
.agent(researcher)
.task(researchTask)
.costConfiguration(CostConfiguration.builder()
.inputTokenRate(new BigDecimal("0.0000025")) // $2.50 / 1M input tokens
.outputTokenRate(new BigDecimal("0.0000100")) // $10.00 / 1M output tokens
.currency("USD")
.build())
.build();
EnsembleOutput output = ensemble.run();
// Per-run cost
CostEstimate total = output.getMetrics().getTotalCostEstimate();
if (total != null) {
System.out.printf("Run cost: $%.6f (in=%.6f out=%.6f)%n",
total.getTotalCost(),
total.getInputCost(),
total.getOutputCost());
}
// Per-task cost
for (TaskOutput task : output.getTaskOutputs()) {
CostEstimate cost = task.getMetrics().getCostEstimate();
if (cost != null) {
System.out.printf("[%s] $%.6f%n", task.getAgentRole(), cost.getTotalCost());
}
}
Cost estimation requires that the LLM provider returns token usage. When token counts
are -1, getCostEstimate() returns null rather than an incorrect zero.
Execution Trace¶
Every run produces a complete ExecutionTrace -- a hierarchical record of every LLM
interaction, every tool call with its input and output, all prompts sent, and delegation
chains. This is the primary resource for post-mortem debugging and analysis.
EnsembleOutput output = ensemble.run();
ExecutionTrace trace = output.getTrace();
System.out.println("Run ID: " + trace.getEnsembleId());
System.out.println("Workflow: " + trace.getWorkflow());
System.out.println("Duration: " + trace.getTotalDuration());
// Inspect each task's LLM interactions
for (TaskTrace task : trace.getTaskTraces()) {
System.out.printf("Task [%s]: %d LLM call(s)%n",
task.getAgentRole(), task.getLlmInteractions().size());
for (LlmInteraction interaction : task.getLlmInteractions()) {
System.out.printf(" Iteration %d: %s, %dms, %d tool call(s)%n",
interaction.getIterationIndex(),
interaction.getResponseType(),
interaction.getLatency().toMillis(),
interaction.getToolCalls().size());
}
}
Export to JSON¶
The trace serializes to pretty-printed JSON with a single method call. All
Instant fields are ISO-8601 strings and Duration fields are ISO-8601 duration
strings (PT12.345S).
// Get as JSON string
String json = output.getTrace().toJson();
// Write to a file
output.getTrace().toJson(Path.of("run-trace.json"));
Automatic export¶
Register a traceExporter on the ensemble to automatically export after every run:
Ensemble ensemble = Ensemble.builder()
.agent(researcher)
.task(researchTask)
// Write each run to traces/<ensembleId>.json
.traceExporter(new JsonTraceExporter(Path.of("traces/")))
.build();
JsonTraceExporter supports two modes:
- Directory mode (default): each run writes {ensembleId}.json inside the directory
- File mode: always overwrites the same file -- useful for single-run pipelines
// Directory mode (each run = new file)
new JsonTraceExporter(Path.of("traces/"))
// File mode (always overwrites)
new JsonTraceExporter(Path.of("run-trace.json"), false)
Implement ExecutionTraceExporter to send traces to any destination:
Ensemble.builder()
.traceExporter(trace -> {
myObservabilityApi.ingest(trace.toJson());
})
.build();
Trace structure¶
The trace is organized as a hierarchy:
ExecutionTrace
schemaVersion, ensembleId, workflow
startedAt, completedAt, totalDuration
inputs (template variables)
agents[] (role, goal, toolNames, allowDelegation)
taskTraces[]
agentRole, taskDescription, duration
prompts (systemPrompt, userPrompt)
llmInteractions[]
iterationIndex, latency, inputTokens, outputTokens
responseType (TOOL_CALLS or FINAL_ANSWER)
responseText (on FINAL_ANSWER)
toolCalls[]
toolName, arguments, result, duration, outcome
delegations[] (for peer delegation)
finalOutput, parsedOutput
metrics (TaskMetrics)
metrics (ExecutionMetrics)
totalCostEstimate
errors[]
Accessing prompt content¶
The exact prompts sent to the LLM are captured on each TaskTrace:
for (TaskTrace task : trace.getTaskTraces()) {
TaskPrompts prompts = task.getPrompts();
System.out.println("=== System prompt ===");
System.out.println(prompts.getSystemPrompt());
System.out.println("=== User prompt ===");
System.out.println(prompts.getUserPrompt());
}
Inspecting tool calls¶
Every tool invocation is recorded with its arguments, result, timing, and outcome:
for (TaskTrace task : trace.getTaskTraces()) {
for (LlmInteraction iter : task.getLlmInteractions()) {
for (ToolCallTrace tool : iter.getToolCalls()) {
System.out.printf("[%s] %s(%s) -> %s [%dms, %s]%n",
task.getAgentRole(),
tool.getToolName(),
tool.getArguments(),
tool.getResult(),
tool.getDuration().toMillis(),
tool.getOutcome());
}
}
}
Tool call outcomes:
- SUCCESS -- tool returned a successful ToolResult
- FAILURE -- tool returned a failed ToolResult (error message begins with "Error: ")
- ERROR -- tool threw an uncaught exception
- SKIPPED_MAX_ITERATIONS -- tool was not executed because the iteration limit was reached
Tool Metrics¶
In addition to execution metrics, individual tool executions can be instrumented with the
pluggable ToolMetrics interface. Every tool that extends AbstractAgentTool is
automatically instrumented.
How tool metrics work¶
When a tool is executed, AbstractAgentTool.execute() automatically records:
- Success counter -- incremented when
doExecute()returns a successfulToolResult - Failure counter -- incremented when
doExecute()returns a failedToolResult - Error counter -- incremented when
doExecute()throws an uncaught exception - Duration timer -- recorded on every execution regardless of outcome
All measurements are tagged with the tool name and the agent role that invoked the tool.
Tools can also record custom measurements using the metrics() accessor:
public class InventoryTool extends AbstractAgentTool {
@Override
protected ToolResult doExecute(String input) {
metrics().incrementCounter("inventory.cache.hit", agentRole());
// ... execute tool
return ToolResult.success(result);
}
}
Micrometer integration¶
Use the agentensemble-metrics-micrometer module to export tool metrics to any
Micrometer-compatible registry (Prometheus, Datadog, CloudWatch, etc.):
MeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
Ensemble ensemble = Ensemble.builder()
.toolMetrics(new MicrometerToolMetrics(registry))
.build();
Custom tool metrics implementation¶
Implement ToolMetrics directly for custom backends:
public class MyToolMetrics implements ToolMetrics {
@Override
public void incrementSuccess(String toolName, String agentRole) {
// record success
}
@Override
public void recordDuration(String toolName, String agentRole, Duration duration) {
// record duration
}
// ... other methods
}
Ensemble.builder()
.toolMetrics(new MyToolMetrics())
.build();