27 - Viz Observability: Tool & Agent I/O Visibility¶
Status: Complete (all five issues implemented)
Version: 1.1.0
Date: 2026-03-31
1. Motivation¶
The agentensemble-viz live dashboard shows the flow of execution -- which tasks
are running, which agents are active, when tools are called, and when tasks complete.
However, it lacks insight into the content of those interactions:
- What arguments were passed to a tool? The viz shows "calculator called" but not
that
{"expression": "42 * 17"}was the input. - What did the tool return? The viz shows duration and name but not the result text.
- What was the agent asked to do? The task description is visible, but the full assembled context (upstream outputs, expected output, agent goal/background, available tools) is not surfaced as a first-class concept.
- What did the LLM actually say? The iteration events carry the full message buffer and response text, but the viz does not render them in an inspectable conversation view.
- Late-joining clients miss conversation history. LLM iteration messages are broadcast as ephemeral events, so a user who opens the dashboard mid-execution sees no prior reasoning chain.
This design addresses all five gaps with three Java-side changes and two viz-side changes, organized as five issues (IO-001 through IO-005):
- #285 IO-001: Enrich ToolCallEvent with task context and outcome
- #286 IO-002: Add TaskInputEvent for first-class agent input capture
- #287 IO-003: Persist LLM iteration data in late-join snapshots
- #288 IO-004: Viz tool call detail panel with formatted I/O
- #289 IO-005: Viz agent conversation thread view
2. Current State¶
2.1 Data Already Flowing¶
The Java event system and WebSocket protocol already carry rich I/O data:
| Data | Java Event Field | Wire Message Field |
|---|---|---|
| Tool arguments | ToolCallEvent.toolArguments() |
ToolCalledMessage.toolArguments |
| Tool result text | ToolCallEvent.toolResult() |
ToolCalledMessage.toolResult |
| Tool structured result | ToolCallEvent.structuredResult() |
ToolCalledMessage.structuredResult |
| LLM input (message buffer) | LlmIterationStartedEvent.messages() |
LlmIterationStartedMessage.messages |
| LLM response text | LlmIterationCompletedEvent.responseText() |
LlmIterationCompletedMessage.responseText |
| LLM tool requests | LlmIterationCompletedEvent.toolRequests() |
LlmIterationCompletedMessage.toolRequests |
| Task output | TaskCompleteEvent.taskOutput() |
TaskCompletedMessage.output |
2.2 Gaps¶
| Gap | Root Cause |
|---|---|
| Tool calls not associated with tasks | ToolCallEvent has no taskIndex; listener hardcodes 0 |
| Tool success/failure not distinguished | ToolCallEvent has no outcome; listener sends null |
| No first-class "agent input" concept | No event captures the assembled task context before the first LLM call |
| Late-join clients miss conversations | LlmIterationStarted/CompletedMessage sent via broadcastEphemeral() |
| Viz does not render tool I/O details | Data arrives but no UI component displays it |
| Viz does not render conversation threads | Data arrives but no dedicated conversation view exists |
3. Design¶
3.1 IO-001: Enrich ToolCallEvent¶
Module: agentensemble-core, agentensemble-web
Add two fields to ToolCallEvent:
public record ToolCallEvent(
String toolName,
String toolArguments,
String toolResult,
Object structuredResult,
String agentRole,
Duration duration,
int taskIndex, // NEW
String outcome // NEW: "SUCCESS" or "FAILURE"
) {}
Emission changes in AgentExecutor:
executeSingleTool(): wrap tool execution in try/catch; set outcome based on whether the tool threw. Pass the task index from the execution context.executeParallelTools(): same pattern for parallel execution.
Wire message update:
WebSocketStreamingListener.onToolCall() uses event.taskIndex() and event.outcome()
instead of the current hardcoded 0 and null.
ToolCalledMessage already has taskIndex and outcome fields -- no wire format change
needed.
Backward compatibility: The record gains trailing fields. Existing
EnsembleListener.onToolCall() implementations receive the enriched event without code
changes since they already accept ToolCallEvent.
3.2 IO-002: TaskInputEvent¶
Module: agentensemble-core, agentensemble-web, agentensemble-viz
New event capturing the fully assembled agent input at task start:
public record TaskInputEvent(
int taskIndex,
String taskDescription,
String expectedOutput,
String agentRole,
String agentGoal,
String agentBackground,
List<String> toolNames,
String assembledContext
) {}
assembledContextis the complete prompt context string built from upstream task outputs and any additional context configured on the task.
Listener extension:
Firing point: In AgentExecutor, after assembling the context (merging upstream
outputs into the task context) but before the first LLM call. This is distinct from
TaskStartEvent which fires earlier at the workflow level.
Wire protocol:
New TaskInputMessage implementing ServerMessage:
public record TaskInputMessage(
int taskIndex,
String taskDescription,
String expectedOutput,
String agentRole,
String agentGoal,
String agentBackground,
List<String> toolNames,
String assembledContext,
Instant sentAt
) implements ServerMessage {}
JSON type name: task_input.
Viz types:
export interface TaskInputMessage {
type: 'task_input';
taskIndex: number;
taskDescription: string;
expectedOutput: string;
agentRole: string;
agentGoal: string;
agentBackground: string | null;
toolNames: string[];
assembledContext: string;
sentAt: string;
}
Reducer: Store on LiveTask as taskInput?: TaskInputMessage.
3.3 IO-003: Persist LLM Iterations in Late-Join Snapshots¶
Module: agentensemble-web
Problem: broadcastEphemeral() means late-joining clients never see prior LLM
iterations. The conversation history is lost.
Solution:
-
WebSocketStreamingListenermaintains a per-task ring buffer of the last N iteration pairs (LlmIterationStarted+LlmIterationCompleted), where N defaults to 5 and is configurable viaWebDashboard.builder().maxSnapshotIterations(int). -
The
hellosnapshot message gains an optionalrecentIterationsfield:
public record HelloMessage(
// ... existing fields ...
List<IterationSnapshot> recentIterations // NEW
) implements ServerMessage {}
Where IterationSnapshot pairs a started + completed message for one iteration.
-
When a new client connects and receives the
hellomessage, the vizliveReducerhydrates the conversation state fromrecentIterations. -
The ring buffer is cleared when a new ensemble run starts.
3.4 IO-004: Viz Tool Call Detail Panel -- COMPLETE¶
Module: agentensemble-viz
Expand tool call entries in the live conversation/timeline view:
- Expandable card: Click a tool call to expand and see:
toolArgumentsrendered as syntax-highlighted JSONtoolResultrendered as formatted text (auto-detected JSON gets highlighting)structuredResultrendered as a collapsible JSON tree if presentoutcomebadge: green "SUCCESS" or red "FAILURE"durationMsshown as a timing badge- Task association: Group tool calls under their parent task using
taskIndex - Collapsed state: Shows tool name, outcome badge, and duration as a compact row
3.5 IO-005: Viz Agent Conversation Thread View -- COMPLETE¶
Module: agentensemble-viz
New AgentConversationPanel component showing the full ReAct reasoning chain per task:
- Header: Task input summary from
TaskInputEvent-- agent role, goal, available tools, assembled context (collapsible) - Iteration cards: Each LLM iteration rendered as:
- Input section: Messages sent to the LLM (system prompt, user message, prior assistant/tool messages) -- rendered as chat bubbles with role labels
- Output section: LLM response -- either final answer text or tool call requests
- Tool results: If the iteration produced tool calls, show the tool results inline before the next iteration
- Metadata: Token usage (input/output) and latency per iteration
- Navigation: Click an iteration to jump to it; current iteration highlighted
4. Issue Dependency Graph¶
IO-001 (enrich ToolCallEvent) --+
+--> IO-004 (viz tool panel)
IO-002 (TaskInputEvent) --+
+--> IO-005 (viz conversation view)
IO-003 (late-join snapshots) --+
IO-001, IO-002, and IO-003 are independent Java-side changes. IO-004 and IO-005 depend on the Java work being complete.
5. Testing Strategy¶
IO-001¶
- Unit:
ToolCallEventTestverifying new fields;AgentExecutorTestverifying taskIndex and outcome are populated correctly for both success and failure cases - Integration:
WebSocketStreamingListenerTestverifying the wire message contains real taskIndex and outcome values
IO-002¶
- Unit:
TaskInputEventTest; verify event fired fromAgentExecutorwith correct assembled context - Integration:
WebSocketStreamingListenerTestverifyingTaskInputMessagebroadcast; end-to-end test that the event fires during a real ensemble run
IO-003¶
- Unit: Ring buffer capacity and eviction;
HelloMessageserialization withrecentIterations - Integration: Connect a second WebSocket client mid-execution and verify it receives iteration history in the hello snapshot
IO-004 / IO-005¶
- Unit: Component tests with mock data verifying rendering of tool arguments, tool results, conversation messages, token badges
- E2E: Playwright test verifying tool detail expansion and conversation thread rendering during a live ensemble run
6. Design Decisions¶
| Decision | Rationale |
|---|---|
Add fields to existing ToolCallEvent rather than a new event |
The information is intrinsically part of the tool call; a separate event would fragment the lifecycle |
TaskInputEvent is separate from TaskStartEvent |
TaskStartEvent fires at the workflow level before context assembly; TaskInputEvent fires at the executor level after context is assembled -- they serve different purposes |
| Ring buffer for iteration snapshots rather than full history | Unbounded storage of all iterations could consume significant memory for long-running ensembles; a configurable cap keeps it bounded |
| Ephemeral flag retained for real-time iteration messages | Late-join snapshot handles history; real-time messages still skip the snapshot buffer for performance |
outcome as String rather than enum in the event record |
Keeps the callback package free of additional enum types; the values are a closed set ("SUCCESS", "FAILURE") documented in Javadoc |