AgentEnsemble v2.0.0 Architecture Design¶
Overview¶
AgentEnsemble v2.0.0 is a major version that shifts the framework from an agent-first to a task-first paradigm. The central insight is that users care about what work needs to be done, not about constructing agent personas. Agent composition becomes an implementation detail managed invisibly by the framework.
Three new architectural pillars accompany this shift:
- Task-scoped cross-execution memory -- tasks declare named memory scopes that persist
across separate
ensemble.run()invocations - Human-in-the-loop review gates -- tasks can pause before, during, or after execution to collect external input (human approval, correction, or clarification)
- Graceful partial results --
EnsembleOutputis redesigned to treat incomplete pipelines as a first-class outcome
Because this is a semantic-versioned major release, all of the above are delivered as breaking changes with a provided migration guide. No compatibility shim is carried.
1. Paradigm: Task-First, Agent-Invisible¶
Motivation¶
The v1.x API requires users to think in terms of three tightly coupled concepts:
// v1.x: define agent, define task, wire them together, define ensemble
Agent researcher = Agent.builder()
.role("Researcher").goal("...").backstory("...")
.chatLanguageModel(model).tools(webSearchTool).build();
Task task = Task.builder()
.description("Research AI trends").expectedOutput("Summary")
.agent(researcher)
.build();
Ensemble.builder().agents(researcher).tasks(task)
.workflow(Workflow.SEQUENTIAL).build().run();
For the majority of use cases the Agent object is accidental complexity. Users think
"I need research done, then a report written" -- not "I need a Researcher persona with a
specific backstory, then I wire it to a task."
v2.0 API: Zero-Ceremony Path¶
// Simplest form: no agents declared
EnsembleOutput output = Ensemble.run(model,
Task.of("Research the latest AI trends in healthcare"),
Task.of("Write a 1000-word blog post from the research")
);
v2.0 API: Configured Path (still no agents)¶
EnsembleOutput output = Ensemble.builder()
.chatLanguageModel(model)
.memoryStore(MemoryStore.embeddings(embeddingModel, embeddingStore))
.reviewHandler(ReviewHandler.console())
.task(Task.builder()
.description("Research AI trends in healthcare")
.expectedOutput("Detailed research summary with citations")
.tools(webSearchTool, webScraperTool)
.outputType(ResearchReport.class)
.memory("market-intel")
.build())
.task(Task.builder()
.description("Write blog post from the research")
.expectedOutput("1000-word blog post in Markdown")
.review(Review.required("Approve the blog post draft"))
.build())
.run();
v2.0 API: Explicit Agent (Power-User Escape Hatch)¶
Agent remains available for users who need explicit persona control, shared agent identity
across multiple tasks, or per-agent LLM selection:
Agent researcher = Agent.builder()
.role("Senior Healthcare Analyst").goal("...").backstory("...")
.chatLanguageModel(anthropicModel)
.build();
Task.builder()
.description("Research AI trends")
.agent(researcher) // opt-in: binds an explicit agent
.build();
Agent Auto-Synthesis¶
When no agent is declared on a task, the framework synthesizes one from the task description. The synthesis strategy is pluggable:
public interface AgentSynthesizer {
Agent synthesize(Task task, SynthesisContext context);
static AgentSynthesizer template() { ... } // default: template-based
static AgentSynthesizer llmBased() { ... } // LLM-generated persona (opt-in)
}
The template-based synthesizer (default) derives role/goal/backstory from the task description using configurable templates. No extra LLM call is required.
The LLM-based synthesizer invokes the LLM to generate an optimal persona for the task. It produces higher-quality personas at the cost of an additional LLM call per task.
Configured at the ensemble level:
Ensemble.builder()
.agentSynthesizer(AgentSynthesizer.template()) // default; explicit for clarity
// or: .agentSynthesizer(AgentSynthesizer.llmBased())
// or: .agentSynthesizer(myCustomSynthesizer)
What Moves from Agent to Task¶
| Capability | v1.x Location | v2.0 Location |
|---|---|---|
tools |
Agent (required for tool use) | Task (preferred) or Agent (when explicit) |
chatLanguageModel |
Agent (required) | Task (per-task override) or Ensemble (default) |
guardrails |
Task | Task (unchanged) |
memory |
Ensemble / EnsembleMemory | Task (scoped, see section 2) |
maxIterations |
Agent | Task or Ensemble default |
role, goal, backstory |
Agent (required) | Agent (optional; auto-synthesized otherwise) |
Breaking Changes to Ensemble.builder()¶
| v1.x method | v2.0 |
|---|---|
.agents(agent, ...) |
Removed; agents are auto-synthesized or set per-task via Task.agent() |
.workflow(Workflow.X) |
Retained; now optional (inferred by default -- see section 5) |
.memory(EnsembleMemory) |
Replaced by .memoryStore(MemoryStore) + task-level .memory(scope) |
2. Task-Scoped Cross-Execution Memory¶
Problem with v1.x Memory¶
The v1.x memory system (EnsembleMemory, MemoryContext) is configured at the ensemble
level and scoped to the lifetime of a single ensemble.run() invocation. There is no
first-class mechanism for tasks to share knowledge across separate runs.
Named Memory Scopes¶
In v2.0, tasks declare one or more named memory scopes. A scope is a logical namespace within the backing store. Memory is written at the end of task execution and read at the start of the next execution of any task with the same scope.
Task.builder()
.description("Research competitor pricing for Q1")
.memory("competitor-research") // single scope
.build();
Task.builder()
.description("Analyze pricing trends over time")
.memory("competitor-research", "market-trends") // reads from both scopes
.build();
Execution semantics: - Write: at task completion, the task output (raw text and optional structured output) is stored in each declared scope - Read: at task startup, relevant entries from each declared scope are retrieved and injected into the agent prompt as context - Edited output: if a review gate modifies the task output, the edited version is what gets stored in memory - Isolation: a task can only read from scopes it explicitly declares
MemoryStore SPI¶
The backing store is configured once at the ensemble level. Scopes are logical partitions:
public interface MemoryStore {
void store(String scope, MemoryEntry entry);
List<MemoryEntry> retrieve(String scope, String query, int maxResults);
void evict(String scope, EvictionPolicy policy);
static MemoryStore embeddings(EmbeddingModel model, EmbeddingStore store) { ... }
static MemoryStore inMemory() { ... } // dev / testing
}
Ensemble.builder()
.memoryStore(MemoryStore.embeddings(embeddingModel, store))
.task(task1)
.task(task2)
.run();
Memory Configuration per Scope¶
Tasks can configure scope-specific eviction:
Task.builder()
.description("Research competitor pricing")
.memory(MemoryScope.builder()
.name("competitor-research")
.keepLastEntries(10) // keep last 10 entries
// or: .keepEntriesWithin(Duration.ofDays(30))
.build())
.build();
MemoryTool¶
Agents can explicitly query or write to memory during execution via a built-in MemoryTool.
This is in addition to the automatic prompt injection at task startup:
Task.builder()
.description("Find relevant past findings and extend them")
.memory("research-history")
.tools(MemoryTool.of("research-history")) // explicit mid-task access
.build();
3. Human-in-the-Loop Review System¶
Concept: Review Gates¶
A review gate is a point in task execution where external input is requested. Execution pauses until input is received or a timeout expires.
Review gates support three timing points:
| Timing | API | Use Case |
|---|---|---|
| Before execution | Task.builder().beforeReview(Review...) |
Gate expensive or sensitive tasks |
| During execution | HumanInputTool in the agent's tool list |
Agent asks clarifying questions mid-task |
| After execution | Task.builder().review(Review...) |
Review output before it passes to the next task |
ReviewHandler SPI¶
public interface ReviewHandler {
/**
* Present a review request to an external source.
* Blocks until a decision is made or the timeout expires.
*/
ReviewDecision review(ReviewRequest request);
}
public sealed interface ReviewDecision permits ReviewDecision.Continue,
ReviewDecision.Edit, ReviewDecision.ExitEarly {
record Continue() implements ReviewDecision {}
record Edit(String revisedOutput) implements ReviewDecision {}
record ExitEarly() implements ReviewDecision {}
}
Built-in implementations:
ReviewHandler.console() // CLI with stdin, countdown timer display
ReviewHandler.web(URI callbackUrl) // Webhook-based (for production deployments)
ReviewHandler.autoApprove() // Testing / CI: always continues immediately
ReviewHandler.autoApproveWithDelay(Duration) // Simulates human review timing in tests
Timeout and Default Actions¶
Every review gate has a configurable timeout with a default action when time expires:
Review.builder()
.prompt("Review the pricing recommendation memo")
.timeout(Duration.ofMinutes(5))
.onTimeout(Review.CONTINUE) // auto-continue when timeout expires
// or: .onTimeout(Review.EXIT_EARLY)
// or: .onTimeout(Review.FAIL)
.build()
CLI Interaction (ConsoleReviewHandler)¶
=== Task Complete: Draft pricing recommendation memo ===
Output:
Based on competitor analysis, I recommend a 12% price reduction
on the Standard tier and a 5% increase on Enterprise...
[Truncated -- full output in trace]
[c] Continue [e] Edit [x] Exit early (auto-continue in 4:58)
> _
During-Execution Review (HumanInputTool)¶
An agent can request clarification mid-task via a HumanInputTool. The ReviewHandler
surfaces the request to the same CLI/web interface:
Task.builder()
.description("Draft a marketing email")
.tools(HumanInputTool.of()) // agent can pause and ask questions
.build()
During execution, the agent invokes the tool:
=== Input Requested: Draft a marketing email ===
Agent asks: "Should the email focus on the enterprise or SMB segment?"
[type response and press Enter] (auto-skip in 2:00)
> Enterprise segment, especially CISO persona
The timeout for during-execution requests is configurable separately from post-execution review.
Ensemble-Level Review Policy¶
A global ReviewHandler and ReviewPolicy can be set at the ensemble level. Individual
tasks can override:
Ensemble.builder()
.reviewHandler(ReviewHandler.console())
.reviewPolicy(Review.afterEveryTask()) // review each task output
// or: .reviewPolicy(Review.afterLastTask()) // only final output
// or: .reviewPolicy(Review.never()) // fully autonomous (default)
.task(task1)
.task(task2)
.run();
// Individual task overrides
Task.builder()
.description("Execute financial transaction")
.review(Review.required()) // always review, even if ensemble default is never
.build();
Task.builder()
.description("Format the report header")
.review(Review.skip()) // skip, even if ensemble default is always
.build();
4. Partial Results and Graceful Exit-Early¶
Problem with v1.x EnsembleOutput¶
The current EnsembleOutput assumes all tasks ran to completion. An error or user-triggered
exit leaves users with either a thrown exception or no results at all.
v2.0 EnsembleOutput¶
EnsembleOutput is redesigned to treat partial completion as a first-class, non-error outcome:
EnsembleOutput output = ensemble.run();
output.isComplete(); // true only when all tasks completed
output.completedTasks(); // list of outputs for completed tasks (always safe to call)
output.exitReason(); // COMPLETED | USER_EXIT_EARLY | TIMEOUT | ERROR
output.lastCompletedOutput(); // convenience: last task that finished
output.taskOutputs(); // all completed outputs (unchanged method name)
// Accessing by task identity
output.getOutput(task1); // returns Optional<TaskOutput>
Exit-Early Contract¶
The framework guarantees that any task that completed before an exit-early decision:
- Has its output included in EnsembleOutput.completedTasks()
- Has its output persisted to any declared memory scopes
- Can be inspected via output.getOutput(task)
This enables resumable workflows in future iterations: a subsequent run can read from memory scopes populated by completed tasks in a prior run.
ExitReason¶
public enum ExitReason {
COMPLETED, // all tasks finished normally
USER_EXIT_EARLY, // user selected "exit early" at a review gate
TIMEOUT, // a review gate timeout expired with onTimeout(EXIT_EARLY)
ERROR // an unrecoverable error terminated the pipeline
}
5. Workflow Inference¶
Default: Sequential by Declaration Order¶
When no workflow is specified, tasks execute sequentially in the order they are declared. Context flows forward automatically.
Inferred Parallelism from Context Declarations¶
When tasks declare explicit context(...) dependencies, the framework constructs a DAG
and executes tasks in parallel where the dependency graph allows:
// Framework infers: tasks 1 and 2 run in parallel, task 3 waits for both
Ensemble.builder()
.chatLanguageModel(model)
.task(Task.builder().description("Analyze market A").build())
.task(Task.builder().description("Analyze market B").build())
.task(Task.builder()
.description("Combine market analyses")
.context("Analyze market A", "Analyze market B")
.build())
.run();
Explicit Override¶
Users who want to force a specific execution strategy can still declare it explicitly:
Ensemble.builder()
.workflow(Workflow.SEQUENTIAL) // force sequential regardless of DAG structure
.workflow(Workflow.PARALLEL) // force all tasks into DAG executor
.workflow(Workflow.HIERARCHICAL) // manager-worker delegation (unchanged semantics)
6. Module Structure¶
The monolithic agentensemble-core is split into focused modules with clear SPI boundaries:
| Module | Contents | Required |
|---|---|---|
agentensemble-core |
Task, Ensemble, Agent, AgentSynthesizer, EnsembleOutput, workflow engine, template resolver, validation, exception hierarchy | Yes |
agentensemble-memory |
MemoryStore SPI, MemoryScope, MemoryEntry, InMemoryStore, EmbeddingStoreLongTermMemory, MemoryTool | Optional |
agentensemble-review |
ReviewHandler SPI, Review, ReviewDecision, ReviewRequest, ConsoleReviewHandler, HumanInputTool | Optional |
agentensemble-metrics-micrometer |
MicrometerToolMetrics (unchanged) | Optional |
agentensemble-devtools |
DagExporter, EnsembleDevTools (unchanged) | Optional |
agentensemble-tools/* |
Built-in tool modules (unchanged) | Optional |
agentensemble-viz |
Visualization tooling (unchanged) | Optional |
agentensemble-examples |
Updated examples (task-first API) | No |
agentensemble-bom |
Bill of Materials for all modules | No (convenience) |
Minimal dependency (core framework only):
Full dependency via BOM:
implementation(platform("net.agentensemble:agentensemble-bom:2.0.0"))
implementation("net.agentensemble:agentensemble-core")
implementation("net.agentensemble:agentensemble-memory")
implementation("net.agentensemble:agentensemble-review")
7. SPI Boundary Definitions¶
These interfaces define the contracts between modules. They must be stable before parallel development begins on each workstream.
AgentSynthesizer (core)¶
package net.agentensemble;
public interface AgentSynthesizer {
Agent synthesize(Task task, SynthesisContext context);
}
public record SynthesisContext(ChatLanguageModel model, Locale locale) {}
MemoryStore (agentensemble-memory)¶
package net.agentensemble.memory;
public interface MemoryStore {
void store(String scope, MemoryEntry entry);
List<MemoryEntry> retrieve(String scope, String query, int maxResults);
void evict(String scope, EvictionPolicy policy);
}
public record MemoryEntry(String content, Object structuredContent,
Instant storedAt, Map<String, String> metadata) {}
ReviewHandler (agentensemble-review)¶
package net.agentensemble.review;
public interface ReviewHandler {
ReviewDecision review(ReviewRequest request);
}
public record ReviewRequest(
String taskDescription,
String taskOutput,
ReviewTiming timing,
Duration timeout
) {}
public enum ReviewTiming { BEFORE_EXECUTION, DURING_EXECUTION, AFTER_EXECUTION }
public sealed interface ReviewDecision permits ReviewDecision.Continue,
ReviewDecision.Edit, ReviewDecision.ExitEarly {
record Continue() implements ReviewDecision {}
record Edit(String revisedOutput) implements ReviewDecision {}
record ExitEarly() implements ReviewDecision {}
}
8. Breaking Changes Summary¶
| Area | v1.x | v2.0 |
|---|---|---|
| Agent declaration | Required for every task | Optional; auto-synthesized when absent |
Ensemble.builder().agents(...) |
Required | Removed |
Task.agent(...) |
Required | Optional |
| Tools | Declared on Agent | Declared on Task (primary); Agent retains when explicit |
chatLanguageModel |
Declared on Agent | Declared on Task (per-task) or Ensemble (default) |
| Memory | Ensemble.memory(EnsembleMemory) -- run-scoped |
Task.memory(scope) + Ensemble.memoryStore(...) -- cross-run |
EnsembleOutput |
Assumes full completion | isComplete(), exitReason(), completedTasks() |
| Workflow | Must be explicitly declared | Inferred; explicit override available |
| Module structure | Everything in agentensemble-core |
Split into core + memory + review |
9. Implementation Sequencing¶
MapReduceEnsemble (issues #98, #99, #100) ships in the current v1.x API before this work begins. The MapReduce implementation will be refactored as part of the v2.0 migration.
Parallel Workstreams¶
The following groups can be developed in parallel once the SPI contracts (section 7) are agreed and documented:
Group A -- Core refactor (foundation; other groups depend on this)
- Task-first core: Task absorbs tools, LLM, maxIterations from Agent; Ensemble.builder().agents() removed
- Agent auto-synthesis: AgentSynthesizer SPI, template-based default, LLM-based opt-in
Group B -- Memory (parallel with Group A; depends on SPI contract)
- Module split: extract agentensemble-memory from core
- Task-scoped cross-execution memory: named scopes, MemoryStore SPI, TTL/eviction config
Group C -- Review (parallel with Groups A and B; depends on SPI contract)
- Module split: extract agentensemble-review from core
- ReviewHandler SPI + ConsoleReviewHandler: CLI interaction, timeout countdown
- Review gates: before/during/after timing, ensemble-level policy, HumanInputTool
Group D -- Output and Workflow (depends on Group A)
- Partial results: EnsembleOutput redesign, ExitReason, exit-early memory persistence guarantee
- Workflow inference: DAG inference from context() declarations, explicit override preserved
Group E -- MapReduce refactor (depends on Group A + #98-100 merging)
- Rework MapReduceEnsemble to task-first paradigm
Group F -- Finalization (depends on Groups A + B + C)
- BOM module: agentensemble-bom covering all v2.0 modules
- Migration guide: docs/migration/v1-to-v2.md, updated examples, updated docs
10. Migration from v1.x¶
A migration guide will be provided at docs/migration/v1-to-v2.md covering:
- Translating
Agent.builder()declarations intoTask.builder()configuration - Moving tools from agents to tasks
- Moving
chatLanguageModelfrom agents to the ensemble or individual tasks - Replacing
Ensemble.memory(EnsembleMemory)with task-scoped memory - Updating module dependencies for the new module structure
- Handling the new
EnsembleOutputshape
A mechanical translation is possible for all common v1.x patterns. The only case requiring
architectural thought is when a single agent is deliberately reused across multiple tasks
for shared state or persona continuity -- users should continue to use explicit Agent
declarations via Task.agent(...) in that case.