Whitepaper 09: Orchestrator Design Patterns for AI Security Agents

Author: Khushal Suthar Date: June 2026 Series: Autonomous Penetration Testing with AI Agents Category: Systems Design — Coordination Patterns

Executive Summary

The orchestrator is the brain of an autonomous pentesting system. It does not find vulnerabilities or run exploits; it decides what to work on next, which agent should do it, how to combine results, and when to stop. A well-designed orchestrator multiplies the effectiveness of its sub-agents; a poorly-designed one creates contention, wasted work, and missed correlations. This paper catalogs the design patterns that have proven effective for orchestrating AI security agents, drawn from production autonomous pentesting systems and adjacent multi-agent literature. Each pattern is presented with its structure, strengths, failure modes, code structure, example scenario, and guidance on when to apply it.

We catalog eight orchestration patterns — Sequential Pipeline, Parallel Fan-Out, Event-Driven, Hierarchical, Hybrid, Planner-Executor, Blackboard, and Competitive — along with four anti-patterns that consistently fail in production. A selection matrix maps engagement characteristics to recommended patterns. Cross-cutting concerns — error handling, state management, observability, budget enforcement, and conflict resolution — are examined in detail. Finally, we show how the Custom Orchestrator introduced in Whitepaper 03 (wp03) implements a Hybrid pattern, combining pipeline flow, parallel dispatch, and event-driven triggers into a single adaptive coordinator.

1. The Orchestrator's Job

Before cataloging patterns, we must define the orchestrator's responsibilities precisely. The orchestrator is the executive function of the system. Its job is:

Goal decomposition: Translate engagement objectives ("assess the security of this /24 and the web app on 10.0.0.5") into phase-level and task-level sub-goals.

Resource allocation: Decide which sub-agent works on which target, in what order, with what model tier and context budget.

Flow control: Determine when to move from one phase to the next, when to parallelize, when to wait, when to escalate.

Synthesis: Combine findings from multiple sub-agents into a coherent world model and attack chains.

Termination: Decide when the engagement is complete — when enough of the attack surface has been explored, enough chains have been attempted, and the marginal value of continued work is low.

The orchestrator does not interpret tool output, select specific exploits, or write findings. That is the sub-agents' job. The orchestrator operates at a higher level of abstraction — phases, targets, agents — and its context is correspondingly small (engagement state, not tool output).

This separation is deliberate: an orchestrator that also does detailed reasoning becomes a single-agent system with extra steps, inheriting all the context-window and cost problems of single-agent design (Whitepapers 06, 07). The orchestrator's context budget should be measured in hundreds of findings, not thousands of tool outputs. It reasons over summaries and structured state, not raw packets and command output.

Design Principles

All patterns in this catalog adhere to four design principles:

Context discipline: Every agent — including the orchestrator — has a bounded context. The orchestrator never ingests raw tool output; it consumes structured findings and phase summaries.
Budget discipline: Every agent has a token and call-count budget. Budget exhaustion is a normal termination, not a failure.
Provenance discipline: Every finding carries provenance (producing agent, tool, input reference, timestamp). Without provenance, debugging and reporting are impossible.
Adaptivity: The orchestrator should adapt its strategy to engagement conditions, not rigidly follow a single pattern from start to finish.

2. Pattern Catalog

The patterns are organized by the problem they solve. A production orchestrator typically combines several patterns; they are not mutually exclusive. For each pattern we provide: a description, when to use it, advantages, disadvantages, a code structure sketch, and an example scenario.

Pattern 1: Sequential Pipeline

Description: Phases execute in order: Recon → Enumeration → Vulnerability Analysis → Exploitation → Lateral Movement → Reporting. Each phase's output is the next phase's input. The orchestrator advances the phase when the current phase's exit criteria are met. This is the simplest orchestration pattern and the natural starting point for any system.

[Recon] → [Enumeration] → [Vuln Analysis] → [Exploitation] → [Reporting]

When to use: Small engagements, early-stage systems, single-host assessments, or as a baseline that more complex patterns extend. Also suitable for engagements where the operator wants maximum predictability and minimal autonomous decision-making.

Advantages:

Simple to implement and debug. The state machine has one variable: current phase.
Predictable: the operator always knows what phase is running and what comes next.
Clean context boundaries: each sub-agent's context is scoped to its phase. No context contamination across phases.
Easy exit criteria: "enumeration is done when all hosts have been port-scanned and services identified."
Straightforward budgeting: allocate budget per phase, spend it, advance.

Disadvantages:

Slow: no parallelism across phases. Each phase waits for the previous to complete fully.
Rigid: a finding in exploitation that warrants re-enumeration requires going "back," which the linear pipeline does not naturally support. The orchestrator must either ignore the finding (bad) or implement a phase-revisit mechanism (which starts to look like the Hybrid pattern).
Phase boundaries are artificial: in practice, enumeration and vulnerability analysis overlap. A human pentester does not finish enumeration before starting to think about vulnerabilities.
Underutilizes resources: while one sub-agent works on exploitation, the enumeration and recon agents sit idle.

Code structure:

class SequentialPipelineOrchestrator:
PHASES = ["recon", "enumeration", "vuln_analysis", "exploitation", "reporting"]
def __init__(self, world_model, agent_registry, budget_ledger): self.world_model = world_model self.agents = agent_registry self.budget = budget_ledger self.current_phase = 0
def run(self, engagement): while self.current_phase < len(self.PHASES): phase = self.PHASES[self.current_phase] agent = self.agents.get(phase) budget = self.budget.allocate(phase)
result = agent.execute( targets=engagement.targets, world_model=self.world_model, budget=budget, )
if self.exit_criteria_met(phase, result): self.current_phase += 1 else: # Re-run phase with adjusted scope, or escalate self.handle_incomplete_phase(phase, result)
def exit_criteria_met(self, phase, result): criteria = EXIT_CRITERIA[phase]  # e.g., "all hosts port-scanned" return criteria.evaluate(self.world_model)

Example scenario: A single-host assessment of a web server at 10.0.0.5. The orchestrator runs recon (port scan, DNS), then enumeration (service identification, web directory brute-forcing), then vulnerability analysis (matching services to known CVEs), then exploitation (attempting matched exploits), then reporting. Each phase completes fully before the next begins. Total engagement takes 45 minutes and produces a clean report. The pipeline works well here because the scope is small enough that phase boundaries are natural, and there is no parallelism to exploit.

Pattern 2: Parallel Fan-Out

Description: The orchestrator identifies independent work units (e.g., one subnet, one web app, one AD domain) and dispatches a sub-agent per unit. Sub-agents run concurrently and report findings back to a shared world model. After all agents complete, a synthesis step combines findings into cross-unit attack chains.

                    ┌→ [Sub-agent A: subnet 1] →┐
[Orchestrator] ────┼→ [Sub-agent B: subnet 2] ──┼→ [Synthesis] → [Orchestrator] └→ [Sub-agent C: web app] ───┘

When to use: When the engagement scope naturally decomposes into independent targets — multiple subnets, multiple applications, segmented networks, or multi-domain Active Directory environments.

Advantages:

Fast: N work units complete in approximately 1/N the time (bounded by the slowest unit).
Each sub-agent's context is bounded to its work unit — no context contamination from other targets.
Scales horizontally: more work units → more sub-agents (up to API rate limits and budget).
Fault isolation: one sub-agent's failure does not block others.

Disadvantages:

Requires identifying independent work units. If units are not truly independent, findings in one affect another (e.g., a credential from subnet A is valid on subnet B), and the orchestrator must propagate — which the pure fan-out pattern does not handle.
Synthesis is non-trivial: combining findings from N agents into coherent chains requires a synthesis step that itself needs context management and may miss chains that require knowledge of multiple units.
Cost multiplies: N concurrent agents each make calls. Budget must be enforced per-agent and globally.
Coordination overhead: the orchestrator must track N concurrent agents, their status, budget consumption, and findings streams.

Code structure:

class ParallelFanOutOrchestrator:
def __init__(self, world_model, agent_factory, budget_ledger, max_concurrency=5): self.world_model = world_model self.agent_factory = agent_factory self.budget = budget_ledger self.semaphore = asyncio.Semaphore(max_concurrency)
async def run(self, work_units): tasks = [self.dispatch_unit(unit) for unit in work_units] results = await asyncio.gather(*tasks, return_exceptions=True) await self.synthesize(results) return self.world_model
async def dispatch_unit(self, unit): async with self.semaphore: agent = self.agent_factory.create(unit.type) budget = self.budget.allocate(unit.id, cap=unit.estimated_cost) try: result = await agent.execute( target=unit, world_model=self.world_model, budget=budget, ) self.world_model.merge_findings(result.findings) return result except BudgetExhausted: return PartialResult(unit, reason="budget_exhausted") except Exception as e: return FailedResult(unit, error=str(e))
async def synthesize(self, results): # Cross-unit chain discovery: credentials from unit A valid on unit B synth_agent = self.agent_factory.create("synthesis") await synth_agent.execute(world_model=self.world_model, budget=...)

Example scenario: An engagement covers three subnets (10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24) and one external web app. The orchestrator dispatches four sub-agents concurrently. The subnet agents each perform recon, enumeration, and vuln analysis on their subnet. The web app agent does the same for the external app. All four run in parallel. After they complete, a synthesis agent examines the combined world model and discovers that credentials found on subnet A's SMB share are valid for the external web app's admin panel — a cross-unit chain that no individual agent would have found alone.

Pattern 3: Event-Driven Reaction

Description: The orchestrator does not pre-plan all work. Instead, it registers triggers — conditions on findings that spawn new sub-agent tasks. When a finding matching a trigger is written to the world model, a new sub-agent is dispatched. The system is reactive: it responds to opportunities as they arise.

[Finding: open port 445]   → trigger: "SMB open"         → dispatch [SMB enum agent]
[Finding: SMB guest access] → trigger: "guest access"    → dispatch [Share enum agent] [Finding: sensitive share]  → trigger: "sensitive data"  → dispatch [Exfil assessment agent]

When to use: For high-value finding types where rapid follow-up is critical — a credential discovery should immediately trigger lateral-movement assessment, not wait for a phase transition. Also useful in continuous monitoring contexts where new findings arrive over time.

Advantages:

Responsive: the system reacts to opportunities as they arise, not on a fixed schedule.
Efficient: work is done only when triggered by relevant findings. No idle scanning of irrelevant targets.
Mirrors human pentester behavior: "I found something interesting, let me dig deeper right now."
Naturally extensible: add a new trigger rule to handle a new finding type without changing the core orchestration loop.

Disadvantages:

Can cascade: one finding triggers three agents, each producing findings that trigger three more. Without limits, the system expands exponentially.
Hard to predict: the operator cannot easily see "what will the agent do next" because it depends on findings not yet discovered.
Termination is harder: there is no fixed "end of phase." The system must decide when to stop reacting, which requires a progress metric or an idle timeout.
Trigger design is critical: overly broad triggers fire too often (noise); overly narrow triggers miss opportunities.

Code structure:

class EventDrivenOrchestrator:
def __init__(self, world_model, agent_factory, budget_ledger): self.world_model = world_model self.agent_factory = agent_factory self.budget = budget_ledger self.triggers = []  # List of Trigger objects self.active_agents = {}  # finding_id -> agent task
def register_trigger(self, trigger): self.triggers.append(trigger)
async def run(self, engagement): self.world_model.subscribe(self.on_finding) # Seed initial recon agents await self.dispatch_recon(engagement) # Main loop: wait for triggers, dispatch agents, check termination while not self.termination_condition(): await asyncio.sleep(POLL_INTERVAL) self.check_progress()
async def on_finding(self, finding): for trigger in self.triggers: if trigger.matches(finding) and trigger.can_fire(finding): await self.fire_trigger(trigger, finding)
async def fire_trigger(self, trigger, finding): if self.concurrency_limit_reached(): trigger.queue(finding) return agent = self.agent_factory.create(trigger.agent_type) budget = self.budget.allocate( f"trigger:{trigger.id}:{finding.id}", cap=trigger.budget_cap, ) task = asyncio.create_task(agent.execute( target=finding, world_model=self.world_model, budget=budget )) self.active_agents[finding.id] = task trigger.record_fire(finding)  # cooldown tracking

Example scenario: During enumeration, a sub-agent discovers port 445 open on host 10.0.0.15. This finding triggers an SMB enumeration agent. The SMB agent discovers guest access is enabled, which triggers a share enumeration agent. The share agent finds a sensitive share containing credentials, which triggers a lateral-movement assessment agent. The entire chain from "port open" to "lateral movement attempted" unfolds in under three minutes, without any phase transition or orchestrator decision. The orchestrator's only role is maintaining the trigger registry and enforcing concurrency limits.

Pattern 4: Hierarchical Delegation

Description: A lead orchestrator decomposes the engagement into phase-level objectives and delegates each to a phase orchestrator. Phase orchestrators further decompose into task-level agents. Each layer has a clear abstraction boundary: the lead thinks in phases, phase orchestrators think in tasks, task agents think in tools.

[Lead Orchestrator]
├→ [Recon Phase Orchestrator] │       ├→ [Subnet scan agent] │       └→ [DNS enum agent] ├→ [Exploitation Phase Orchestrator] │       ├→ [Web exploit agent] │       └→ [Service exploit agent] └→ [Reporting Orchestrator] ├→ [Findings compiler] └→ [Narrative generator]

When to use: Large engagements — multiple domains, many targets, complex scope — where a single orchestrator cannot maintain coherent context over the full scope.

Advantages:

Scales to large engagements: the lead orchestrator's context stays small because it only sees phase-level summaries.
Each level has a coherent abstraction: lead thinks in phases, phase orchestrators think in tasks, task agents think in tools. No level is overwhelmed.
Failure isolation: a phase orchestrator failure does not crash the lead orchestrator. The lead can retry the phase with a different delegate or skip it.
Parallel phases: the lead can run independent phases concurrently (e.g., recon on subnet A while exploitation proceeds on an already-exploited host in subnet B).

Disadvantages:

Latency: each delegation layer adds a round-trip. For small engagements, the overhead is not worth it.
Context loss: summarization at each layer can drop details needed at the next layer up. The lead orchestrator may miss a critical detail because the phase orchestrator summarized it away.
Complexity: more moving parts, more failure modes, harder to debug. The call stack through three layers of orchestrators can be difficult to trace.
Contract enforcement: each layer must define and enforce an interface contract. Violations (a phase orchestrator returning 5,000 tokens of raw output instead of a 500-token summary) must be caught and re-prompted, adding overhead.

Code structure:

class LeadOrchestrator:
def __init__(self, world_model, phase_orchestrators, budget_ledger): self.world_model = world_model self.phase_orchestrators = phase_orchestrators  # dict: phase -> PhaseOrchestrator self.budget = budget_ledger
async def run(self, engagement): phase_plan = self.decompose(engagement) for phase in phase_plan: phase_orchestrator = self.phase_orchestrators[phase] budget = self.budget.allocate(phase, cap=self.phase_budget(phase_plan)) summary = await phase_orchestrator.execute( objective=phase_plan[phase], world_model=self.world_model, budget=budget, contract=PhaseContract(max_summary_tokens=500), ) self.validate_contract(summary, phase) self.world_model.merge_phase_summary(summary)
class PhaseOrchestrator: def __init__(self, task_agents, agent_factory): self.task_agents = task_agents
async def execute(self, objective, world_model, budget, contract): task_plan = self.decompose(objective) for task in task_plan: agent = self.task_agents[task.type] result = await agent.execute(task, world_model, budget.sub(task.id)) world_model.merge_findings(result.findings) return contract.summarize(world_model, objective)
class PhaseContract: def __init__(self, max_summary_tokens=500, min_confidence=0.7): self.max_summary_tokens = max_summary_tokens self.min_confidence = min_confidence
def summarize(self, world_model, objective): findings = world_model.query(objective.scope, confidence=self.min_confidence) summary = self.compress(findings, max_tokens=self.max_summary_tokens) return PhaseSummary(objective=objective, findings=findings, narrative=summary)
def validate(self, summary): if len(summary.narrative) > self.max_summary_tokens: raise ContractViolation("summary exceeds token limit")

Example scenario: A large enterprise engagement covers the DMZ (50 hosts), the internal network (200 hosts), and the AD domain (5 domain controllers, 50 member servers). The lead orchestrator decomposes this into three phase-level objectives: DMZ assessment, internal assessment, and AD assessment. It delegates DMZ to a phase orchestrator that manages subnet scanning, web enumeration, and exploitation for the 50 DMZ hosts. Meanwhile, it delegates AD assessment to a different phase orchestrator that manages Kerberos enumeration, AS-REP roasting, and bloodhound analysis. Each phase orchestrator runs 5–10 task agents in parallel. The lead orchestrator's context contains only three phase summaries — not the thousands of findings produced by the task agents.

Pattern 5: Hybrid (Pipeline + Parallel + Event-Driven)

Description: The pipeline defines the overall flow (phases advance in order), parallelism is used within phases (multiple targets processed concurrently), and event-driven triggers handle high-priority findings that cut across phases. This combines the predictability of the pipeline, the speed of parallelism, and the responsiveness of event-driven reaction.

Phase: Recon (parallel across subnets)
↓ [exit criteria met] Phase: Enumeration (parallel across hosts) ↓ trigger: credential found → [Lateral movement agent] (cross-phase) ↓ [exit criteria met] Phase: Vulnerability Analysis (parallel across services) ↓ [exit criteria met] Phase: Exploitation (serial per chain, parallel across chains)

When to use: Production systems. This is the recommended default pattern for any engagement beyond a single host. Most real autonomous pentesting systems converge on some variant of this pattern.

Advantages:

Combines the predictability of the pipeline, the speed of parallelism, and the responsiveness of event-driven triggers.
Matches how human pentest teams actually work — phases, but with parallel team members and ad-hoc "drop everything and look at this" moments.
Adaptive: the orchestrator can shift the balance between pipeline work and triggered work as the engagement evolves.
Handles the common case where a credential or access finding discovered during enumeration should immediately trigger lateral-movement work, not wait for the exploitation phase.

Disadvantages:

Most complex to implement. The orchestrator must manage a priority queue, phase state, trigger registry, and concurrent agents simultaneously.
Requires careful policy for when event-driven triggers preempt pipeline work. Without a clear policy, triggers can starve pipeline tasks.
Debugging is harder: the system's behavior is a product of both the planned pipeline and the reactive triggers, which can produce surprising execution traces.
Phase exit criteria become fuzzy: if triggered agents are still running when the pipeline wants to advance, should it wait or proceed?

Code structure:

class HybridOrchestrator:
PHASES = ["recon", "enumeration", "vuln_analysis", "exploitation", "reporting"]
def __init__(self, world_model, agent_factory, budget_ledger, triggers): self.world_model = world_model self.agent_factory = agent_factory self.budget = budget_ledger self.triggers = triggers self.priority_queue = asyncio.PriorityQueue() self.current_phase = 0 self.active_agents = set()
async def run(self, engagement): self.world_model.subscribe(self.on_finding) self.seed_pipeline_tasks(engagement)
while not self.termination_condition(): # Dispatch highest-priority task to next free agent slot if len(self.active_agents) < MAX_CONCURRENCY: task = await self.priority_queue.get() agent = self.agent_factory.create(task.agent_type) budget = self.budget.allocate(task.id, cap=task.budget_cap) asyncio_task = asyncio.create_task( self.run_agent(agent, task, budget) ) self.active_agents.add(asyncio_task)
# Check phase advancement if self.exit_criteria_met(self.current_phase): self.advance_phase()
await asyncio.sleep(POLL_INTERVAL)
async def on_finding(self, finding): for trigger in self.triggers: if trigger.matches(finding) and trigger.can_fire(finding): # Triggered tasks are HIGH priority; pipeline tasks are NORMAL self.priority_queue.put_nowait( WorkItem(priority=Priority.HIGH, trigger=trigger, finding=finding) ) trigger.record_fire(finding)
def seed_pipeline_tasks(self, engagement): phase = self.PHASES[self.current_phase] for target in engagement.targets: self.priority_queue.put_nowait( WorkItem(priority=Priority.NORMAL, phase=phase, target=target) )
def advance_phase(self): self.current_phase += 1 self.seed_pipeline_tasks(engagement)  # seed next phase

Example scenario: During the enumeration phase, a sub-agent discovers valid credentials for the Jenkins admin panel on host 10.0.0.20. This finding triggers a high-priority lateral-movement task that jumps the priority queue. The orchestrator dispatches a lateral-movement agent while enumeration continues on other hosts. The lateral-movement agent discovers that Jenkins has a script console, which triggers an exploitation agent. Meanwhile, the pipeline continues enumerating hosts 10.0.0.21–50. The orchestrator is simultaneously running pipeline enumeration and triggered exploitation — the hallmark of the Hybrid pattern. When enumeration's exit criteria are met (all hosts scanned), the pipeline advances to vulnerability analysis, but the triggered exploitation agent continues running until it completes or its budget is exhausted.

Pattern 6: Planner-Executor

Description: A planner agent produces a multi-step plan — a DAG of tasks with dependencies — and executor agents run the tasks. The orchestrator manages plan execution, handling success, failure, and re-planning. The plan is an explicit artifact that can be inspected, modified, and replayed.

[Planner] → Plan DAG: [Task A] → [Task B (depends on A)] → [Task C (depends on B)]
↓ [Orchestrator] → dispatches executors per task, respecting dependencies

When to use: When the engagement has a clear objective structure (e.g., "pivot from DMZ to internal network via the Jenkins host") and the steps are somewhat predictable. Also useful for compliance-driven engagements where the plan must be documented and approved before execution.

Advantages:

Explicit plan enables reasoning about why each task is being done. The plan is auditable.
Dependencies are explicit: Task B does not start until Task A's outputs are available. No wasted work on tasks whose preconditions are not met.
Re-planning on failure: if Task A fails, the planner can produce an alternative path to the same goal.
Plan visibility: the operator can inspect the plan before execution, modify it, or approve it. This is valuable for engagements requiring human oversight.
Replayability: the plan is a record of what was attempted and in what order.

Disadvantages:

Planning is expensive — a large LLM call — and may be wrong if the planner lacks information. The plan is only as good as the planner's knowledge of the target.
Plans are brittle: a plan built on assumptions that turn out to be wrong (e.g., "host X will be exploitable via CVE-Y") requires re-planning, which is costly and may produce a plan that is equally wrong.
Over-planning: the planner may produce a plan that is too detailed, not adapting to findings that emerge during execution. A 20-step plan where step 3 changes everything is wasted planning.
Under-planning: the planner may produce a plan that is too vague, leaving executors without enough guidance.

Code structure:

class PlannerExecutorOrchestrator:
def __init__(self, planner_agent, executor_factory, world_model, budget_ledger): self.planner = planner_agent self.executor_factory = executor_factory self.world_model = world_model self.budget = budget_ledger self.plan = None
async def run(self, objective): self.plan = await self.planner.generate_plan(objective, self.world_model) await self.execute_plan(self.plan)
async def execute_plan(self, plan): while not plan.is_complete(): ready_tasks = plan.get_ready_tasks()  # tasks whose deps are satisfied if not ready_tasks: if plan.has_failed_tasks(): await self.replan(plan) else: break  # deadlock — no ready tasks and no failures for task in ready_tasks: executor = self.executor_factory.create(task.type) budget = self.budget.allocate(task.id) result = await executor.execute(task, self.world_model, budget) if result.success: plan.mark_complete(task, result) else: plan.mark_failed(task, result) if self.assumption_invalidated(task, result): await self.replan(plan)
async def replan(self, plan): failed = plan.get_failed_tasks() completed = plan.get_completed_tasks() new_plan = await self.planner.replan( objective=plan.objective, completed=completed, failed=failed, world_model=self.world_model, ) self.plan = new_plan await self.execute_plan(new_plan)
class PlanDAG: def __init__(self, objective, tasks, dependencies): self.objective = objective self.tasks = {t.id: t for t in tasks} self.dependencies = dependencies  # task_id -> [dep_task_ids] self.status = {t.id: "pending" for t in tasks}
def get_ready_tasks(self): return [ self.tasks[tid] for tid, deps in self.dependencies.items() if self.status[tid] == "pending" and all(self.status[d] == "complete" for d in deps) ]

Example scenario: The objective is "pivot from the DMZ to the internal network." The planner produces a DAG: (1) enumerate DMZ hosts → (2) identify exploitable DMZ service → (3) exploit DMZ host → (4) enumerate internal network from DMZ host → (5) identify internal targets. Task 1 completes and reveals a Jenkins instance. Task 2 identifies Jenkins script console as the exploit path. Task 3 exploits it successfully. Task 4 runs from the compromised DMZ host and enumerates the internal network. But task 5 fails — the internal network is segmented and the DMZ host cannot reach the target. The orchestrator invokes the planner, which re-plans: instead of direct access from DMZ, try pivoting through the Jenkins host's outbound connections. A new task 5b is added: "enumerate Jenkins outbound network access." The plan adapts to reality.

Pattern 7: Blackboard

Description: All agents read from and write to a shared "blackboard" — the world model. There is no direct agent-to-agent communication. Agents monitor the blackboard for relevant changes and act on what they find. The blackboard is the sole coordination mechanism.

            [World Model / Blackboard]
↗        ↑         ↖ [Agent A]    [Agent B]    [Agent C] writes      reads &      writes findings    writes       chains

When to use: As a substrate that other patterns build on, not as a standalone orchestration pattern. The world model (Whitepaper 08) is a blackboard; the orchestrator provides the coordination layer on top. The pure blackboard pattern (agents self-organizing around the blackboard with no coordinator) is rarely sufficient alone but is a powerful building block.

Advantages:

Decoupled: agents do not need to know about each other, only about the blackboard schema. New agents can be added without modifying existing ones.
Flexible: the system can be extended at runtime by registering new agents that monitor the blackboard.
Natural synthesis: the blackboard is the synthesis — all findings converge there. No separate synthesis step is needed.
Schema-driven: well-structured schemas enable agents to query precisely what they need.

Disadvantages:

Coordination is implicit: no agent is responsible for deciding what to work on next. This can lead to duplicated work (two agents enumerate the same host) or gaps (no agent checks the interesting host because each assumed another would).
Requires a well-designed schema: if the blackboard is too unstructured, agents cannot find relevant information. If too rigid, agents cannot express novel findings.
Termination is unclear: who decides when the engagement is done? Without a coordinator, the system may run indefinitely.
Polling overhead: without efficient subscriptions, agents must poll the blackboard, wasting calls.

Code structure:

class Blackboard:
def __init__(self): self.findings = {}  # finding_id -> Finding self.subscriptions = defaultdict(list)  # topic -> [callbacks]
def write(self, finding): self.findings[finding.id] = finding for topic in finding.topics: for callback in self.subscriptions[topic]: callback(finding)
def query(self, filter_spec): return [f for f in self.findings.values() if filter_spec.matches(f)]
def subscribe(self, topic, callback): self.subscriptions[topic].append(callback)
class BlackboardAgent: def __init__(self, name, interests, action_fn, blackboard): self.name = name self.interests = interests  # topics this agent cares about self.action_fn = action_fn self.blackboard = blackboard
def activate(self): for topic in self.interests: self.blackboard.subscribe(topic, self.on_finding)
async def on_finding(self, finding): # Agent decides autonomously whether to act if self.should_act(finding): result = await self.action_fn(finding, self.blackboard) for new_finding in result.findings: self.blackboard.write(new_finding)
def should_act(self, finding): # Agent self-decides; no external coordinator return not self.blackboard.query( Filter(host=finding.host, type=self.action_type, status="complete") )

Example scenario: The blackboard contains findings from a recon phase: host 10.0.0.10 has ports 22, 80, 443 open. An SSH-enumeration agent is subscribed to the "port:22" topic and sees this finding. It decides to act (no existing SSH findings for this host) and writes new findings: "SSH version: OpenSSH 8.2, allows password auth." A web-enumeration agent subscribed to "port:80" sees the same host finding and writes: "Web server: Apache 2.4.41, directory listing enabled on /backup/." A credential-testing agent subscribed to "auth:password" sees the SSH finding and attempts password spraying. All of this happens without any central coordinator — each agent monitors the blackboard and acts autonomously. The weakness is that no agent is responsible for checking whether the engagement is complete, and two agents might both decide to enumerate the same web app.

Pattern 8: Competitive Multi-Agent

Description: Multiple agents work on the same objective independently, and the orchestrator selects the best result. Inspired by competitive programming and ensemble methods in machine learning. Diversity of approach is the key design principle — each agent should try a genuinely different strategy.

[Orchestrator] → "Find an exploit for service X on host Y"
├→ [Agent A: tries CVE-based approach]  → result A ├→ [Agent B: tries config-based approach] → result B └→ [Agent C: tries logic-based approach]  → result C [Orchestrator] → selects best result (or combines)

When to use: High-value objectives where finding any working path is worth the extra cost — e.g., "gain domain admin" in the final hours of an engagement, or "find any exploit for this critical service."

Advantages:

Diverse approaches increase the chance of finding a working path. Different agents try different angles (CVE-based, misconfiguration-based, logic-based).
Parallel: all agents run concurrently, so latency is that of the fastest successful agent.
Robust: if one agent's approach fails, others may succeed. The system does not depend on a single strategy.
Quality selection: the orchestrator can select the best result or combine insights from multiple results.

Disadvantages:

Expensive: N agents work on the same problem, but only one result is used. (N-1)× cost is "wasted" — though the diversity may be worth it for high-value objectives.
Selection is hard: how does the orchestrator decide which result is "best" without doing the work itself? It needs a verification step or a scoring rubric.
Can cause target disruption: multiple agents attempting exploits simultaneously may be more disruptive than a single agent, potentially triggering IDS/IPS or crashing services.
Diminishing returns: beyond 2–3 approaches, additional agents rarely find new paths but still consume budget.

Code structure:

class CompetitiveOrchestrator:
def __init__(self, agent_factory, budget_ledger, selector): self.agent_factory = agent_factory self.budget = budget_ledger self.selector = selector  # result selection strategy
async def run(self, objective): strategies = self.diversify(objective)  # 2-3 distinct approaches budget = self.budget.allocate(objective.id) per_agent_budget = budget.split(len(strategies))
tasks = [] for strategy, agent_budget in zip(strategies, per_agent_budget): agent = self.agent_factory.create(strategy.agent_type) task = asyncio.create_task( agent.execute( objective=objective, strategy=strategy, world_model=self.world_model, budget=agent_budget, ) ) tasks.append(task)
# First-success-wins: cancel remaining agents when one succeeds done, pending = await asyncio.wait( tasks, return_when=asyncio.FIRST_COMPLETED ) for task in pending: task.cancel()
results = [t.result() for t in done if not t.cancelled()] return self.selector.select_best(results, objective)
class ResultSelector: def select_best(self, results, objective): scored = [(r, self.score(r, objective)) for r in results] scored.sort(key=lambda x: x[1], descending=True) return scored[0][0]
def score(self, result, objective): # Prefer verified successes, then theoretical, then partial if result.verified_success: return 1.0 elif result.theoretical_success: return 0.5 else: return 0.1 * result.partial_progress

Example scenario: The objective is "gain domain admin in the corp.local domain." The orchestrator dispatches three agents with different strategies: Agent A tries the Kerberoasting path (find SPN accounts, request TGS, crack offline, use credentials). Agent B tries the AS-REP roasting path (find accounts with "Do not require Kerberos preauthentication," crack offline). Agent C tries the credential spraying path (spray common passwords against domain accounts). Agent A succeeds first — it finds an SPN account with a weak password, cracks it, and obtains domain admin via the account's privileges. The orchestrator cancels Agents B and C and uses Agent A's result. Total cost: 3× a single-agent approach, but the success rate is significantly higher because three independent strategies were tried.

3. Selection Guidance Matrix

No single pattern is universally best. The choice depends on engagement characteristics. The following matrix maps engagement properties to recommended patterns:

Pattern Selection by Phase

Within a single engagement, different phases may warrant different patterns:

Budget Considerations in Pattern Selection

In practice, the orchestrator should be pattern-aware: it selects the appropriate pattern based on the current phase and engagement state, and can switch patterns as the engagement evolves. A /24 assessment might start as Parallel Fan-Out (scan all subnets), transition to Hybrid (enumerate in parallel with event-driven exploitation), and end with Competitive Multi-Agent (multiple approaches to gain domain admin).

This pattern-switching is the hallmark of a mature orchestrator. A naive orchestrator picks one pattern and sticks with it; a mature one adapts.

4. Cross-Pattern Concerns

Regardless of which pattern is used, every orchestrator must address several cross-cutting concerns. These are not pattern-specific — they apply to all patterns and are often the difference between a working system and a failing one.

4.1 Error Handling

Multi-agent systems fail in more ways than single-agent systems. The orchestrator must handle:

Agent crashes: A sub-agent process dies or times out. The orchestrator must detect the failure (timeout, heartbeat loss), record partial findings if any were written to the world model, and decide whether to retry, re-dispatch with a different approach, or skip.
Hallucinated findings: An LLM sub-agent claims to have found a vulnerability that does not exist. The orchestrator cannot prevent this, but it can require verification for high-severity findings before promoting them to the confirmed world model.
Tool execution failures: A sub-agent's tool call fails (network timeout, service crash, permission denied). The sub-agent should handle this internally, but if it propagates, the orchestrator treats it as a task failure.
Budget exhaustion: An agent runs out of budget mid-task. This is a normal termination, not a failure. The orchestrator collects partial findings and moves on.

Error handling strategy:

async def run_agent_with_handling(self, agent, task, budget):
try: result = await asyncio.wait_for( agent.execute(task, self.world_model, budget), timeout=task.timeout, ) return result except BudgetExhausted: return PartialResult(task, reason="budget_exhausted") except asyncio.TimeoutError: return PartialResult(task, reason="timeout") except AgentCrash as e: self.log_error(task, e) if task.retry_count < MAX_RETRIES: return await self.run_agent_with_handling( agent, task.retry(), budget.remaining() ) return FailedResult(task, error="max_retries_exceeded") except Exception as e: self.log_error(task, e) return FailedResult(task, error=str(e))

Key principle: Failures are expected and handled gracefully. A single agent failure should never crash the orchestrator or halt the engagement. The orchestrator logs the failure, collects any partial results, and continues with other work.

4.2 State Management

The orchestrator maintains several categories of state:

Engagement state: Current phase, targets, objectives, engagement budget. This is the orchestrator's primary context.
Agent state: Which agents are active, their current task, budget consumed, findings produced. This is operational state for dispatch decisions.
World model state: All findings, chains, and evidence accumulated by sub-agents. This is the system's shared knowledge base (Whitepaper 08).
Plan state: If using Planner-Executor, the current plan DAG with task statuses.

State persistence: All state should be persistable to durable storage. If the orchestrator crashes, it should be able to resume from a checkpoint — re-loading engagement state, world model, and agent status. Without persistence, a crash means restarting the engagement from scratch, which is unacceptable for multi-hour engagements.

State consistency: The world model must be consistent. If two agents write findings concurrently, the writes must be serialized (via a lock or a transactional store). Conflicting findings (e.g., "host X is exploitable" vs. "host X is not exploitable") must be detected and resolved (see Conflict Resolution below).

State size management: The orchestrator's own context (engagement state + agent status) must be bounded. If the orchestrator tracks every finding individually, its context grows unboundedly. Instead, it should track summaries: "enumeration phase: 45/50 hosts complete, 120 findings, 3 high-severity."

4.3 Observability

The orchestrator is the most important component to observe. Its telemetry should include:

Decision log: Every dispatch, phase transition, trigger fire, and escalation, with rationale. This is the orchestrator's "flight recorder."
Agent status board: Which agents are active, their current task, budget consumed, findings produced, and how long they have been running.
Queue depth: How many pending work items are waiting for an available agent. A growing queue indicates a bottleneck.
Progress metrics: Findings per hour, chains advanced per hour, phase completion percentage. A healthy system shows steady progress; a stuck system shows zero.
Pattern log: Which orchestration pattern is active and when it switches. This helps debug unexpected behavior.
Budget burn rate: Token spend per hour, projected time to budget exhaustion, per-agent spend distribution.
Error log: All agent failures, retries, and escalations with enough context to diagnose the root cause.

Observability implementation:

class OrchestratorTelemetry:
def __init__(self): self.decision_log = [] self.agent_status = {} self.metrics = ProgressMetrics() self.pattern_log = []
def log_decision(self, decision_type, rationale, **context): entry = { "timestamp": time.now(), "type": decision_type,  # "dispatch", "phase_transition", "trigger", "escalation" "rationale": rationale, "context": context, } self.decision_log.append(entry) self.emit(entry)  # stream to external observability system
def update_agent_status(self, agent_id, status): self.agent_status[agent_id] = { "status": status,  # "running", "complete", "failed", "budget_exhausted" "task": status.task, "budget_consumed": status.budget_consumed, "findings_produced": status.findings_count, "duration": status.elapsed, }
def log_pattern_switch(self, from_pattern, to_pattern, reason): self.pattern_log.append({ "timestamp": time.now(), "from": from_pattern, "to": to_pattern, "reason": reason, })

This telemetry is the operator's window into the autonomous system. Without it, the operator cannot tell whether the system is working well, working poorly, or not working at all. In production, the telemetry should be streamed to an external dashboard (Grafana, custom UI) so the operator can monitor in real time.

4.4 Budget Enforcement

Every pattern must enforce a budget: per-agent, per-phase, and per-engagement limits on token spend and call count. Without budget enforcement, a single runaway agent — stuck in a retry loop, or exploring an endless enumeration cascade — can consume the entire engagement budget.

Implementation: The orchestrator maintains a budget ledger. Before dispatching a sub-agent, it allocates a budget. The sub-agent checks its remaining budget before each LLM call and terminates gracefully (returning partial findings) when the budget is exhausted. The orchestrator treats budget-exhaustion as a normal termination, not a failure.

class BudgetLedger:
def __init__(self, total_budget): self.total = total_budget self.spent = 0 self.allocations = {}
def allocate(self, task_id, cap): available = self.total - self.spent allocation = min(cap, available) self.allocations[task_id] = BudgetAllocation( cap=allocation, spent=0, task_id=task_id ) return self.allocations[task_id]
def charge(self, task_id, tokens, calls): alloc = self.allocations[task_id] alloc.spent += tokens self.spent += tokens if alloc.spent >= alloc.cap: raise BudgetExhausted(task_id)
def remaining(self): return self.total - self.spent

4.5 Deadlock and Livelock Detection

Multi-agent systems can deadlock or livelock. The orchestrator must detect both:

Deadlock: No agent has made a state change in N seconds. All agents are waiting for dependencies that will never complete. Resolution: cancel one waiting agent and re-dispatch with a different approach, or force a phase transition.
Livelock: Agents are making calls but no new findings are being produced. Agents repeatedly restart work without making progress. Resolution: escalate to human, force a phase transition, or increase the diversity of approaches (switch to Competitive).

Detection requires a progress metric: number of new findings per unit time, number of state changes, number of successful tool executions. A healthy system shows steady progress; a stuck system shows zero.

class ProgressMonitor:
def __init__(self, world_model, window_seconds=300): self.world_model = world_model self.window = window_seconds self.finding_history = []
def check(self): recent = self.world_model.findings_since(time.now() - self.window) new_findings = len(recent) state_changes = self.world_model.state_changes_since( time.now() - self.window )
if new_findings == 0 and state_changes == 0: return ProgressStatus.DEADLOCKED elif new_findings == 0 and state_changes > 0: return ProgressStatus.LIVELOCKED else: return ProgressStatus.HEALTHY

4.6 Causality and Provenance

When multiple agents produce findings, the orchestrator must track which agent produced which finding and why. This is essential for:

Debugging: Why did the agent claim host X is vulnerable? Because agent B found service Y, agent C matched it to CVE Z, and agent A attempted the exploit.
Reporting: The report must cite the evidence chain for each finding. "Host X is vulnerable to CVE-Z because service Y was identified on port 443 (evidence: nmap output, agent B, timestamp T1), CVE-Z matches service Y (evidence: CVE database, agent C, timestamp T2), and exploitation was confirmed (evidence: exploit output, agent A, timestamp T3)."
Quality improvement: Analyzing which agents produce high-quality findings vs. false positives lets the system improve over time.

Every finding in the world model carries provenance: producing agent, producing tool, input context reference, and timestamp. Finding chains link related findings (e.g., "this credential finding enabled this lateral-movement finding").

4.7 Conflict Resolution

When two agents produce contradictory findings (e.g., Agent A says "host X is exploitable," Agent B says "host X is not exploitable"), the orchestrator must resolve the conflict:

Check provenance: Which finding has stronger evidence? A successful exploit > a failed exploit attempt > a theoretical vulnerability match.
Re-verify: Dispatch a verification agent to reproduce the finding independently.
Escalate: If the conflict cannot be resolved automatically, flag it for human review.

Conflicts should be rare in a well-designed system (agents work on independent targets), but they do occur when agents' scopes overlap (e.g., two agents both assess the same web app from different angles). The Hybrid pattern's priority queue can help: if Agent A's "exploitable" finding has higher priority (triggered a successful exploit), it preempts Agent B's "not exploitable" finding.

5. The Custom Orchestrator (wp03) as a Hybrid Pattern

The Custom Orchestrator introduced in Whitepaper 03 (wp03) is a concrete implementation of the Hybrid pattern. This section maps its architecture to the pattern catalog and explains why Hybrid was chosen.

Why Hybrid

The Custom Orchestrator was designed for general-purpose autonomous pentesting engagements. These engagements exhibit all the characteristics that make Hybrid the right choice:

Multiple targets → requires parallelism (Parallel Fan-Out within phases).
Phase structure → requires predictable flow (Sequential Pipeline backbone).
High-value findings → requires responsive follow-up (Event-Driven triggers).
Budget constraints → requires adaptive strategy (pattern-switching based on budget).

A pure Sequential Pipeline would be too slow. A pure Parallel Fan-Out would miss cross-target chains. A pure Event-Driven system would be unpredictable and hard to terminate. The Hybrid pattern combines the strengths of each while mitigating their weaknesses.

Architecture Mapping

The Custom Orchestrator's architecture maps to the Hybrid pattern as follows:

Pipeline backbone: The orchestrator defines phases (recon, enumeration, vulnerability analysis, exploitation, lateral movement, reporting) with explicit exit criteria. The pipeline advances when criteria are met, providing the predictable overall structure.

Parallel dispatch within phases: Within each phase, the orchestrator identifies work units (hosts, services, web apps) and dispatches sub-agents concurrently, up to a configurable concurrency limit. This is the Parallel Fan-Out pattern applied within a phase.

Event-driven triggers across phases: The orchestrator registers triggers on high-value finding types:

credential_found → dispatch lateral-movement agent (cross-phase, high priority)
admin_access_found → dispatch privilege-escalation agent (cross-phase, high priority)
sensitive_data_found → dispatch exfiltration-assessment agent (cross-phase, medium priority)
new_host_discovered → dispatch enumeration agent (within-phase, normal priority)

Triggered tasks are enqueued with HIGH priority and preempt normal pipeline tasks. This is the Event-Driven pattern layered on top of the pipeline.

Priority queue dispatch: The orchestrator maintains a priority queue of work items. Pipeline phase tasks are NORMAL priority; triggered tasks are HIGH priority. The orchestrator dispatches the highest-priority available task to the next free sub-agent slot. This naturally handles preemption (triggered tasks jump the queue) and parallelism (multiple sub-agents pull from the queue).

Blackboard substrate: The world model (Whitepaper 08) serves as the blackboard. All agents read from and write to it. The orchestrator does not relay findings between agents — it lets the shared world model do that. Triggers fire on world-model changes, not on direct agent-to-orchestrator reports.

Adaptive Pattern Switching

The Custom Orchestrator can switch its primary strategy based on engagement state:

Early engagement (budget充足): Hybrid with aggressive parallelism. Many concurrent agents, triggers enabled, pipeline advancing steadily.
Mid engagement (budget moderate): Hybrid with reduced concurrency. Fewer concurrent agents, triggers limited to high-severity findings only, pipeline advancing more deliberately.
Late engagement (budget low): Switch to Sequential Pipeline for remaining work. No new parallel dispatch, no new triggers. Complete remaining phases serially with the remaining budget.
Critical objective identified: Switch to Competitive for that objective. Dispatch 2–3 agents with different strategies to maximize success probability.

This adaptive switching is controlled by a strategy policy function that evaluates engagement state (budget remaining, findings produced, phase progress, time elapsed) and selects the appropriate pattern configuration.

Concrete Implementation Sketch

class CustomOrchestrator(HybridOrchestrator):
"""The wp03 Custom Orchestrator — a Hybrid pattern implementation."""
def __init__(self, config): super().__init__( world_model=WorldModel(config.world_model_config), agent_factory=AgentFactory(config.agent_configs), budget_ledger=BudgetLedger(config.total_budget), triggers=self.build_triggers(config.triggers), ) self.strategy_policy = StrategyPolicy(config.strategy_config) self.current_strategy = "hybrid_aggressive"
def build_triggers(self, trigger_configs): triggers = [] for tc in trigger_configs: triggers.append(Trigger( condition=tc.condition, agent_type=tc.agent_type, priority=tc.priority, budget_cap=tc.budget_cap, cooldown=tc.cooldown, concurrency_limit=tc.concurrency_limit, )) return triggers
async def run(self, engagement): while not self.termination_condition(): # Adaptive strategy selection new_strategy = self.strategy_policy.evaluate( budget=self.budget.remaining(), phase=self.current_phase, findings=self.world_model.finding_count(), time_elapsed=engagement.elapsed(), ) if new_strategy != self.current_strategy: self.switch_strategy(new_strategy)
# Dispatch from priority queue await self.dispatch_available_tasks()
# Check phase advancement if self.exit_criteria_met(self.current_phase): self.advance_phase()
# Check progress (deadlock/livelock detection) status = self.progress_monitor.check() if status == ProgressStatus.DEADLOCKED: self.handle_deadlock() elif status == ProgressStatus.LIVELOCKED: self.handle_livelock()
await asyncio.sleep(POLL_INTERVAL)
def switch_strategy(self, new_strategy): self.telemetry.log_pattern_switch( from_pattern=self.current_strategy, to_pattern=new_strategy, reason=self.strategy_policy.last_reason, ) self.current_strategy = new_strategy self.apply_strategy_config(new_strategy)
def apply_strategy_config(self, strategy): config = STRATEGY_CONFIGS[strategy] self.max_concurrency = config.max_concurrency self.trigger_filter = config.trigger_filter  # which triggers are active self.budget.per_agent_cap = config.per_agent_cap

This implementation shows how the Hybrid pattern is not a single rigid design but a configurable framework. The Custom Orchestrator uses the same priority-queue, phase-based, trigger-driven core throughout the engagement, but tunes its parameters (concurrency, trigger filter, budget caps) based on the current strategy. This is pattern-aware orchestration in practice.

6. Anti-Patterns

Several orchestration designs are temptingly simple but consistently fail in practice. Each anti-pattern is presented with its failure mode and a fix.

Anti-Pattern 1: Mega-Orchestrator

Description: The orchestrator does all reasoning itself, using sub-agents only as "tool runners" with no autonomy. The orchestrator reads tool output, decides what tool to run next, and sends commands to sub-agents who merely execute.

Failure mode: The orchestrator's context grows unboundedly as it ingests every tool output from every sub-agent. It becomes the bottleneck — all work serializes through it. The system collapses to a single-agent system with the context and cost problems of single-agent design (Whitepapers 06, 07), but with additional coordination overhead. Budget is consumed by the orchestrator's reasoning, not by sub-agent work.

Fix: Sub-agents should make decisions, not just execute. The orchestrator delegates objectives ("enumerate this host and identify vulnerabilities"), not instructions ("run nmap with these flags, then run nikto, then check for CVE-2023-XXXX"). The orchestrator's context contains summaries and findings, not tool output.

Anti-Pattern 2: Free-For-All Blackboard

Description: All agents read and write to a shared blackboard with no coordination layer. Each agent monitors the blackboard and acts autonomously on what it finds. There is no orchestrator deciding what to work on next.

Failure mode: Agents duplicate work (two agents both enumerate the same host because both saw it on the blackboard and neither knew the other was working on it). Agents miss obvious attack chains because each agent sees only its piece of the puzzle — no agent has the global view needed to connect a credential from agent A with a service from agent B. The system never terminates because no agent is responsible for deciding when the engagement is done.

Fix: The blackboard is a substrate, not an orchestrator. Add an explicit coordination layer — any of Patterns 1–6 or the Hybrid pattern. The coordinator assigns work units, prevents duplication, runs synthesis, and decides termination. Agents write to the blackboard but do not self-dispatch; the coordinator dispatches based on blackboard state and engagement objectives.

Anti-Pattern 3: Perpetual Re-Planning

Description: The planner re-plans after every task completion, producing a new plan each time. The system spends more time planning than executing, and the plan is never stable enough to execute fully.

Failure mode: Each re-plan is a large LLM call that consumes budget and time. By the time the new plan is ready, several tasks have completed and the plan is already stale, triggering another re-plan. The system enters a planning loop where execution never gains momentum. In the worst case, the planner produces a different plan each time even when the engagement state hasn't meaningfully changed, leading to inconsistent and unpredictable behavior.

Fix: Re-plan only on failure or on significant assumption violation — not on every task completion. Define explicit re-plan triggers: (1) a task failed in a way that invalidates the plan's assumptions, (2) a finding was discovered that opens a significantly better path, (3) the budget is running low and the plan needs to be prioritized. Between re-plans, execute the plan as-is. If the plan becomes stale, let executors adapt within their delegated scope rather than triggering a global re-plan.

Anti-Pattern 4: Budgetless Parallelism

Description: The orchestrator dispatches unlimited parallel agents without budget enforcement. Each agent is free to make as many calls as it wants, for as long as it wants.

Failure mode: One cascade — a subnet that expands into thousands of findings, or an event-driven trigger that fires repeatedly — spawns thousands of agent calls, exhausting the engagement budget in the first hour. The system has no mechanism to stop runaway agents. In the worst case, the API rate limit is hit and all agents fail simultaneously, producing a burst of errors and no useful findings. The operator discovers that 90% of the budget was spent on a single subnet that turned out to be uninteresting, while the critical target was never assessed.

Fix: Per-agent budget caps and a global concurrency limit. No agent is dispatched without an allocated budget. The budget is checked before every LLM call, and the agent terminates gracefully when exhausted. The global concurrency limit prevents resource exhaustion (API rate limits, memory, CPU). The orchestrator monitors aggregate budget burn rate and reduces concurrency or triggers strategy-switching if burn rate is too high.

7. Conclusion

Orchestration is the art of coordinating multiple cognitive agents toward a shared objective. In autonomous pentesting, the objective is complex (find and exploit vulnerabilities across a network), the agents are imperfect (LLMs that hallucinate, misjudge, and miss), and the environment is adversarial (the target does not cooperate). The orchestrator's job is to produce a coherent, thorough assessment despite these imperfections.

The eight patterns in this paper are not theoretical constructs — they are the patterns that work. They have been validated in systems that run real engagements against real infrastructure. The choice among them is not aesthetic; it is driven by the engagement's structure, the system's budget, and the acceptable level of risk.

The four anti-patterns are the patterns that do not work, despite their apparent simplicity. They are worth naming explicitly because every team that builds an autonomous pentesting system is tempted by at least one of them. The Mega-Orchestrator seems like the "obvious" design — let the smartest model do the thinking. The Free-For-All Blackboard seems elegant — let agents self-organize. Perpetual Re-Planning seems thorough — always have the best plan. Budgetless Parallelism seems powerful — throw everything at the problem. Each fails for specific, predictable reasons, and each has a known fix.

The most important design principle is this: the orchestrator should be adaptive, not fixed. Engagements vary. A fixed orchestration strategy will be optimal for some and disastrous for others. The mature system recognizes this and adapts its strategy as the engagement unfolds — faster on simple targets, more deliberate on complex ones, aggressive when a critical finding demands immediate follow-up, and conservative when the budget is running low.

The Hybrid pattern, as implemented by the Custom Orchestrator (wp03), is the recommended default for production systems. It combines the predictability of the pipeline, the speed of parallel dispatch, and the responsiveness of event-driven triggers in a single adaptive framework. Its priority-queue dispatch, configurable concurrency, and strategy-switching policy provide the flexibility to handle the full range of engagement conditions — from a single-host assessment to a multi-domain enterprise engagement — without changing the core architecture.

In the final paper of this series, we look forward: where is AI pentesting headed, and what does the trajectory mean for defenders, attackers, and the practice of security assessment itself?

This whitepaper is part of a series on autonomous penetration testing with AI agents. For the full series index and related work, see the accompanying documentation.