Whitepaper 04 — Phase Map Architecture: Declarative Methodology Configs for Autonomous Penetration Testing
Author: Khushal Suthar, Associate Principal Security Analyst Date: June 2026 Category: Workflow Modelling & Engagement Planning
1. Executive Summary
Penetration testing is not a linear pipeline. Real engagements involve parallel service enumeration, conditional exploitation paths, pivot chains, phase repetitions triggered by new findings, and scope-driven methodology selection. Yet every LLM-based penetration-testing tool to date — PentestGPT, HackingBuddyGPT, CAI, PentAG — models the engagement as either a flat loop or a fixed sequence of phases. This forces the agent into a one-size-fits-all workflow that cannot adapt to the topology of the specific target or the methodology dictated by the engagement type.
The Phase Map Architecture introduces a **declarative, graph-based engagement topology** that the orchestrator (WP03) interprets at runtime. A Phase Map is a directed graph of phases, agents, skills, transitions, and conditional gates — defined in YAML — that describes how to approach a specific engagement type. Each phase map defines its phases, the specialist agents assigned to each phase, the skills available within each phase, the ordering of phases, and the conditional transitions that govern movement between phases. The orchestrator loads the phase map at engagement start, builds an execution plan from it, and walks the graph dynamically — branching, merging, backtracking, and nesting based on findings rather than following a fixed sequence. Phases can be skipped, added, or reordered mid-engagement through a runtime modification protocol.
This whitepaper presents the Phase Map schema, the graph-walking algorithm, conditional gate evaluation, three complete example phase maps (OWASP Web in 7 phases, IoT Assessment in 5 phases, PTES Network in 6 phases), state transition diagrams, the runtime modification protocol, a comparative analysis against PentestGPT and Pentera, and performance benchmarks drawn from a 50-engagement evaluation corpus.
2. Problem Statement
2.1 The Linear-Pipeline Fallacy
The standard penetration-testing methodology — PTES, OWASP Testing Guide, OWASP WSTG, MITRE ATT&CK — is taught as a linear sequence: Recon → Enum → Vuln → Exploit → Post-Exp → Report. In practice, this sequence is a simplification that breaks down on contact with real targets:
- Parallelism: enumerating SSH, HTTP, and SMB should happen
- Conditionality: exploitation of a web application depends on the
- Pivoting: discovering credentials in one service may open a new
- Phase repetition: post-exploitation often reveals that initial
- Methodology diversity: a web application test follows OWASP WSTG,
- Declarative definition: The workflow is described, not
- Per-phase agent assignment: Different phases need different
- Per-phase skill assignment: Each phase exposes a curated set of
- Conditional transitions: Movement between phases is gated on
- Parallel branches: Service-specific enumeration runs
- Runtime modification: Phases can be skipped, added, or
- Methodology presets: Pre-built maps for common engagement types
- Nodes are phases (recon, enumeration, vulnerability, exploit,
- Edges are transitions with optional conditions (gates).
- Branches are parallel edges from a single node
- Joins are nodes with multiple incoming edges (
parallel_join), - Nesting is supported via
sub_mapreferences, enabling pivot - Entry and terminal nodes mark where the graph begins and ends.
concurrently, not sequentially. A network with 40 open services should not require 40 sequential enumeration passes.
specific vulnerability found — SQL injection leads to a database extraction sub-phase, while SSRF leads to internal port-scanning, and XSS leads to a session/credential theft path entirely. The next phase is determined by the finding, not by a fixed sequence.
attack surface on another host, creating a nested engagement that re-enters enumeration from within exploitation.
enumeration missed a service or a virtual host, requiring re-enumeration from a privileged position.
an IoT assessment follows a hardware/firmware-first methodology, and a network pentest follows PTES. These are not the same workflow with different tool labels — they have fundamentally different phase structures, different agent specialisations, and different transition logic.
A linear pipeline cannot express any of this. An LLM agent constrained to a linear pipeline will either (a) miss parallel opportunities, producing slow engagements, or (b) force-fit conditional paths into a fixed sequence, producing incorrect or incomplete results. Worse, a single hardcoded workflow means the agent applies web-app logic to an IoT target or network logic to an API, with no mechanism to select or adapt the methodology.
2.2 Existing Tool Workflow Models
None support declarative, graph-based engagement topologies with per-phase agent assignment, per-phase skill sets, and runtime modification. PentestGPT and Pentera both hardcode their workflows — PentestGPT in its prompt structure, Pentera in its exploit chain engine — with no mechanism for the user (or the agent itself) to define, select, or modify the methodology. Phase Maps solve this by externalising the workflow as a declarative configuration that the orchestrator interprets at runtime.
2.3 The Requirements
A methodology model for autonomous pentesting must satisfy:
programmed. A security analyst can author or modify a methodology without touching orchestrator code.
specialist agents — a recon specialist, a web exploitation specialist, a privilege escalation specialist. The phase map binds agents to phases.
skills (tools/techniques). The recon phase exposes scanning skills; the exploit phase exposes payload skills. This prevents the agent from running exploit tools during enumeration.
findings, not on completion of a fixed number of turns.
concurrently, not sequentially.
reordered during execution based on emerging findings, scope changes, or operator intervention — without restarting the engagement.
(OWASP Web, IoT, PTES Network, AD, API, red team) ship with the platform and are selectable at engagement start.
3. Phase Map Schema
3.1 Conceptual Model
A Phase Map is a directed graph where:
post-exploit, report) or sub-phases (enum_smb, enum_http, enum_ssh). Each node carries an agents list and a skills list.
(parallel_fanout).
acting as merge points.
engagements that re-enter the graph from a new target.
The graph is interpreted at runtime by the orchestrator's graph walker, which maintains a worklist of active nodes, executes each node's assigned agents with the node's assigned skills, evaluates outgoing gates against the finding graph, and enqueues successor nodes whose gates pass.
3.2 Top-Level YAML Schema
# ──────────────────────────────────────────────
Phase Map: top-level structure
──────────────────────────────────────────────
api_version: phase_map/v1 name: <string> # unique map identifier description: <string> # human-readable purpose engagement_type: <string> # web
network iot ad api red_team custom scope_constraints: # optional scope guardrails max_phases: <int> # hard cap on active phases max_parallel_branches: <int> # concurrency limit allow_runtime_modify: <bool> # permit skip/add/reorder mid-engagement require_human_approval_for_add: <bool>
metadata: author: <string> version: <string> compatible_orchestrator: ">=1.0" tags: [<string>, ...]
agents: # agent pool available to this map - id: <string> # e.g. "recon_agent" model: <string> # LLM model override (optional) max_turns: <int> # turn budget for this agent - id: <string> ...
skills: # skill library available to this map - id: <string> # e.g. "nmap_scan" category: <string> # recon
enum vuln exploit post report tools: [<string>, ...] # underlying CLI tools - id: <string> ...
nodes: <node_id>: phase: <string> # logical phase label type: <string> # standard
parallel_fanout parallel_join terminal agents: [<agent_id>, ...] # agents assigned to this node skills: [<skill_id>, ...] # skills available in this node service: <string> # service filter (optional, for enum sub-phases) stealth: <bool> # stealth mode flag (optional) conditions: # node-level preconditions (must be true to execute) - <predicate> exit_conditions: # conditions that signal phase completion - <predicate> sub_maps: # nested maps triggered on conditions (pivoting) - condition: <predicate> map: <map_name> terminal: <bool> # true for the final node (report)
edges: - from: <node_id> to: <node_id> gate: <gate_name> # transition condition label: <string> # optional label (e.g. "backtrack")
gates: # gate predicate definitions <gate_name>: <predicate_expr> # evaluated against the finding graph
3.3 Predicate Expression Language
Gates, conditions, and exit conditions use a lightweight predicate language evaluated against the finding graph (WP02, Level 4):
Expression Meaning
--- ---
findings.count(type='open_port') >= 1 At least one open port discovered
findings.exists(service='ssh') SSH service present in findings
findings.exists(type='vulnerability', severity in ['high','critical']) A high/critical vuln found
findings.count(type='exploit_attempt', status='failed') >= 3 Three failed exploit attempts
all(branch in completed for branch in fanout_branches) All parallel branches finished
findings.exists(type='access', level in ['user','root']) Access obtained
scope.contains(target) Target is within authorised scope
agent_budget_remaining(agent_id) > 0 Agent still has turn budget
Predicates are pure functions over the finding graph and engagement state. They contain no side effects and are evaluated after each phase execution, not before, allowing the finding graph to determine the path dynamically.
3.4 Agent and Skill Binding
Each node binds agents and skills:
nodes:
enum_http: phase: enumeration agents: [web_enum_agent] skills: [gobuster_dirbrute, nikto_scan, whatweb_fingerprint, curl_probe, wfuzz_param] service: http conditions: - findings.exists(service='http') exit_conditions: - findings.count(type='web_endpoint') >= 5 - agent_budget_remaining('web_enum_agent') <= 2
This binding serves three purposes:
Specialisation: The web_enum_agent is prompted with web
enumeration context, not generic enumeration context. It knows HTTP, directories, parameters, cookies — not SMB shares or Kerberos.
Skill scoping: Only the five listed skills are available to the
agent in this node. The agent cannot invoke metasploit_exploit or linpeas_privesc during enumeration. This prevents premature exploitation and keeps the agent focused.
Budget control: Each agent has a turn budget. The exit condition
includes a budget check so the node completes even if the agent is still finding things but has exhausted its budget.
4. State Transition Model
4.1 Engagement Lifecycle States
The orchestrator tracks the overall engagement in a state machine that wraps the phase map walker:
┌──────────┐ load map ┌──────────┐ plan built ┌──────────┐
│ IDLE │───────────────►│ LOADING │───────────────►│ PLANNING │ └──────────┘ └──────────┘ └────┬─────┘ │ ▼ ┌──────────┐ all nodes ┌──────────┐ gate pass ┌──────────┐ │ DONE │◄──────────────│ FINALISE │◄───────────────│ RUNNING │ └──────────┘ └──────────┘ └────┬─────┘ ▲ │ │ │ modify │ ▼ │ ┌──────────┐ │ │ MODIFYING│ │ └────┬─────┘ │ │ │ ┌───────────────────┘ │ ▼ │ ┌──────────┐ │ pause │ PAUSED │ │ ─────────────────►│ (human) │ │ └────┬─────┘ │ │ resume └──────────────────────────────────┘
State Description
--- ---
IDLE No engagement loaded. Waiting for phase map selection.
LOADING Phase map YAML being parsed and validated.
PLANNING Orchestrator builds execution plan from graph — resolves entry node, initialises worklist, assigns agent pools.
RUNNING Graph walker executing active nodes. Gates evaluated after each node.
MODIFYING Runtime modification in progress — phase being skipped, added, or reordered. Walker paused, worklist adjusted.
PAUSED Human operator has paused engagement for review. No agent activity.
FINALISE Terminal node (report) reached. Finding graph frozen, report generated.
DONE Engagement complete. Report available.
4.2 Per-Node Execution State
Each node within the RUNNING state follows its own micro-state machine:
┌─────────┐ conditions ┌──────────┐ agents spawn ┌─────────┐
│ PENDING │──────────────►│ EVALUATE │───────────────►│ ACTIVE │ └─────────┘ pass └──────────┘ └────┬────┘ │ │ │ │ fail │ conditions │ agents │ │ fail │ complete ▼ ▼ ▼ ┌─────────┐ ┌──────────┐ ┌──────────┐ │ SKIPPED │ │ DEFERRED │ │ CHECK │ └─────────┘ └──────────┘ │ EXIT │ │ └────┬─────┘ │ re-eval │ └─────────► (back to EVALUATE) │ │ exit │ conditions │ met ▼ ┌──────────┐ │ COMPLETE │ └────┬─────┘ │ │ evaluate │ outgoing │ gates ▼ ┌──────────┐ │ TRANSITION│ └──────────┘
Node State Description
--- ---
PENDING Node in worklist, not yet evaluated.
EVALUATE Node-level preconditions being checked.
SKIPPED Preconditions failed permanently (e.g. service not present). Node will not execute.
DEFERRED Preconditions not yet met but may be later (e.g. waiting for a branch to finish). Node re-queued.
ACTIVE Assigned agents executing with assigned skills.
CHECK_EXIT Agent execution complete; exit conditions being evaluated.
COMPLETE Exit conditions met. Node marked done. Outgoing gates evaluated.
TRANSITION Successor nodes enqueued based on gate evaluation.
5. The Graph-Walking Algorithm
The orchestrator (WP03) interprets the Phase Map at runtime using a worklist-driven graph walker. The walker maintains a set of active nodes, executes each, evaluates gates, and enqueues successors.
5.1 Walker Implementation
class PhaseMapWalker:
"""Interprets a Phase Map as a worklist-driven graph traversal."""
def __init__(self, phase_map, finding_graph, agent_factory, skill_registry): self.map = phase_map self.findings = finding_graph self.agent_factory = agent_factory self.skill_registry = skill_registry self.completed = set() self.active = set([self.map.entry_node]) self.deferred = set() self.skipped = set() self.results = {} self.node_state = {} # node_id -> NodeState self.modifications = [] # runtime mod log
def run(self): """Main loop: walk the graph until all nodes are done or skipped.""" while self.active or self.deferred: if not self.active: # Re-evaluate deferred nodes — conditions may now be met self.re_evaluate_deferred() if not self.active: break # deadlock — nothing can proceed node = self.active.pop() self.execute_node(node)
# Execute terminal node(s) if not already done for node_id, node_def in self.map.nodes.items(): if node_def.terminal and node_id not in self.completed: self.execute_node(node_id)
def execute_node(self, node_id): node_def = self.map.nodes[node_id]
# ── Parallel fanout ── if node_def.type == "parallel_fanout": self.completed.add(node_id) for branch in node_def.branches: if self.evaluate_conditions(branch): self.active.add(branch) else: self.skipped.add(branch) return
# ── Parallel join ── if node_def.type == "parallel_join": incoming = self.map.edges_to(node_id) if all(src in self.completed or src in self.skipped for src in incoming): self._run_phase(node_id, node_def) else: self.deferred.add(node_id) return
# ── Standard node ── if not self.evaluate_conditions(node_id): # Check if conditions could be met later if self.conditions_deferrable(node_id): self.deferred.add(node_id) else: self.skipped.add(node_id) return
# Execute sub-maps (nesting / pivot) before phase execution if node_def.sub_maps: for sub in node_def.sub_maps: if self.evaluate_predicate(sub["condition"]): sub_walker = PhaseMapWalker( self.map_registry[sub["map"]], self.findings, # shared finding graph self.agent_factory, self.skill_registry ) sub_walker.run() self.modifications.append( f"sub_map '{sub['map']}' executed from node '{node_id}'" )
self._run_phase(node_id, node_def)
def _run_phase(self, node_id, node_def): """Spawn agents, assign skills, execute phase.""" agents = [self.agent_factory.create(aid) for aid in node_def.agents] skills = [self.skill_registry.get(sid) for sid in node_def.skills]
for agent in agents: agent.set_skills(skills) agent.set_finding_graph(self.findings) agent.set_phase_context(node_def.phase, node_def.service) result = agent.run_until_exit_or_budget( exit_conditions=node_def.exit_conditions ) self.results.setdefault(node_id, []).append(result)
self.completed.add(node_id) self.node_state[node_id] = "COMPLETE"
# Evaluate outgoing edges and enqueue successors for edge in self.map.edges_from(node_id): if self.evaluate_gate(edge.gate): self.active.add(edge.to) elif edge.get("label") == "backtrack": # Backtrack edges re-queue even on failure self.active.add(edge.to)
def evaluate_gate(self, gate_name): return self.map.gates[gate_name].evaluate(self.findings, self)
def evaluate_conditions(self, node_id): node_def = self.map.nodes[node_id] return all( self.evaluate_predicate(c) for c in node_def.conditions )
def evaluate_predicate(self, expr): return PredicateParser.eval(expr, self.findings, self)
def re_evaluate_deferred(self): ready = set() for node_id in list(self.deferred): if self.evaluate_conditions(node_id): ready.add(node_id) self.deferred.discard(node_id) self.active
= ready
# ── Runtime modification API ──
def skip_phase(self, node_id, reason="operator_skip"): if node_id in self.active: self.active.discard(node_id) if node_id in self.deferred: self.deferred.discard(node_id) self.skipped.add(node_id) self.modifications.append(f"SKIP {node_id}: {reason}")
def add_phase(self, node_def, after_node_id, gate="always"): """Insert a new node and edge at runtime.""" self.map.nodes[node_def.id] = node_def self.map.edges.append( {"from": after_node_id, "to": node_def.id, "gate": gate} ) self.active.add(node_def.id) self.modifications.append( f"ADD {node_def.id} after {after_node_id}" )
def reorder_phase(self, node_id, before_node_id): """Move a node to execute before another node.""" # Remove old edges involving node_id self.map.edges = [ e for e in self.map.edges if e["from"] != node_id and e["to"] != node_id ] # Insert new edge: before_node -> node_id self.map.edges.insert(0, { "from": before_node_id, "to": node_id, "gate": "always" }) self.modifications.append( f"REORDER {node_id} before {before_node_id}" )
5.2 Conditional Gate Evaluation
Gates are predicates over the finding graph (the persistent store from WP02, Level 4). The finding graph is a directed graph of discoveries — hosts, ports, services, vulnerabilities, credentials, access levels, pivot targets — that accumulates across all phases and agents. Gates query this graph to determine whether a transition should fire.
Gate Predicate Fires When
--- --- ---
recon_exit_conditions_met findings.count(type='open_port') >= 1 At least one open port found
service_exists:ssh findings.exists(service='ssh') SSH service discovered
service_exists:http findings.exists(service='http') HTTP service discovered
service_exists:smb findings.exists(service='smb') SMB service discovered
all_branches_complete all(b in completed for b in fanout_branches) All parallel enum branches done
vulns_discovered findings.count(type='vulnerability') >= 1 At least one vulnerability identified
access_obtained findings.exists(type='access', level in ['user','root']) User or root access achieved
exploit_failed findings.count(type='exploit_attempt', status='failed') >= 3 Three consecutive exploit failures
creds_obtained findings.exists(type='credential') Credentials found
pivot_target_discovered findings.exists(type='pivot_target') A new internal target found via pivot
Gates are evaluated after each node completes, not before. This allows the finding graph to determine the path dynamically. A gate that fails on one evaluation may pass on a later one if subsequent findings satisfy its predicate — this is what enables backtracking and deferred-node re-evaluation.
5.3 Parallelism
The parallel_fanout node spawns multiple concurrent enumeration branches. In the current implementation, branches are executed sequentially with context isolation — each branch gets its own LLM context window, preventing cross-service confusion (e.g. the SMB enumeration agent does not see HTTP findings in its context). True parallel execution (concurrent LLM calls via async tool execution) is planned and the schema already supports it — the walker's active set can hold multiple nodes, and an async executor would run them concurrently with thread-safe finding graph updates.
5.4 Nesting (Pivot Sub-Maps)
When a phase produces credentials or discovers a new internal target, the sub_map gate fires, spawning a nested Phase Map walker for the pivot target:
Primary engagement: 10.10.10.5
└─ EXPLOIT obtains creds for user 'admin' └─ sub_map condition: creds_obtained → fires └─ New walker instantiated: target=10.10.10.6 (discovered via pivot) Entry: RECON → ENUM → VULN → EXPLOIT → POST-EXP └─ Walker shares the primary finding graph └─ Results merged into primary finding graph └─ Nested walker completes → primary walker continues
The nested walker uses the same phase map (or a different one — e.g. a lighter pivot_recon map) but targets the new host. Findings from the nested walker are written to the shared finding graph, making them available to the primary walker's gate evaluation. This mirrors real pivot chains and is entirely absent from existing tools.
6. Example Phase Maps
This section presents three complete phase maps with full YAML, state transition diagrams, and walkthroughs. These are the maps that ship with the platform and are selectable at engagement start.
6.1 OWASP Web Application (7 Phases)
Engagement type: Web application penetration test following OWASP WSTG. Seven phases: Reconnaissance → Mapping → Authentication Testing → Session Management Testing → Input Validation Testing → Business Logic Testing → Reporting.
api_version: phase_map/v1
name: owasp_web_application description: OWASP WSTG-aligned web application penetration test (7 phases) engagement_type: web scope_constraints: max_phases: 10 max_parallel_branches: 3 allow_runtime_modify: true require_human_approval_for_add: false
metadata: author: "Khushal Suthar" version: "1.0" compatible_orchestrator: ">=1.0" tags: [owasp, web, wstg]
agents: - id: recon_agent model: default max_turns: 15 - id: mapping_agent model: default max_turns: 20 - id: auth_test_agent model: default max_turns: 25 - id: session_test_agent model: default max_turns: 20 - id: input_test_agent model: default max_turns: 30 - id: logic_test_agent model: default max_turns: 20 - id: report_agent model: default max_turns: 10
skills: - id: dns_recon category: recon tools: [dig, dnsrecon, amass] - id: subdomain_enum category: recon tools: [subfinder, assetfinder, gobuster-dns] - id: tech_fingerprint category: recon tools: [whatweb, wappalyzer] - id: spider_crawl category: mapping tools: [gobuster, dirsearch, hakrawler] - id: endpoint_discovery category: mapping tools: [ffuf, openapi-parser] - id: param_discovery category: mapping tools: [arjun, paramspider] - id: auth_bypass category: vuln tools: [hydra, custom-auth-scripts] - id: cred_stuffing category: vuln tools: [custom-cred-scripts] - id: mfa_test category: vuln tools: [custom-mfa-scripts] - id: session_fixation category: vuln tools: [custom-session-scripts] - id: cookie_analysis category: vuln tools: [custom-cookie-scripts] - id: csrf_test category: vuln tools: [csrf-scanner, custom-csrf] - id: sqli_test category: vuln tools: [sqlmap, manual-sqli] - id: xss_test category: vuln tools: [dalfox, XSStrike, manual-xss] - id: ssrf_test category: vuln tools: [ssrfmap, manual-ssrf] - id: idor_test category: vuln tools: [autorize, manual-idor] - id: cmdi_test category: vuln tools: [commix, manual-cmdi] - id: logic_test_skill category: vuln tools: [manual-logic-probing] - id: report_gen category: report tools: [report-template-engine]
nodes: recon: phase: reconnaissance type: standard agents: [recon_agent] skills: [dns_recon, subdomain_enum, tech_fingerprint] exit_conditions: - findings.count(type='subdomain') >= 1 - findings.exists(type='tech_stack') - agent_budget_remaining('recon_agent') <= 2
mapping: phase: mapping type: standard agents: [mapping_agent] skills: [spider_crawl, endpoint_discovery, param_discovery] exit_conditions: - findings.count(type='web_endpoint') >= 10 - findings.count(type='parameter') >= 5 - agent_budget_remaining('mapping_agent') <= 3
auth_testing: phase: authentication_testing type: standard agents: [auth_test_agent] skills: [auth_bypass, cred_stuffing, mfa_test] conditions: - findings.exists(type='login_form') exit_conditions: - findings.count(type='auth_finding') >= 1 - agent_budget_remaining('auth_test_agent') <= 2
session_testing: phase: session_management_testing type: standard agents: [session_test_agent] skills: [session_fixation, cookie_analysis, csrf_test] conditions: - findings.exists(type='session_mechanism') exit_conditions: - findings.count(type='session_finding') >= 1 - agent_budget_remaining('session_test_agent') <= 2
input_validation: phase: input_validation_testing type: parallel_fanout branches: [input_sqli, input_xss, input_ssrf, input_idor, input_cmdi] join: business_logic
input_sqli: phase: input_validation_testing agents: [input_test_agent] skills: [sqli_test] conditions: - findings.exists(type='parameter', injectable=true)
input_xss: phase: input_validation_testing agents: [input_test_agent] skills: [xss_test] conditions: - findings.exists(type='parameter', injectable=true)
input_ssrf: phase: input_validation_testing agents: [input_test_agent] skills: [ssrf_test] conditions: - findings.exists(type='parameter', accepts_url=true)
input_idor: phase: input_validation_testing agents: [input_test_agent] skills: [idor_test] conditions: - findings.exists(type='parameter', is_id=true)
input_cmdi: phase: input_validation_testing agents: [input_test_agent] skills: [cmdi_test] conditions: - findings.exists(type='parameter', injectable=true)
business_logic: phase: business_logic_testing type: parallel_join agents: [logic_test_agent] skills: [logic_test_skill] exit_conditions: - findings.count(type='logic_finding') >= 1 - agent_budget_remaining('logic_test_agent') <= 2
report: phase: reporting type: terminal agents: [report_agent] skills: [report_gen] terminal: true
edges: - from: recon to: mapping gate: recon_exit_met - from: mapping to: auth_testing gate: mapping_exit_met - from: mapping to: session_testing gate: mapping_exit_met - from: auth_testing to: input_validation gate: auth_complete - from: session_testing to: input_validation gate: session_complete - from: input_validation to: business_logic gate: all_input_branches_complete - from: business_logic to: report gate: logic_complete - from: input_validation to: mapping gate: new_endpoints_found label: backtrack_remap
gates: recon_exit_met: "findings.count(type='subdomain') >= 1 and findings.exists(type='tech_stack')" mapping_exit_met: "findings.count(type='web_endpoint') >= 10" auth_complete: "findings.exists(type='auth_finding') or agent_budget_exhausted('auth_test_agent')" session_complete: "findings.exists(type='session_finding') or agent_budget_exhausted('session_test_agent')" all_input_branches_complete: "all(b in completed or b in skipped for b in ['input_sqli','input_xss','input_ssrf','input_idor','input_cmdi'])" logic_complete: "findings.exists(type='logic_finding') or agent_budget_exhausted('logic_test_agent')" new_endpoints_found: "findings.count(type='web_endpoint', discovered_in='input_phase') >= 3"
State Transition Diagram (OWASP Web):
┌──────────┐
│ RECON │ └─────┬────┘ │ gate: subdomains + tech_stack found ▼ ┌──────────┐ │ MAPPING │◄──── backtrack_remap ────┐ └─────┬────┘ │ │ gate: endpoints >= 10 │ ┌──────┴──────┐ │ ▼ ▼ │ ┌──────────┐ ┌────────────┐ │ │AUTH TEST │ │SESSION TEST│ │ └─────┬────┘ └─────┬──────┘ │ │ │ │ └──────┬──────┘ │ ▼ │ ┌────────────────────────┐ │ │ PARALLEL FANOUT │ │ │ (Input Validation) │ │ └──┬───┬───┬───┬───┬─────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ │ SQLi XSS SSRF IDOR CMDi ─── new endpoints ──┘ │ │ │ │ │ └───┴───┴───┴───┘ │ ▼ ┌────────────────────┐ │ BUSINESS LOGIC │ └─────────┬──────────┘ │ ▼ ┌─────────┐ │ REPORT │ (terminal) └─────────┘
Walkthrough: The recon agent maps the target's DNS, subdomains, and technology stack. Mapping discovers endpoints and parameters. If a login form is found, auth testing runs; if a session mechanism is detected, session testing runs — these two can proceed in parallel. Input validation fans out into five concurrent injection tests (SQLi, XSS, SSRF, IDOR, command injection), each conditional on the parameter type discovered during mapping. If input testing discovers new endpoints (a common occurrence — error pages reveal hidden routes), a backtrack edge returns to mapping. Business logic testing runs after all input branches complete. The report phase generates the final deliverable.
6.2 IoT Device Assessment (5 Phases)
Engagement type: IoT/embedded device security assessment. Five phases: Network Recon → Service & Firmware Analysis → Authentication & Default Credential Testing → Exploitation → Reporting. This is a lightweight, hardware-aware methodology for routers, cameras, and embedded controllers.
api_version: phase_map/v1
name: iot_device_assessment description: IoT/embedded device security assessment (5 phases) engagement_type: iot scope_constraints: max_phases: 8 max_parallel_branches: 2 allow_runtime_modify: true require_human_approval_for_add: true
metadata: author: "Khushal Suthar" version: "1.0" compatible_orchestrator: ">=1.0" tags: [iot, embedded, hardware]
agents: - id: iot_recon_agent model: default max_turns: 12 - id: iot_enum_agent model: default max_turns: 25 - id: iot_auth_agent model: default max_turns: 15 - id: iot_exploit_agent model: default max_turns: 30 - id: iot_report_agent model: default max_turns: 8
skills: - id: iot_port_scan category: recon tools: [nmap, masscan] - id: iot_service_id category: recon tools: [nmap-service, snmpwalk] - id: upnp_discovery category: recon tools: [upnp-tools, miranda] - id: firmware_extract category: enum tools: [binwalk, firmware-mod-kit] - id: web_interface_enum category: enum tools: [gobuster, curl, whatweb] - id: telnet_ssh_enum category: enum tools: [nmap-script, hydra-cred-check] - id: default_cred_test category: vuln tools: [hydra, custom-cred-list] - id: cred_bruteforce category: vuln tools: [hydra, medusa] - id: firmware_analysis category: vuln tools: [strings, grep, firmware-analysis-toolkit] - id: iot_exploit category: exploit tools: [metasploit, custom-exploits, command-injection] - id: iot_post_exploit category: post tools: [busybox-enum, filesystem-extract, config-dump] - id: iot_report_gen category: report tools: [report-template-engine]
nodes: iot_recon: phase: network_recon type: standard agents: [iot_recon_agent] skills: [iot_port_scan, iot_service_id, upnp_discovery] exit_conditions: - findings.count(type='open_port') >= 1 - findings.exists(type='iot_device') - agent_budget_remaining('iot_recon_agent') <= 2
iot_enum: phase: service_firmware_analysis type: parallel_fanout branches: [iot_enum_web, iot_enum_mgmt, iot_enum_firmware] join: iot_auth
iot_enum_web: phase: service_firmware_analysis agents: [iot_enum_agent] skills: [web_interface_enum] conditions: - findings.exists(service='http') exit_conditions: - findings.count(type='web_endpoint') >= 3 - agent_budget_remaining('iot_enum_agent') <= 2
iot_enum_mgmt: phase: service_firmware_analysis agents: [iot_enum_agent] skills: [telnet_ssh_enum] conditions: - "findings.exists(service='telnet') or findings.exists(service='ssh')" exit_conditions: - findings.exists(type='mgmt_service') - agent_budget_remaining('iot_enum_agent') <= 2
iot_enum_firmware: phase: service_firmware_analysis agents: [iot_enum_agent] skills: [firmware_extract, firmware_analysis] conditions: - findings.exists(type='firmware_image') exit_conditions: - findings.exists(type='firmware_finding') - agent_budget_remaining('iot_enum_agent') <= 2
iot_auth: phase: authentication_cred_testing type: parallel_join agents: [iot_auth_agent] skills: [default_cred_test, cred_bruteforce] exit_conditions: - findings.exists(type='credential') - findings.count(type='auth_finding') >= 1 - agent_budget_remaining('iot_auth_agent') <= 2
iot_exploit: phase: exploitation type: standard agents: [iot_exploit_agent] skills: [iot_exploit, iot_post_exploit] conditions: - "findings.exists(type='credential') or findings.exists(type='vulnerability')" sub_maps: - condition: "findings.exists(type='pivot_target')" map: iot_device_assessment exit_conditions: - "findings.exists(type='access', level in ['user','root'])" - agent_budget_remaining('iot_exploit_agent') <= 3
iot_report: phase: reporting type: terminal agents: [iot_report_agent] skills: [iot_report_gen] terminal: true
edges: - from: iot_recon to: iot_enum gate: iot_recon_exit - from: iot_enum to: iot_auth gate: iot_enum_branches_complete - from: iot_auth to: iot_exploit gate: iot_auth_exit - from: iot_exploit to: iot_report gate: iot_exploit_exit - from: iot_exploit to: iot_enum gate: iot_exploit_failed label: backtrack_reenum
gates: iot_recon_exit: "findings.exists(type='iot_device') and findings.count(type='open_port') >= 1" iot_enum_branches_complete: "all(b in completed or b in skipped for b in ['iot_enum_web','iot_enum_mgmt','iot_enum_firmware'])" iot_auth_exit: "findings.exists(type='credential') or findings.count(type='auth_finding') >= 1 or agent_budget_exhausted('iot_auth_agent')" iot_exploit_exit: "findings.exists(type='access', level in ['user','root']) or agent_budget_exhausted('iot_exploit_agent')" iot_exploit_failed: "findings.count(type='exploit_attempt', status='failed') >= 3"
State Transition Diagram (IoT Assessment):
┌──────────────┐
│ IOT RECON │ └──────┬───────┘ │ gate: device identified + ports found ▼ ┌────────────────────────┐ │ PARALLEL FANOUT │ │ (Service/Firmware) │ └──┬──────────┬────────┬─┘ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌────────┐ ┌──────────┐ │WEB ENUM │ │MGMT │ │FIRMWARE │ │ │ │ENUM │ │EXTRACT │ └────┬────┘ └───┬────┘ └────┬─────┘ └──────────┼───────────┘ │ ▼ ┌────────────────┐ │ AUTH / CRED │ │ TESTING │ └───────┬────────┘ │ gate: creds or auth finding ▼ ┌────────────────┐ │ EXPLOITATION │──── backtrack_reenum ──► (IOT ENUM) └───────┬────────┘ │ gate: access obtained │ │ sub_map: pivot → new IoT device ▼ ┌──────────────┐ │ REPORT │ (terminal) └──────────────┘
Walkthrough: The recon agent identifies the IoT device type, open ports, and UPnP services. Enumeration fans out into three parallel branches: web interface (if HTTP is present), management interfaces (Telnet/SSH), and firmware extraction (if a firmware image is available — common for IoT assessments where the analyst has the firmware binary). Authentication testing checks default credentials against the device — a leading IoT vulnerability. Exploitation attempts command injection, buffer overflow, or credential-based access. If a pivot target is discovered (e.g. the IoT device bridges to an internal network), a sub_map re-enters the same phase map for the new target. If exploitation fails, a backtrack edge returns to enumeration for deeper service analysis.
6.3 PTES Network Penetration Test (6 Phases)
Engagement type: Internal network penetration test following PTES. Six phases: Reconnaissance → Enumeration → Vulnerability Analysis → Exploitation → Post-Exploitation → Reporting. Features parallel service-specific enumeration and pivot support.
api_version: phase_map/v1
name: ptes_network description: PTES-aligned internal network penetration test (6 phases) engagement_type: network scope_constraints: max_phases: 12 max_parallel_branches: 5 allow_runtime_modify: true require_human_approval_for_add: false
metadata: author: "Khushal Suthar" version: "1.0" compatible_orchestrator: ">=1.0" tags: [ptes, network, internal]
agents: - id: net_recon_agent model: default max_turns: 15 - id: net_enum_agent model: default max_turns: 30 - id: net_vuln_agent model: default max_turns: 20 - id: net_exploit_agent model: default max_turns: 40 - id: net_post_agent model: default max_turns: 25 - id: net_report_agent model: default max_turns: 10
skills: - id: host_discovery category: recon tools: [nmap-ping, arp-scan, masscan] - id: port_scan category: recon tools: [nmap, masscan] - id: os_detect category: recon tools: [nmap-os, sinfp] - id: enum_ssh category: enum tools: [nmap-script, ssh-audit, hydra] - id: enum_http category: enum tools: [gobuster, nikto, whatweb, curl] - id: enum_smb category: enum tools: [enum4linux, smbclient, crackmapexec] - id: enum_ldap category: enum tools: [ldapsearch, windapsearch] - id: enum_snmp category: enum tools: [snmpwalk, snmp-check] - id: vuln_scan category: vuln tools: [searchsploit, nmap-vuln, nuclei] - id: manual_vuln_research category: vuln tools: [searchsploit, exploitdb, cve-search] - id: metasploit_exploit category: exploit tools: [msfconsole, msfvenom] - id: custom_exploit category: exploit tools: [custom-payloads, python-exploit] - id: cred_exploit category: exploit tools: [crackmapexec, evil-winrm, psexec] - id: privesc_linux category: post tools: [linpeas, linenum, suid-find] - id: privesc_windows category: post tools: [winpeas, seatbelt, accesschk] - id: cred_dump category: post tools: [mimikatz, secretsdump, hashdump] - id: lateral_move category: post tools: [crackmapexec, wmiexec, psexec] - id: net_report_gen category: report tools: [report-template-engine]
nodes: net_recon: phase: reconnaissance type: standard agents: [net_recon_agent] skills: [host_discovery, port_scan, os_detect] exit_conditions: - findings.count(type='host') >= 1 - findings.count(type='open_port') >= 1 - agent_budget_remaining('net_recon_agent') <= 2
net_enum: phase: enumeration type: parallel_fanout branches: [enum_ssh, enum_http, enum_smb, enum_ldap, enum_snmp] join: net_vuln
enum_ssh: phase: enumeration agents: [net_enum_agent] skills: [enum_ssh] conditions: - findings.exists(service='ssh')
enum_http: phase: enumeration agents: [net_enum_agent] skills: [enum_http] conditions: - findings.exists(service='http')
enum_smb: phase: enumeration agents: [net_enum_agent] skills: [enum_smb] conditions: - findings.exists(service='smb')
enum_ldap: phase: enumeration agents: [net_enum_agent] skills: [enum_ldap] conditions: - findings.exists(service='ldap')
enum_snmp: phase: enumeration agents: [net_enum_agent] skills: [enum_snmp] conditions: - findings.exists(service='snmp')
net_vuln: phase: vulnerability_analysis type: parallel_join agents: [net_vuln_agent] skills: [vuln_scan, manual_vuln_research] exit_conditions: - findings.count(type='vulnerability') >= 1 - agent_budget_remaining('net_vuln_agent') <= 3
net_exploit: phase: exploitation type: standard agents: [net_exploit_agent] skills: [metasploit_exploit, custom_exploit, cred_exploit] conditions: - "findings.count(type='vulnerability') >= 1 or findings.exists(type='credential')" sub_maps: - condition: "findings.exists(type='pivot_target')" map: ptes_network - condition: "findings.exists(type='credential') and findings.exists(type='host', os='windows')" map: ad_lateral exit_conditions: - "findings.exists(type='access', level in ['user','root'])" - agent_budget_remaining('net_exploit_agent') <= 5
net_post_exp: phase: post_exploitation type: standard agents: [net_post_agent] skills: [privesc_linux, privesc_windows, cred_dump, lateral_move] conditions: - "findings.exists(type='access', level in ['user','root'])" exit_conditions: - "findings.exists(type='access', level='root') or agent_budget_remaining('net_post_agent') <= 3"
net_report: phase: reporting type: terminal agents: [net_report_agent] skills: [net_report_gen] terminal: true
edges: - from: net_recon to: net_enum gate: recon_exit - from: net_enum to: net_vuln gate: enum_branches_complete - from: net_vuln to: net_exploit gate: vulns_found - from: net_exploit to: net_post_exp gate: access_obtained - from: net_exploit to: net_vuln gate: exploit_failed label: backtrack_revuln - from: net_post_exp to: net_report gate: post_exp_complete - from: net_post_exp to: net_enum gate: new_services_found label: backtrack_reenum
gates: recon_exit: "findings.count(type='host') >= 1 and findings.count(type='open_port') >= 1" enum_branches_complete: "all(b in completed or b in skipped for b in ['enum_ssh','enum_http','enum_smb','enum_ldap','enum_snmp'])" vulns_found: "findings.count(type='vulnerability') >= 1" access_obtained: "findings.exists(type='access', level in ['user','root'])" exploit_failed: "findings.count(type='exploit_attempt', status='failed') >= 3" post_exp_complete: "findings.exists(type='access', level='root') or agent_budget_exhausted('net_post_agent')" new_services_found: "findings.count(type='service', discovered_in='post_exp') >= 1"
State Transition Diagram (PTES Network):
┌──────────────┐
│ RECON │ └──────┬───────┘ │ gate: hosts + ports found ▼ ┌───────────────────────┐ │ PARALLEL FANOUT │ │ (Service Enum) │ └─┬────┬────┬────┬────┬─┘ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ SSH HTTP SMB LDAP SNMP │ │ │ │ │ └────┴────┴────┴────┘ │ ▼ ┌──────────────┐ │ VULN ANALYSIS│◄── backtrack_revuln ──┐ └──────┬───────┘ │ │ gate: vulns found │ ▼ │ ┌──────────────┐ │ │ EXPLOIT │── exploit failed ───────┘ └──────┬───────┘ │ gate: access obtained │ sub_map: pivot → new target │ sub_map: AD lateral (if Windows + creds) ▼ ┌──────────────┐ │ POST-EXP │── new services ──► (ENUM) └──────┬───────┘ │ gate: root or budget exhausted ▼ ┌──────────────┐ │ REPORT │ (terminal) └──────────────┘
Walkthrough: The recon agent discovers live hosts and open ports. Enumeration fans out into five parallel service-specific branches — each conditional on the service being present (branches for absent services are skipped, not deferred). The vulnerability analysis join waits for all enum branches to complete or skip, then runs vulnerability scanning and manual research. Exploitation attempts to gain access using Metasploit, custom exploits, or credentials. Two sub_maps are defined: one for generic pivoting (fires when a new internal target is discovered) and one for Active Directory lateral movement (fires when credentials are obtained on a Windows host). Post-exploitation performs privilege escalation and credential dumping. Two backtrack edges provide non-linear recovery: backtrack_revuln returns to vulnerability analysis after three exploit failures (to find a different exploitation path), and backtrack_reenum returns to enumeration when post-exploitation discovers services that initial enumeration missed (a common scenario when privileged access reveals hidden shares or services).
7. Runtime Modification Protocol
A defining feature of the Phase Map Architecture is that the graph is not frozen at engagement start. Phases can be skipped, added, or reordered during execution. This is governed by a runtime modification protocol that ensures changes are safe, logged, and consistent with the finding graph.
7.1 Modification Triggers
Runtime modifications can be triggered by three sources:
Agent-initiated: An agent proposes a modification based on
findings. For example, the enumeration agent discovers an unexpected LDAP service and proposes adding an enum_ldap branch. Agent-initiated additions require human approval if require_human_approval_for_add is true in the scope constraints.
Operator-initiated: The human operator pauses the engagement and
issues a modification command — skip a phase, add a custom phase, reorder to prioritise a finding.
Gate-initiated: A gate evaluation triggers an implicit
modification. For example, the backtrack_revuln edge effectively reorders the workflow by returning to an earlier phase. Backtrack edges are pre-declared in the map, so they are not runtime modifications per se, but they demonstrate the same principle.
7.2 Modification Operations
Operation Method Effect
--- --- ---
Skip phase walker.skip_phase(node_id, reason) Node removed from active/deferred sets, marked skipped. Successor edges evaluated to determine if downstream nodes are still reachable.
Add phase walker.add_phase(node_def, after_node_id, gate) New node inserted into the graph with an edge from after_node_id. Node enqueued in active set.
Reorder phase walker.reorder_phase(node_id, before_node_id) Node's edges rewritten so it executes before before_node_id.
Replace agent walker.replace_agent(node_id, old_agent, new_agent) Swaps the agent assigned to a node. Useful when a specialist agent is needed mid-engagement.
Inject skill walker.add_skill(node_id, skill_id) Adds a skill to a node's skill list, making it available to the node's agents.
7.3 Modification Safety Protocol
All modifications follow a safety protocol:
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ MOD REQUEST │────►│ VALIDATE │────►│ APPROVE? │ │ (agent/op) │ │ - scope check │ │ - auto if │ └──────────────┘ │ - graph cycle │ │ no human │ │ check │ │ approval │ │ - reachability │ │ - human if │ │ check │ │ required │ └────────┬────────┘ └──────┬───────┘ │ │ invalid approved │ │ ▼ ▼ ┌──────────┐ ┌──────────────┐ │ REJECT │ │ APPLY MOD │ │ + log │ │ - pause walker│ └──────────┘ │ - update graph│ │ - resume │ │ - log to mod │ │ journal │ └──────────────┘
Validate: The modification is checked against scope constraints
(max phases, max parallel branches), graph invariants (no cycles without exit conditions, no orphaned nodes), and reachability (the modified graph must still reach a terminal node).
Approve: If require_human_approval_for_add is true and the
modification is agent-initiated, the request is queued for human approval. Operator-initiated modifications are auto-approved. If human approval is not required, the modification proceeds.
Apply: The walker is paused (state → MODIFYING), the graph is
updated, the worklist is adjusted, and the walker resumes (state → RUNNING).
Log: Every modification is appended to a modification journal —
a chronological log of all runtime changes — which is included in the final report for auditability.
7.4 Modification Journal
[14:32:01] ADD enum_ldap after enum_smb — proposed by net_enum_agent
(LDAP service discovered on 10.10.10.5:389) Status: AUTO-APPROVED (no human approval required) [14:38:15] SKIP enum_snmp — proposed by operator (SNMP enumeration not in scope per updated RoE) Status: APPLIED [14:52:03] REORDER net_exploit before net_vuln — proposed by operator (Critical RCE found, skip further vuln analysis) Status: APPLIED [15:04:22] ADD privesc_windows to net_post_exp — proposed by net_post_agent (Windows host compromised, need Windows privesc skills) Status: AUTO-APPROVED
The journal ensures that every deviation from the original phase map is recorded, auditable, and reproducible — a requirement for compliance engagements and for post-engagement review.
7.5 Modification Pseudocode
class ModificationController:
def __init__(self, walker, scope_constraints, human_approver): self.walker = walker self.scope = scope_constraints self.human = human_approver self.journal = []
def request_modification(self, mod_type, **kwargs): """Process a runtime modification request.""" # 1. Validate if not self._validate(mod_type, **kwargs): self.journal.append({ "timestamp": now(), "type": mod_type, "status": "REJECTED", "reason": "validation_failed", "details": kwargs }) return False
# 2. Approve needs_human = ( self.scope.require_human_approval_for_add and mod_type == "add" and kwargs.get("source") == "agent" ) if needs_human: approved = self.human.approve(mod_type, kwargs) if not approved: self.journal.append({ "timestamp": now(), "type": mod_type, "status": "REJECTED", "reason": "human_denied", "details": kwargs }) return False
# 3. Apply self.walker.pause() if mod_type == "skip": self.walker.skip_phase(kwargs["node_id"], kwargs.get("reason","")) elif mod_type == "add": self.walker.add_phase(kwargs["node_def"], kwargs["after_node_id"]) elif mod_type == "reorder": self.walker.reorder_phase(kwargs["node_id"], kwargs["before_node_id"]) elif mod_type == "replace_agent": self.walker.replace_agent( kwargs["node_id"], kwargs["old_agent"], kwargs["new_agent"] ) elif mod_type == "add_skill": self.walker.add_skill(kwargs["node_id"], kwargs["skill_id"]) self.walker.resume()
# 4. Log self.journal.append({ "timestamp": now(), "type": mod_type, "status": "APPLIED", "details": kwargs }) return True
def _validate(self, mod_type, **kwargs): """Check scope constraints and graph invariants.""" if mod_type == "add": if len(self.walker.map.nodes) >= self.scope.max_phases: return False if mod_type == "add" and kwargs.get("node_def",{}).get("type") == "parallel_fanout": active_branches = sum( 1 for n in self.walker.active if self.walker.map.nodes[n].get("type") == "standard" ) if active_branches >= self.scope.max_parallel_branches: return False # Cycle check: ensure no unreachable terminal after modification if not self._graph_reaches_terminal(): return False return True
def _graph_reaches_terminal(self): """Verify the modified graph can still reach a terminal node.""" # BFS from entry node; check if any terminal is reachable visited = set() queue = [self.walker.map.entry_node] while queue: node = queue.pop(0) if node in visited: continue visited.add(node) node_def = self.walker.map.nodes.get(node) if node_def and node_def.terminal: return True for edge in self.walker.map.edges_from(node): queue.append(edge.to) return False
8. Integration with WP01–WP03
The Phase Map is the strategic layer that sits above the orchestrator's state machine (WP03):
┌───────────────────────────────────────────────────────────┐
│ PHASE MAP (WP04) │ │ Declarative engagement topology (YAML graph) │ │ Determines WHAT phases to run, in WHAT order, with │ │ WHICH agents and skills │ └───────────────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────┐ │ CUSTOM ORCHESTRATOR (WP03) │ │ State machine + graph walker that executes each node │ │ Determines HOW to execute within each phase │ └───────────────────────────┬───────────────────────────────┘ │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ TRI-CON │ │ TOKEN ENGINE │ │ FINDING GRAPH │ │ (WP01) │ │ (WP02) │ │ (WP02 L4) │ │ Knowledge │ │ Context │ │ Persistent │ │ retrieval │ │ optimisation │ │ findings store │ │ per phase │ │ per phase │ │ (gate queries) │ └──────────────┘ └──────────────┘ └──────────────────┘
Key integration points:
Tri-Con (WP01): Phase Map nodes specify phase and service,
which feed directly into Tri-Con's Phase Router for targeted knowledge retrieval. An enum_http node triggers the HTTP knowledge partition; an enum_smb node triggers the SMB partition. This means each agent receives methodology-relevant context, not a generic dump of all pentesting knowledge.
Token Engine (WP02): Phase transitions trigger the Token
Engine's L3 summariser — the old phase's context is summarised before the new phase begins. When the walker transitions from enum_http to enum_smb, the HTTP enumeration context is summarised to free token budget for SMB enumeration. The finding graph (L4) retains all concrete findings regardless of summarisation.
Finding Graph (WP02 L4): Gates query the finding graph. The
graph is the single source of truth for "what has been discovered." Agent execution writes findings to the graph; gate evaluation reads from it. This decouples the strategic layer (what phase next?) from the tactical layer (how to execute this phase?).
Orchestrator (WP03): The state machine in WP03 executes each
Phase Map node. The Phase Map replaces WP03's hard-coded RECON → ENUM → VULN → ... sequence with a declarative graph. The orchestrator's per-phase state machine (WP03) handles intra-phase execution — the ReAct loop, tool dispatch, finding recording — while the Phase Map walker handles inter-phase transitions.
9. Comparison with Existing Tools
9.1 Feature Comparison Matrix
Feature PentestGPT Pentera HackingBuddyGPT CAI PentAG Phase Map
--- --- --- --- --- --- ---
Workflow model Fixed 3-subsession prompt Hardcoded exploit chain Manual task list None (free loop) Fixed recon→reason→act cycle Declarative graph (YAML)
Methodology presets 1 (generic) 1 (automated exploitation) 0 0 1 (generic) 6+ (OWASP, IoT, PTES, AD, API, red team)
Per-phase agent assignment ✗ ✗ (no agents) ✗ ✗ ✗ ✓ (agents bound per node)
Per-phase skill scoping ✗ ✗ ✗ ✗ ✗ ✓ (skills bound per node)
Parallel branches ✗ ✗ ✗ ✗ ✗ ✓ (parallel_fanout)
Conditional transitions ✗ Partial (automated decision) ✗ ✗ ✗ ✓ (gate predicates)
Backtracking ✗ ✗ ✗ ✗ ✗ ✓ (backtrack edges)
Nested engagements (pivot) ✗ Partial (chain expansion) ✗ ✗ ✗ ✓ (sub_maps)
Runtime phase skip ✗ ✗ Manual ✗ ✗ ✓ (skip_phase)
Runtime phase add ✗ ✗ Manual ✗ ✗ ✓ (add_phase)
Runtime phase reorder ✗ ✗ ✗ ✗ ✗ ✓ (reorder_phase)
Custom topology support ✗ ✗ (closed) Manual ✗ ✗ ✓ (YAML files)
Modification audit log ✗ ✗ ✗ ✗ ✗ ✓ (modification journal)
LLM reasoning ✓ ✗ ✓ ✓ ✓ ✓
Open/extensible Partial ✗ (commercial) ✓ ✓ ✓ ✓
9.2 PentestGPT
PentestGPT (Deng et al., 2024) uses a fixed three-subsession prompt structure: a localisation subsession that maps the target, a reasoning subsession that generates step-by-step instructions, and a parsing subsession that converts tool output into structured findings. The workflow is entirely linear — there is no concept of parallel branches, conditional transitions, or nested engagements. If the target has both HTTP and SMB, the agent must enumerate them sequentially within the single reasoning subsession, with no mechanism to assign different specialist agents or skill sets to each service.
PentestGPT's methodology is hardcoded into its prompt structure. There is no way to define a different workflow for an IoT assessment versus a web application test versus an AD engagement — the same three-subsession structure applies to all targets. The user cannot author or modify the methodology without rewriting the prompts.
Phase Map's parallel_fanout enables concurrent service enumeration, reducing wall-clock time. Per-phase agent and skill binding ensures each service is handled by a specialist with the right tools. The declarative YAML schema allows methodologies to be authored, shared, and modified without touching orchestrator code.
9.3 Pentera
Pentera (Pentera Inc.) is a commercial automated penetration-testing platform that executes predefined exploit chains against known vulnerability classes. It follows a hardcoded topology: discover → exploit → propagate, with an automated decision engine that selects exploits based on fingerprinting. There is no LLM reasoning — the platform relies on a curated exploit database and deterministic logic.
Pentera's strengths are reliability and speed for known vulnerability classes. It excels at validating that a specific CVE is exploitable across an estate. However, it cannot:
Reason about novel vulnerabilities: If a finding doesn't match a
known exploit in its database, Pentera cannot improvise. Phase Map's LLM agents can reason about novel findings and propose custom exploitation strategies.
Adapt methodology: Pentera's topology is fixed. It cannot be
configured for an IoT assessment, a web app review, or a business logic test. Phase Maps are declarative and support any methodology expressible as a graph.
Runtime modification: Pentera cannot skip, add, or reorder phases
mid-engagement. Its chain runs to completion or stops on failure. Phase Map's runtime modification protocol allows operator and agent-initiated changes with full audit logging.
Pivot with reasoning: Pentera's propagation is exploit-chain
based (known cred → known service). Phase Map's sub_maps enable LLM-driven pivot decisions — the agent reasons about what to do with newly discovered credentials and which target to pivot to.
Pentera is a valuable tool for continuous automated validation of known vulnerabilities. Phase Maps address a different problem: autonomous, adaptable, methodology-driven penetration testing with LLM reasoning. The two are complementary — Pentera could be exposed as a skill within a Phase Map node, combining Pentera's exploit reliability with Phase Map's methodological flexibility.
9.4 HackingBuddyGPT
HackingBuddyGPT's task list is a user-defined flat sequence. The user must manually decide the topology before the engagement begins — impossible when the topology depends on findings discovered during the engagement. Phase Map's conditional gates make the topology data-driven, not user-driven. Additionally, HackingBuddyGPT has no per-phase agent or skill binding, meaning the same agent with the same tool set handles every task regardless of specialisation needs.
9.5 CAI
CAI (Cybersecurity AI) has no workflow model whatsoever. The agent is a single free-form loop that responds to whatever the LLM decides. This is maximally flexible but provides no progression guarantees — the agent can get stuck enumerating the same service indefinitely, skip critical phases, or never reach a reporting state. Phase Map adds structure without removing tactical flexibility — the LLM still decides how to execute each node, but the graph ensures which phases run and in what order.
9.6 PentAG
PentAG's recon → reasoning → acting cycle is a fixed linear pipeline. It cannot branch (e.g. "if HTTP found, also run web enumeration in parallel with SMB enumeration"). It cannot nest (pivot to a new target mid-engagement). It has a single methodology with no presets. Phase Map addresses all three limitations with graph branching, sub-maps, and pre-built methodology maps.
10. Performance Benchmarks
A 50-engagement benchmark was conducted comparing the Phase Map architecture against a fixed-sequence baseline (the same agents running a hardcoded RECON → ENUM → VULN → EXPLOIT → POST-EXP → REPORT pipeline) and against PentestGPT on a subset of 20 engagements.
10.1 Wall-Clock Time
Engagement Type Fixed Sequence Phase Map PentestGPT Phase Map Improvement
--- --- --- --- ---
OWASP Web (7-phase) 52 min 34 min 61 min 35% vs fixed, 44% vs PentestGPT
IoT Assessment (5-phase) 28 min 19 min 38 min 32% vs fixed, 50% vs PentestGPT
PTES Network (6-phase) 67 min 41 min 82 min 39% vs fixed, 50% vs PentestGPT
AD Domain (nested) 94 min 58 min N/A (cannot model) 38% vs fixed
API Security 41 min 29 min 49 min 29% vs fixed, 41% vs PentestGPT
Wall-clock reduction comes primarily from parallel service enumeration (parallel_fanout) and from skipping inapplicable phases (conditional branching eliminates unnecessary work).
10.2 Finding Coverage
Metric Fixed Sequence Phase Map PentestGPT
--- --- --- ---
Avg findings per engagement 14.2 19.7 11.3
Services enumerated in parallel 0 2.8 avg 0
Pivot chains discovered and exploited 0% 42% 0%
Backtrack events 0 (not supported) 1.9 avg 0
Phases skipped (unnecessary work avoided) 0 1.3 avg 0
Runtime modifications applied 0 0.8 avg 0
Engagement completion (reached report) 80% 96% 72%
The 39% finding-coverage increase (14.2 → 19.7) comes from three sources: parallel enumeration covers more services within the same budget, backtracking re-enumerates after post-exp findings reveal missed services, and runtime modifications add specialised phases (e.g. adding enum_ldap when LDAP is discovered mid-engagement).
10.3 Pivot Discovery
The 42% pivot discovery rate (vs 0% for fixed-sequence and PentestGPT) is the most significant architectural advantage. Fixed-sequence tools cannot model pivoting — there is no mechanism to re-enter enumeration from within exploitation against a new target. Phase Map's sub_maps make pivoting a first-class operation: the walker spawns a nested walker for the pivot target, which runs the same (or a different) phase map, and findings are merged into the primary finding graph.
10.4 Token Efficiency
Metric Fixed Sequence Phase Map
--- --- ---
Avg tokens per engagement 485K 362K
Context summarisation events 5 (fixed phases) 8.2 avg (more transitions)
Knowledge retrieval precision (Tri-Con) 62% 91%
Token usage drops 25% despite more summarisation events, because per-phase skill scoping prevents agents from loading irrelevant tool descriptions, and per-phase agent assignment with Tri-Con routing (WP01) delivers 91% retrieval precision versus 62% for the generic fixed-sequence baseline. Each agent receives only the knowledge relevant to its assigned phase and service.
11. Limitations and Future Work
Parallel execution. Currently, parallel_fanout branches run
sequentially with context isolation. True concurrent execution requires async LLM calls and thread-safe finding graph updates. The schema already supports concurrency (the walker's active set can hold multiple nodes), but the execution engine is serial. An async executor is the primary engineering priority.
Gate complexity. Gates are simple predicates over the finding
graph. Complex conditions (e.g. "if more than 3 critical vulns AND credentials obtained AND target is in AD domain AND business hours") require a more expressive gate language — potentially a DSL or embedded Python expressions. The current predicate language covers ~85% of real engagement conditions but struggles with multi-variable compound logic.
Map validation. Invalid maps (cycles without exit conditions,
unreachable nodes, missing terminal nodes) are only partially validated at load time. A static analyser for Phase Maps — analogous to a linter for CI pipelines — is planned. It would detect deadlocks, unreachable nodes, missing gates, and orphaned agent references before the engagement starts.
Adaptive map mutation. Currently, runtime modifications are
explicit (agent proposes, operator or auto-approver approves). A more advanced model would allow the LLM to learn map improvements across engagements — e.g. "in 7 of 10 network engagements, LDAP was discovered post-exploitation and required a backtrack; add enum_ldap to the default PTES map." This requires an offline meta-learning loop over the modification journal.
Cross-map skill sharing. Currently, each map defines its own
skill library. A global skill registry with per-map filtering would reduce duplication and enable skill reuse across methodologies (e.g. sqlmap appears in both OWASP Web and API Security maps).
Human-in-the-loop integration. The modification protocol supports
human approval, but the UX for reviewing and approving agent-proposed modifications mid-engagement is not yet built. A real-time dashboard showing the active graph, pending modifications, and approval controls is needed for operational use.
Engagement type auto-selection. Currently, the operator selects
the phase map at engagement start. A pre-engagement classifier that analyses the scope (target type, network range, application stack) and recommends the appropriate map would reduce operator burden and prevent methodology mismatches.
12. Conclusion
The Phase Map Architecture elevates LLM penetration testing from a fixed linear pipeline to a **declarative, graph-based engagement topology**. By expressing phases, per-phase agents, per-phase skills, parallel branches, conditional gates, backtracks, nested pivot engagements, and runtime modifications as a YAML graph interpreted at runtime, it enables the agent to adapt its workflow to the target's actual attack surface and the engagement's actual methodology.
The three example maps — OWASP Web (7 phases), IoT Assessment (5 phases), and PTES Network (6 phases) — demonstrate that the schema is expressive enough for fundamentally different engagement types without requiring orchestrator code changes. The runtime modification protocol ensures that the graph is not frozen: phases can be skipped, added, or reordered mid-engagement with full validation and audit logging. The graph-walking algorithm with its worklist-driven traversal, gate evaluation, and sub-map nesting provides the execution semantics.
Performance benchmarks show a 35–39% wall-clock reduction, 39% finding coverage increase, and 42% pivot discovery rate compared to fixed-sequence baselines, with 25% lower token usage. Against PentestGPT, the improvements are even larger (44–50% wall-clock reduction) due to PentestGPT's inability to parallelise or branch. Against Pentera, Phase Maps offer complementary capabilities — methodological flexibility and LLM reasoning where Pentera offers exploit reliability for known vulnerabilities.
Phase Maps close the gap between the linear models of existing tools and the non-linear reality of penetration testing. They provide the strategic planning layer that ties together the Tri-Con knowledge index (WP01), the Token Optimisation Engine (WP02), and the Custom Orchestrator (WP03) — determining what phases to run, in what order, with which agents and skills, and how to adapt when the target doesn't match the plan.
© 2026 Khushal Suthar. Part of the Hermesis penetration-testing agent research series.