WP05 — Skill-Based Platform: A Never-Changing Core with a Shared Skill Library
> Thesis: The platform's core binary never ships a new release. Every module — every scanner, every fuzzer, every reporter, every agent — is assembled at runtime from skills pulled out of a shared, versioned Skill Library. Adding support for a new technology is no longer an engineering project that touches the core; it is a documentation project that drops a new skill package into the library. The result is a security platform whose capability surface grows continuously while its attack surface for regression stays frozen at zero.
This whitepaper specifies the architecture, schema, loader, runtime composition, versioning, marketplace, and safety model of a Skill-Based Platform. It is written to be implementable: every section contains enough concrete detail (YAML schemas, sequence diagrams rendered as ASCII, code-shaped pseudocode, threat models) that a small team could begin construction within a sprint. It also situates the design against two prior art systems — Legion's plugin framework and CAI's modular agents — and explains where the Skill-Based Platform deliberately diverges.
Table of Contents
1. Motivation: Why the Core Must Never Change
Every security platform of meaningful age carries a scar pattern: the core was once small and correct, then feature requests arrived, then the core grew, then the core became the thing everyone is afraid to touch. A refactor that should have taken a day now takes a quarter because the core is simultaneously the scheduler, the transport, the plugin host, the policy engine, the report renderer, and the catalog of every technology the platform has ever been asked to look at. Releases become coordination problems. A regression in the HTTP/2 parser delays the release that also fixes a memory leak in the MQTT fuzzer, because both live in the same binary and ship on the same train.
The Skill-Based Platform inverts this. The core is a deliberately tiny, deliberately boring program. It does exactly four things:
It does not know what HTTP is. It does not know what MQTT is. It does not know what a Kubernetes API server is. It knows how to read a skill manifest, validate it, materialize the components the manifest describes, wire those components into a directed graph, run that graph under a resource envelope, and emit results. The knowledge of what to do lives entirely in skills. The knowledge of how to do it safely lives in the core. This separation is the single most important property of the system, and every other design decision in this document exists to defend it.
The payoff is mechanical. When a customer asks, "Can you scan our new IoT protocol stack?", the answer is never "we'll put it on the roadmap for Q3." The answer is "we'll have a skill for it by Friday." Adding a technology is no longer a core change; it is a skill package, and a skill package is 1–2 hours of focused work for a domain expert. Zero downtime, zero regression risk to the core, zero coordination with a release train. The skill is written, validated against the schema, pushed to the marketplace, and available to every deployed platform instance within the library's propagation window (seconds for a cloud-hosted library, minutes for an on-prem mirror).
2. What Is a Skill?
A skill is the smallest unit of capability the platform understands. It is a self-contained, declaratively-described package that teaches the platform how to engage with one specific technology, technique, or reporting format. Critically, a skill is not code that runs inside the core. A skill is a manifest (the skill.yaml) plus a set of assets (knowledge documents, command definitions, pattern libraries, parser scripts, report templates). The core reads the manifest, loads the assets, and uses them to compose a module at runtime. The skill itself is inert until the core activates it.
Every skill contains exactly five component slots, each of which is optional but at least one must be present:
/cgi-bin/." nuclei -t cves/2021/CVE-2021-41773.yaml -u {target}" 403 followed by path containing ../ → Path Traversal, High." A skill that only has knowledge is a reference skill — the platform can surface it to operators or feed it to an LLM agent as context, but it won't run commands. A skill that has commands and patterns but no parser will rely on the core's default parser. A skill that has only a report template is a presentation skill that formats findings from other skills. This composability is intentional: skills are Lego bricks, not monoliths.
2.1 Skill Identity
Every skill has a globally unique identity composed of three parts:
- Category (e.g.,
web,iot,cloud) — the broad domain. - Slug (e.g.,
apache-path-traversal) — the specific capability within the category. - Version (semver, e.g.,
1.2.0) — the skill's own version, independent of the core.
The fully qualified ID is category/slug@version, e.g., web/apache-path-traversal@1.2.0. This ID is how skills reference each other (a cloud skill may depends_on a network skill for port discovery), how the marketplace indexes them, and how operators pin specific versions in policy.
2.2 Skill Package Layout
A skill is distributed as a directory or a tarball with a fixed layout:
web/apache-path-traversal/
├── skill.yaml # The manifest (required) ├── knowledge/ │ ├── overview.md # Domain knowledge │ └── references.yaml # CVE list, config defaults ├── commands/ │ └── nuclei-cve.yaml # Command spec ├── patterns/ │ └── traversal.yaml # Pattern definitions ├── parser/ │ └── parse.py # Parser script (sandboxed) └── report/ └── template.j2 # Jinja2 report template
Every component is optional. The manifest declares which components are present and how they relate.
3. Skill Schema (YAML)
The skill.yaml manifest is the contract between a skill author and the platform core. It is validated at load time against a JSON Schema derived from the specification below. A manifest that fails validation is rejected before any of its assets are read — the core never partially loads a skill.
3.1 Top-Level Structure
# skill.yaml — top-level
apiVersion: skill.platform/v1 # Schema version, pinned kind: Skill # Currently only "Skill"; reserved for future kinds
id: category: web slug: apache-path-traversal version: 1.2.0 # SemVer; must match the package version
meta: name: "Apache 2.4.49 Path Traversal (CVE-2021-41773)" description:
Detects and verifies CVE-2021-41773 and CVE-2021-42013 in Apache HTTP Server versions 2.4.49 and 2.4.50. Maps confirmed findings to CVSS 7.5. author: "platform-security-team" license: "Apache-2.0" homepage: "https://marketplace.platform/skills/web/apache-path-traversal" tags: [apache, cve, path-traversal, web, http] min_core_version: "1.0.0" # Minimum core version that can load this skill supported_targets: # What kinds of targets this skill applies to - web-url - ip-address - hostname
Component declarations — each is optional
components: knowledge: documents: - path: knowledge/overview.md title: "CVE-2021-41773 Technical Overview" - path: knowledge/references.yaml title: "CVE References and Defaults" commands: - id: nuclei-scan spec: commands/nuclei-cve.yaml patterns: - id: traversal-detect spec: patterns/traversal.yaml parser: type: python # "python"
"jq" "jsonata" "builtin" entrypoint: parser/parse.py function: parse_nuclei_output report: template: report/template.j2 format: markdown # "markdown" "html" "json" "pdf"
Dependency graph
depends_on: - skill: web/http-reachability@^1.0 optional: true # If present, run first; if absent, skip purpose: "Confirm target is reachable on HTTP/HTTPS before launching nuclei."
Runtime profile — tells the core how to schedule this skill
runtime: executor: container # "container"
"process" "http" "inline" container: image: ghcr.io/platform/nuclei:3.0.4 timeout_seconds: 120 cpu: "0.5" memory: "256Mi" network: egress-only # "none" "egress-only" "full" concurrency: 4 # Max parallel invocations of this skill retry: max_attempts: 2 backoff: exponential
Safety declarations — see Section 9
safety: impact: "non-destructive" # "passive"
"non-destructive" "active" "destructive" requires_confirmation: false # If true, operator must approve before execution scope: - "web-path-traversal-detection" data_handling: stores_raw_output: true stores_credentials: false pii_risk: low
3.2 Command Spec
Each command is declared declaratively so the core can validate, parameterize, and sandbox it without executing arbitrary code:
# commands/nuclei-cve.yaml
id: nuclei-scan type: container # "container"
"process" "http-request" container: image: ghcr.io/platform/nuclei:3.0.4 argv: - "nuclei" - "-json" - "-t" - "cves/2021/CVE-2021-41773.yaml" - "-u" - "{target.url}" env: NUCLEI_RATE_LIMIT: "{policy.rate_limit default(150)}" inputs: - name: target type: object required: true schema: url: { type: string, format: uri } outputs: - name: raw_findings type: json-stream # One JSON object per line - name: exit_code type: integer
The {target.url} and {policy.rate_limit
default(150)} placeholders are resolved by the core's template engine at invocation time. The core never passes raw user input directly into a command — all interpolation goes through a typed, schema-validated input contract.
3.3 Pattern Spec
Patterns identify findings in command output or raw traffic. They are declarative so the core can optimize, cache, and audit them:
# patterns/traversal.yaml
id: traversal-detect matchers: - id: confirmed-traversal where: output.json condition:
item.matched == true and item.template-id contains "CVE-2021-41773" severity: high confidence: confirmed extract: cve: "item.info.reference match('CVE-\\d{4}-\\d+')" matched_url: "item.matched-at" ip: "item.ip" - id: probable-traversal where: output.json condition: item.matched == true and item.template-id contains "path-traversal" severity: medium confidence: probable normalize_to: # Maps to the platform's normalized Finding schema category: web class: path-traversal cwe: "CWE-22"
3.4 Parser Spec
The parser transforms raw command output into the platform's normalized Finding schema. It can be a Python function (sandboxed — see Section 9), a jq expression, a JSONata expression, or the builtin default:
# Declared in skill.yaml under components.parser
parser: type: python entrypoint: parser/parse.py function: parse_nuclei_output input: raw_findings # Which command output to consume output: findings # Which module output to produce
# parser/parse.py — runs in a restricted sandbox
from platform.sdk import Finding, Severity
def parse_nuclei_output(raw_findings: list[dict]) -> list[Finding]: results = [] for item in raw_findings: results.append(Finding( skill_id="web/apache-path-traversal@1.2.0", title=f"Path Traversal: {item.get('template-id')}", severity=Severity.from_string(item.get('info', {}).get('severity', 'info')), target=item.get('matched-at', ''), evidence=item, cve=item.get('info', {}).get('reference', [None])[0], )) return results
3.5 Report Template
The report template renders findings for human consumption. It uses Jinja2 with a restricted filter set:
{# report/template.j2 #}
Apache Path Traversal — {{ findings
length }} finding(s)
{% for f in findings %}
{{ f.title }}
Severity: {{ f.severity upper }}
Target: {{ f.target }}
CVE: {{ f.cve or "N/A" }}
Evidence: {{ f.evidence tojson(indent=2) }}
{% endfor %}
{% if findings
length == 0 %} No path traversal findings detected. Apache target appears patched against CVE-2021-41773 and CVE-2021-42013. {% endif %}
4. Skill Categories
The Skill Library is organized into six top-level categories. These are not arbitrary buckets — each category maps to a distinct engagement model, a distinct target topology, and a distinct safety envelope. The core uses the category to select default runtime profiles, default safety rules, and default report scaffolding.
4.1 Web
Skills that engage web applications, APIs, and HTTP services. Targets are URLs, hostnames, or IP:port pairs. Typical skills: SQL injection detection, XSS detection, path traversal, directory enumeration, JWT analysis, GraphQL introspection, WAF fingerprinting. Commands are usually HTTP requests or containerized scanners (nuclei, ffuf, sqlmap). Safety tends toward non-destructive to active. The Web category is the most populous because the technology surface is the largest and changes the fastest.
4.2 Mobile
Skills that engage mobile applications (APK, IPA) and mobile backend APIs. Targets are binary artifacts or bundle identifiers. Typical skills: APK static analysis (manifest extraction, hardcoded secrets, insecure storage), IPA static analysis, Frida hook recipes, certificate pinning bypass, deep link analysis, backend API discovery from mobile traffic. Commands are typically containerized static analyzers (mobSF, apktool, jadx) or dynamic instrumentation (Frida). Safety is non-destructive for static, active for dynamic. Mobile skills frequently depends_on Web skills for the backend API phase.
4.3 IoT
Skills that engage embedded devices, firmware, and IoT protocols. Targets are IP addresses, firmware blobs, or protocol endpoints. Typical skills: firmware extraction (binwalk), hardcoded credential detection in firmware, MQTT broker enumeration, CoAP recon, Zigbee/BLE sniffer recipes, default credential brute force, UPnP enumeration. Commands are a mix of containerized tools (binwalk, nmap scripts) and protocol-specific clients. Safety is active for live devices (a brute force can lock out a device), non-destructive for firmware analysis. IoT skills carry the highest requires_confirmation rate because misconfigured active scans can brick consumer hardware.
4.4 Network
Skills that engage network infrastructure: routers, switches, firewalls, VPN concentrators, load balancers. Targets are IP ranges, CIDR blocks, or hostnames. Typical skills: port scanning (nmap), service fingerprinting, TLS cipher analysis, SNMP enumeration, BGP route analysis, firewall rule inference, VPN detection. Commands are containerized network tools. Safety spans passive (banner analysis from Shodan-style data) to active (nmap with NSE scripts). Network skills are heavily composed — a network discovery skill produces targets that feed into Web, IoT, or Cloud skills.
4.5 Cloud
Skills that engage cloud providers and cloud-native infrastructure: AWS, GCP, Azure, Kubernetes, Terraform, CloudFormation. Targets are cloud accounts, regions, clusters, or IaC repositories. Typical skills: S3 bucket exposure detection, IAM policy analysis, EKS RBAC audit, Terraform static analysis (tfsec, checkov), Kubernetes pod security analysis, cloud metadata endpoint checks, exposed secrets in CI configs. Commands are typically cloud CLI calls (aws, gcloud, az, kubectl) or IaC scanners. Safety is non-destructive (read-only API calls) for most skills, active for a small number that create canary resources. Cloud skills are the most policy-sensitive: the safety declarations must respect the customer's cloud tenancy boundaries.
4.6 AI/ML
Skills that engage AI/ML systems: LLM endpoints, model serving infrastructure, training pipelines, datasets. Targets are API endpoints, model artifacts, or dataset repositories. Typical skills: LLM prompt injection testing, model extraction attack detection, training data poisoning detection, model card validation, inference endpoint DoS testing, vector database exposure detection, agent tool-use abuse testing. Commands are HTTP requests to inference endpoints or containerized adversarial toolkits. Safety is non-destructive for analysis skills, active for red-team skills that send crafted inputs. This is the newest category and the fastest-growing; it is also the category where the skill schema is most likely to evolve, because the attack surface is still being mapped.
4.7 Cross-Category Composition
Categories are not silos. A typical engagement composes skills across categories. A cloud pen-test might run: cloud/eks-rbac-audit → network/port-scan (on discovered nodes) → web/api-reachability (on discovered services) → ai/llm-endpoint-test (on discovered inference endpoints). The core's composition engine (Section 6) handles the data flow between categories. The category system exists for organization, defaults, and safety policy — not for runtime isolation.
5. Skill Loader Architecture
The skill loader is the core's front door. It is the only component that reads from the Skill Library, and it is the only component that can introduce new skills into a running platform. Its design is deliberately simple and deliberately strict: it validates everything, trusts nothing, and fails closed.
5.1 Loader Pipeline
The loader runs a five-stage pipeline for every skill it encounters:
┌─────────────────────────────────────────────────────────────────┐
│ SKILL LOADER PIPELINE │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ 1. Fetch │──▶│ 2. Parse │──▶│ 3.Validate│──▶│ 4. Compile│──▶│ 5.Register│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ Fetch: Pull skill package from library (local FS or remote) │ │ Parse: Read skill.yaml, expand into in-memory manifest tree │ │ Validate: JSON Schema validation + semantic checks + safety │ │ Compile: Resolve templates, pre-compile parsers, index patterns│ │ Register: Add to skill registry, notify composition engine │ └─────────────────────────────────────────────────────────────────┘
Stage 1 — Fetch. The loader retrieves the skill package from the configured library source. Sources are pluggable: local filesystem (file://), HTTP registry (https://marketplace.platform/), S3 mirror (s3://), or a Git repository. The loader verifies a package signature (Sigstore/cosign) before proceeding. Unsigned packages are rejected in production mode; a --allow-unsigned flag exists for development only and is gated behind a core configuration flag that cannot be set via skill.
Stage 2 — Parse. The loader reads skill.yaml, resolves any include: directives (skills can include shared YAML fragments), and expands the manifest into an in-memory tree. Parsing failures (malformed YAML, unknown fields in non-strict mode) cause the skill to be rejected with a structured error. The core never guesses.
Stage 3 — Validate. This is the most important stage. Validation has three layers:
Structural validation: The manifest is checked against the JSON Schema for apiVersion: skill.platform/v1. Unknown fields in strict mode are rejected. Required fields are checked. Type constraints are enforced.
Semantic validation: Cross-field checks run. min_core_version must be ≤ the running core version. depends_on references must point to skills that exist (or are being loaded in the same batch). runtime.executor must be one the core supports. safety.impact must be compatible with the platform's configured engagement mode (a destructive skill cannot be loaded if the platform is in safe mode).
Safety validation: The safety declarations are checked against the platform's safety policy. A skill that declares requires_confirmation: true is allowed; a skill that should declare it but doesn't (based on its commands) is flagged for manual review. The loader maintains a heuristic classifier that inspects command argv for high-risk patterns (e.g., rm, dd, mkfs, --destroy) and escalates the impact declaration if the skill author under-declared it.
Stage 4 — Compile. The loader resolves all template placeholders in command specs against the platform's default policy context (not against a specific target — that happens at invocation time). Parsers are pre-compiled: Python parsers are byte-compiled and checked for sandbox violations (no import os, no open() outside approved paths, no subprocess, no network access). Jinja2 templates are compiled and checked for forbidden filters/tags. Patterns are compiled into matcher objects and indexed by the fields they extract, so the composition engine can query "which skills can produce a cve field?" in O(1).
Stage 5 — Register. The validated, compiled skill is added to the in-process Skill Registry, a concurrent map keyed by fully-qualified skill ID. The registry publishes an event (skill.loaded) that the composition engine, the policy engine, and the operator console subscribe to. From this point, the skill is available for composition. Registration is atomic: a skill is either fully registered or not registered at all. There is no "partially loaded" state.
5.2 Registry Design
The Skill Registry is the core's source of truth for what skills are available. It is:
In-memory for hot path performance. A typical platform instance has 200–500 skills loaded; the registry is a HashMap<String, CompiledSkill> with an RCU (read-copy-update) pattern so readers never block writers.
Append-only for versions. A new version of a skill is a new entry (web/apache-path-traversal@1.2.1 is a distinct key from web/apache-path-traversal@1.2.0). Old versions are not evicted unless an explicit GC policy runs, so running modules that pinned an old version are never broken by a new release.
Queryable. The registry supports queries by category, by tag, by supported_targets, by safety.impact, by depends_on, and by "can produce field X." These queries power the composition engine's skill selection.
5.3 Hot Loading and Unloading
Skills can be loaded, upgraded, and unloaded while the platform is running — this is the "zero downtime" promise. The mechanism:
Load: New skill goes through the pipeline, enters the registry. New module requests can use it immediately. Existing running modules are unaffected (they pinned their skill versions at composition time).
Upgrade: A new version of a skill is loaded as a new registry entry. The old version remains. Module requests that don't pin a version get the latest by default; requests that pin a version get what they pinned. An operator can issue a skill.roll command to drain old-version references and evict the old entry.
Unload: An operator issues skill.unload web/apache-path-traversal@1.2.0. The core marks the skill as "draining." New module requests cannot use it. Running modules that reference it are allowed to finish. When the last reference is released, the skill is evicted from the registry and its compiled assets are dropped from memory.
At no point during any of these operations does the core binary change, restart, or lose in-flight work.
6. Runtime Composition
Composition is the process by which the core assembles a set of skills into a runnable module graph in response to an operator request. This is where the platform's real power lives: an operator does not say "run skill X," they say "assess target Y for risk class Z," and the composition engine selects and wires the appropriate skills.
6.1 The Composition Request
An operator (or an automated trigger, or an LLM agent) submits a composition request:
# composition request
target: type: web-url value: "https://apache-vuln.example.com" engagement: mode: safe # "safe"
"standard" "aggressive" categories: [web, network] # Optionally constrain; null = all applicable skills: null # Optionally pin specific skills; null = auto-select max_duration_seconds: 1800 policy: rate_limit: 150 scope: "production-readonly"
6.2 Skill Selection
The composition engine queries the Skill Registry to find skills that:
Declare supported_targets matching the request's target type.
Are in the requested categories (if specified).
Have a safety.impact compatible with the engagement mode (safe allows passive and non-destructive; standard adds active; aggressive adds destructive but still requires explicit confirmation).
Are not blocked by the platform's denylist.
Have all their depends_on satisfied (transitively).
The result is a candidate set. The engine then applies a relevance filter: it scores each candidate by how well its meta.tags and knowledge match the target's fingerprint (technology stack, open ports, HTTP headers, etc.). The target fingerprint is produced by a lightweight recon phase that runs a small set of always-present core skills (port scan, HTTP probe, TLS probe) before the main composition. This is the one place the core has "baked-in" knowledge — and it is deliberately minimal: just enough to fingerprint the target so skill selection is intelligent.
6.3 Graph Construction
Selected skills are wired into a Directed Acyclic Graph (DAG). Each skill is a node. depends_on creates edges. Skills that produce outputs consumed by other skills' inputs create data-flow edges. The engine performs a topological sort and emits an execution plan:
┌───────────────────────────────────────────────────────────────┐
│ COMPOSITION DAG EXAMPLE │ │ │ │ ┌──────────────┐ │ │ │ network/ │ │ │ │ port-scan │ │ │ └──────┬───────┘ │ │ │ ports[] │ │ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ web/http- │────▶│ web/apache- │ │ │ │ reachability │ │ path-traversal│ │ │ └──────┬───────┘ └──────┬───────┘ │ │ │ reachable? │ findings[] │ │ ▼ │ │ │ ┌──────────────┐ │ ┌──────────────┐ │ │ │ web/tls- │ │ │ report/ │ │ │ │ cipher-audit │ │ │ executive- │ │ │ └──────┬───────┘ │ │ summary │ │ │ │ findings[] │ └──────▲───────┘ │ │ │ │ │ │ │ └───────────────────┴──────────┘ │ │ merged findings[] │ └───────────────────────────────────────────────────────────────┘
The DAG is the module. It is what runs. The core executes it node by node, respecting dependencies, concurrency limits, and safety policy. Each node's output feeds into a typed data bus; downstream nodes consume typed inputs. Type mismatches are caught at composition time, not at runtime.
6.4 Execution
The core's execution engine runs the DAG:
Schedule: Nodes with no unmet dependencies are scheduled. The core respects each skill's runtime.concurrency limit — if a skill says concurrency: 4, no more than 4 instances of that skill run simultaneously across all active modules.
Invoke: For each node, the core materializes the skill's command with the target and policy context, sends it to the executor (container runtime, process spawner, HTTP client, or inline Python sandbox), and streams output back.
Parse: Raw output is fed to the skill's parser, producing normalized Finding objects.
Match: Patterns are applied to findings and raw output to classify, enrich, and deduplicate.
Publish: Findings are published to the data bus, to the operator console (streamed), and to durable storage (batched).
Report: When the DAG completes (or when an operator requests an interim report), the report templates from each skill are rendered with the accumulated findings, producing a unified report.
6.5 Dynamic Re-Composition
The DAG is not frozen at composition time. The engine supports dynamic re-composition: if a skill's output reveals a new sub-target (e.g., the HTTP probe discovers an API endpoint that wasn't in the initial target spec), the engine can request a sub-composition for that sub-target, pulling in additional skills mid-run. This is how the platform handles the reality that security engagement is inherently exploratory — you don't know what you'll find until you start looking.
Dynamic re-composition is bounded: the engine has a configurable depth limit (default 3) and a total-skill-count limit (default 50 skills per module) to prevent runaway expansion. Every dynamic re-composition is logged and visible to the operator in real time.
7. Versioning and Compatibility
The Skill-Based Platform has three independent version axes that must never be conflated:
7.1 Core Version
The core binary has a semver version. It changes rarely — the design goal is "never," but security patches and executor upgrades will necessitate occasional releases. The core version defines:
Which apiVersion of the skill schema it understands (currently skill.platform/v1).
Which executors are available (container, process, http, inline).
Which safety policies are enforced.
A core version bump is a platform-wide event. It is tested against the entire skill library in CI before release. The core maintains backward compatibility with at least one prior major schema version, loading older skills in a compatibility shim.
7.2 Skill Version
Each skill has its own semver version, independent of the core. A skill author bumps the skill version when they change any component — a new CVE reference, a tuned pattern, a parser bugfix. The min_core_version field in the manifest declares the oldest core that can load this skill. This is the only coupling between skill and core, and it is one-directional: a skill declares what it needs; the core never declares what skills it needs.
Skill versioning follows semver with a security-specific convention:
Patch (1.2.0 → 1.2.1): Bugfix in parser, pattern tuning, knowledge doc update. No new command, no new finding class. Safe to auto-upgrade.
Minor (1.2.1 → 1.3.0): New sub-check, new pattern, expanded target support. No breaking change to outputs. Safe to auto-upgrade within an engagement.
Major (1.3.0 → 2.0.0): Breaking change — renamed findings, changed output schema, removed command, changed safety impact. Requires operator acknowledgment before upgrade.
7.3 Library Version
The Skill Library as a whole has a manifest version — a signed list of all skill IDs and versions it currently contains. This is used for reproducibility: an engagement report includes the library manifest hash, so the exact skill set that produced the report can be reconstructed. This is critical for audit and for comparing engagement results over time.
7.4 Compatibility Matrix
Change Core bump? Skill bump? Library bump? Running modules affected?
--- --- --- --- ---
New skill added No N/A Yes No
Skill patch upgrade No Patch Yes No (pinned at composition)
Skill minor upgrade No Minor Yes No (pinned at composition)
Skill major upgrade No Major Yes No (pinned; operator must roll)
New executor in core Minor No No No
Schema version bump Major All skills update min_core_version Yes No (compat shim)
Safety policy change Patch No No New modules only
The key property: no change in any version axis ever breaks a running module. Modules pin their skill versions and core version at composition time. The only way to affect a running module is to explicitly cancel it.
8. The Skill Marketplace
The Skill Library is not just a filesystem — it is a marketplace: a versioned, signed, searchable registry of skills that can be published, discovered, subscribed to, and audited. The marketplace model is what makes "adding new tech = 1–2 hours" scale beyond a single team to an entire ecosystem.
8.1 Roles
Authors: Write skills. Can be internal (platform team, red team, a customer's security team) or external (community, vendors). Authors have identities (OIDC-backed) and signing keys.
Consumers: Platform instances that load skills from the marketplace. A consumer is configured with one or more marketplace subscriptions.
Curators: Marketplace operators who review skills before they are promoted to stable. Curators do not gate publishing to dev or beta, but they gate promotion to stable and verified.
Auditors: Read-only access to the full marketplace history for compliance. Every publish, upgrade, takedown, and download is logged.
8.2 Publication Channels
Every skill exists in one of four channels, which map to its maturity and trust level:
Channel Meaning Auto-load? Signed? Curated?
--- --- --- --- ---
dev Author's working version No (dev only) Optional No
beta Publicly available, seeking feedback Opt-in Required No
stable Reviewed, production-ready Yes (default) Required Yes
verified Curator-audited, safety-reviewed, SLA-backed Yes (recommended) Required + attestation Yes + safety audit
A platform instance's configuration declares which channels it will auto-load from. A production deployment typically auto-loads stable and verified only; a research deployment might add beta.
8.3 Publication Workflow
Author writes skill → local validation (CLI: skill validate)
↓ Author signs package (cosign) → publishes to dev channel ↓ Author requests promotion to beta → marketplace runs automated checks ↓ Curator reviews → promotes to stable (or rejects with feedback) ↓ Safety team audits → promotes to verified (adds safety attestation)
Automated checks on promotion include: schema validation, parser sandbox validation, command safety classification, dependency resolution test, and a dry-run composition against a synthetic target to verify the skill produces valid findings without errors.
8.4 Discovery
The marketplace exposes a search API and a web UI. Skills are discoverable by:
Full-text search over meta.name, meta.description, meta.tags.
Category browse.
Target type browse ("what skills can scan a Kubernetes cluster?").
CVE/ CWE lookup ("which skills detect CWE-22?").
Dependency graph browse ("what skills depend on web/http-reachability?").
Each skill's marketplace page shows: description, version history, channel, author identity, signing key, safety declarations, dependency graph, sample findings, and download/load counts.
8.5 Subscriptions and Sync
A platform instance subscribes to marketplace feeds. The core's library sync daemon polls (or receives webhooks for) new and updated skills matching the subscription filter. A subscription filter might be: "all verified skills in web and cloud categories, plus stable skills tagged apache or kubernetes." New skills matching the filter are fetched, signature-verified, and hot-loaded per Section 5.3. This is how a deployed platform stays current without anyone touching it.
8.6 Private Skills
Not every skill should be public. A customer's internal security team may write skills for proprietary protocols or custom APIs. The marketplace supports private skills: skill packages published to a private registry that requires authentication. The core's library sync daemon handles private registries with the same pipeline as public ones, adding an auth step to Fetch. Private skills never appear in public search and never propagate to other consumers.
9. Safety Rules and Sandboxing
The Skill-Based Platform runs code from many authors against real targets. This is inherently dangerous. The safety model is defense-in-depth: every layer assumes the layer above it has been compromised.
9.1 Impact Classification
Every skill declares an impact level. The core enforces this as a hard gate — a skill cannot run if its impact exceeds the engagement mode's allowance.
Impact Definition Engagement Mode
--- --- ---
passive No packets sent to target. Analysis of pre-existing data (Shodan, certificates, public datasets). safe
non-destructive Sends requests to target but does not attempt to modify state. Scanning, fingerprinting, vuln detection. safe
active Sends crafted requests that may trigger observable behavior. Exploit verification, brute force, fuzzing. standard
destructive May modify or damage target state. Exploit payloads, DoS, configuration changes. aggressive + confirmation
9.2 Confirmation Gate
Skills that declare requires_confirmation: true (or that are classified as destructive) are held at a confirmation gate. The core pauses the module, surfaces the skill, its impact, its target, and its commands to the operator, and waits for explicit approval. If the operator does not approve within the module's timeout, the skill is skipped. This is non-bypassable: the confirmation gate is in the core, not in the skill.
9.3 Command Sandboxing
Commands run in the executor the skill declares. For the container executor (the most common), the core enforces:
Read-only root filesystem. The skill's container cannot write to its own filesystem except to a tmpfs /tmp. This prevents a compromised skill from persisting payloads.
No host mounts. The container has no access to the host filesystem, host network, host PID space, or host IPC.
Network policy. The container's network is governed by the skill's runtime.container.network declaration:
- none: No network. For offline parsers and static analysis. - egress-only: Outbound to the target only. The core injects iptables/eBPF rules that restrict egress to the target's IP and port. No inbound. No lateral movement. - full: Unrestricted network. Requires impact: destructive and operator confirmation. Logged at packet level.
Resource limits. CPU, memory, and PID limits are enforced via cgroups. A skill that OOMs is killed and its failure is recorded; it does not crash the core.
Timeout. Every command has a timeout. A command that exceeds it is killed (SIGKILL after SIGTERM grace period). No skill can hang the platform.
9.4 Parser Sandboxing
Python parsers run in a restricted execution environment:
No filesystem access except the skill's own package directory (read-only).
No network access.
No subprocess. os.system, subprocess., os.exec are blocked.
No eval/exec/compile of dynamic code.
No import of unsafe modules. An allowlist permits: json, re, datetime, collections, itertools, platform.sdk. Everything else is denied.
Memory and CPU limits. Parsers run in a worker pool with per-task resource limits. A parser that loops is killed.
Structured output only. A parser must return a list of Finding objects (validated by the core) or raise a ParseError. It cannot return arbitrary objects.
9.5 Supply Chain Safety
Every skill package is signed (Sigstore/cosign). The core verifies the signature at Fetch time and checks it against the marketplace's trust store. A skill signed by an unknown key is rejected in production. The marketplace records a transparency log (Rekor-style) of every publish, so the history of a skill is publicly auditable. The core pins the marketplace's signing root in its configuration, which can only be changed by a core restart with a new config file — not by a skill, not by an API call, not by an operator command.
9.6 Safety Policy as Code
The platform's safety policy is itself a declarative document, versioned and auditable:
# safety-policy.yaml
engagement_modes: safe: allow_impact: [passive, non-destructive] require_confirmation_for: [] max_concurrency: 8 standard: allow_impact: [passive, non-destructive, active] require_confirmation_for: [destructive] max_concurrency: 16 aggressive: allow_impact: [passive, non-destructive, active, destructive] require_confirmation_for: [destructive] max_concurrency: 32
denylist: skills: [] # Explicit skill IDs to block categories: [] # Category-level blocks authors: [] # Author-level blocks (trust revocation)
allowlist: channels: [stable, verified] # Only load from these channels private_registries: [corp-internal] # Additional private registries
data_handling: redact_secrets_in_reports: true redact_pii_in_reports: true encrypt_raw_output_at_rest: true raw_output_retention_days: 30
The safety policy is loaded at core startup and can be hot-reloaded from disk (the core watches the file). It cannot be modified by a skill. It is the outermost ring of the defense-in-depth model.
10. Comparison: Legion Plugins vs. CAI Modular Agents vs. Skill-Based Platform
The Skill-Based Platform is not the first system to pursue extensibility without core changes. Two prior art systems are worth comparing in detail: Legion's plugin framework and CAI's modular agents. Both influenced this design; both have properties the Skill-Based Platform deliberately improves upon.
10.1 Legion Plugins
Legion (a security automation framework) uses a plugin system where each plugin is a Python module that registers itself with the framework at import time. Plugins define tasks, hooks, and UI components. The framework's core loads plugins from a plugin directory, calls their register() function, and integrates their capabilities.
Strengths:
Simple mental model: write a Python file, drop it in a directory, it works.
Plugins have full access to the framework's internal APIs, enabling deep integration.
Hot-reloadable in some configurations.
Weaknesses that the Skill-Based Platform addresses:
Plugins are code, not declarations. A Legion plugin is a Python module that runs arbitrary code at import time. This means a plugin can do anything — modify the framework's internals, monkey-patch other plugins, access the filesystem, make network calls — during registration. The Skill-Based Platform's skills are declarative manifests with sandboxed components; the core never executes skill code at load time.
No schema contract. Legion plugins conform to a convention (duck typing), not a schema. If a plugin doesn't implement an expected method, the framework fails at runtime, often with opaque errors. The Skill-Based Platform validates every skill against a JSON Schema at load time; a non-conforming skill is rejected with a structured error before it can affect the system.
Tight coupling. Legion plugins often import each other directly, creating a tangled dependency web. The Skill-Based Platform's depends_on is declarative and resolved by the core, not by the skills themselves. Skills never import each other; they declare what they need and the core wires it.
Safety is opt-in. A Legion plugin can run arbitrary code with no safety declaration. The framework trusts the plugin author. The Skill-Based Platform requires every skill to declare its impact, runs commands in sandboxed executors, and validates parser code against an allowlist. Safety is not optional; it is structural.
Versioning is implicit. Legion plugins have versions, but the framework doesn't manage compatibility. Two plugins that depend on different versions of a shared helper can conflict silently. The Skill-Based Platform's semver + min_core_version + library manifest hash makes every dependency explicit and every composition reproducible.
No marketplace. Legion plugins are distributed as Git repos or Python packages. There is no curated, signed, channel-tiered registry. The Skill-Based Platform's marketplace provides discovery, trust tiers, supply-chain signing, and audit logging.
10.2 CAI Modular Agents
CAI (a Cybersecurity AI framework) uses a modular agent architecture where different agent types (recon agent, exploit agent, report agent) are composed into a workflow. Each agent is a Python class that implements a specific interface (run(), observe(), act()). Agents communicate via a shared state object. An orchestrator selects and sequences agents based on the task.
Strengths:
Agent abstraction maps well to security workflows (recon → exploit → report).
Agents can be swapped or added without changing the orchestrator.
The shared state object enables rich inter-agent communication.
Weaknesses that the Skill-Based Platform addresses:
Agents are behavioral abstractions, not capability units. A CAI agent bundles behavior (what to do) with implementation (how to do it) in a single Python class. The Skill-Based Platform separates these: a skill declares what it knows and what it does, but the how (execution, sandboxing, scheduling) is the core's job. This separation means the same skill can be run by different executors (container, process, HTTP) without changing the skill.
Agent communication is untyped. CAI agents share a state object that is typically a dict or a loosely-typed context. Type errors surface at runtime. The Skill-Based Platform's composition DAG has typed data-flow edges; type mismatches are caught at composition time.
No declarative knowledge. CAI agents embed their domain knowledge in Python code. The Skill-Based Platform's skills have a knowledge component that is Markdown + structured YAML, readable by both humans and LLM agents. This makes skills useful even when they're not executing — an operator or an AI agent can query the knowledge base directly.
Safety is per-agent, not systemic. A CAI agent implements its own safety checks (if it chooses to). There is no systemic enforcement. The Skill-Based Platform's safety policy is a core-level, non-bypassable gate that applies to every skill uniformly.
Versioning and reproducibility. CAI workflows are reproducible only if you pin every agent's version and the shared state schema. In practice, this is rarely done. The Skill-Based Platform's library manifest hash makes every composition cryptographically reproducible.
LLM integration. CAI agents are often LLM-driven, meaning their behavior is non-deterministic. The Skill-Based Platform supports LLM-driven skills (an ai/ skill can use an LLM for analysis), but the skill itself is deterministic in its declaration — the LLM is a tool the skill uses, not the skill's definition. This keeps the composition graph predictable even when individual nodes use AI.
10.3 Summary Comparison
Property Legion Plugins CAI Modular Agents Skill-Based Platform
--- --- --- ---
Extension unit Python module Agent class Declarative skill package
Schema contract Convention (duck typing) Interface (abstract class) JSON Schema (validated)
Load-time safety None (code runs at import) None (code runs at init) Full (declarative validation + sandbox)
Runtime safety Opt-in per plugin Opt-in per agent Systemic, core-enforced
Dependency model Python imports Shared state object Declarative depends_on, core-resolved
Versioning Implicit Implicit Semver + library manifest hash
Reproducibility Low Low Cryptographic
Knowledge representation Embedded in code Embedded in code Separate knowledge component (Markdown + YAML)
Report generation Per-plugin, ad hoc Per-agent, ad hoc Declarative report template per skill
Marketplace None None Signed, channel-tiered, auditable
Zero-downtime upgrade Sometimes (hot-reload) No Yes (versioned registry + pinning)
Core changes for new tech Sometimes (new hooks) Sometimes (new orchestrator logic) Never
The fundamental difference is one of philosophy. Legion and CAI both extend a framework by adding code that runs inside it. The Skill-Based Platform extends a core by adding declarations that the core interprets. The core's code never grows; only its data does. This is the property that makes "the core never changes" literally true, not aspirational.
11. Operational Scenarios
11.1 Adding a New Technology (The 1–2 Hour Promise)
A customer reports they've deployed a new IoT protocol gateway using Thread/Matter. The platform has no Thread skill. The timeline:
Minute 0–20: Security engineer reads the Matter specification's threat model. Identifies three checks: (a) commissioning window exposure, (b) default vendor key reuse, (b) BLE pairing replay. Drafts knowledge/overview.md with the CVE references and protocol background.
Minute 20–50: Engineer writes commands/matter-recon.yaml — a containerized Python script that sends MATTER commissioning queries to the target. Writes patterns/matter-findings.yaml with three matchers, one per check. Each matcher maps to a normalized finding class.
Minute 50–70: Engineer writes parser/parse.py (30 lines — parse the command's JSON output into Finding objects). Writes report/template.j2 (a Jinja2 template that renders the three finding types).
Minute 70–80: Engineer writes skill.yaml, runs skill validate locally. Fixes the two validation errors (a missing supported_targets entry and an under-declared impact that the safety classifier caught). Re-validates. Passes.
Minute 80–90: Engineer signs the package and publishes to dev. Runs a dry-run composition against a test target. Findings render correctly.
Minute 90–100: Engineer requests promotion to beta. Automated checks pass. Engineer opens a curator review ticket.
Curator reviews (out of band, ~1 day for stable, ~1 week for verified). The skill is usable in beta immediately by any platform opted into the beta channel.
Total active engineering time: ~90 minutes. Zero core changes. Zero downtime. Zero regression risk. The skill is available to every platform instance subscribed to the beta channel within the sync daemon's poll interval (default 60 seconds).
11.2 Upgrading a Skill Mid-Engagement
An engagement is running web/apache-path-traversal@1.2.0. The skill author publishes 1.2.1 (a pattern tuning fix that reduces false positives). The library sync daemon fetches 1.2.1 and hot-loads it into the registry. The running module is unaffected — it pinned 1.2.0 at composition time. The next module request that doesn't pin a version gets 1.2.1. An operator who wants the running module to benefit from the fix can either restart the module (it will compose with 1.2.1) or issue a skill.roll command, which drains the old version after the current scan pass completes and lets the next pass use the new version.
11.3 Emergency Takedown
A skill is found to have a safety issue — its parser, in an edge case, exfiltrates target data to an external endpoint (a violation of the sandbox rules that slipped through review). The curator issues a takedown command. The marketplace marks the skill as revoked, adds it to the global denylist, and propagates the revocation to all subscribed platform instances within seconds. The core's library sync daemon receives the revocation, unloads the skill immediately (killing any running instances of it with a logged reason), and prevents future loads. An audit log entry is created with the takedown reason, the curator identity, and the timestamp. The skill's author is notified and can publish a fixed version, which goes through the full review pipeline again.
11.4 Air-Gapped Deployment
A customer runs the platform in an air-gapped environment. They mirror the marketplace to an internal registry on a weekly cadence (a signed bundle of all stable and verified skills). The core's library sync daemon points at the internal registry. All signing keys are pre-distributed. The core operates identically — it never needs internet access. Private skills are published directly to the internal registry. The only difference from a cloud-connected deployment is the sync cadence.
12. Failure Modes and Mitigations
Failure Impact Mitigation
--- --- ---
Skill package corrupted in transit Skill won't load Signature verification fails; skill rejected; operator notified; fallback to previous version
Parser loops infinitely CPU spike in parser worker Worker killed after CPU limit; skill marked as error; module continues with other skills
Command container OOMs Skill fails Container killed; cgroup limit; skill marked as error; module continues
Skill declares wrong impact Safety violation Safety classifier detects mismatch at load time; skill rejected or auto-escalated; never runs under-declared
Marketplace is unreachable No new skills loaded Core continues with already-loaded skills; sync daemon retries with backoff; operator notified
Skill has unsatisfied depends_on Skill can't compose Loader flags it; composition engine skips it; operator sees "skill X skipped: missing dependency Y"
Two skills produce conflicting findings Duplicate or contradictory findings Core runs a deduplication pass post-DAG; findings with same class + target + cve are merged, highest severity wins
LLM-driven skill hallucinates a finding False positive ai/ skills must declare confidence: low by default; findings below a confidence threshold are quarantined for operator review
Malicious skill attempts sandbox escape Potential compromise Defense-in-depth: container isolation (no host access), parser sandbox (no FS/network/subprocess), signature verification (no unsigned code), safety policy (core-level gate). Escape attempts are logged and the skill is auto-revoked.
Core bug Platform-wide Core is tiny (<5k LOC), heavily fuzzed, and has a 100% branch coverage requirement. Bugs are rare and fixable without touching skills.
13. Roadmap
13.1 Phase 1 — Core MVP (Weeks 1–6)
Skill loader with skill.platform/v1 schema validation.
Skill registry with hot-load/unload.
Container and process executors.
Composition DAG with static skill selection (no fingerprinting yet).
Basic safety policy (impact gate, confirmation gate, container sandbox).
10 reference skills across Web and Network categories.
13.2 Phase 2 — Composition Intelligence (Weeks 7–12)
Target fingerprinting (lightweight recon phase).
Dynamic re-composition with depth/count limits.
Typed data-flow edges with compile-time validation.
Skill scoring and relevance filtering.
30 additional skills across all six categories.
13.3 Phase 3 — Marketplace (Weeks 13–20)
Marketplace registry with dev/beta/stable/verified channels.
Sigstore signing and transparency log.
Publication workflow with automated checks.
Discovery API and web UI.
Library sync daemon with subscription filters.
13.4 Phase 4 — Ecosystem (Weeks 21+)
Private registries and air-gapped mirrors.
LLM agent integration (AI agents as skill consumers and as ai/ skill components).
Skill dependency graph visualization and impact analysis.
Community contribution program with curator onboarding.
Performance: parser worker pooling, pattern index optimization, composition cache.
Appendix: Glossary
Term Definition
--- ---
Skill The smallest unit of platform capability. A declarative package with knowledge, commands, patterns, parser, and report template.
Skill Library The collection of all available skills, organized by category and version.
Skill Registry The in-process, in-memory index of loaded skills. The core's source of truth for what is available.
Core The never-changing platform binary. Loads skills, composes modules, enforces safety, streams results.
Module A running instance of a composition DAG. The unit of execution.
Composition The process of selecting skills and wiring them into a DAG for a given target and engagement.
Finding The normalized output unit of any skill. Has severity, confidence, target, evidence, class, and CWE.
Engagement Mode A safety tier (safe, standard, aggressive) that gates which impact levels are allowed.
Impact A skill's declared potential to affect the target: passive, non-destructive, active, destructive.
Marketplace The signed, versioned, channel-tiered registry where skills are published, discovered, and subscribed to.
Channel A maturity/trust tier in the marketplace: dev, beta, stable, verified.
Library Manifest Hash A cryptographic hash of the library's complete skill inventory at a point in time. Used for reproducibility.
DAG Directed Acyclic Graph. The structure of a composed module. Nodes are skills; edges are dependencies and data flows.
Executor The runtime that executes a skill's command: container, process, http, or inline.
This document specifies the Skill-Based Platform architecture as of schema version skill.platform/v1. The design is stable; the skill library is not. That is the point.