From Checklists to Playbooks: Making Pentest Workflows Agent-Friendly

Wed Dec 10, 2025

A lot of teams have a good instinct for how to approach a web or network test:

  • Start with scope and rules of engagement,
  • Do some light recon,
  • Map the attack surface,
  • Then carefully validate anything that looks interesting.

But that knowledge often lives in people’s heads, scattered notes, and half‑remembered scripts. That works fine until you want to:

  • Onboard new testers quickly,
  • Bring in more automation, or
  • Let “agents” (LLM‑driven or otherwise) handle the boring parts safely.

I’ve been working on turning that instinct into something much more explicit, but still private: a small, structured “manual” that humans and tools can both understand.

This post talks about the shape of that manual, not the internals.


The problem: humans improvise, agents need a map

Humans are good at reading between the lines:

  • “This is production, so I probably shouldn’t hammer it.”
  • “This endpoint smells like business logic; be gentle and think before fuzzing.”
  • “We’re clearly off‑scope if we go here.”

Agents don’t have that gut feel.

If you want to safely delegate parts of an engagement, you need a way to encode intent:

  • What’s allowed here?
  • How “loud” am I allowed to be?
  • Which flow should I follow for this kind of target?
  • When must I stop and ask a person?

That’s what this manual is trying to solve.


Environments matter as much as targets

The manual also distinguishes between where you are:

  • Production vs staging vs development vs lab.
  • External perimeter vs internal network.
  • Business‑critical vs low‑risk systems.

The same technique can be completely fine in a lab and totally unacceptable in production. Rather than trying to remember that ad‑hoc, each environment profile spells out:

  • Default mode,
  • Things that are never acceptable,
  • Things that require explicit sign‑off.

This is the bridge between “we have a policy doc somewhere” and “the tooling actually behaves the way the policy expects”.


Guardrails and approval points

The riskiest parts of an engagement are usually obvious to a senior tester:

  • Large password or MFA code spaces,
  • High‑volume fuzzing,
  • Anything involving big data pulls,
  • Anything that writes or changes configuration.

The manual bakes those into a tiny list of “must ask first” moments.

When a human is driving, it’s just a reminder. When an agent is driving, it’s a hard stop:

“I’ve reached a potentially dangerous decision. Here’s what I’m thinking and why. Do you approve?”

That one pattern alone goes a long way towards making automated help feel like a safety net instead of a liability.


Why bother?

None of this replaces experience or judgement. You still need people who understand the business, the stack, and the threat model.

What it does give you is:

  • A way to scale that experience across people and tools,
  • A language you can share with agents without exposing your internal playbook,
  • And a clearer separation between “what we do” and “how a given tool happens to implement it today”.

The manual I’ve been building is private by design, and it will stay that way. But the pattern is general:

  • Define modes,
  • Define workflows,
  • Define environments,
  • Define guardrails.

If you get those four right, the rest of your automation – including agent‑style workflows – has something solid to stand on.