From Checklists to Playbooks: Making Pentest Workflows Agent-Friendly

Wed Dec 10, 2025

A lot of teams have a good instinct for how to approach a web or network test:

Start with scope and rules of engagement,
Do some light recon,
Map the attack surface,
Then carefully validate anything that looks interesting.

But that knowledge often lives in people’s heads, scattered notes, and half‑remembered scripts. That works fine until you want to:

Onboard new testers quickly,
Bring in more automation, or
Let “agents” (LLM‑driven or otherwise) handle the boring parts safely.

I’ve been working on turning that instinct into something much more explicit, but still private: a small, structured “manual” that humans and tools can both understand.

This post talks about the shape of that manual, not the internals.

The problem: humans improvise, agents need a map

Humans are good at reading between the lines:

“This is production, so I probably shouldn’t hammer it.”
“This endpoint smells like business logic; be gentle and think before fuzzing.”
“We’re clearly off‑scope if we go here.”

Agents don’t have that gut feel.

If you want to safely delegate parts of an engagement, you need a way to encode intent:

What’s allowed here?
How “loud” am I allowed to be?
Which flow should I follow for this kind of target?
When must I stop and ask a person?

That’s what this manual is trying to solve.

Environments matter as much as targets

The manual also distinguishes between where you are:

Production vs staging vs development vs lab.
External perimeter vs internal network.
Business‑critical vs low‑risk systems.

The same technique can be completely fine in a lab and totally unacceptable in production. Rather than trying to remember that ad‑hoc, each environment profile spells out:

Default mode,
Things that are never acceptable,
Things that require explicit sign‑off.

This is the bridge between “we have a policy doc somewhere” and “the tooling actually behaves the way the policy expects”.

Guardrails and approval points

The riskiest parts of an engagement are usually obvious to a senior tester:

Large password or MFA code spaces,
High‑volume fuzzing,
Anything involving big data pulls,
Anything that writes or changes configuration.

The manual bakes those into a tiny list of “must ask first” moments.

When a human is driving, it’s just a reminder. When an agent is driving, it’s a hard stop:

“I’ve reached a potentially dangerous decision. Here’s what I’m thinking and why. Do you approve?”

That one pattern alone goes a long way towards making automated help feel like a safety net instead of a liability.

Why bother?

None of this replaces experience or judgement. You still need people who understand the business, the stack, and the threat model.

What it does give you is:

A way to scale that experience across people and tools,
A language you can share with agents without exposing your internal playbook,
And a clearer separation between “what we do” and “how a given tool happens to implement it today”.

The manual I’ve been building is private by design, and it will stay that way. But the pattern is general:

Define modes,
Define workflows,
Define environments,
Define guardrails.

If you get those four right, the rest of your automation – including agent‑style workflows – has something solid to stand on.