How AI Pen Testing Actually Works (and Where It Breaks)

Description

Episode Summary

AI is starting to change penetration testing, but most people are asking the wrong question. In this episode of Secured, Cole Cornford sits down with Brendan Dolan-Gavitt, AI researcher at XBOW and former NYU professor, to unpack what autonomous pen testing really is, what it can reliably do today, and what still needs humans.

They explore why AI agents are great at scaling the boring parts of testing, like authenticated workflows and broad vulnerability coverage across huge attack surfaces, and why that does not automatically translate to deep, context-aware exploitation. The conversation also gets into the messy parts: AI systems overclaiming “serious” findings, business logic flaws that are hard to verify, audit expectations, and why scope control needs real guardrails, not vibes. From agent traces and validation models to cost curves and creative exfiltration tricks, this episode is a grounded look at where AI helps AppSec and where it can still cause damage if you trust it too much.

Timestamps

00:00 – Intro

03:10 – From academia to building autonomous security tools

05:00 – Human pen testers vs AI agents: what is actually different

06:40 – Where AI helps most: boring tasks and low hanging fruit

08:30 – Scale: a thousand targets vs hiring a thousand testers

10:20 – Accessibility, economics, and Jevons paradox

12:30 – Accountability: audit evidence, traces, and “who signs off”

14:40 – Scope control: avoiding prod and preventing out-of-scope actions

16:20 – Safety checkers, overseer agents, and persuasion resistance

18:40 – The cost question: VC money, inference pricing, and efficiency

21:20 – When AI wastes money and why prioritisation matters

23:50 – Failure mode: overclaiming business “vulnerabilities”

26:10 – Validation agents and adversarial peer review

28:40 – The scary clever stuff: exfiltrating files as images

31:00 – What AI finds well: XSS, SQLi, file traversal, hard proof bugs

33:10 – What AI struggles with: business logic and contextual judgement

35:20 – Hype vs skepticism and why nobody has a crystal ball

🐙 Secured is grateful to be sponsored and supported by Chainguard.

Chainguard is the trusted source for open source. Get hardened, secure, production-ready builds so your team can ship faster, stay compliant, and reduce risk. Download your free CVE Reduction Assessment at https://dayone.fm/chainguard

This podcast uses the following third-party services for analysis:

Podtrac - https://analytics.podtrac.com/privacy-policy-gdrp
Spotify Ad Analytics - https://www.spotify.com/us/legal/ad-analytics-privacy-policy/

Listen

Description

Episode Summary

Timestamps

Want to check another podcast?