AI-Powered Autonomous Penetration Testing Agent
Published at USENIX Security 2024
Official Website: pentestgpt.com »
Research Paper · Report Bug · Request Feature
- Autonomous Agent - Agentic pipeline for intelligent, autonomous penetration testing
- Session Persistence - Save and resume penetration testing sessions
- Docker-First - Isolated, reproducible environment with security tools pre-installed
In Progress: Multi-model support for OpenAI, Gemini, and other LLM providers
- AI-Powered Challenge Solver - Leverages LLM advanced reasoning to perform penetration testing and CTFs
- Live Walkthrough - Tracks steps in real-time as the agent works through challenges
- Multi-Category Support - Web, Crypto, Reversing, Forensics, PWN, Privilege Escalation
- Real-Time Feedback - Watch the AI work with live activity updates
- Extensible Architecture - Clean, modular design ready for future enhancements
- Docker (required) - Install Docker
- LLM Provider (choose one):
- Anthropic API Key from console.anthropic.com
- Claude OAuth Login (requires Claude subscription)
- OpenRouter for alternative models at openrouter.ai
- Tutorial: Using Local Models with Claude Code
# Clone and build git clone --recurse-submodules https://github.com/GreyDGL/PentestGPT.git cd PentestGPT make install # Configure authentication (first time only) make config # Connect to container make connectNote: The
--recurse-submodulesflag downloads the benchmark suite. If you already cloned without it, run:git submodule update --init --recursive
cd benchmark/standalone-xbow-benchmark-runner python3 run_benchmarks.py --range 1-1 --pattern-flagSee Benchmark Documentation for detailed usage.
| Command | Description |
|---|---|
make install | Build the Docker image |
make config | Configure API key (first-time setup) |
make connect | Connect to container (main entry point) |
make stop | Stop container (config persists) |
make clean-docker | Remove everything including config |
# Interactive TUI mode (default) pentestgpt --target 10.10.11.234 # Non-interactive mode pentestgpt --target 10.10.11.100 --non-interactive # With challenge context pentestgpt --target 10.10.11.50 --instruction "WordPress site, focus on plugin vulnerabilities"Keyboard Shortcuts: F1 Help | Ctrl+P Pause/Resume | Ctrl+Q Quit
PentestGPT supports routing requests to local LLM servers (LM Studio, Ollama, text-generation-webui, etc.) running on your host machine.
- Local LLM server with an OpenAI-compatible API endpoint
- LM Studio: Enable server mode (default port 1234)
- Ollama: Run
ollama serve(default port 11434)
# Configure PentestGPT for local LLM make config # Select option 4: Local LLM # Start your local LLM server on the host machine # Then connect to the container make connectEdit scripts/ccr-config-template.json to customize:
localLLM.api_base_url: Your LLM server URL (default:host.docker.internal:1234)localLLM.models: Available model names on your server- Router section: Which models handle which operations
| Route | Purpose | Default Model |
|---|---|---|
default | General tasks | openai/gpt-oss-20b |
background | Background operations | openai/gpt-oss-20b |
think | Reasoning-heavy tasks | qwen/qwen3-coder-30b |
longContext | Large context handling | qwen/qwen3-coder-30b |
webSearch | Web search operations | openai/gpt-oss-20b |
- Connection refused: Ensure your LLM server is running and listening on the configured port
- Docker networking: Use
host.docker.internal(notlocalhost) to access host services from Docker - Check CCR logs: Inside the container, run
cat /tmp/ccr.log
PentestGPT collects anonymous usage data to help improve the tool. This data is sent to our Langfuse project and includes:
- Session metadata (target type, duration, completion status)
- Tool execution patterns (which tools are used, not the actual commands)
- Flag detection events (that a flag was found, not the flag content)
No sensitive data is collected - command outputs, credentials, or actual flag values are never transmitted.
# Via command line flag pentestgpt --target 10.10.11.234 --no-telemetry # Via environment variable export LANGFUSE_ENABLED=falsePentestGPT includes 104 XBOW validation benchmarks for comprehensive testing and evaluation.
cd benchmark/standalone-xbow-benchmark-runner python3 run_benchmarks.py --range 1-10 --pattern-flag # Run benchmarks 1-10 python3 run_benchmarks.py --all --pattern-flag # Run all 104 benchmarks python3 run_benchmarks.py --retry-failed # Retry failed benchmarks python3 run_benchmarks.py --dry-run --range 1-5 # Preview without executingPentestGPT achieved an 86.5% success rate (90/104 benchmarks) on the XBOW validation suite:
- Cost: Average $1.11, Median $0.42 per successful benchmark
- Time: Average 6.1 minutes, Median 3.3 minutes per successful benchmark
- Success rates by difficulty:
- Level 1: 91.1%
- Level 2: 74.5%
- Level 3: 62.5%
For detailed benchmark results, analysis, and automated testing instructions, see the Benchmark Documentation.
The previous multi-LLM version (v0.15) supporting OpenAI, Gemini, Deepseek, and Ollama is archived in legacy/:
cd legacy && pip install -e . && pentestgpt --reasoning gpt-4oIf you use PentestGPT in your research, please cite our paper:
@inproceedings{299699, author = {Gelei Deng and Yi Liu and Víctor Mayoral-Vilches and Peng Liu and Yuekang Li and Yuan Xu and Tianwei Zhang and Yang Liu and Martin Pinzger and Stefan Rass}, title = {{PentestGPT}: Evaluating and Harnessing Large Language Models for Automated Penetration Testing}, booktitle = {33rd USENIX Security Symposium (USENIX Security 24)}, year = {2024}, isbn = {978-1-939133-44-1}, address = {Philadelphia, PA}, pages = {847--864}, url = {https://www.usenix.org/conference/usenixsecurity24/presentation/deng}, publisher = {USENIX Association}, month = aug }Distributed under the MIT License. See LICENSE.md for more information.
Disclaimer: This tool is for educational purposes and authorized security testing only. The authors do not condone any illegal use. Use at your own risk.
- Research supported by Quantstamp and NTU Singapore