GitHub - sleeepeer/PIArena: [To appear in ACL 2026] PIArena: A Platform for Prompt Injection Evaluation

A Platform for Prompt Injection Evaluation

PIArena is an easy-to-use toolbox and also a comprehensive benchmark for researching prompt injection attacks and defenses. It provides:

Plug-and-play Attacks & Defenses – Easily integrate state-of-the-art defenses into your workflow to protect your LLM system against prompt injection attacks. You can also play with existing attack strategies to perform a better research.
Systematic Evaluation Benchmark – End-to-end evaluation pipeline enables you to easily evaluate attacks / defenses on various datasets.
Add Your Own – You can also easily integrate your own attack or defense into our benchmark to systematically assess how well it perform.

News

[4/9/2026] 🎉 PIArena is accepted to ACL 2026 Main, see you in San Diego!

📝 Quick Start

⚙️ Installation

Clone the project and setup python environment:

git clone git@github.com:sleeepeer/PIArena.git cd PIArena conda create -n piarena python=3.10 -y conda activate piarena pip install -r requirements.txt pip install --upgrade setuptools pip pip install -e . # Install piarena as an editable package

Login to HuggingFace 🤗 with your HuggingFace Access Token, you can find it at this link:

huggingface-cli login

📌 Ready-to-use Tools

You can simply import attacks and defenses and integrate them into your own code. Please see details in Attack docs and Defense docs.

from piarena.attacks import get_attack from piarena.defenses import get_defense from piarena.llm import Model llm = Model("Qwen/Qwen3-4B-Instruct-2507") defense = get_defense("promptguard") attack = get_attack("combined")

📈 Run Evaluation

Use main.py to run the benchmark:

# Using CLI arguments python main.py --dataset squad_v2 --attack direct --defense none # Using a YAML config file python main.py --config configs/experiments/my_experiment.yaml # Run many experiments in parallel across GPUs # Edit the configuration section in scripts/run.py to set GPUs, datasets, attacks, defenses # The scheduler automatically assigns jobs to the least-loaded GPU python scripts/run.py

Available Datasets: Please see HuggingFace/PIArena.

Available Attacks:

none - No attack (baseline)
direct - Directly attack using injected prompt (default)
combined - Formalizing and Benchmarking Prompt Injection Attacks and Defenses
ignore - Ignore Previous Prompt: Attack Techniques For Language Models
completion - Prompt injection attacks against GPT-3
character - Delimiters won’t save you from prompt injection
nanogcg - GCG and nanoGCG
tap - TAP: A Query-Efficient Method for Jailbreaking Black-Box LLMs
pair - PAIR: Jailbreaking black box large language models in twenty queries
strategy_search - Strategy search attack based on defense feedback introduced in PIArena.

Available Defenses:

none - No defense (baseline, default)
datasentinel - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
attentiontracker - Attention Tracker: Detecting Prompt Injection Attacks in LLMs
piguard - PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free
promptguard - Meta Prompt Guard
secalign - SecAlign: Defending Against Prompt Injection with Preference Optimization (uses Meta-SecAlign model)
promptlocate - PromptLocate: Localizing Prompt Injection Attacks
promptarmor - PromptArmor: Simple yet Effective Prompt Injection Defenses
pisanitizer - PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
datafilter - Defending Against Prompt Injection with DataFilter

🔍 Search-based Attacks

PIArena supports search-based attacks (PAIR, TAP, Strategy Search) that iteratively refine injected prompts using an attack LLM. Use main_search.py for these attacks:

# --attack can be tap, pair, strategy_search python main_search.py --dataset squad_v2 --attack strategy_search --defense datafilter \ --backend_llm Qwen/Qwen3-4B-Instruct-2507 --attacker_llm Qwen/Qwen3-4B-Instruct-2507 # Run many search experiments in parallel # Edit scripts/run_search.py to configure GPUs, attacks, defenses, datasets python scripts/run_search.py

See Strategy Search for details.

🔍 Reinforcement Learning-based Attacks

Building upon PIArena (including defenses and benchmarks), this repository provides the code for PISmith, a reinforcement learning-based framework for red teaming prompt injection defenses.

🤖 Agent Benchmarks

PIArena also supports agentic benchmarks: InjecAgent, AgentDojo and AgentDyn.

Setup Agent Benchmarks

# AgentDojo / AgentDyn cd agents/agentdojo && pip install -e . && cd ../..

InjecAgent Evaluation

python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none

AgentDojo / AgentDyn Evaluation

# Original AgentDojo suite with OpenAI API export OPENAI_API_KEY="Your API Key Here" python main_agentdojo.py --model gpt-5-mini --attack none --suite workspace # Original AgentDojo suite with a PIArena defense python main_agentdojo.py --model meta-llama/Llama-3.1-8B-Instruct --attack tool_knowledge --defense datafilter --suite workspace # Merged AgentDyn suite with a PIArena defense python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense datafilter --suite shopping # Benchmark-native defense from the merged AgentDojo / AgentDyn tree python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense prompt_guard_2_detector --suite shopping

The same main_agentdojo.py entrypoint is used for both benchmark families:

AgentDojo suites: workspace, slack, travel, banking
AgentDyn suites: shopping, github, dailylife

PIArena integrates defenses to work in AgentDojo and AgentDyn. Benchmark-native defenses such as tool_filter, repeat_user_prompt, piguard_detector, and prompt_guard_2_detector are also available through the same runner.

🙋🏻‍♀️ Add your own attacks / defenses

Please see Extending PIArena for full details.

Citation

If you find our paper or the code useful, please kindly cite the following paper:

@article{geng2026piarena, title={PIArena: A Platform for Prompt Injection Evaluation}, author={Geng, Runpeng and Yin, Chenlong and Wang, Yanting and Chen, Ying and Jia, Jinyuan}, journal={arXiv preprint arXiv:2604.08499}, year={2026} }

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
agents		agents
assets		assets
configs		configs
datasets		datasets
docs		docs
piarena		piarena
scripts		scripts
website		website
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_agentdojo.py		main_agentdojo.py
main_injecagent.py		main_injecagent.py
main_search.py		main_search.py
print_results.ipynb		print_results.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Platform for Prompt Injection Evaluation

News

Table of Contents

📝 Quick Start

⚙️ Installation

📌 Ready-to-use Tools

📈 Run Evaluation

🔍 Search-based Attacks

🔍 Reinforcement Learning-based Attacks

🤖 Agent Benchmarks

Setup Agent Benchmarks

InjecAgent Evaluation

AgentDojo / AgentDyn Evaluation

🙋🏻‍♀️ Add your own attacks / defenses

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Platform for Prompt Injection Evaluation

News

Table of Contents

📝 Quick Start

⚙️ Installation

📌 Ready-to-use Tools

📈 Run Evaluation

🔍 Search-based Attacks

🔍 Reinforcement Learning-based Attacks

🤖 Agent Benchmarks

Setup Agent Benchmarks

InjecAgent Evaluation

AgentDojo / AgentDyn Evaluation

🙋🏻‍♀️ Add your own attacks / defenses

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages