AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Code for our paper AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Setup

Create virtual environment, for example with conda:

conda create -n AdvAgent python=3.12.2 conda activate AdvAgent

Install dependencies:

pip install -r requirements.txt

Clone this repository:

git clone https://github.com/AI-secure/AdvAgent.git

Set up OpenAI API key and other keys to the environment:
(Our pipeline supports attacking various large language models such as GPT, Gemini, and Claude. Here, we take attacking GPT as an example.)

export OPENAI_API_KEY=<YOUR_KEY> export HUGGING_FACE_HUB_TOKEN=<YOUR_KEY>

Data

We conduct experiments on the Mind2Web dataset and test our approach against the state-of-the-art web agent framework, SeeAct.

Download the source data Multimodal-Mind2Web from Hugging Face and store it in the path data/Multimodal-Mind2Web/data/.

Download the Seeact Source Data and store it in the path data/seeact_source_data/.

Run Demo

Data Generation

Construct the training set and test set

Run the notebook data_generation.ipynb to filter data from the source dataset and construct the training set and test set.

Build datasets for SFT and DPO

Run training_data_generation.sh to test the quality of the data in the training set and construct datasets for SFT and DPO.

After completing the Data Generation section, your file structure should look like this:

├──task_demo_-1_aug ├──attack_dataset.json ├──subset_test_data_aug │ ├── train.json │ ├── test.json │ ├── augmented_dataset.json │ ├── predictions │ │ ├── prediction-4api-augment-data.jsonl │ │ ├── augmented_dataset_correct.json │ │ └── prediction-4api-augment-data-correct.jsonl │ └── imgs │ └── f5da4b14-026d-4a10-ab89-f5720418f2b4_9016ffb6-7468-4495-ad07-756ac9f2af03.jpg └── together └── data └── sft_train_data.jsonl

Model Training

SFT

We fine-tune the model by calling Together AI's API. The basic training process is as follows (for more instructions, please refer to the Together AI docs):
Set up Together AI API key:

export TOGETHER_API_KEY=<YOUR_KEY>

Upload training dataset:

together files upload "xxx.jsonl"

Train the SFT model:

together fine-tuning create \ --training-file "file-xxx" \ --model "mistralai/Mistral-7B-Instruct-v0.2" \ --lora \ --batch-size 16

Download the SFT model:

together fine-tuning download "ft-xxx"

You can store the SFT model in the path data/task_demo_-1_aug/together/new_models/.

DPO

Run dpo_training.sh to train the DPO model.
Select the best training model based on the training curve, and run dpo_model_merge.sh to merge the model.

Evaluation

Run evaluation.sh to evaluate the SFT and DPO models.

Citation

If you find this code useful, please cite our paper:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Setup

Data

Run Demo

Data Generation

Construct the training set and test set

Build datasets for SFT and DPO

Model Training

SFT

DPO

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
SeeAct		SeeAct
data		data
dpo		dpo
outputs		outputs
.gitignore		.gitignore
README.md		README.md
data_generation.ipynb		data_generation.ipynb
dpo_model_merge.sh		dpo_model_merge.sh
dpo_training.sh		dpo_training.sh
evaluation.sh		evaluation.sh
pipe_inference.png		pipe_inference.png
requirements.txt		requirements.txt
training_data_generation.sh		training_data_generation.sh

AI-secure/AdvAgent

Folders and files

Latest commit

History

Repository files navigation

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Setup

Data

Run Demo

Data Generation

Construct the training set and test set

Build datasets for SFT and DPO

Model Training

SFT

DPO

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages