Skip to content

AI-secure/AdvAgent

Repository files navigation

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Image

Code for our paper AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Setup

Create virtual environment, for example with conda:

conda create -n AdvAgent python=3.12.2 conda activate AdvAgent 

Install dependencies:

pip install -r requirements.txt 

Clone this repository:

git clone https://github.com/AI-secure/AdvAgent.git 

Set up OpenAI API key and other keys to the environment:
(Our pipeline supports attacking various large language models such as GPT, Gemini, and Claude. Here, we take attacking GPT as an example.)

export OPENAI_API_KEY=<YOUR_KEY> export HUGGING_FACE_HUB_TOKEN=<YOUR_KEY> 

Data

We conduct experiments on the Mind2Web dataset and test our approach against the state-of-the-art web agent framework, SeeAct.

Download the source data Multimodal-Mind2Web from Hugging Face and store it in the path data/Multimodal-Mind2Web/data/.

Download the Seeact Source Data and store it in the path data/seeact_source_data/.

Run Demo

Data Generation

Construct the training set and test set

Run the notebook data_generation.ipynb to filter data from the source dataset and construct the training set and test set.

Build datasets for SFT and DPO

Run training_data_generation.sh to test the quality of the data in the training set and construct datasets for SFT and DPO.

After completing the Data Generation section, your file structure should look like this:

├──task_demo_-1_aug ├──attack_dataset.json ├──subset_test_data_aug │ ├── train.json │ ├── test.json │ ├── augmented_dataset.json │ ├── predictions │ │ ├── prediction-4api-augment-data.jsonl │ │ ├── augmented_dataset_correct.json │ │ └── prediction-4api-augment-data-correct.jsonl │ └── imgs │ └── f5da4b14-026d-4a10-ab89-f5720418f2b4_9016ffb6-7468-4495-ad07-756ac9f2af03.jpg └── together └── data └── sft_train_data.jsonl 

Model Training

SFT

We fine-tune the model by calling Together AI's API. The basic training process is as follows (for more instructions, please refer to the Together AI docs):
Set up Together AI API key:

export TOGETHER_API_KEY=<YOUR_KEY> 

Upload training dataset:

together files upload "xxx.jsonl" 

Train the SFT model:

together fine-tuning create \ --training-file "file-xxx" \ --model "mistralai/Mistral-7B-Instruct-v0.2" \ --lora \ --batch-size 16 

Download the SFT model:

together fine-tuning download "ft-xxx" 

You can store the SFT model in the path data/task_demo_-1_aug/together/new_models/.

DPO

Run dpo_training.sh to train the DPO model.
Select the best training model based on the training curve, and run dpo_model_merge.sh to merge the model.

Evaluation

Run evaluation.sh to evaluate the SFT and DPO models.

Citation

If you find this code useful, please cite our paper:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published