Create virtual environment, for example with conda:
conda create -n AdvAgent python=3.12.2 conda activate AdvAgent Install dependencies:
pip install -r requirements.txt Clone this repository:
git clone https://github.com/AI-secure/AdvAgent.git Set up OpenAI API key and other keys to the environment:
(Our pipeline supports attacking various large language models such as GPT, Gemini, and Claude. Here, we take attacking GPT as an example.)
export OPENAI_API_KEY=<YOUR_KEY> export HUGGING_FACE_HUB_TOKEN=<YOUR_KEY> We conduct experiments on the Mind2Web dataset and test our approach against the state-of-the-art web agent framework, SeeAct.
Download the source data Multimodal-Mind2Web from Hugging Face and store it in the path data/Multimodal-Mind2Web/data/.
Download the Seeact Source Data and store it in the path data/seeact_source_data/.
Run the notebook data_generation.ipynb to filter data from the source dataset and construct the training set and test set.
Run training_data_generation.sh to test the quality of the data in the training set and construct datasets for SFT and DPO.
After completing the Data Generation section, your file structure should look like this:
├──task_demo_-1_aug ├──attack_dataset.json ├──subset_test_data_aug │ ├── train.json │ ├── test.json │ ├── augmented_dataset.json │ ├── predictions │ │ ├── prediction-4api-augment-data.jsonl │ │ ├── augmented_dataset_correct.json │ │ └── prediction-4api-augment-data-correct.jsonl │ └── imgs │ └── f5da4b14-026d-4a10-ab89-f5720418f2b4_9016ffb6-7468-4495-ad07-756ac9f2af03.jpg └── together └── data └── sft_train_data.jsonl We fine-tune the model by calling Together AI's API. The basic training process is as follows (for more instructions, please refer to the Together AI docs):
Set up Together AI API key:
export TOGETHER_API_KEY=<YOUR_KEY> Upload training dataset:
together files upload "xxx.jsonl" Train the SFT model:
together fine-tuning create \ --training-file "file-xxx" \ --model "mistralai/Mistral-7B-Instruct-v0.2" \ --lora \ --batch-size 16 Download the SFT model:
together fine-tuning download "ft-xxx" You can store the SFT model in the path data/task_demo_-1_aug/together/new_models/.
Run dpo_training.sh to train the DPO model.
Select the best training model based on the training curve, and run dpo_model_merge.sh to merge the model.
Run evaluation.sh to evaluate the SFT and DPO models.
If you find this code useful, please cite our paper:
