| 💾 Code | 📄 Paper | 🌐 Website |
|---|---|---|
| 🤗 Dataset | 🤖 Models | 📦 PyPI |
Structured Distillation of Web Agent Capabilities Enables Generalization
Xing Han Lù, Siva Reddy
This repository contains the code for the A3 framework, which uses LLMs to systematically generate synthetic web agent training data by decomposing the annotation process into three roles: Task Designer, Annotator, and Supervisor.
pip install agent-as-annotatorsOr install from source:
git clone https://github.com/McGill-NLP/agent-as-annotators.git cd agent-as-annotators pip install -e .vllm serve --config configs/vllm/Qwen3.5-9B.yamla3-eval --benchmark webarena_test --model A3-qwen3.5-9bThe A3 pipeline generates synthetic training data in 5 steps:
python scripts/create_personas.pya3-explore python scripts/generate_task_intents.pypython scripts/create_synth_configs.pya3-synth --benchmark a3_synth --model gemini-3-propython scripts/convert_trajectories_to_json.py python scripts/generate_rft_data.pya3-train --config configs/train/qwen3.5-9b.jsonTraining uses SFT with FSDP for multi-GPU parallelism. See configs/train/ for hyperparameters and configs/accelerate/ for FSDP configuration.
| Command | Description |
|---|---|
a3-eval | Run evaluation on WebArena, VisualWebArena, WorkArena, MiniWoB |
a3-synth | Run trajectory collection for A3-Synth |
a3-explore | Run environment exploration |
a3-train | Fine-tune a model with SFT |
a3-screen-utils | Screen session management utilities |
agent-as-annotators/ agent_as_annotators/ # Core package cli/ # CLI entry points (eval, synth, explore, train) modeling.py # Agent model wrapper (vLLM, Gemini, OpenAI) prompts/ # All prompt templates judge/ # Inverted evaluation protocol (Judge module) benchmarks/a3_synth/ # A3-Synth benchmark registration exploration/ # Exploration task registration utils/ # Utilities configs/a3_synth/ # A3-Synth task configurations configs/ model_configs.json # Model registry train/ # Training hyperparameters vllm/ # vLLM serving configs accelerate/ # FSDP configs scripts/ # Data pipeline scripts