Skip to content

ghas-results/BrowserGym

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

377 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BrowserGym banner

πŸ› οΈ Setup - πŸ‹ Usage - πŸ’» Demo - 🌐 Ecosystem - πŸš€ AgentLab - 🌟 Contributors - πŸ“„ Paper - πŸ“ Citation

pypi PyPI - License PyPI - Downloads GitHub star chart Code Format Tests

pip install browsergym

Warning

BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research. It is not meant to be a consumer product. Use with caution!

Tip

πŸš€ Check out AgentLab✨ ! A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.

4x4.grid.mp4

Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row).

BrowserGym includes the following benchmarks by default:

Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the AbstractBrowserTask class.

πŸ› οΈ Setup

To use browsergym, install one of the following packages:

pip install browsergym # (recommended) everything below pip install browsergym-experiments # experiment utilities (agent, loop, benchmarks) + everything below pip install browsergym-core # core functionalities only (no benchmark, just the openended task) pip install browsergym-miniwob # core + miniwob pip install browsergym-webarena # core + webarena pip install browsergym-visualwebarena # core + visualwebarena pip install browsergym-workarena # core + workarena pip install browsergym-assistantbench # core + assistantbench pip install weblinx-browsergym # core + weblinx

Then setup playwright by running

playwright install chromium

Finally, each benchmark comes with its own specific setup that requires to follow additional steps.

πŸ—οΈ Development setup

To install browsergym locally for development, use the following commands:

git clone git@github.com:ServiceNow/BrowserGym.git cd BrowserGym make install

Contributions are welcome! 😊

πŸ‹ Usage

Boilerplate code to run an agent on an interactive, open-ended task:

import gymnasium as gym import browsergym.core # register the openended task as a gym environment # start an openended environment env = gym.make( "browsergym/openended", task_kwargs={"start_url": "https://www.google.com/"}, # starting URL wait_for_user_message=True, # wait for a user message after each agent message sent to the chat ) # run the environment <> agent loop until termination obs, info = env.reset() while True: action = ... # implement your agent here obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: break # release the environment env.close()

MiniWoB

import gymnasium as gym import browsergym.miniwob # register miniwob tasks as gym environments # start a miniwob task env = gym.make("browsergym/miniwob.choose-list") ... # list all the available miniwob tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")] print("\n".join(env_ids))

WorkArena

import gymnasium as gym import browsergym.workarena # register workarena tasks as gym environments # start a workarena task env = gym.make("browsergym/workarena.servicenow.order-ipad-pro") ... # list all the available workarena tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] print("\n".join(env_ids))

WebArena

import gymnasium as gym import browsergym.webarena # register webarena tasks as gym environments # start a webarena task env = gym.make("browsergym/webarena.310") ... # list all the available webarena tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")] print("\n".join(env_ids))

VisualWebArena

import gymnasium as gym import browsergym.webarena # register webarena tasks as gym environments # start a visualwebarena task env = gym.make("browsergym/visualwebarena.721") ... # list all the available visualwebarena tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")] print("\n".join(env_ids))

AssistantBench

import gymnasium as gym import browsergym.workarena # register assistantbench tasks as gym environments # start an assistantbench task env = gym.make("browsergym/assistantbench.validation.3") ... # list all the available assistantbench tasks env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")] print("\n".join(env_ids))

πŸ’» Demo

If you want to experiment with a demo agent in BrowserGym, follow these steps

# conda setup conda env create -f demo_agent/environment.yml conda activate demo_agent # or pip setup pip install -r demo_agent/requirements.txt # then download the browser for playwright playwright install chromium

Our demo agent uses openai as a backend, be sure to set your OPENAI_API_KEY.

Launch the demo agent as follows

# openended (interactive chat mode) python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com # miniwob python demo_agent/run_demo.py --task_name miniwob.click-test # workarena python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop # webarena python demo_agent/run_demo.py --task_name webarena.4 # visualwebarena python demo_agent/run_demo.py --task_name visualwebarena.398

You can customize your experience by changing the model_name to your preferred LLM (it uses gpt-4o-mini by default), adding screenshots for your VLMs with use_screenshot, and much more!

python demo_agent/run_demo.py --help

🌐 Ecosystem

  • AgentLab: Seamlessly run agents on benchmarks, collect and analyse traces.
  • WorkArena(++): A benchmark for web agents on the ServiceNow platform.
  • WebArena: A benchmark of realistic web tasks on self-hosted domains.
  • VisualWebArena: A benchmark of realistic visual web tasks on self-hosted domains.
  • MiniWoB(++): A collection of over 100 web tasks on synthetic web pages.
  • WebLINX: A dataset of real-world web interaction traces.
  • AssistantBench: A benchmark of realistic and time-consuming tasks on the open web.
  • DoomArena: A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments.

🌟 Contributors

BrowserGym contributors

πŸ“ Citing This Work

Please use the two following bibtex entries if you wish to cite BrowserGym:

@article{ chezelles2025browsergym, title={The BrowserGym Ecosystem for Web Agent Research}, author={Thibault Le Sellier de Chezelles and Maxime Gasse and Alexandre Lacoste and Massimo Caccia and Alexandre Drouin and L{\'e}o Boisvert and Megh Thakkar and Tom Marty and Rim Assouel and Sahar Omidi Shayegan and Lawrence Keunho Jang and Xing Han L{\`u} and Ori Yoran and Dehan Kong and Frank F. Xu and Siva Reddy and Graham Neubig and Quentin Cappart and Russ Salakhutdinov and Nicolas Chapados}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2025}, url={https://openreview.net/forum?id=5298fKGmv3}, note={Expert Certification} } @inproceedings{workarena2024, title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?}, author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {11642--11662}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v235/drouin24a.html}, }

Here is an example of how they can be used:

We use the BrowserGym framework for our experiments \cite{workarena2024,chezelles2025browsergym}.

About

🌎πŸ’ͺ BrowserGym, a Gym environment for web task automation

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 88.3%
  • HTML 8.8%
  • JavaScript 2.5%
  • Makefile 0.4%