This directory contains utility scripts for the LiveBench project.
This section is for the E2B backend (default provider:
CODE_SANDBOX_PROVIDER=e2b). Default BoxLite backend dependency:pip install "boxlite[sync]>=0.6.0".
The build_e2b_template.py script creates a custom E2B sandbox environment with preinstalled Python packages that are commonly needed for GDPVal tasks. Use this when running E2B (CODE_SANDBOX_PROVIDER=e2b, default). This eliminates the ModuleNotFoundError issues that agents frequently encounter when trying to create documents, spreadsheets, presentations, and PDFs.
Analysis of agent terminal logs (in livebench/data/agent_data/GLM-4.7-test/terminal_logs/) revealed frequent failures due to missing Python packages:
ModuleNotFoundError: No module named 'pptx'ModuleNotFoundError: No module named 'reportlab'ModuleNotFoundError: No module named 'docx'- And many others...
Agents would attempt to install these packages at runtime using pip install, but this:
- Wastes tokens and time
- Doesn't persist across sandbox instances
- Sometimes fails due to network or permission issues
Create a custom E2B sandbox template with all commonly needed packages preinstalled. This is based on:
-
GDPVal Task Analysis (220 tasks from
gdpval/data/train-00000-of-00001.parquet):- Word/DOCX: 126 tasks (57%)
- PDF: 92 tasks (42%)
- Excel/XLSX: 81 tasks (37%)
- Charts/Visualization: 39 tasks (18%)
- PowerPoint/PPT: 34 tasks (15%)
-
Agent Terminal Logs Analysis (from
livebench/data/agent_data/*/terminal_logs/):- Identified actual package import failures
- Confirmed which packages are most frequently needed
The custom template includes 19 packages:
Document Creation:
python-docx- Word documentspython-pptx- PowerPoint presentationsreportlab- PDF generationPyPDF2- PDF reading/manipulation
Spreadsheets:
openpyxl- Excel .xlsx filesxlsxwriter- Excel writingxlrd- Excel .xls reading
Data Manipulation:
pandas- Data analysisnumpy- Numerical computing
Visualization:
matplotlib- Charts and graphsseaborn- Statistical visualizationsplotly- Interactive visualizations
Utilities:
pillow- Image processingrequests- HTTP requestsbeautifulsoup4- HTML parsinglxml- XML processingpython-dateutil- Date/time utilitiestabulate- Table formattingpyyaml- YAML parsing
See what packages will be installed:
python scripts/build_e2b_template.py --list-packagesTest the configuration without building:
export E2B_API_KEY=your_api_key_here python scripts/build_e2b_template.py --dry-runGenerate the template files (Dockerfile, e2b.toml, build script, README):
export E2B_API_KEY=your_api_key_here python scripts/build_e2b_template.py --alias gdpval-workspaceThis creates files in e2b-templates/gdpval-workspace/:
Dockerfile- Template definitione2b.toml- E2B configurationbuild.sh- Build scriptREADME.md- Template documentation
Two options:
Option A: Using the build script
cd e2b-templates/gdpval-workspace ./build.shOption B: Manual build
# Install E2B CLI npm install -g @e2b/cli # Login e2b login # Build cd e2b-templates/gdpval-workspace e2b template buildOption C: Via E2B Dashboard
- Go to https://e2b.dev/dashboard
- Create new template
- Upload the Dockerfile or paste its contents
- Build and get the template ID
After building, update your .env file:
CODE_SANDBOX_PROVIDER=e2b E2B_TEMPLATE_ID=<your-new-template-id>Then restart your LiveBench agent.
After building your template, test that all packages are installed:
# Test with default template ID from environment export E2B_API_KEY=your_api_key_here export E2B_TEMPLATE_ID=your_template_id_here python scripts/test_e2b_template.py # Or specify template ID directly python scripts/test_e2b_template.py --template-id tpl_abc123xyzThe test script will:
- Create a sandbox with your template
- Try importing all 19 packages
- Report which packages work and which fail
- Exit with success/failure status
usage: build_e2b_template.py [-h] [--alias ALIAS] [--dry-run] [--list-packages] Build E2B custom sandbox template with preinstalled packages optional arguments: -h, --help show this help message and exit --alias ALIAS Template alias name (default: gdpval-workspace) --dry-run Print what would be done without actually building --list-packages List packages that would be installed and exit usage: test_e2b_template.py [-h] [--template-id TEMPLATE_ID] Test E2B custom template package availability optional arguments: -h, --help show this help message and exit --template-id TEMPLATE_ID E2B template ID to test (defaults to E2B_TEMPLATE_ID env var) - Dockerfile: Extends
e2bdev/code-interpreter:latestwith preinstalled packages - e2b.toml: E2B template configuration
- build.sh: Convenience script for building the template
- README.md: Template-specific documentation
Install or reinstall BoxLite sync extras:
pip install "boxlite[sync]>=0.6.0"Get your API key from https://e2b.dev/dashboard and set it:
export E2B_API_KEY=your_api_key_hereOr add it to your .env file.
Install the E2B CLI:
npm install -g @e2b/cli- Edit
scripts/build_e2b_template.py - Add packages to the
get_required_packages()function - Re-run the script to regenerate template files
- Rebuild the template
- E2B Documentation: https://e2b.dev/docs
- E2B Custom Templates: https://e2b.dev/docs/quickstart/install-custom-packages
- E2B Dashboard: https://e2b.dev/dashboard
livebench/tools/productivity/code_execution_sandbox.py- Supports parallel E2B/BoxLite backends (e2bdefault)explore_gdpval.py- Explores GDPVal task datagdpval/data/train-00000-of-00001.parquet- GDPVal task dataset