Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Scripts Directory

This directory contains utility scripts for the LiveBench project.

E2B Custom Sandbox Template Builder

This section is for the E2B backend (default provider: CODE_SANDBOX_PROVIDER=e2b). Default BoxLite backend dependency: pip install "boxlite[sync]>=0.6.0".

Overview

The build_e2b_template.py script creates a custom E2B sandbox environment with preinstalled Python packages that are commonly needed for GDPVal tasks. Use this when running E2B (CODE_SANDBOX_PROVIDER=e2b, default). This eliminates the ModuleNotFoundError issues that agents frequently encounter when trying to create documents, spreadsheets, presentations, and PDFs.

Problem

Analysis of agent terminal logs (in livebench/data/agent_data/GLM-4.7-test/terminal_logs/) revealed frequent failures due to missing Python packages:

  • ModuleNotFoundError: No module named 'pptx'
  • ModuleNotFoundError: No module named 'reportlab'
  • ModuleNotFoundError: No module named 'docx'
  • And many others...

Agents would attempt to install these packages at runtime using pip install, but this:

  1. Wastes tokens and time
  2. Doesn't persist across sandbox instances
  3. Sometimes fails due to network or permission issues

Solution

Create a custom E2B sandbox template with all commonly needed packages preinstalled. This is based on:

  1. GDPVal Task Analysis (220 tasks from gdpval/data/train-00000-of-00001.parquet):

    • Word/DOCX: 126 tasks (57%)
    • PDF: 92 tasks (42%)
    • Excel/XLSX: 81 tasks (37%)
    • Charts/Visualization: 39 tasks (18%)
    • PowerPoint/PPT: 34 tasks (15%)
  2. Agent Terminal Logs Analysis (from livebench/data/agent_data/*/terminal_logs/):

    • Identified actual package import failures
    • Confirmed which packages are most frequently needed

Preinstalled Packages

The custom template includes 19 packages:

Document Creation:

  • python-docx - Word documents
  • python-pptx - PowerPoint presentations
  • reportlab - PDF generation
  • PyPDF2 - PDF reading/manipulation

Spreadsheets:

  • openpyxl - Excel .xlsx files
  • xlsxwriter - Excel writing
  • xlrd - Excel .xls reading

Data Manipulation:

  • pandas - Data analysis
  • numpy - Numerical computing

Visualization:

  • matplotlib - Charts and graphs
  • seaborn - Statistical visualizations
  • plotly - Interactive visualizations

Utilities:

  • pillow - Image processing
  • requests - HTTP requests
  • beautifulsoup4 - HTML parsing
  • lxml - XML processing
  • python-dateutil - Date/time utilities
  • tabulate - Table formatting
  • pyyaml - YAML parsing

Usage

1. List Packages

See what packages will be installed:

python scripts/build_e2b_template.py --list-packages

2. Dry Run

Test the configuration without building:

export E2B_API_KEY=your_api_key_here python scripts/build_e2b_template.py --dry-run

3. Create Template Files

Generate the template files (Dockerfile, e2b.toml, build script, README):

export E2B_API_KEY=your_api_key_here python scripts/build_e2b_template.py --alias gdpval-workspace

This creates files in e2b-templates/gdpval-workspace/:

  • Dockerfile - Template definition
  • e2b.toml - E2B configuration
  • build.sh - Build script
  • README.md - Template documentation

4. Build the Template

Two options:

Option A: Using the build script

cd e2b-templates/gdpval-workspace ./build.sh

Option B: Manual build

# Install E2B CLI npm install -g @e2b/cli # Login e2b login # Build cd e2b-templates/gdpval-workspace e2b template build

Option C: Via E2B Dashboard

  1. Go to https://e2b.dev/dashboard
  2. Create new template
  3. Upload the Dockerfile or paste its contents
  4. Build and get the template ID

5. Update Configuration

After building, update your .env file:

CODE_SANDBOX_PROVIDER=e2b E2B_TEMPLATE_ID=<your-new-template-id>

Then restart your LiveBench agent.

Testing the Template

After building your template, test that all packages are installed:

# Test with default template ID from environment export E2B_API_KEY=your_api_key_here export E2B_TEMPLATE_ID=your_template_id_here python scripts/test_e2b_template.py # Or specify template ID directly python scripts/test_e2b_template.py --template-id tpl_abc123xyz

The test script will:

  1. Create a sandbox with your template
  2. Try importing all 19 packages
  3. Report which packages work and which fail
  4. Exit with success/failure status

Script Options

build_e2b_template.py

usage: build_e2b_template.py [-h] [--alias ALIAS] [--dry-run] [--list-packages] Build E2B custom sandbox template with preinstalled packages optional arguments: -h, --help show this help message and exit --alias ALIAS Template alias name (default: gdpval-workspace) --dry-run Print what would be done without actually building --list-packages List packages that would be installed and exit 

test_e2b_template.py

usage: test_e2b_template.py [-h] [--template-id TEMPLATE_ID] Test E2B custom template package availability optional arguments: -h, --help show this help message and exit --template-id TEMPLATE_ID E2B template ID to test (defaults to E2B_TEMPLATE_ID env var) 

Files Generated

  • Dockerfile: Extends e2bdev/code-interpreter:latest with preinstalled packages
  • e2b.toml: E2B template configuration
  • build.sh: Convenience script for building the template
  • README.md: Template-specific documentation

Troubleshooting

"SyncCodeBox import failed"

Install or reinstall BoxLite sync extras:

pip install "boxlite[sync]>=0.6.0"

"E2B_API_KEY environment variable is not set"

Get your API key from https://e2b.dev/dashboard and set it:

export E2B_API_KEY=your_api_key_here

Or add it to your .env file.

"e2b CLI not found"

Install the E2B CLI:

npm install -g @e2b/cli

Adding More Packages

  1. Edit scripts/build_e2b_template.py
  2. Add packages to the get_required_packages() function
  3. Re-run the script to regenerate template files
  4. Rebuild the template

References

Related Files

  • livebench/tools/productivity/code_execution_sandbox.py - Supports parallel E2B/BoxLite backends (e2b default)
  • explore_gdpval.py - Explores GDPVal task data
  • gdpval/data/train-00000-of-00001.parquet - GDPVal task dataset