CausalVideoAnnotator is designed to support researchers and practitioners in labeling videos with causal structures, beyond conventional frame-level tags. It enables annotators to capture attibutes and affordances by linking objects in a structured format. The toolkit is tailored for building high-quality datasets for causal reasoning, vision-language-action models, and embodied AI research. This toolkit was initiated by Yushun Xiang and Dingxiang Luo.
First, download the required model checkpoints. SAM-2.1 (or SAM-2) checkpoints can be obtained by running:
chmod +x hfd.sh ./hfd.sh facebook/sam2.1-hiera-large --local-dir facebook/sam2.1-hiera-large Tips:
- To use the Huggingface Model Downloader, make sure aria2 is installed beforehand.
Create a directory for your videos and place the files to be annotated inside:
mkdir videosThen move the videos you want to annotate into the videos folder.
This project use uv to manage the Python virtual environment.
uv sync uv run backend.pyhttps://localhost:8000 backend.py: FastAPI app entrypoint; serves/staticand API.src/: Python modules (sam2_pipeline.py,utils.py, components, views).static/: Frontend assets (index.html,*.js,styles.css).videos/andannotations/: Created at runtime; local storage for media and results. Place videos undervideos/<task>/<view>/file.mp4.- Other helpers:
transfer_video_files.py,utils.md,tools.txt.
- Create env (Python 3.10+):
python -m venv .venv && source .venv/bin/activate - Install deps (minimal):
pip install fastapi uvicorn opencv-python numpy torch torchvision Pillow python-multipart- Optional models/tools: SAM2/ByteTrack as noted in
README.md.
- Optional models/tools: SAM2/ByteTrack as noted in
- Run backend:
python backend.py(serves athttp://localhost:8000, static at/static). - Alternative run:
uvicorn backend:app --reload(from repo root).
- Python: PEP 8, 4‑space indent, type hints where practical. Keep functions small and pure in
src/. - JavaScript: camelCase for identifiers; keep files lowercase with hyphens when multiword (e.g.,
file-browser.js). - Paths: use
Path/os.pathand never assume absolute paths undervideos/orannotations/. - Docstrings/comments: concise, explain why over what.
- Framework: pytest (suggested). Place tests under
tests/, nametest_*.py. - Quick run:
pytest -q(afterpip install pytest). - Aim for unit tests around
src/utils.pyand integration tests hitting FastAPI routes withhttpx/TestClient.
- Commits: imperative mood, concise subject. Prefer prefix tags when relevant:
[feat],[fix],[chore]. Keep one logical change per commit. - PRs: include summary, rationale, screenshots for UI changes, and reproduction steps. Link related issues and list test coverage or manual checks.
- Large models (SAM/trackers) are optional but GPU‑recommended; document your setup.
- Backend enforces listing under
videos/; keep inputs within that tree to avoid traversal. - Do not commit large media or model weights; use
.gitignoreand external storage.

