Video Depth Anything ~ Livestream

Additions:

Remote Inference on camera stream (unstable and insecure, but functional!)
VR Compatable frontend for experimental Telepresence in threejs
Signaling server for establishing communication between inference and client

First pull model weights

bash get_weights.sh

Run Automatic VENV Setup + Flask Output of Depth Estimation from Camera Stream

python3 auto.py

Run Automatic Optimized Camera Stream Estimation and Output

python3 auto_fast.py

Run Remote Inference Capable Server Demo

Create a Glitch Account
Create a Project
Upload the contents of the signaling folder
Replace the signaling server found in the server.py on line 278
Replace the server link on line 19 in client.js with the same url
Upload the contents of the Client folder to a new project
Run the following and visit the url of the client glitch project to connect and facilitate remote inference on your local camera video stream

python3 server.py

ORIGINAL REPOSITORY README CONTENT

Sili Chen · Hengkai Guo^† · Shengnan Zhu · Feihu Zhang
Zilong Huang · Jiashi Feng · Bingyi Kang^†
ByteDance
†Corresponding author

This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability. Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher consistent depth accuracy.

News

2025-02-08: Enable autocast inference. Support grayscale video, NPZ and EXR output formats.
2025-01-21: Paper, project page, code, models, and demo are all released.

Release Notes

2025-02-08: 🚀🚀🚀 Inference speed and memory usage improvement

Model Latency (ms) GPU VRAM (GB)

FP32 FP16 FP32 FP16

Video-Depth-Anything-V2-Small 9.1 7.5 7.3 6.8

Video-Depth-Anything-V2-Large 67 14 26.7 23.6

The Latency and GPU VRAM results are obtained on a single A100 GPU with input of shape 1 x 32 x 518 × 518.

Pre-trained Models

We provide two models of varying scales for robust and consistent video depth estimation:

Model	Params	Checkpoint
Video-Depth-Anything-V2-Small	28.4M	Download
Video-Depth-Anything-V2-Large	381.8M	Download

Usage

Preparation

git clone https://github.com/DepthAnything/Video-Depth-Anything cd Video-Depth-Anything pip install -r requirements.txt

Download the checkpoints listed here and put them under the checkpoints directory.

bash get_weights.sh

Inference a video

python3 run.py --input_video ./assets/example_videos/davis_rollercoaster.mp4 --output_dir ./outputs --encoder vitl

Options:

--input_video: path of input video
--output_dir: path to save the output results
--input_size (optional): By default, we use input size 518 for model inference.
--max_res (optional): By default, we use maximum resolution 1280 for model inference.
--encoder (optional): vits for Video-Depth-Anything-V2-Small, vitl for Video-Depth-Anything-V2-Large.
--max_len (optional): maximum length of the input video, -1 means no limit
--target_fps (optional): target fps of the input video, -1 means the original fps
--fp32 (optional): Use fp32 precision for inference. By default, we use fp16.
--grayscale (optional): Save the grayscale depth map, without applying color palette.
--save_npz (optional): Save the depth map in npz format.
--save_exr (optional): Save the depth map in exr format.

Citation

If you find this project useful, please consider citing:

@article{video_depth_anything, title={Video Depth Anything: Consistent Depth Estimation for Super-Long Videos}, author={Chen, Sili and Guo, Hengkai and Zhu, Shengnan and Zhang, Feihu and Huang, Zilong and Feng, Jiashi and Kang, Bingyi} journal={arXiv:2501.12375}, year={2025} }

LICENSE

Video-Depth-Anything-Small model is under the Apache-2.0 license. Video-Depth-Anything-Large model is under the CC-BY-NC-4.0 license. For business cooperation, please send an email to Hengkai Guo at guohengkaighk@gmail.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Depth Anything ~ Livestream

Additions:

First pull model weights

Run Automatic VENV Setup + Flask Output of Depth Estimation from Camera Stream

Run Automatic Optimized Camera Stream Estimation and Output

Run Remote Inference Capable Server Demo

ORIGINAL REPOSITORY README CONTENT

News

Release Notes

Pre-trained Models

Usage

Preparation

Inference a video

Citation

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
client		client
signaling		signaling
utils		utils
video_depth_anything		video_depth_anything
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
auto.py		auto.py
auto_fast.py		auto_fast.py
auto_remote.py		auto_remote.py
get_weights.sh		get_weights.sh
requirements.txt		requirements.txt
run.py		run.py
server.py		server.py

Model	Latency (ms)		GPU VRAM (GB)
Model	FP32	FP16	FP32	FP16
Video-Depth-Anything-V2-Small	9.1	7.5	7.3	6.8
Video-Depth-Anything-V2-Large	67	14	26.7	23.6

Folders and files

Latest commit

History

Repository files navigation

Video Depth Anything ~ Livestream

Additions:

First pull model weights

Run Automatic VENV Setup + Flask Output of Depth Estimation from Camera Stream

Run Automatic Optimized Camera Stream Estimation and Output

Run Remote Inference Capable Server Demo

ORIGINAL REPOSITORY README CONTENT

News

Release Notes

Pre-trained Models

Usage

Preparation

Inference a video

Citation

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages