conda create -n pvsm python=3.11 conda activate pvsm # Install torch torchvision based on your environment configurations pip install -r requirements.txtThere's a known issue of the current release of gsplat==1.5.3, so please install gsplat via source for now:
# Install gsplat from source pip install git+https://github.com/nerfstudio-project/gsplat.gitDownload DINOv3-ViT-B and place it under metric_checkpoints/;
Download our pre-trained model checkpoints:
After downloading, organize your checkpoints directory as follows:
metric_checkpoints/ ├── pvsm_finetuned_full.pt # Our trained full 24-layer model ├── pvsm_finetuned_small.pt # Our trained smaller 12-layer model ├── dinov3-vitb16-pretrain-lvd1689m # DINOv3 Checkpoint │ ├── config.json │ ├── LICENSE.md │ ├── model.safetensors │ ├── preprocessor_config.json │ └── README.md ├── imagenet-vgg-verydeep-19.mat # (Optional) for training └── map-anything # (Optional) for dataset generation ├── config.json ├── model.safetensors └── README.mdFor a quick interactive demo, please follow the instruction and unzip the downloaded example data (22.3 MB) to your local machine.
To launch the interactive web-based demo:
torchrun --nproc_per_node 1 --standalone viser_demo.py --config-name runs/pvsm_finetuned_smallThe demo will start a web server. Open your browser and navigate to the displayed URL to interact with the model.
System Requirements:
- Small model: ~2.5GB VRAM
- Full model: ~3.0GB VRAM
Note: The rendering quality in gsplat is compressed.
To run inference on a dataset:
python inference.py --config-name runs/pvsm_finetuned_smallOr for the full model:
python inference.py --config-name runs/pvsm_finetuned_fullTo train the model:
torchrun --nproc_per_node <num_gpus> train.py --config-name runs/pvsm_finetuned_smallConfiguration:
- Training configurations are located in
configs/runs/ - Model configurations are in
configs/model/ - Dataset configurations are in
configs/dataset/
API Keys: Before training, create configs/api_keys.yaml with your WandB API key:
wandb: YOUR_WANDB_KEYYou can use configs/api_keys_example.yaml as a template.
If you find this work useful in your research, please consider citing:
@article{wu_pvsm_2026, title={From Rays to Projections: Better Inputs for Feed-Forward View Synthesis}, author={Wu, Zirui and Jiang, Zeren and Oswald, Martin R. and Song, Jie}, journal={arxiv preprint arxiv:2601.05116}, year={2026} }This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See LICENSE.md for details.
This work is built upon LVSM's code base.
