Large-Vocabulary 3D Diffusion Model with Transformer

Ziang Cao¹, Fangzhou Hong¹, Tong Wu^2,3, Liang Pan^1,3, Ziwei Liu¹

¹S-Lab, Nanyang Technological University ²The Chinese University of Hong Kong; ³Shanghai AI Laboratory

Paper | Project page | Video |

DiffTF can generate large-vocabulary 3D objects with rich semantics and realistic texture.

📖 For more visual results, go checkout our project page

Installation

Clone this repository and navigate to it in your terminal. Then run:

bash install_difftf.sh

This will install the related python package that the scripts depend on.

Preparing data

Training

I. Triplane fitting

1. Training the shared decoder

conda activate difftf export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 #Omniobject3D python -m torch.distributed.launch --nproc_per_node 8 ./Triplanerecon/train.py --config ./Triplanerecon/configs/omni/train.txt \\ --datadir ./dataset/Omniobject3D/renders \\# dataset path --basedir ./Checkpoint \\# basepath --expname omni_sharedecoder \\# the ckpt will save in ./Checkpoint/omni_sharedecoder #ShapeNet python -m torch.distributed.launch --nproc_per_node 8 ./Triplanerecon/train.py --config ./Triplanerecon/configs/shapenet_car/train.txt \\ --datadir ./dataset/ShapeNet/renders_car --basedir ./Checkpoint \\# basepath --expname shapenet_sharedecoder \\# the ckpt will save in ./Checkpoint/shapenet_car_sharedecoder

2. Triplane fitting

conda activate difftf #Omniobject3D python ./Triplanerecon/train_single_omni.py \\ --config ./Triplanerecon/configs/omni/train_single.txt \\ #config path --num_gpu 1 --idx 0 \\ #using 1gpu to fit triplanes  --datadir ./dataset/Omniobject3D/renders \\# dataset path --basedir ./Checkpoint \\# basepath --expname omni_triplane \\# triplanes will save in ./Checkpoint/omni_triplane --decoderdir ./Checkpoint/omni_sharedecoder/300000.tar # ckpt of shared decoder #ShapeNet python ./Triplanerecon/train_single_shapenet.py \\ --config ./Triplanerecon/configs/shapenet_car/train_single.txt \\ --num_gpu 1 --idx 0 \\ #using 1gpu to fit triplanes  --datadir ./dataset/ShapeNet/renders_car \\# dataset path --basedir ./Checkpoint \\# basepath --expname shapenet_triplane \\# triplanes will save in ./Checkpoint/shapenet_triplane --decoderdir ./Checkpoint/shapenet_sharedecoder/300000.tar # ckpt of shared decoder #Using 8 gpus bash multi_omni.sh 8 #Using 8 gpus bash multi_shapenet.sh 8

Note: We input the related hyperparameters and settings in the config files. You can find them in ./configs/shapenet or ./configs/omni.

3. Preparing triplane for diffusion

#preparing triplanes for training diffusion python ./Triplanerecon/extract.py --basepath ./Checkpoint/omni_triplane \\ # path of triplanes --mode omni \\ # name of dataset (omni or shapenet) --newpath ./Checkpoint/omni_triplane_fordiffusion #new path of triplanes

II. Training Diffusion

cd ./3dDiffusion export PYTHONPATH=$PWD:$PYTHONPATH conda activate difftf cd scripts python image_train.py --datasetdir ./Checkpoint/omni_triplane_fordiffusion #path to fitted triplanes  --expname difftf_omni #ckpt will save in ./Checkpoint/difftf_omni

You may also want to train in a distributed manner. In this case, run the same command with mpiexec:

mpiexec -n 8 python image_train.py --datasetdir ./Checkpoint/omni_triplane_fordiffusion #path to fitted triplanes --expname difftf_omni #ckpt will save in ./Checkpoint/difftf_omni

Note: Hyperparameters about training are set in image_train.py while hyperparameters about architecture are set in ./improved_diffusion/script_util.py.

Note: Our fitted triplane can be downloaded via this link.

Inference

I. Sampling triplane using trained diffusion

Our pre-trained model can be founded in difftf_checkpoint/omni

python image_sample.py \\ --model_path ./Checkpoint/difftf_omni/model.pt #checkpoint_path --num_samples=5000 --save_path ./Checkpoint/difftf_omni # path of the generated triplanes

II. Rendering triplane using shared decoder

Our pre-trained share decoder can be founded in difftf_checkpoint/triplane decoder.zip

python ddpm_vis.py --config ./configs/omni/ddpm.txt --ft_path ./Checkpoint/omni_triplane_fordiffusion/003000.tar #path of shared decoder --triplanepath ./Checkpoint/difftf_omni/samples_5000x18x256x256.npz # path of generated triplanes --basedir ./Checkpoint \\# basepath --expname ddpm_omni_vis \\# triplanes will save in ./Checkpoint/omni_triplane --mesh 0 \\# whether to save mesh --testvideo \\# whether to save all images using video python ddpm_vis.py --config ./configs/shapenet_car/ddpm.txt --ft_path ./Checkpoint/shapenet_car_triplane_fordiffusion/003000.tar #path of shared decoder --triplanepath ./Checkpoint/difftf_shapenet/samples_5000x18x256x256.npz # path of generated triplanes --basedir ./Checkpoint \\# basepath --expname ddpm_shapenet_vis \\# triplanes will save in ./Checkpoint/omni_triplane --mesh 0 \\# whether to save mesh --testvideo \\# whether to save all images using video

References

If you find DiffTF useful for your work please cite:

@article{cao2023large, title={Large-Vocabulary 3D Diffusion Model with Transformer}, author={Cao, Ziang and Hong, Fangzhou and Wu, Tong and Pan, Liang and Liu, Ziwei}, journal={arXiv preprint arXiv:2309.07920}, year={2023} }

Acknowledgement

The code is implemented based on improved-diffusion and nerf-pytorch. We would like to express our sincere thanks to the contributors.

🗞️ License

Distributed under the S-Lab License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
3dDiffusion		3dDiffusion
Triplanerecon		Triplanerecon
dataset		dataset
img		img
LICENCE		LICENCE
README.md		README.md
install_difftf.sh		install_difftf.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large-Vocabulary 3D Diffusion Model with Transformer

Installation

Preparing data

Training

I. Triplane fitting

1. Training the shared decoder

2. Triplane fitting

3. Preparing triplane for diffusion

II. Training Diffusion

Inference

I. Sampling triplane using trained diffusion

II. Rendering triplane using shared decoder

References

Acknowledgement

🗞️ License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Large-Vocabulary 3D Diffusion Model with Transformer

Installation

Preparing data

Training

I. Triplane fitting

1. Training the shared decoder

2. Triplane fitting

3. Preparing triplane for diffusion

II. Training Diffusion

Inference

I. Sampling triplane using trained diffusion

II. Rendering triplane using shared decoder

References

Acknowledgement

🗞️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages