Skip to content

snap-research/dfm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[NeurIPS 2025] Decomposable Flow Matching

Improving Progressive Generation with Decomposable Flow Matching

Project Page arXiv

Moayed Haji-Ali*, Willi Menapace*, Ivan Skorokhodov, Arpit Sahni, Sergey Tulyakov, Vicente Ordonez, Aliaksandr Siarohin

Snap Research & Rice University


πŸš€ Check Out Our Latest Work! πŸŽ₯πŸ”Š

One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers
Learn how dynamic compute allocation in DiTs can accelerate convergence by up to 2.5Γ— while enabling a single model to flexibly operate across a wide range of inference compute budgets. This code repo also contains the training and inference scripts for ELIT.

TL;DR: Decomposable Flow Matching (DFM) is a simple framework to progressively generate visual modalities scale-by-scale, achieving up to 50% faster convergence compared to Flow Matching. DFM applies flow matching independently at each level of a multi-scale representation (e.g., a Laplacian pyramid) in an end-to-end fashion, staying compatible with standard flow-matching pipelines while improving quality and convergence speed.


Disclaimer

This repo provides a reimplementation of DFM on top of SiT, following REPA setup. The architecture does not exactly follow the one used in the paper and results might be different. Below, we provide comparison between SiT and DFM produced using this repo.


Method Implementation

Decomposable Flow Matching (DFM) combines multiscale decomposition with Flow Matching. DFM progressively synthesizes different representation scales by generating the coarse-structure scale first and incrementally refining it with finer scales.

  • DFM Architecture: Modifies the DiT architecture to use per-scale patchification and timestep-embedding layers while keeping the core DiT architecture untouched.
  • DFM Training: Samples the stage count from a categorical distribution, draws each stage flow-timestep from a logit-normal distribution biased toward lower noise in early stages, and trains one DiT backbone to jointly predict all stage-wise velocities.
  • DFM Inference: Denoises the coarsest stage first for T₁ steps and activates next stages after the previous ones reach a predetermined per-stage noise threshold.

Experimental Results

Method FID sFID IS Precision Recall
SiT-XL/2 33.24 8.55 48.22 0.308 0.581
DFM-SiT-XL/2 18.27 6.51 85.50 0.452 0.557

Pretrained checkpoints of the above experiments will be released soon.


1. Environment Setup

conda create -n dfm python=3.9 -y conda activate dfm pip install -r requirements.txt

2. Dataset

2.1 Dataset Download

Download ImageNet. Then run the following processing and VAE latent extraction scripts.

# Convert raw ImageNet data to a ZIP archive at 256x256 resolution python dataset_tools.py convert \ --source=[YOUR_DOWNLOAD_PATH]/ILSVRC/Data/CLS-LOC/train \ --dest=[TARGET_PATH]/images \ --resolution=256x256 \ --transform=center-crop-dhariwal
# Convert the pixel data to VAE latents python dataset_tools.py encode \ --source=[TARGET_PATH]/images \ --dest=[TARGET_PATH]/vae-sd

Here, YOUR_DOWNLOAD_PATH is the directory where you downloaded the dataset, and TARGET_PATH is the directory where you will save the preprocessed images and corresponding compressed latent vectors. This directory will be used for your experiment scripts.


3. Training

Training uses the unified train.py script with YAML configuration files or CLI arguments. Update data_dir in the config to point to your data directory.

# From CLI args accelerate launch train.py --model [MODEL_NAME] --exp-name [EXP_NAME] --data-dir [DATA_DIR] # Or from YAML config accelerate launch train.py --config [CONFIG_PATH] --data-dir [DATA_DIR]

where [MODEL_NAME] can be specificed as SiT or DFM-SiT baselines (e.g SiT-XL/2 or DFM-SiT-XL/2)

Sample training configurations can be found in experiments/train

Example Training

# From CLI args accelerate launch train.py --model DFM-SiT-XL/2 --exp-name dfm-sit-xl-2-256px --data-dir [DATA_DIR] # Or from YAML config accelerate launch train.py --config experiments/train/dfm_sit_b_256.yaml --data-dir [DATA_DIR]

Key DFM Hyperparameters

The main DFM-specific options to adjust are:

Parameter Description Default
model Model architecture: SiT-B/2, SiT-XL/2, DFM-SiT-B/2, DFM-SiT-XL/2 etc. β€”
stages_count Number of stages in DFM 2
stage_weights Sampling weights of each stage during training [0.9, 0.1]
num_steps_per_scale Number of inference steps for each stage [40, 10]
stage_sampling_thresholds Noise threshold where next stage generation is initialized [0.1]

Please refer to the paper for guidelines on choosing DFM hyperparameters.


4. Sampling

Sampling uses the unified generate.py script with DDP:

4.1 SiT

# From CLI args torchrun --nproc_per_node=8 generate.py \ --model SiT-B/2 --ckpt exps/sit-b-2-256px/checkpoints/0400000.pt # Or from YAML config torchrun --nproc_per_node=8 generate.py \ --config experiments/generation/sit_b_256.yaml \ --ckpt exps/sit-b-2-256px/checkpoints/0400000.pt

4.2 DFM-SiT

# From CLI args torchrun --nproc_per_node=8 generate.py \ --model DFM-SiT-B/2 --ckpt exps/dfm-sit-b-2-256px/checkpoints/0400000.pt # Or from YAML config torchrun --nproc_per_node=8 generate.py \ --config experiments/generation/dfm_sit_b_256.yaml \ --ckpt exps/dfm-sit-b-2-256px/checkpoints/0400000.pt

5. Evaluation

We provide evaluation scripts in experiments/evaluation/ that generate samples and compute FID, sFID, IS, Precision, and Recall.

bash experiments/evaluation/eval_dfm_sit_b_256.sh

This will generate samples under the results/ directory and an .npz file which can be used for evaluation. To run the reference TensorFlow evaluation on ImageNet, we use the ADM evaluation suite.

Note: Please make sure that the model hyperparameters match the training ones and refer to the paper for guidelines on choosing DFM inference hyperparameters.


Acknowledgement

This code is mainly built upon REPA. We thank the authors for open-sourcing their codebase.


BibTeX

@article{dfm, title={Improving Progressive Generation with Decomposable Flow Matching}, author={Moayed Haji-Ali and Willi Menapace and Ivan Skorokhodov and Arpit Sahni  and Sergey Tulyakov and Vicente Ordonez and Aliaksandr Siarohin}, journal={arXiv preprint arXiv:2506.19839}, year={2025} }

About

[NeurIPS 2025] Improving Progressive Generation with Decomposable Flow Matching

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors