Skip to content

[docs] Add NeMo Automodel training guide#13306

Draft
pthombre wants to merge 5 commits intohuggingface:mainfrom
pthombre:automodel_docs
Draft

[docs] Add NeMo Automodel training guide#13306
pthombre wants to merge 5 commits intohuggingface:mainfrom
pthombre:automodel_docs

Conversation

@pthombre
Copy link


What does this PR do?

Adds a new documentation page for NeMo Automodel, NVIDIA's PyTorch DTensor-native training library for fine-tuning and pretraining
diffusion models at scale. NeMo Automodel integrates directly with Diffusers — it loads pretrained models from the Hugging Face Hub using
Diffusers model classes and generates outputs via Diffusers pipelines with no checkpoint conversion needed.

The new guide covers:

  • Supported models (Wan 2.1, FLUX.1-dev, HunyuanVideo 1.5)
  • Installation
  • Data preparation and preprocessing
  • Training configuration (annotated YAML reference)
  • Single-node and multi-node training launch
  • Generation / inference with fine-tuned checkpoints
  • How NeMo Automodel integrates with the Diffusers ecosystem
  • Hardware requirements

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you read our philosophy doc (important for complex PRs)?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@stevhliu @sayakpaul

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@sayakpaul sayakpaul requested a review from stevhliu March 23, 2026 03:22
Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nice, thanks for the docs!

Comment on lines +19 to +25
### Why NeMo Automodel?

- **Hugging Face native**: Train any Diffusers-format model from the Hub with no checkpoint conversion — day-0 support for new model releases.
- **Any scale**: The same YAML recipe and training script runs on 1 GPU or across hundreds of nodes. Parallelism is configuration, not code.
- **High performance**: FSDP2 distributed training with multiresolution bucketed dataloading and pre-encoded latent space training for maximum GPU utilization.
- **Hackable**: Linear training scripts with YAML configuration files. No hidden trainer abstractions — you can read and modify the entire training loop.
- **Open source**: Apache 2.0 licensed, NVIDIA-supported, and actively maintained.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would integrate this info in the opening intro paragraph to simplify the structure a bit

- **Hackable**: Linear training scripts with YAML configuration files. No hidden trainer abstractions — you can read and modify the entire training loop.
- **Open source**: Apache 2.0 licensed, NVIDIA-supported, and actively maintained.

### Workflow overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i don't know if this workflow adds that much value that words could just convey?


| Model | Hugging Face ID | Task | Parameters |
|-------|----------------|------|------------|
| Wan 2.1 T2V 1.3B | [`Wan-AI/Wan2.1-T2V-1.3B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers) | Text-to-Video | 1.3B |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove the backticks around the model name as its not a code element

Comment on lines +267 to +268
> [!TIP]
> Full example configs for all models are available in the [NeMo Automodel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/diffusion/finetune).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> [!TIP]
> Full example configs for all models are available in the [NeMo Automodel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/diffusion/finetune).
Comment on lines +264 to +265
> [!NOTE]
> NeMo Automodel also supports **pretraining** diffusion models from randomly initialized weights. Set `mode: pretrain` in the model config. Pretraining example configs are available in the [NeMo Automodel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/diffusion/pretrain).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> [!NOTE]
> NeMo Automodel also supports **pretraining** diffusion models from randomly initialized weights. Set `mode: pretrain` in the model config. Pretraining example configs are available in the [NeMo Automodel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/diffusion/pretrain).

## Launch training

**Single-node training:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also use the <hfoptions> tags for single-node training and multi-node training


After training, generate videos or images from text prompts using the fine-tuned checkpoint.

**Wan 2.1 (single-GPU):**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also use <hfoptions> tags here

- **Scalable training for Diffusers models**: NeMo Automodel adds distributed training capabilities (FSDP2, multi-node, multiresolution bucketing) that go beyond what the built-in Diffusers training scripts provide, while keeping the same model and pipeline interfaces.
- **Shared ecosystem**: any model, LoRA adapter, or pipeline component from the Diffusers ecosystem remains compatible throughout the training and inference workflow.

## Hardware requirements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add the hardware requirements to the ## Installation section. better for users to know what the requirements are up front :)

pthombre and others added 4 commits March 23, 2026 18:24
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
adding contacts into the readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants