Skip to content

add mark_not_offload() interface for cpu_offload_v1#2770

Open
lhb8125 wants to merge 2 commits intoNVIDIA:mainfrom
lhb8125:hongbinl/offload_activation_mxfp8_fix
Open

add mark_not_offload() interface for cpu_offload_v1#2770
lhb8125 wants to merge 2 commits intoNVIDIA:mainfrom
lhb8125:hongbinl/offload_activation_mxfp8_fix

Conversation

@lhb8125
Copy link
Contributor

@lhb8125 lhb8125 commented Mar 17, 2026

Description

This PR adds an the interface of mark_not_offload() for cpu_offload v1 version, so MCore's offloading feature could reuse the mark_not_offload() interface to avoid offloading some tensors.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
@ksivaman ksivaman requested a review from pggPL March 17, 2026 05:59
@ksivaman
Copy link
Member

@lhb8125 Could you please update the PR description? @pggPL for review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR wires up mark_not_offload() for the V1 CPU offload code path (NVTE_CPU_OFFLOAD_V1=1). Previously, the function returned early without doing anything when the V1 path was active. The change correctly adds a delegation call in cpu_offload.py and introduces a new mark_not_offload implementation in cpu_offload_v1.py.

However, the V1 implementation contains a logic bug that makes it a no-op:

  • cpu_offload.py — the delegation fix is correct and straightforward.
  • cpu_offload_v1.py — the new mark_not_offload sets _TE_do_not_offload = True on the underlying tensors. This attribute is checked exclusively in the non-V1 path (OffloadableLayerState._check_if_offload, cpu_offload.py:463). The V1 path decides whether to offload via tensor_need_offloading_checker_activations, which only checks for hasattr(tensor, "activation_offloading"). As a result, if a tensor was previously marked with mark_activation_offload, calling mark_not_offload will not prevent it from being offloaded in V1. The V1 implementation should instead remove the activation_offloading attribute (matching the opt-in semantics of the V1 offloading mechanism), similar to how mark_activation_offload sets it.

Confidence Score: 2/5

  • Not safe to merge — the core V1 implementation sets an attribute that is never checked, making mark_not_offload a silent no-op in the V1 path.
  • The delegation in cpu_offload.py is correct, but the actual V1 implementation in cpu_offload_v1.py uses the wrong attribute (_TE_do_not_offload instead of removing activation_offloading). The V1 offloading decision is entirely gated on activation_offloading, so the new function does not achieve its stated goal.
  • transformer_engine/pytorch/cpu_offload_v1.py — specifically the new mark_not_offload function (lines 52–60).

Important Files Changed

Filename Overview
transformer_engine/pytorch/cpu_offload_v1.py Adds mark_not_offload to the V1 code path and imports prepare_for_saving/restore_from_saved, but the implementation sets _TE_do_not_offload which is never consulted by V1's tensor_need_offloading_checker — the function is effectively a no-op.
transformer_engine/pytorch/cpu_offload.py Small, correct change: delegates to v1_code_path.mark_not_offload before returning early when NVTE_CPU_OFFLOAD_V1 is set, fixing the previously silent no-op.

Flowchart

%%{init: {'theme': 'neutral'}}%% flowchart TD A["mark_not_offload(*tensors) called\n(cpu_offload.py)"] --> B{NVTE_CPU_OFFLOAD_V1?} B -- Yes --> C["v1_code_path.mark_not_offload(*tensors)\n(cpu_offload_v1.py)"] B -- No --> D["prepare_for_saving(*tensors)"] D --> E["setattr(tensor, '_TE_do_not_offload', True)"] E --> F["restore_from_saved(tensor_obj, tensors)"] F --> G["Non-V1 push_tensor checks\n_TE_do_not_offload ✅"] C --> H["prepare_for_saving(*tensors)"] H --> I["setattr(tensor, '_TE_do_not_offload', True)"] I --> J["restore_from_saved(tensor_obj, tensors)"] J --> K["V1 tensor_need_offloading_checker\nchecks activation_offloading only ❌\n_TE_do_not_offload is never read"] style K fill:#ffcccc,stroke:#cc0000 style G fill:#ccffcc,stroke:#009900 
Loading

Last reviewed commit: 730d7a1

Comment on lines +52 to +60
def mark_not_offload(*tensors: torch.Tensor):
"""Marks tensors to prevent them from being offloaded."""
tensors, tensor_obj = prepare_for_saving(*tensors)

for tensor in tensors:
if tensor is not None:
setattr(tensor, "_TE_do_not_offload", True)

restore_from_saved(tensor_obj, tensors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _TE_do_not_offload is never checked in V1 path — function is a no-op

The V1 code path decides whether to offload a tensor using tensor_need_offloading_checker_activations, which checks only for hasattr(tensor, "activation_offloading") (see cpu_offload_v1.py line 738–739). The attribute _TE_do_not_offload set here is never consulted in the V1 path, so calling this function after mark_activation_offload will not prevent offloading.

To confirm: the only place _TE_do_not_offload is read in the whole codebase is cpu_offload.py line 463 (OffloadableLayerState._check_if_offload), which belongs to the non-V1 code path.

For mark_not_offload to actually work in V1, the implementation should mirror how mark_activation_offload operates — i.e., remove (or otherwise gate on) the activation_offloading attribute:

def mark_not_offload(*tensors: torch.Tensor): """Marks tensors to prevent them from being offloaded.""" for tensor in tensors: if tensor is None: continue if type(tensor) in [torch.Tensor, torch.nn.Parameter]: if hasattr(tensor, "activation_offloading"): del tensor.activation_offloading else: data_tensors = tensor.get_data_tensors() for t in data_tensors: if t is not None and hasattr(t, "activation_offloading"): del t.activation_offloading

Alternatively, tensor_need_offloading_checker_activations could be updated to also gate on _TE_do_not_offload, but that would be a more invasive change.

@lhb8125
Copy link
Contributor Author

lhb8125 commented Mar 17, 2026

/te-ci pytorch L1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants