OOM when training with long sequences despite using dynamic batch size and sequence parallel

Problem
I need to configure Megatron for training a 4B model with 64K max-response-len using PPO on 4x H200 GPUs.
I'm experiencing Out of Memory (OOM) errors when training with 64K sequences, even with --use-dynamic-batch-size and --sequence-parallel enabled.

My Understanding
I set --max-tokens-per-gpu . I expected:

Dynamic batch size would automatically adjust micro_batch_size based on sequence length
This would prevent activations from exceeding GPU memory
optimizer.step() memory usage should remain stable regardless of global batch size
However: It seems unlikely to avoid OOM by simply reducing global batch size, since optimizer memory is independent of global batch size.

MyConfig

TP_SIZE=2 PP_SIZE=1 CP_SIZE=1 EP_SIZE=1 ETP_SIZE=1 MAX_LEN=$((1024 * 64)) # 64K MAX_TOKENS_PER_GPU=$((($MAX_LEN / $CP_SIZE) + 1024)) # ~65K ROLLOUT_BATCH_SIZE=16 N_SAMPLES_PER_PROMPT=4 NUM_STEPS_PER_ROLLOUT=4

--rollout-batch-size $ROLLOUT_BATCH_SIZE \ --n-samples-per-prompt $N_SAMPLES_PER_PROMPT \ --rollout-max-response-len $MAX_LEN \ --colocate \ --actor-num-gpus-per-node 2 \ --tensor-model-parallel-size 2 \ --sequence-parallel \ --pipeline-model-parallel-size 1 \ --context-parallel-size 1 \ --recompute-granularity full \ --recompute-method uniform \ --recompute-num-layers 1 \ --use-dynamic-batch-size \ --max-tokens-per-gpu 65536 \ --transformer-impl transformer_engine \ --bf16 \ --fp8-format e4m3 \ --fp8-recipe blockwise

Is there any way to avoid the OOM problem during long sequence training？
Any guidance appreciated! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when training with long sequences despite using dynamic batch size and sequence parallel #1522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM when training with long sequences despite using dynamic batch size and sequence parallel #1522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions