- Notifications
You must be signed in to change notification settings - Fork 3.1k
Pull requests: PaddlePaddle/PaddleNLP
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Normalize gates on expert dim before calculating seq_aux_loss
#11160 opened Nov 3, 2025 by lshpku Loading…
support sharding stage3 for deepseekv3 model contributor
#11149 opened Oct 23, 2025 by AlAuAu Loading…
【FlexCheckpoint】fix_the_optimizer_init contributor
#11123 opened Sep 27, 2025 by zty-king Loading…
2 tasks
hack offload optimizer减少一次master weight的offload&reload
#11111 opened Sep 23, 2025 by Wennie396 Loading…
add script for training gpt3 on XPU machine using flagcx as comm backend contributor stale
#11014 opened Aug 26, 2025 by mikethegoblin Loading…
2 tasks
[NOT MERGE]Pr adapt flex checkpoint contributor stale
#10996 opened Aug 25, 2025 by zty-king Loading…
2 tasks
[BUG]: fix the bug in PretrainedModel.recompute_disable() contributor stale
#10988 opened Aug 21, 2025 by hongjx175 Loading…
2 tasks
recompute support offload tensor stale
#10981 opened Aug 21, 2025 by blacksheep-Aristotle Loading…
2 tasks
moe_layer support fine_grained_forward stale
#10980 opened Aug 21, 2025 by blacksheep-Aristotle Loading…
2 tasks
update expert parallel init logic stale
#10966 opened Aug 18, 2025 by blacksheep-Aristotle Loading…
2 tasks
Previous Next
ProTip! no:milestone will show everything without a milestone.