Gen-Verse / ReasonFlux Star 502 Code Issues Pull requests [NeurIPS 2025 Spotlight] ReasonFlux (long-CoT), ReasonFlux-PRM (process reward model) and ReasonFlux-Coder (code generation) reinforcement-learning chain-of-thought llm-rlhf sft-data o1-mini o1-preview deepseek-v3 deepseek-r1 Updated Sep 27, 2025 Python
ssbuild / llm_rlhf Star 26 Code Issues Pull requests realize the reinforcement learning training for gpt2 llama bloom and so on llm model lora reward trl llm rlhf trlx llm-rlhf Updated Sep 19, 2023 Python