michaelnny (Michael Hu)

Pinned Loading

rl4llm rl4llm Public

RL4LLM: A Research-Friendly RL Framework for LLM Post-Tuning

Python
alpha_zero alpha_zero Public

A PyTorch implementation of DeepMind's AlphaZero agent to play Go and Gomoku board games

Python 162 36
deep_rl_zoo deep_rl_zoo Public

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.

Python 120 12
muzero muzero Public

A PyTorch implementation of DeepMind's MuZero agent

Python 36 6
InstructLLaMA InstructLLaMA Public

Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to Instru…

Jupyter Notebook 56 13