Skip to content
View michaelnny's full-sized avatar
  • Shanghai

Block or report michaelnny

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. rl4llm rl4llm Public

    RL4LLM: A Research-Friendly RL Framework for LLM Post-Tuning

    Python

  2. alpha_zero alpha_zero Public

    A PyTorch implementation of DeepMind's AlphaZero agent to play Go and Gomoku board games

    Python 162 36

  3. deep_rl_zoo deep_rl_zoo Public

    A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.

    Python 120 12

  4. muzero muzero Public

    A PyTorch implementation of DeepMind's MuZero agent

    Python 36 6

  5. InstructLLaMA InstructLLaMA Public

    Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to Instru…

    Jupyter Notebook 56 13