iBacklight (Haoran Qi)

Pinned Loading

uarm-artemis-official/Robots_Basic_Frame_TypeC uarm-artemis-official/Robots_Basic_Frame_TypeC Public

C 2
PipelineLLM PipelineLLM Public

PipelineLLM 是一个系统性的大语言模型（LLM）后训练学习项目，涵盖从监督微调（SFT）到偏好优化（DPO）、强化学习（RLHF/PPO/GRPO）再到持续学习（Continual Learning)的完整技术栈。

Python 19 3
reinforcement-learning reinforcement-learning Public

Forked from dennybritz/reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

Jupyter Notebook 1
AlbertaSat/ex2_obc_software AlbertaSat/ex2_obc_software Public

Main repository for Athena service & equipment handler implementations

C 10 7