You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for sharing your code. I wonder where the iterative self-improvement is happening in your code. Are SFT and RL performed offline, or are they run iteratively? If it’s the latter, could you point me to the relevant code repository? Thanks
Hi, Thanks for sharing your code.
I wonder where the iterative self-improvement is happening in your code.
Are SFT and RL performed offline, or are they run iteratively? If it’s the latter, could you point me to the relevant code repository?
Thanks