during my own testing and people's feedbacks it seems that some kernels on M1 has precision issues in RoPE. Likely due to sin/cos.
#27