rocm 7.2
bugfixes and performance improvements
hunyuan
fp8 support
finish optimisations
sync master
impl whisper encoder fa2
rocm