Skip to content

feat: add w4afp8 on Hopper GPUs#287

Open
foreverrookie wants to merge 1 commit intodeepseek-ai:mainfrom
foreverrookie:feat/w4a8
Open

feat: add w4afp8 on Hopper GPUs#287
foreverrookie wants to merge 1 commit intodeepseek-ai:mainfrom
foreverrookie:feat/w4a8

Conversation

@foreverrookie
Copy link

@foreverrookie foreverrookie commented Feb 9, 2026

Hi, from novita.ai team.

test perf(W4Afp8 vs FP8) on H200. nvcc: 12.9.

groups m/grp n k W4 us W4 GB/s FP8 us FP8 GB/s Speedup
8 16 4096 7168 48 2643 68 3460 1.42x
8 24 4096 7168 48 2658 68 3480 1.42x
8 32 4096 7168 48 2690 68 3486 1.42x
8 40 4096 7168 48 2724 69 3499 1.44x
8 48 4096 7168 48 2721 68 3516 1.42x
8 56 4096 7168 86 1543 76 3170 0.88x
8 64 4096 7168 85 1564 75 3239 0.88x
8 16 7168 2048 32 2036 40 2963 1.25x
8 24 7168 2048 32 2049 40 2981 1.25x
8 32 7168 2048 32 2095 40 3004 1.25x
8 40 7168 2048 32 2120 40 3027 1.25x
8 48 7168 2048 32 2161 40 3059 1.25x
8 56 7168 2048 31 2244 44 2843 1.42x
8 64 7168 2048 42 1666 45 2768 1.07x
16 16 4096 7168 91 2782 131 3613 1.44x
16 24 4096 7168 92 2787 131 3633 1.42x
16 32 4096 7168 91 2832 131 3634 1.44x
16 40 4096 7168 91 2838 131 3655 1.44x
16 48 4096 7168 91 2859 131 3669 1.44x
16 56 4096 7168 128 2051 140 3466 1.09x
16 64 4096 7168 128 2068 141 3458 1.10x
16 16 7168 2048 56 2305 71 3346 1.27x
16 24 7168 2048 56 2336 72 3368 1.29x
16 32 7168 2048 55 2411 71 3402 1.29x
16 40 7168 2048 56 2423 72 3434 1.29x
16 48 7168 2048 56 2467 72 3457 1.29x
16 56 7168 2048 67 2094 75 3334 1.12x
16 64 7168 2048 68 2077 79 3170 1.16x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant