Skip to content

Port Baremetal: Add Cortex-M33 MPS3-AN524 platform support#1579

Draft
willieyz wants to merge 2 commits intomainfrom
port-m33-an524
Draft

Port Baremetal: Add Cortex-M33 MPS3-AN524 platform support#1579
willieyz wants to merge 2 commits intomainfrom
port-m33-an524

Conversation

@willieyz
Copy link
Contributor

@willieyz willieyz commented Feb 25, 2026

Add bare-metal platform support for ARM Cortex-M33 on MPS3-AN524 FPGA. Works on both QEMU (qemu-system-arm -M mps3-an524) and real hardware.

Notice that the configuration ML*_CONFIG_REDUCE_RAM did not implement in mlkem-native, we skip this option during this porting.

  • CMSIS patches for AN524 board configuration
  • UART driver at 0x41303000 (32 MHz, 115200 baud)
  • Memory: 512KB Flash @ 0x10000000, 128KB SRAM @ 0x20000000
  • 96KB stack for ML-DSA-87 support
  • CI integration for automated testing
@willieyz willieyz force-pushed the port-m33-an524 branch 3 times, most recently from a61e82c to bae13fa Compare February 25, 2026 13:21
@oqs-bot
Copy link
Contributor

oqs-bot commented Feb 25, 2026

CBMC Results (ML-KEM-512)

Full Results (187 proofs)
Proof Status Current Previous Change
**TOTAL** 1372s 1186s +15.7%
mlk_indcpa_enc 175s 152s +15%
mlk_keccak_squeezeblocks_x4 159s 129s +23%
mlk_indcpa_keypair_derand 115s 99s +16%
mlk_rej_uniform_c 86s 79s +9%
mlk_polyvec_basemul_acc_montgomery_cached_c 84s 64s +31%
mlk_poly_rej_uniform 48s 38s +26%
poly_ntt_native 32s 29s +10%
polyvec_basemul_acc_montgomery_cached_native 22s 19s +16%
keccakf1600x4_permute_native_x4 21s 19s +11%
mlk_ntt_layer 18s 13s +38%
mlk_poly_reduce_native 17s 14s +21%
mlk_poly_decompress_d4_native 15s 11s +36%
mlk_poly_frommsg 15s 8s +88%
mlk_polyvec_add 15s 17s -12%
mlk_indcpa_dec 14s 11s +27%
mlk_poly_decompress_d10_native 14s 12s +17%
mlk_keccak_absorb_once_x4 11s 8s +38%
mlk_ntt_butterfly_block 11s 8s +38%
mlk_poly_frombytes_native 10s 8s +25%
mlk_polymat_permute_bitrev_to_custom 10s 8s +25%
mlk_keccak_squeezeblocks 9s 5s +80%
mlk_poly_rej_uniform_x4 8s 8s +0%
mlk_fqmul 7s 7s +0%
mlk_shake256x4 7s 5s +40%
keccakf1600_permute_native 6s 8s -25%
kem_dec 6s 5s +20%
mlk_keccak_squeeze_once 6s 6s +0%
poly_decompress_d10_native_x86_64 6s 5s +20%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 5s 3s +67%
mlk_invntt_layer 5s 4s +25%
mlk_poly_getnoise_eta1122_4x 5s 2s +150%
mlk_poly_getnoise_eta1_4x 5s 3s +67%
mlk_poly_mulcache_compute_c 5s 3s +67%
poly_decompress_d4_native_x86_64 5s 5s +0%
intt_native_aarch64 4s 2s +100%
intt_native_x86_64 4s 2s +100%
keccakf1600x4_extract_bytes_native 4s 3s +33%
kem_check_pk 4s 4s +0%
kem_enc_derand 4s 4s +0%
kem_keypair_derand 4s 3s +33%
mlk_ct_cmov_zero 4s 3s +33%
mlk_ct_get_optblocker_i32 4s 1s +300%
mlk_gen_matrix 4s 3s +33%
mlk_keccak_absorb_once 4s 6s -33%
mlk_poly_add 4s 5s -20%
mlk_poly_cbd_eta2 4s 4s +0%
mlk_poly_compress_d10_c 4s 3s +33%
mlk_poly_compress_d4_c 4s 2s +100%
mlk_poly_compress_d5_native 4s 2s +100%
mlk_poly_decompress_d4_c 4s 3s +33%
mlk_poly_decompress_dv 4s 2s +100%
mlk_poly_getnoise_eta1_4x_native 4s 3s +33%
mlk_poly_mulcache_compute 4s 2s +100%
mlk_poly_ntt 4s 3s +33%
mlk_poly_tobytes 4s 3s +33%
mlk_poly_tomont_native 4s 2s +100%
mlk_polyvec_mulcache_compute 4s 3s +33%
mlk_polyvec_ntt 4s 5s -20%
ntt_native_aarch64 4s 4s +0%
poly_frombytes_native_x86_64 4s 4s +0%
poly_reduce_native_aarch64 4s 1s +300%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 4s 4s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 4s 3s +33%
rej_uniform_native_x86_64 4s 4s +0%
keccakf1600x4_xor_bytes_native 3s 2s +50%
kem_keypair 3s 4s -25%
mlk_check_pct 3s 2s +50%
mlk_ct_cmask_neg_i16 3s 1s +200%
mlk_ct_get_optblocker_u8 3s 3s +0%
mlk_gen_matrix_serial 3s 2s +50%
mlk_keccakf1600_extract_bytes 3s 3s +0%
mlk_keccakf1600_extract_bytes (big endian) 3s 3s +0%
mlk_keccakf1600_xor_bytes 3s 3s +0%
mlk_keccakf1600_xor_bytes (big endian) 3s 2s +50%
mlk_keccakf1600x4_extract_bytes 3s 2s +50%
mlk_keypair_getnoise 3s 2s +50%
mlk_matvec_mul 3s 2s +50%
mlk_montgomery_reduce 3s 1s +200%
mlk_poly_compress_d11_c 3s 4s -25%
mlk_poly_compress_d11_native 3s 1s +200%
mlk_poly_compress_d4 3s 3s +0%
mlk_poly_compress_d4_native 3s 3s +0%
mlk_poly_compress_d5_c 3s 2s +50%
mlk_poly_decompress_d10_c 3s 5s -40%
mlk_poly_decompress_du 3s 2s +50%
mlk_poly_invntt_tomont 3s 1s +200%
mlk_poly_reduce 3s 3s +0%
mlk_poly_reduce_c 3s 2s +50%
mlk_poly_sub 3s 3s +0%
mlk_poly_tomont 3s 1s +200%
mlk_poly_tomsg 3s 3s +0%
mlk_polyvec_basemul_acc_montgomery_cached 3s 2s +50%
mlk_polyvec_decompress_du 3s 2s +50%
mlk_polyvec_tobytes 3s 1s +200%
mlk_rej_uniform 3s 1s +200%
mlk_scalar_compress_d4 3s 1s +200%
mlk_scalar_signed_to_unsigned_q 3s 2s +50%
mlk_sha3_512 3s 2s +50%
mlk_shake128_absorb_once 3s 3s +0%
mlk_shake128_squeezeblocks 3s 3s +0%
mlk_shake256 3s 3s +0%
mlk_value_barrier_u32 3s 2s +50%
mlk_value_barrier_u8 3s 2s +50%
nttunpack_native_x86_64 3s 4s -25%
poly_compress_d11_native_x86_64 3s 2s +50%
poly_decompress_d11_native_x86_64 3s 1s +200%
poly_getnoise_eta1122_4x_native 3s 2s +50%
poly_tobytes_native_aarch64 3s 2s +50%
poly_tomont_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 4s -25%
rej_uniform_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 1s +100%
kem_check_sk 2s 2s +0%
kem_enc 2s 2s +0%
mlk_barrett_reduce 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 1s +100%
mlk_ct_cmask_nonzero_u8 2s 3s -33%
mlk_ct_get_optblocker_u32 2s 4s -50%
mlk_ct_memcmp 2s 2s +0%
mlk_ct_sel_int16 2s 3s -33%
mlk_keccakf1600_permute 2s 4s -50%
mlk_keccakf1600x4_permute 2s 1s +100%
mlk_keccakf1600x4_xor_bytes 2s 2s +0%
mlk_poly_cbd_eta1 2s 1s +100%
mlk_poly_compress_d10 2s 4s -50%
mlk_poly_compress_d10_native 2s 2s +0%
mlk_poly_compress_d5 2s 3s -33%
mlk_poly_compress_du 2s 4s -50%
mlk_poly_compress_dv 2s 1s +100%
mlk_poly_decompress_d11_c 2s 3s -33%
mlk_poly_decompress_d11_native 2s 1s +100%
mlk_poly_decompress_d4 2s 2s +0%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_decompress_d5_c 2s 1s +100%
mlk_poly_decompress_d5_native 2s 1s +100%
mlk_poly_frombytes 2s 2s +0%
mlk_poly_frombytes_c 2s 4s -50%
mlk_poly_getnoise_eta2 2s 2s +0%
mlk_poly_invntt_tomont_c 2s 2s +0%
mlk_poly_mulcache_compute_native 2s 1s +100%
mlk_poly_tobytes_c 2s 3s -33%
mlk_poly_tobytes_native 2s 4s -50%
mlk_poly_tomont_c 2s 3s -33%
mlk_polyvec_compress_du 2s 2s +0%
mlk_polyvec_frombytes 2s 1s +100%
mlk_polyvec_invntt_tomont 2s 1s +100%
mlk_polyvec_permute_bitrev_to_custom 2s 2s +0%
mlk_polyvec_permute_bitrev_to_custom_native 2s 4s -50%
mlk_polyvec_reduce 2s 3s -33%
mlk_polyvec_tomont 2s 1s +100%
mlk_scalar_compress_d11 2s 2s +0%
mlk_scalar_decompress_d10 2s 1s +100%
mlk_scalar_decompress_d11 2s 2s +0%
mlk_scalar_decompress_d5 2s 4s -50%
mlk_sha3_256 2s 2s +0%
mlk_shake128x4_squeezeblocks 2s 1s +100%
poly_compress_d10_native_x86_64 2s 3s -33%
poly_compress_d4_native_x86_64 2s 3s -33%
poly_compress_d5_native_x86_64 2s 1s +100%
poly_decompress_d5_native_x86_64 2s 2s +0%
poly_invntt_tomont_native 2s 4s -50%
poly_reduce_native_x86_64 2s 3s -33%
poly_tobytes_native_x86_64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 1s 1s +0%
keccak_f1600_x4_native_avx2 1s 3s -67%
mlk_ct_sel_uint8 1s 1s +0%
mlk_poly_compress_d11 1s 1s +0%
mlk_poly_decompress_d10 1s 3s -67%
mlk_poly_decompress_d11 1s 2s -50%
mlk_poly_ntt_c 1s 1s +0%
mlk_scalar_compress_d1 1s 3s -67%
mlk_scalar_compress_d10 1s 1s +0%
mlk_scalar_compress_d5 1s 1s +0%
mlk_scalar_decompress_d4 1s 2s -50%
mlk_shake128x4_absorb_once 1s 2s -50%
mlk_value_barrier_i32 1s 4s -75%
ntt_native_x86_64 1s 4s -75%
poly_mulcache_compute_native_aarch64 1s 4s -75%
poly_mulcache_compute_native_x86_64 1s 3s -67%
poly_tomont_native_aarch64 1s 1s +0%
rej_uniform_native 1s 4s -75%
sys_check_capability 1s 1s +0%
@oqs-bot
Copy link
Contributor

oqs-bot commented Feb 25, 2026

CBMC Results (ML-KEM-768)

Full Results (187 proofs)
Proof Status Current Previous Change
**TOTAL** 1432s 1413s +1.3%
mlk_indcpa_enc 247s 243s +2%
mlk_indcpa_keypair_derand 238s 231s +3%
mlk_keccak_squeezeblocks_x4 122s 118s +3%
mlk_rej_uniform_c 70s 73s -4%
polyvec_basemul_acc_montgomery_cached_native 60s 57s +5%
mlk_polyvec_basemul_acc_montgomery_cached_c 50s 46s +9%
mlk_poly_rej_uniform 31s 29s +7%
mlk_polyvec_add 27s 26s +4%
poly_ntt_native 26s 22s +18%
keccakf1600x4_permute_native_x4 18s 20s -10%
mlk_poly_reduce_native 14s 14s +0%
mlk_ntt_layer 13s 14s -7%
mlk_poly_decompress_d4_native 13s 12s +8%
mlk_poly_decompress_d10_native 12s 11s +9%
mlk_poly_frommsg 12s 13s -8%
mlk_indcpa_dec 10s 14s -29%
mlk_keccak_absorb_once_x4 9s 9s +0%
mlk_keccak_squeezeblocks 9s 6s +50%
mlk_poly_frombytes_native 9s 6s +50%
mlk_keccak_squeeze_once 7s 7s +0%
mlk_poly_rej_uniform_x4 7s 7s +0%
mlk_keccakf1600x4_permute 6s 1s +500%
mlk_ntt_butterfly_block 6s 8s -25%
mlk_poly_decompress_d10_c 6s 5s +20%
keccakf1600_permute_native 5s 5s +0%
kem_dec 5s 5s +0%
mlk_fqmul 5s 7s -29%
mlk_invntt_layer 5s 6s -17%
mlk_poly_add 5s 6s -17%
mlk_poly_compress_d11_c 5s 3s +67%
mlk_polymat_permute_bitrev_to_custom 5s 5s +0%
mlk_scalar_decompress_d4 5s 2s +150%
mlk_scalar_decompress_d5 5s 1s +400%
poly_decompress_d10_native_x86_64 5s 3s +67%
poly_frombytes_native_x86_64 5s 4s +25%
keccakf1600x4_extract_bytes_native 4s 2s +100%
keccakf1600x4_xor_bytes_native 4s 4s +0%
kem_keypair_derand 4s 3s +33%
mlk_ct_cmov_zero 4s 3s +33%
mlk_ct_memcmp 4s 2s +100%
mlk_gen_matrix 4s 4s +0%
mlk_montgomery_reduce 4s 1s +300%
mlk_poly_getnoise_eta1_4x 4s 3s +33%
mlk_poly_invntt_tomont_c 4s 4s +0%
mlk_poly_mulcache_compute_c 4s 3s +33%
mlk_polyvec_compress_du 4s 2s +100%
mlk_scalar_compress_d11 4s 2s +100%
mlk_sha3_256 4s 2s +100%
mlk_shake256x4 4s 4s +0%
poly_compress_d11_native_x86_64 4s 3s +33%
poly_compress_d4_native_x86_64 4s 1s +300%
poly_decompress_d5_native_x86_64 4s 2s +100%
poly_mulcache_compute_native_aarch64 4s 3s +33%
poly_mulcache_compute_native_x86_64 4s 1s +300%
poly_reduce_native_aarch64 4s 2s +100%
poly_tobytes_native_x86_64 4s 2s +100%
intt_native_aarch64 3s 4s -25%
keccak_f1600_x4_native_avx2 3s 3s +0%
kem_check_pk 3s 2s +50%
kem_enc_derand 3s 2s +50%
mlk_check_pct 3s 3s +0%
mlk_ct_cmask_neg_i16 3s 2s +50%
mlk_ct_sel_uint8 3s 1s +200%
mlk_keccak_absorb_once 3s 5s -40%
mlk_keccakf1600_permute 3s 4s -25%
mlk_keccakf1600_xor_bytes 3s 2s +50%
mlk_poly_cbd_eta1 3s 1s +200%
mlk_poly_compress_d10 3s 2s +50%
mlk_poly_compress_d10_c 3s 6s -50%
mlk_poly_compress_d11 3s 2s +50%
mlk_poly_compress_d11_native 3s 2s +50%
mlk_poly_compress_d4_native 3s 3s +0%
mlk_poly_compress_d5 3s 4s -25%
mlk_poly_decompress_d11 3s 1s +200%
mlk_poly_decompress_du 3s 3s +0%
mlk_poly_decompress_dv 3s 5s -40%
mlk_poly_getnoise_eta1122_4x 3s 3s +0%
mlk_poly_getnoise_eta2 3s 1s +200%
mlk_poly_tobytes 3s 4s -25%
mlk_poly_tobytes_c 3s 3s +0%
mlk_poly_tomont_native 3s 1s +200%
mlk_polyvec_basemul_acc_montgomery_cached 3s 3s +0%
mlk_polyvec_frombytes 3s 2s +50%
mlk_polyvec_permute_bitrev_to_custom_native 3s 3s +0%
mlk_polyvec_reduce 3s 2s +50%
mlk_scalar_compress_d10 3s 2s +50%
mlk_shake128_squeezeblocks 3s 2s +50%
ntt_native_x86_64 3s 4s -25%
nttunpack_native_x86_64 3s 3s +0%
poly_compress_d5_native_x86_64 3s 2s +50%
poly_decompress_d4_native_x86_64 3s 4s -25%
poly_tobytes_native_aarch64 3s 3s +0%
poly_tomont_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 3s 3s +0%
rej_uniform_native 3s 2s +50%
rej_uniform_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
kem_check_sk 2s 1s +100%
kem_enc 2s 2s +0%
mlk_barrett_reduce 2s 3s -33%
mlk_ct_cmask_nonzero_u16 2s 3s -33%
mlk_ct_cmask_nonzero_u8 2s 3s -33%
mlk_ct_get_optblocker_i32 2s 2s +0%
mlk_gen_matrix_serial 2s 2s +0%
mlk_keccakf1600_extract_bytes (big endian) 2s 3s -33%
mlk_keccakf1600_xor_bytes (big endian) 2s 2s +0%
mlk_keccakf1600x4_extract_bytes 2s 1s +100%
mlk_matvec_mul 2s 1s +100%
mlk_poly_compress_d10_native 2s 1s +100%
mlk_poly_compress_d4 2s 2s +0%
mlk_poly_compress_d5_c 2s 2s +0%
mlk_poly_compress_dv 2s 2s +0%
mlk_poly_decompress_d11_c 2s 2s +0%
mlk_poly_decompress_d11_native 2s 4s -50%
mlk_poly_decompress_d4 2s 2s +0%
mlk_poly_decompress_d4_c 2s 2s +0%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_decompress_d5_c 2s 2s +0%
mlk_poly_frombytes 2s 1s +100%
mlk_poly_frombytes_c 2s 3s -33%
mlk_poly_getnoise_eta1_4x_native 2s 2s +0%
mlk_poly_invntt_tomont 2s 2s +0%
mlk_poly_ntt 2s 3s -33%
mlk_poly_ntt_c 2s 4s -50%
mlk_poly_reduce_c 2s 1s +100%
mlk_poly_tomont 2s 3s -33%
mlk_poly_tomont_c 2s 4s -50%
mlk_poly_tomsg 2s 1s +100%
mlk_polyvec_decompress_du 2s 1s +100%
mlk_polyvec_permute_bitrev_to_custom 2s 4s -50%
mlk_polyvec_tobytes 2s 4s -50%
mlk_polyvec_tomont 2s 3s -33%
mlk_rej_uniform 2s 2s +0%
mlk_scalar_compress_d1 2s 2s +0%
mlk_scalar_compress_d4 2s 4s -50%
mlk_scalar_decompress_d10 2s 2s +0%
mlk_shake128_absorb_once 2s 4s -50%
mlk_shake128x4_absorb_once 2s 4s -50%
mlk_shake128x4_squeezeblocks 2s 2s +0%
mlk_shake256 2s 2s +0%
mlk_value_barrier_i32 2s 2s +0%
mlk_value_barrier_u32 2s 3s -33%
mlk_value_barrier_u8 2s 1s +100%
ntt_native_aarch64 2s 3s -33%
poly_compress_d10_native_x86_64 2s 2s +0%
poly_invntt_tomont_native 2s 3s -33%
poly_reduce_native_x86_64 2s 1s +100%
poly_tomont_native_x86_64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 2s 3s -33%
rej_uniform_native_x86_64 2s 1s +100%
intt_native_x86_64 1s 4s -75%
keccak_f1600_x1_native_aarch64 1s 2s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 1s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 3s -67%
kem_keypair 1s 1s +0%
mlk_ct_get_optblocker_u32 1s 2s -50%
mlk_ct_get_optblocker_u8 1s 3s -67%
mlk_ct_sel_int16 1s 2s -50%
mlk_keccakf1600_extract_bytes 1s 1s +0%
mlk_keccakf1600x4_xor_bytes 1s 1s +0%
mlk_keypair_getnoise 1s 2s -50%
mlk_poly_cbd_eta2 1s 1s +0%
mlk_poly_compress_d4_c 1s 3s -67%
mlk_poly_compress_d5_native 1s 2s -50%
mlk_poly_compress_du 1s 2s -50%
mlk_poly_decompress_d10 1s 1s +0%
mlk_poly_decompress_d5_native 1s 3s -67%
mlk_poly_mulcache_compute 1s 1s +0%
mlk_poly_mulcache_compute_native 1s 2s -50%
mlk_poly_reduce 1s 2s -50%
mlk_poly_sub 1s 4s -75%
mlk_poly_tobytes_native 1s 3s -67%
mlk_polyvec_invntt_tomont 1s 3s -67%
mlk_polyvec_mulcache_compute 1s 3s -67%
mlk_polyvec_ntt 1s 3s -67%
mlk_scalar_compress_d5 1s 2s -50%
mlk_scalar_decompress_d11 1s 2s -50%
mlk_scalar_signed_to_unsigned_q 1s 2s -50%
mlk_sha3_512 1s 2s -50%
poly_decompress_d11_native_x86_64 1s 3s -67%
poly_getnoise_eta1122_4x_native 1s 4s -75%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 1s 4s -75%
sys_check_capability 1s 2s -50%
@willieyz willieyz force-pushed the port-m33-an524 branch 3 times, most recently from aca0c50 to 23ae351 Compare February 25, 2026 15:04
@oqs-bot
Copy link
Contributor

oqs-bot commented Feb 25, 2026

CBMC Results (ML-KEM-1024)

Full Results (187 proofs)
Proof Status Current Previous Change
**TOTAL** 1845s 1744s +5.8%
mlk_indcpa_keypair_derand 387s 378s +2%
mlk_indcpa_enc 298s 275s +8%
polyvec_basemul_acc_montgomery_cached_native 142s 127s +12%
mlk_keccak_squeezeblocks_x4 140s 124s +13%
mlk_rej_uniform_c 79s 70s +13%
mlk_polyvec_basemul_acc_montgomery_cached_c 72s 70s +3%
mlk_poly_rej_uniform 38s 36s +6%
poly_ntt_native 37s 27s +37%
mlk_polyvec_add 28s 25s +12%
keccakf1600x4_permute_native_x4 19s 21s -10%
mlk_poly_reduce_native 16s 15s +7%
mlk_indcpa_dec 15s 15s +0%
mlk_ntt_layer 14s 11s +27%
mlk_polyvec_ntt 14s 14s +0%
mlk_poly_frommsg 13s 11s +18%
mlk_poly_decompress_d5_native 12s 14s -14%
mlk_poly_decompress_d11_native 11s 12s -8%
mlk_polymat_permute_bitrev_to_custom 11s 10s +10%
mlk_poly_compress_d11_c 10s 8s +25%
mlk_poly_frombytes_native 10s 7s +43%
mlk_invntt_layer 8s 4s +100%
mlk_poly_rej_uniform_x4 8s 8s +0%
kem_dec 7s 6s +17%
mlk_fqmul 7s 6s +17%
mlk_keccak_absorb_once_x4 7s 9s -22%
mlk_keccak_squeezeblocks 7s 6s +17%
mlk_ntt_butterfly_block 7s 7s +0%
poly_frombytes_native_x86_64 7s 4s +75%
keccakf1600_permute_native 6s 7s -14%
kem_enc_derand 6s 4s +50%
mlk_gen_matrix 6s 9s -33%
mlk_gen_matrix_serial 6s 6s +0%
mlk_keccak_squeeze_once 6s 7s -14%
kem_check_pk 5s 4s +25%
mlk_check_pct 5s 3s +67%
mlk_poly_add 5s 4s +25%
mlk_polyvec_mulcache_compute 5s 2s +150%
mlk_scalar_decompress_d4 5s 2s +150%
poly_decompress_d5_native_x86_64 5s 5s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 5s 2s +150%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 5s 1s +400%
keccakf1600x4_xor_bytes_native 4s 4s +0%
mlk_keccak_absorb_once 4s 4s +0%
mlk_poly_compress_d4 4s 2s +100%
mlk_poly_decompress_d11 4s 3s +33%
mlk_poly_mulcache_compute_c 4s 4s +0%
mlk_poly_tobytes_c 4s 2s +100%
mlk_polyvec_compress_du 4s 3s +33%
mlk_polyvec_permute_bitrev_to_custom_native 4s 4s +0%
mlk_polyvec_tomont 4s 2s +100%
poly_decompress_d11_native_x86_64 4s 4s +0%
poly_mulcache_compute_native_aarch64 4s 2s +100%
poly_mulcache_compute_native_x86_64 4s 4s +0%
rej_uniform_native 4s 3s +33%
sys_check_capability 4s 2s +100%
intt_native_x86_64 3s 1s +200%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 1s +200%
kem_check_sk 3s 4s -25%
kem_keypair 3s 1s +200%
kem_keypair_derand 3s 5s -40%
mlk_ct_cmask_nonzero_u8 3s 3s +0%
mlk_ct_cmov_zero 3s 4s -25%
mlk_ct_get_optblocker_i32 3s 3s +0%
mlk_ct_sel_uint8 3s 2s +50%
mlk_keccakf1600_extract_bytes (big endian) 3s 1s +200%
mlk_keccakf1600_permute 3s 4s -25%
mlk_keccakf1600_xor_bytes (big endian) 3s 2s +50%
mlk_keccakf1600x4_permute 3s 1s +200%
mlk_poly_cbd_eta1 3s 2s +50%
mlk_poly_compress_d5_c 3s 3s +0%
mlk_poly_decompress_d10 3s 5s -40%
mlk_poly_decompress_d10_c 3s 2s +50%
mlk_poly_decompress_d10_native 3s 2s +50%
mlk_poly_decompress_d11_c 3s 2s +50%
mlk_poly_decompress_d4 3s 3s +0%
mlk_poly_decompress_d5 3s 2s +50%
mlk_poly_decompress_d5_c 3s 2s +50%
mlk_poly_decompress_du 3s 4s -25%
mlk_poly_getnoise_eta1122_4x 3s 2s +50%
mlk_poly_getnoise_eta1_4x_native 3s 1s +200%
mlk_poly_invntt_tomont_c 3s 5s -40%
mlk_poly_mulcache_compute 3s 3s +0%
mlk_poly_ntt_c 3s 3s +0%
mlk_poly_sub 3s 2s +50%
mlk_poly_tobytes_native 3s 4s -25%
mlk_poly_tomont_c 3s 2s +50%
mlk_polyvec_basemul_acc_montgomery_cached 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom 3s 3s +0%
mlk_rej_uniform 3s 1s +200%
mlk_scalar_compress_d10 3s 3s +0%
mlk_scalar_compress_d4 3s 1s +200%
mlk_scalar_decompress_d10 3s 3s +0%
mlk_sha3_512 3s 1s +200%
mlk_shake256x4 3s 5s -40%
mlk_value_barrier_i32 3s 3s +0%
mlk_value_barrier_u8 3s 2s +50%
ntt_native_aarch64 3s 3s +0%
ntt_native_x86_64 3s 3s +0%
nttunpack_native_x86_64 3s 3s +0%
poly_compress_d10_native_x86_64 3s 2s +50%
poly_compress_d5_native_x86_64 3s 4s -25%
poly_decompress_d10_native_x86_64 3s 3s +0%
poly_decompress_d4_native_x86_64 3s 3s +0%
poly_tobytes_native_x86_64 3s 2s +50%
poly_tomont_native_x86_64 3s 1s +200%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 4s -25%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 4s -25%
intt_native_aarch64 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 3s -33%
kem_enc 2s 2s +0%
mlk_barrett_reduce 2s 3s -33%
mlk_ct_cmask_neg_i16 2s 1s +100%
mlk_ct_cmask_nonzero_u16 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 2s +0%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_ct_memcmp 2s 6s -67%
mlk_keccakf1600_extract_bytes 2s 1s +100%
mlk_keccakf1600_xor_bytes 2s 4s -50%
mlk_keccakf1600x4_extract_bytes 2s 1s +100%
mlk_keccakf1600x4_xor_bytes 2s 3s -33%
mlk_keypair_getnoise 2s 1s +100%
mlk_matvec_mul 2s 1s +100%
mlk_montgomery_reduce 2s 4s -50%
mlk_poly_cbd_eta2 2s 4s -50%
mlk_poly_compress_d10 2s 4s -50%
mlk_poly_compress_d11_native 2s 3s -33%
mlk_poly_compress_d4_native 2s 1s +100%
mlk_poly_compress_d5_native 2s 3s -33%
mlk_poly_compress_du 2s 3s -33%
mlk_poly_decompress_d4_c 2s 3s -33%
mlk_poly_decompress_d4_native 2s 1s +100%
mlk_poly_decompress_dv 2s 1s +100%
mlk_poly_frombytes 2s 3s -33%
mlk_poly_frombytes_c 2s 1s +100%
mlk_poly_getnoise_eta1_4x 2s 3s -33%
mlk_poly_getnoise_eta2 2s 3s -33%
mlk_poly_invntt_tomont 2s 3s -33%
mlk_poly_mulcache_compute_native 2s 2s +0%
mlk_poly_reduce 2s 2s +0%
mlk_poly_tomont 2s 3s -33%
mlk_poly_tomont_native 2s 2s +0%
mlk_poly_tomsg 2s 2s +0%
mlk_polyvec_decompress_du 2s 3s -33%
mlk_polyvec_frombytes 2s 2s +0%
mlk_polyvec_invntt_tomont 2s 3s -33%
mlk_polyvec_reduce 2s 4s -50%
mlk_polyvec_tobytes 2s 1s +100%
mlk_scalar_compress_d11 2s 1s +100%
mlk_scalar_compress_d5 2s 3s -33%
mlk_scalar_decompress_d11 2s 1s +100%
mlk_scalar_decompress_d5 2s 2s +0%
mlk_shake128_absorb_once 2s 2s +0%
mlk_shake128x4_absorb_once 2s 3s -33%
mlk_shake128x4_squeezeblocks 2s 2s +0%
poly_compress_d11_native_x86_64 2s 4s -50%
poly_getnoise_eta1122_4x_native 2s 2s +0%
poly_reduce_native_x86_64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 2s 2s +0%
rej_uniform_native_aarch64 2s 1s +100%
rej_uniform_native_x86_64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 1s 1s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 3s -67%
mlk_ct_sel_int16 1s 3s -67%
mlk_poly_compress_d10_c 1s 3s -67%
mlk_poly_compress_d10_native 1s 3s -67%
mlk_poly_compress_d11 1s 1s +0%
mlk_poly_compress_d4_c 1s 2s -50%
mlk_poly_compress_d5 1s 4s -75%
mlk_poly_compress_dv 1s 3s -67%
mlk_poly_ntt 1s 3s -67%
mlk_poly_reduce_c 1s 1s +0%
mlk_poly_tobytes 1s 2s -50%
mlk_scalar_compress_d1 1s 3s -67%
mlk_scalar_signed_to_unsigned_q 1s 2s -50%
mlk_sha3_256 1s 3s -67%
mlk_shake128_squeezeblocks 1s 1s +0%
mlk_shake256 1s 2s -50%
mlk_value_barrier_u32 1s 2s -50%
poly_compress_d4_native_x86_64 1s 1s +0%
poly_invntt_tomont_native 1s 4s -75%
poly_reduce_native_aarch64 1s 1s +0%
poly_tobytes_native_aarch64 1s 4s -75%
poly_tomont_native_aarch64 1s 1s +0%
@willieyz willieyz force-pushed the port-m33-an524 branch 8 times, most recently from 9510d04 to 6a14b70 Compare March 3, 2026 09:06
Add bare-metal platform support for Cortex-M33 on MPS3-AN524, tested via qemu (qemu-system-arm -M mps3-an524). Notice that the configuration *_CONFIG_REDUCE_RAM did not implement in mlkem-native, we skip this option during this porting. - Add platform makefile and qemu exec wrapper for M33-AN524 - Platform files are provided by pqmx, see slothy-optimizer/pqmx#116 - Add Cortex-M33 DWT cycle counter support in HAL (distinct from Cortex-M55 PMU-based counting) - Generalize Nix package from m55-an547 to pqmx to serve both M55-AN547 and M33-AN524 platforms - Allow MLD_BUMP_ALLOC_SIZE to be overridden at compile time (we only have a 96 KiB stack) - Add M33 baremetal test to CI matrix Signed-off-by: willieyz <willie.zhao@chelpis.com>
Signed-off-by: willieyz <willie.zhao@chelpis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants