tech/C

Where Does SIMD Help Post-Quantum Cryptography? A Micro-Architectural Study of ML-KEM on x86 AVX2 4 April 2026
We systematically decompose the sources of SIMD speedup for ML-KEM (Kyber) on Intel x86-64 AVX2. By benchmarking four compilation variants, we demonstrate that GCC’s auto-vectorizer provides negligible benefit, and that hand-written AVX2 assembly delivers a $35\times$ – $56\times$ performance increase for core arithmetic operations. This drives an end-to-end KEM speedup of $5.4\times$ – $7.1\times$ .

research research/cryptography research/hpc research/compilers research/systems tech tech/hpc tech/asm tech/C