research/systems
Show
-
Essay
We systematically decompose the sources of SIMD speedup for ML-KEM (Kyber) on Intel x86-64 AVX2. By benchmarking four compilation variants, we demonstrate that GCC’s auto-vectorizer provides negligible benefit, and that hand-written AVX2 assembly delivers a – performance increase for core arithmetic operations. This drives an end-to-end KEM speedup of –.