Parameter-Aware and Instruction-Driven Dilithium Optimization on AVX2 and NEON (opens in new tab)
We improve the performance of the lattice-based cryptosystem Dilithium on AVX2 and NEON by deeply exploiting its algorithmic properties, such as small coefficient bounds and high sparsity, with the distinct instruction-level profiles of the underlying architectures. On AVX2, we deploy a single-modulus 16-bit NTT for $c \cdot \mathbf{s}_i$ and a multi-moduli 16-bit NTT coupled with a vectorized CRT reconstruction for $c \cdot \mathbf{t}_0$. These instruction-level optimizations accelerate the ...
Read the original article