Numerically Stable Cholesky-QR on GPU via Mixed-Precision Randomized Preconditioning (opens in new tab)
Cholesky-QR is among the fastest algorithms for computing the thin QR factorization of tall-and-skinny matrices on GPUs, relying entirely on BLAS-3 operations. However, it is numerically unstable: forming the Gram matrix squares the condition number, causing breakdown when $\kappa_2(\boldsymbol{A}) \gtrsim 10^8$. We present MRCQR (Mixed-Precision Randomized Cholesky-QR), a stable GPU algorithm that addresses this limitation. MRCQR uses a subsamp...
Read the original article