How I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS (opens in new tab)
How to fit a transformer model into a low-memory VPS using Dynamic Quantization and ONNX Runtime.
Read the original articleHow to fit a transformer model into a low-memory VPS using Dynamic Quantization and ONNX Runtime.
Read the original article