How I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS (opens in new tab)

Discussed on Hacker News

How to fit a transformer model into a low-memory VPS using Dynamic Quantization and ONNX Runtime.