GPT-2 124M checkpoint pre-trained on OpenWebText 27.5B tokens (opens in new tab)
GPT-2 124M — OpenWebText Baseline Model Card A 124M-parameter GPT-2 trained from scratch on OpenWebText data using a hand-written deep learning library (no PyTorch in the model or training path). ...
Read the original article