GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen... (opens in new tab)
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
Read the original article