GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen... (opens in new tab) 💬LLMs 46 articles covering this post
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
Read the original article