Back to linbolin1230's feed

GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen... (opens in new tab) 💬LLMs 46 articles covering this post

github.com··DEV, r/GooglePixel, r/LocalLLaMA, r/LocalLLaMA·Covered by vettedconsumer.com + 29 more·Open original

LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

Read the original article

Sign in to keep reading the full article.

Covered in 46 articles

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

I switched from LM Studio to llama.cpp, and I'm never going back to a bloated wrapper

howtogeek.com·

Pairing Claude Code with Local Models

kdnuggets.com·

View all 46 ›