🧠 LLMs - kleinjm

Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious

🤖AI Engineering

xda-developers.com·

vLLM Internalised: The Mechanics of Modern LLM Inference

🤖AI Engineering Blog

medium.com

Unsloth Minimax M3 GGUF

🤖AI Engineering

huggingface.co··r/LocalLLaMA·Cited by 1 article

A reporting checklist for large language models in behavioural science

🤖AI Engineering Academic

nature.com·

Mlx-optiq: per-layer mixed-precision LLM quantization for Apple Silicon

🤖AI Engineering Video Discussion Tutorial

mlx-optiq.com··Hacker News·Cited by 1 article

microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

🤖AI Engineering Code

github.com··DEV

Kimi K2.7-Code: Open-Weight 1T Model That Beats Claude Opus on Tool Use

🤖AI Engineering Blog

wowhow.cloud··DEV

Introduction to (Multimodal) LLM-as-a-Judge

🤖AI Engineering News Blog

yinghonglan.substack.com··Substack

Back to Basics: Build Your Own LLM from Scratch

💎Ruby

thejeshgn.com·

How LLMs are Actually Trained

🤖AI Engineering News Blog

blog.algomaster.io·

Get ChatGPT, Gemini, Claude, and more for life for $60

🤖AI Engineering

macworld.com·

How ChatGPT Actually Works (Beginner Friendly)

🤖AI Agents Blog

medium.com

Why LLMs (still) lack taste

🤖AI Engineering

beyondtheprior.com··Hacker News

Have we made a unicorn? Continuous SVG-pelican style benchmark

🔥Hotwire Reference

havewemadeaunicorn.com··Hacker News

Chain-of-Thought Prompting Is Not What You Think

🤖AI Agents

siliconopera.com·

Intelligent inference scheduling with llm-d on Red Hat AI

🤖AI Engineering

developers.redhat.com·

Build Claude Alternative in Cloud in 20mins

🤖AI Engineering Reference

docs.dagploy.com··Hacker News

US blocks Claude Fable 5 and Mythos 5: is frontier AI now too dangerous?

🤖AI Engineering Blog

techzine.eu·

Benchmarking Large Language Models for Safety Data Extraction

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious

vLLM Internalised: The Mechanics of Modern LLM Inference

Unsloth Minimax M3 GGUF

A reporting checklist for large language models in behavioural science

Mlx-optiq: per-layer mixed-precision LLM quantization for Apple Silicon

microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Kimi K2.7-Code: Open-Weight 1T Model That Beats Claude Opus on Tool Use

Introduction to (Multimodal) LLM-as-a-Judge

Back to Basics: Build Your Own LLM from Scratch

How LLMs are Actually Trained

Get ChatGPT, Gemini, Claude, and more for life for $60

How ChatGPT Actually Works (Beginner Friendly)

Why LLMs (still) lack taste

Have we made a unicorn? Continuous SVG-pelican style benchmark

Chain-of-Thought Prompting Is Not What You Think

Intelligent inference scheduling with llm-d on Red Hat AI

Build Claude Alternative in Cloud in 20mins

US blocks Claude Fable 5 and Mythos 5: is frontier AI now too dangerous?