🤖 AI - kate.yang · Scour

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

💬LLMs Code

github.com··r/LocalLLaMA

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

huggingface.co··r/LocalLLaMA

Claude Fable 5 is Mythos for the masses

🤖AI in Games Blog

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

💬LLMs Blog

The Transformer Architecture: A Step-by-Step Guide

✨Generative AI Blog

m7mdelyoussef.medium.com·

Start Up No.2680: Apple to relaunch Siri again, jet fuel shortage hits Brazil, astrophysicists see LLM future, and more

📰AI News Blog

theoverspill.blog·

GPU Servers for Best Performance

⚙️Game Engines

leaseweb.com··DEV

The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has

xda-developers.com·

On-device AI is a margin decision

💬LLMs Blog

ziraph.com··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

💬LLMs Blog

dnhkng.github.io·

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

💬LLMs Blog

·

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

💬LLMs Blog

dnhkng.github.io·

What Are Tokens in LLMs?

💬LLMs Blog

bearisland.dev··Hacker News

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

💬LLMs Academic

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

Show HN: Ext-Infer

infer.displace.tech··Hacker News

Here's a llama.cpp CLI Command builder.

llamabuilding.com··r/LocalLLaMA

Critical Hugging Face Transformers flaw ran attacker code on a routine model load

siliconangle.com·

Tokenminning: Because Tokenmaxxing Is a Bad Idea

tokenminning.com··Hacker News

Issue #390 - The ML Engineer 🤖

📰AI News News Blog

machinelearning.substack.com··Substack

Sign up or log in to see more results

Log in to enable infinite scrolling