🤖 LLM Inference - anarcher · Scour

Blazing fast on-device GenAI with LiteRT-LM 🦙llama.cpp

developers.googleblog.com·1d

Gemini’s AI Comeback, TPU Wars, & Karpathy Returns 🤖AI

briefing.forwardfuture.ai·18h

Meta's WhatsApp Incognito Chat puts AI conversations in a black box 🦙llama.cpp

LLM Observability with Self-Hosted Langfuse and vLLM 🦙llama.cpp

pyimagesearch.com·2d

QClaw: A Fully Local Agentic Assistant on the Arduino Uno Q 🦙llama.cpp

hackster.io·23h

ImpactArbiter – A PyTorch autograd trap for LLM memory bugs 🦙llama.cpp

github.com·2d·Hacker News

Ollama Doesn't Know Its GPU Is on Another Machine 🦙llama.cpp

loopholelabs.io·15h·Hacker News

A cheap fix that saves the AI $400M dollars a year and brings 4B people online ⚙️Zig

codecai.net·3d·Hacker News

Cerebras Brings Kimi K2.6 Inference to Enterprises 🤖AI

cerebras.ai·1d·Hacker News

Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW) 🦙llama.cpp

semiengineering.com·11h

DeepSeek Agent Harness: Technical deep-dive & the open-source blueprint 💾SQLite

dlcmh.github.io·3h·Hacker News

ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images 🦙llama.cpp

sleepingrobots.com·4d

Announcing OpenAI-compatible API support for Amazon SageMaker AI endpoints 🤖AI

aws.amazon.com·5h

I replaced GitHub Copilot with a self-hosted AI and I won’t go back ⚙️Zig

xda-developers.com·10h

Qwen3.6-27B-UD-Q4_K_XL.gguf · unsloth/Qwen3.6-27B-MTP-GGUF at main 🦙llama.cpp

huggingface.co·3d·r/LocalLLaMA

SpecSA: Bridging Speculative Decoding and Sparse Attention for Efficient LLM Inference 🦙llama.cpp

AI runs on tokens. There’s a missing artifact between them. 🤖AI

·2d

Towards local plug-and-play AI 🦙llama.cpp

adlrocha.substack.com·3d·Substack

Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX 🧠Memory Allocators

phoronix.com·13h·Hacker News

HF downloader utility tampermonkey 🦙llama.cpp

greasyfork.org·2d·r/LocalLLaMA

Sign up or log in to see more results

Log in to enable infinite scrolling