🧠 LLMs - foglerek · Scour

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🌐Open Source AI Code

github.com··Hacker News

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

⚡Inference Blog

·

The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking

Token4Token — pay-per-token inference on Gnosis + Swarm

🌐Open Source AI

t4t.eth.link··Hacker News

our workplace LLM mass delusion

✍️Prompt Engineering Blog

blog.avas.space··Hacker News

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

⚡Inference Blog

Show HN: Ext-Infer

🌐Open Source AI

infer.displace.tech··Hacker News

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🌐Open Source AI Blog

adambien.blog·

Machinic Psychopharmacology: Do LLMs Self-Medicate?

lesswrong.com··Hacker News

Intro — Sehastrajit

⚡Inference Blog

Speculators v0.5.0: DFlash support and online training

developers.redhat.com·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🌐Open Source AI

Modernizing attendance ticketing in SAS Viya using SAS Agentic AI Accelerator

🤖AI Agents Blog

blogs.sas.com·

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

⚡Inference Code

github.com··r/LocalLLaMA

Researchers Build Self-Replicating AI Worm That Operates Entirely on Local, Open-Weight Models

🌐Open Source AI

thehackernews.com·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

⚡Inference Academic

MLPerf and the rise of latency-aware LLM benchmarking

Melanie Mitchell: What We Get Wrong About AI

✍️Prompt Engineering

yalereview.org··Substack, Hacker News, Hacker News

Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore

👨‍💻Coding Agents Blog

aws.amazon.com·

Issue #390 - The ML Engineer 🤖

⚡Inference News Blog

machinelearning.substack.com··Substack

Log in to enable infinite scrolling