🤖 AI Inference - nmarshall · Scour

CPritch/shiftpaper: Parallax wallpaper for Wayland with depth estimation, written in Rust + WGSL 📈Grafana

github.com·2d·Hacker News

imec IC-Link and TSMC 3DFabric Alliance Expansion Signals New Era of System-Level Scaling ⚡Hardware Acceleration

semiwiki.com·1d

KV Cache and Flash Attention with interactive diagrams 💾Cache Optimization

kvcache.cobanov.dev·9h·Hacker News

I ran this bulky LLM on an SBC cluster, and it's the most unhinged setup I've ever built ⚙️LLVM

xda-developers.com·6d

zero-intelligence/zero-intel: Every codebase has a confession. Most people never ask it the right question. 🔍Code Review

github.com·13h·Hacker News

The AI Inference Supercycle Is Here. These 2 Stocks Will Be the Biggest Winners of This Megatrend (Hint: It's Not Broadcom or Intel) 🏗️AI Infrastructure

Show HN: Marlin-2B: a tiny VLM to extract structured information from videos 🏗️AI Infrastructure

huggingface.co·2d·Hacker News

Flash Getting Stacked High-Bandwidth Version 🔁Cache Coherence

semiengineering.com·6d

wojciechowskiapp/Kaption: Real-time in-game subtitle translation for Hoyoverse titles like Genshin Impact, Honkai: Star Rail on Windows 🗣️Voice Coding

github.com·2d·Hacker News

AI Inference Costs: The Wake-Up Call for 2026 and 2027 🏗️AI Infrastructure

blog.herlein.com·1d·Hacker News

What's in a GGUF, besides the weights - and what's still missing? 🤖LLMs

nobodywho.ooo·6d·Hacker News, r/LocalLLaMA

BuffaloTechRider/Autodidact: Self-learning AI agent that gets smarter and cheaper over time. Routes between local and cloud LLMs, learns from every interaction, remembers everything. 🤖AI agents

github.com·1d·Hacker News

2.3x KV Cache Compression at 32k Context 🏗Computer Architecture

github.com·6d·Hacker News

Software 3.0 🔧Software Engineering

dsebastien.net·2d

codexstar69/pi-listen: Hold-to-talk voice input for Pi CLI — Deepgram streaming STT with live transcription, voice commands, and cross-platform hold detection 🎙️Whisper

github.com·1d·Hacker News

PyTorch, rewritten from scratch in pure Rust 🔥Burn

github.com·6d·Hacker News

ImpactArbiter – A PyTorch autograd trap for LLM memory bugs ∀Lean4

github.com·2d·Hacker News

kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo 🎚️Voice AI Systems

github.com·3d·Hacker News

chiennv2000/orthrus: Fast, lossless LLM inference via dual-view diffusion decoding. 💻Local LLMs

github.com·5d·Hacker News

LocalVibe – Pure-Rust local AI stack with MCP, in one binary (Apple Silicon) ☁️Serverless Rust

github.com·4d·Hacker News

Log in to enable infinite scrolling