🤖 AI Inference - nmarshall · Scour

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents 🏗️AI Infrastructure

inferencebench.ai·5h·Hacker News

A cheap fix that saves the AI $400M dollars a year and brings 4B people online 🏗️AI Infrastructure

codecai.net·3d·Hacker News

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips 🔁Cache Coherence

supercomputing-system-ai-lab.github.io·2d·Hacker News

Command A+: Making sovereign agentic capabilities available to all 🤖AI Coding Tools

cohere.com·12h·Hacker News

Unleashing the Power of ONNX for Speedier SBERT Inference 🏗️AI Infrastructure

pub.towardsai.net

·1d

Artain-AI/ignite-ms: Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control. 🔥Burn

github.com·11h·Hacker News

Let AI Agents Write Your Serving Stack with VibeServe 🏗️AI Infrastructure

syfi.cs.washington.edu·6d·Hacker News

Training a 22MB prompt injection classifier 🏗️AI Infrastructure

stackone.com·13h·Hacker News

DeepSeek V4 Flash: Bringing Frontier AI to the Home ⚡Hardware Acceleration

blog.jonathanpage.com·2d·Hacker News

kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command. 🎚️Audio Codecs

github.com·7h·Hacker News

The Best Open Source and Open-Weight LLM Models to Run Locally in 2026 💻Local LLMs

huggingface.co·2d

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU 🏗️AI Infrastructure

theahmadosman.substack.com·7h·Substack, r/LocalLLaMA

Show HN: GPT-2 inference in pure C#, 0 bytes allocated per token 🔥Burn

github.com·3d·Hacker News

The Oats Protocol – Open Agent Tools for Local Coding Agents 🧩Nomad

news.ycombinator.com·2d·Hacker News

I tried 4 LLM speedup techniques on CPU. Three made it slower. ⚙️Performance Profiling

deemwar-products.github.io·9h·Hacker News

Ollama Doesn't Know Its GPU Is on Another Machine ⚡Hardware Acceleration

loopholelabs.io·14h·Hacker News

Mistral SDK 🎨Design Systems

dsebastien.net·2d

A VERY lightweight open web-search tool for smaller local LLMs ⚙️DataFusion

github.com·6d·Hacker News, r/LocalLLaMA

I replaced GitHub Copilot with a self-hosted AI and I won’t go back 🤖AI Coding Tools

xda-developers.com·9h

With Its IPO Done, Cerebras Can Get Back To Pushing The AI Envelope 🧠Neuromorphic Chips

nextplatform.com·5d·Hacker News

Log in to enable infinite scrolling