⚡ Fast AI Inference - emschwartz · Scour

How to Run Gemma 4 12B Locally - The Best AI For Consumer Laptops 🤖AI Video

youtube.com·23h

Sources: ByteDance has partnered with chipmaker InnoStar to develop an AI inference chip modeled after Groq's LPUs, which are built to run AI models at low cost... 🏗️LLM Infrastructure

·6d

FitMyLLM — Independent benchmarks for self-hosted AI 🏠Self-Hosting Discussion

lemmy.world·2d

Nvidia Pays $400 Million for AI Software Firm Kumo 🆕New AI

pymnts.com·17h

Bit-Exact AI Inference Verification Without Performance Tradeoffs 🏗️LLM Infrastructure Academic

Making Local LLM Go Brrr 🤖AI

seanpedersen.github.io·1d

Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster Inference Gateway 🌍Distributed Systems Blog

cloud.google.com·2d

jmaczan/tiny-vllm: Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM 🏗️LLM Infrastructure Code

github.com·6d·Hacker News

Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W 📱Edge AI Optimization

armdevices.net·9h

3-Part Series: LLM Latency in Production (Part 1) 🧠LLM Inference

towardsai.net·1d

Where to Host Your Open-Source Model (Under 10B Parameters) 🤖AI

digitalocean.com·18h

Step 3.7 Flash – 198B-A11B MoE vision-language model 🤖AI

huggingface.co·5d·Hacker News

Nemotron 3 Ultra now available on AI Gateway 🪄Prompt Engineering

Intel's attempting to break into the AI market once more, but this time avoiding Nvidia's dominance in training by going for inference 🖥GPUs

·3d

Qwen3.7 Plus - Intelligence, Performance & Price Analysis 💰Tokenomics

artificialanalysis.ai·1d·Hacker News

Deploy Hermes Agent on OpenShift AI with vLLM model serving 🏗️LLM Infrastructure

developers.redhat.com·3d

Your first model deployment on Foundry Local on Azure Local: from catalog to inference in 10 minutes 💻Chips

techcommunity.microsoft.com·2d

Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org llama.cpp · Discussion #24102 🤖AI Discussion Code

github.com·4h·r/LocalLLaMA

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and Nvidia 🤖AI Blog

developer.nvidia.com·2d·Hacker News

YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition 🏗️LLM Infrastructure Academic

Log in to enable infinite scrolling