⚡ Fast AI Inference - emschwartz · Scour

GGUF vs MLX: A Decision Guide, Not Another Benchmark 🤖AI

muhammadraza.me·2d

Majestic’s 128TB AI Server Aims to Smash the LLM Memory Wall 📱Edge AI Optimization

spectrum.ieee.org

·3d·Hacker News

Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? · ggml-org llama.cpp · Discussion #24102 🤖AI Discussion Code

github.com·1h·r/LocalLLaMA

Holo3.1: Fast & Local Computer Use Agents 🤖AI Blog

huggingface.co·2d

Multi-Lora-Continual-Learning 📅Resource Scheduling

trajectory.ai·5d·Hacker News

LLM, give me a JSON. Make no mistakes. 🤖AI

nobodywho.ooo·2d·Hacker News

Experience with "nvidia/LocateAnything-3B" 🤖AI

huggingface.co·6d·r/LocalLLaMA

"Optimal Cognitive Core"- specialized 1.7B model for grounded question answering 🤖AI

huggingface.co·1d·Hacker News

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro 🤖AI

lemongravy.me·3d·r/LocalLLaMA

3-Part Series: LLM Latency in Production (Part 1) 🤖AI

pub.towardsai.net

·2d

paralleliq/piqc: Kubernetes scanner that discovers LLMs running on vLLM and extracts their deployment and runtime facts. 🏗️LLM Infrastructure Code

github.com·2d·Hacker News

Dropstone 1.5: Technical Report 🆕New AI

blankline.org·3d·Hacker News

KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators 📱Edge AI Optimization Academic

Independent benchmarks for self-hosted AI 🤖AI

fitmyllm.com·4d

Show HN: Zerostack, an open coding agent optimized for memory footprint 🤖AI

gi-dellav.github.io·13h·Hacker News

NVIDIA/cosmos: NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more. 🏗️LLM Infrastructure Code

NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local 💻Chips Blog

blogs.nvidia.com·2d

unsloth/Qwen3-8B-GGUF 🤖AI

huggingface.co·4d

Nemotron 3 Ultra announced: high-speed, leading US open weights intelligence 🆕New AI

artificialanalysis.ai·4d·Hacker News

Exclusive: Nvidia snaps up Kumo AI in latest acquisition 🚀Startups

·1d·Hacker News

Log in to enable infinite scrolling