⚡ Vllm - unclamproot · Scour

The economics of speculative decoding

🤖LLM Inference Blog

fergusfinn.com··Hacker News

OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software

🤖LLM News

cnx-software.com·

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

🤖LLM Inference Code

github.com··r/LocalLLaMA

#065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory

🤖LLM Inference

indiehacker.news·

SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving

🤖LLM Academic

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

smolhub.com··r/LocalLLaMA

Issue #390 - The ML Engineer 🤖

🤖AI News Blog

machinelearning.substack.com··Substack

End-to-End Context Compression at Scale

🤖LLM Inference Academic

google/gemma-4-12B-it-qat-q4_0-gguf

huggingface.co·

FOD#155: Continual Learning in LLMs: Why AI Models Need Sleep

turingpost.com·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🤖LLM Code

github.com··Hacker News

Gated DeltaNet, From First Principles

🤖LLM Inference Blog

sankalp.bearblog.dev·

Build a Medical Report Analyzer on Dedicated Inference with Python

digitalocean.com·

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

🤖LLM Inference Academic

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

🤖LLM Inference Blog

blogs.nvidia.com·

The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again

🤖LLM Blog

See, Act, Correct: three levers for working with a code agent

🤖AI Blog

blog.owulveryck.info··Hacker News, Hacker News

How to cut the cost of long AI agent threads (without making the agent dumber)

🤖LLM Inference Blog

viktor.com··Hacker News

Benchmarking dots.tts on Strix Halo

sleepingrobots.com·

Still: Amortized KV Cache Compaction in a Single Forward Pass

🤖LLM Inference Academic

Sign up or log in to see more results

Log in to enable infinite scrolling