🧠 LLM Inference - akapaka · Scour

Ideogram-4-FP8 Brings High-Quality Text-to-Image Generation to More Hardware

hackernoon.com·

Less-relevant results

TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to optimise AI integration

🔌Model Context Protocol Code

github.com··Hacker News

Youssof Altoukhi (@Youssofal_)

xcancel.com··r/LocalLLaMA

Apple rebuilt its on-device AI stack at WWDC 2026

🤖Machine Learning Blog

ziraph.com··Hacker News

Does anyone know what PCIe mode was used for these benchmarks?

🧠Local llm Code

github.com··r/LocalLLaMA

Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving

⚡LLM Quantization Academic

arxiv.org··Hacker News

The economics of speculative decoding

⚡LLM Quantization Blog

fergusfinn.com··Hacker News

Show HN: Magenta Real-Time Music Generation on iPhone, Without the GPU

🤖Machine Learning Code

github.com··Hacker News

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🤖Qwen News

digg.com··Hacker News

princezuda/-RequiemGPT-: Fully open source and open weights built and trained by fable five with one prompt. An experience in how AI actually works

🤖Machine Learning Code

github.com··Hacker News

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

🧠Local llm Code

github.com··r/LocalLLaMA

How to cut the cost of long AI agent threads (without making the agent dumber)

🔌Model Context Protocol Blog

viktor.com··Hacker News

john-rocky/apple-silicon-llm-bench: Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models

🤖Qwen Code

github.com··Hacker News

See, Act, Correct: three levers for working with a code agent

🔌Model Context Protocol Blog

blog.owulveryck.info··Hacker News, Hacker News

The iPhone’s Last Stand

stratechery.com··Hacker News

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

🧠Local llm Code

github.com··Hacker News

mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp

🧠Local llm Code

github.com··r/LocalLLaMA

Do Transformers Need Three Projections? Systematic Study of QKV Variants

⚡LLM Quantization Academic

arxiv.org··Hacker News

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

🔌Model Context Protocol Blog

developer.nvidia.com··Hacker News

AI Coding Agents Have a Cost Visibility Problem

hackernoon.com·

No more posts from akapaka's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

Sign up or log in to see more results

Log in to enable infinite scrolling