LLM serving frameworks

Feeds to Scour
SubscribedAll
Scoured 240 posts in 8.6 ms

DiffusionGemma: The Developer Guide- Google Developers Blog

 🔧Systems-level optimizations for LLM serving  Content type: Blog

Unsloth Gemma 4 QAT

 Model optimizations in LLMs
unsloth.ai·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 🧠Large Language Models (LLMs)

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

 📊AI Performance Profiling
everylocalai.com··DEV

AI Serving Platform That Adapts to Your Model

 📊AI Performance Profiling  Content type: Blog
databricks.com·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 Model optimizations in LLMs  Content type: News  Content type: Blog
blog.google··Hacker News
Less-relevant results

Google's new open-weights model brings image-generation tricks to AI text generation

 🧠Large Language Models (LLMs)  Content type: News
theregister.com·

DiffusionGemma 26B A4B results on my 5090

 🧠Large Language Models (LLMs)

Fixing a stuck Ollama runner and building a GPU watchdog

 📊AI Performance Profiling

What Ollama Reveals About Local AI, Agents, and Open Models

 🤖Agents using LLMs  Content type: Blog
odsc.medium.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 📊AI Performance Profiling

What's in the Box? A Field Guide to AI Models

 🧠Large Language Models (LLMs)  Content type: Blog
iankduncan.com·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

 🧠Large Language Models (LLMs)  Content type: Blog
ziraph.com··Hacker News

fix(ollama): use provider thinking default in SDK session factory (#9… · openclaw/openclaw@4f3c2cd

 🤖Agents using LLMs  Content type: Code
github.com·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🔧Systems-level optimizations for LLM serving  Content type: Blog
dnhkng.github.io·

On-device AI is a margin decision

 🧠Large Language Models (LLMs)  Content type: Blog
ziraph.com··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🔢Quantization of LLMs

An LLM that reviews your code, challenges your decisions, but never writes code for you

 💬Prompt optimizations for LLM serving  Content type: Blog
blog.adafruit.com·

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

 ⚙️AI Infrastructure Automation

How we fight GPU scarcity without compromise

 🧠Large Language Models (LLMs)  Content type: Blog
equixly.com··Hacker News

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help