LLMs

large language models, foundation models, pretraining, transformers

Feeds to Scour
SubscribedAll
Scoured 472 posts in 7.9 ms

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

 💻GPU Computing  Content type: Blog
bric.pe.kr··DEV

Ask HN: Any Local LLM can I run without GPU for Local Agentic workflow AI?

 🔧MLOps  Content type: Discussion

Google open-sources speedy DiffusionGemma text diffusion model

 💻GPU Computing
siliconangle.com·

Expanding Apple Foundation Models to support image inputs and running in private cloud compute is a huge upgrade. The old models really weren’t capable...

 🔧MLOps
manton.org·

Google Gemma4 12B released

 🔥PyTorch  Content type: Blog
medium.com·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

 💻GPU Computing
everylocalai.com··DEV

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

 🔧MLOps

How we fight GPU scarcity without compromise

 💻GPU Computing  Content type: Blog
equixly.com··Hacker News

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

 🏗️AI Infra
phoronix.com··r/artificial

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

 🔧MLOps  Content type: Blog
adambien.blog·

A Plea to the Labs: Let the Models Diagnose.

 🔧MLOps  Content type: Blog

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 💻GPU Computing  Content type: News  Content type: Blog
blog.google··Hacker News

How Gemma Collins’ dad saved her from financial ruin & helped rake in £1.4M last year… he even lives with her & fiancé

 🔥PyTorch  Content type: News
thesun.co.uk
·

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

 Distributed Training  Content type: Academic
arxiv.org·

Report: GKE Inference Gateway delivers up to 92% faster AI responses

 🏗️AI Infra  Content type: Blog

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

 🏗️AI Infra  Content type: Blog
medium.com·

Humans and LLMs share a mental disorder: Fugue Lock

 🐧Operating Systems
vwwwv.org··Hacker News

Gemma 4 31B Runs Fastest on SambaCloud

 🌐Networking
sambanova.ai·

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

 🐍Python

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

 🐍Python  Content type: Code
github.com··Hacker News

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help