LLMs

Feeds to Scour
SubscribedAll
Scoured 346 posts in 6.1 ms

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🤖AI Agents  Content type: Code
github.com··Hacker News

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 vibe coding  Content type: Academic
arxiv.org·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

 🐛Bug Bounty
zozo123.github.io··Hacker News

Google open-sources speedy DiffusionGemma text diffusion model

 🌐Open Source
siliconangle.com·

Why LLMs (still) lack taste

 🐛Bug Bounty

Using Scikit-LLM with Open-Source LLMs

 vibe coding

A Plea to the Labs: Let the Models Diagnose.

 🐛Bug Bounty  Content type: Blog

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

 🌐Open Source
phoronix.com·

The Rise of Agentic AI: What Every Engineer Should Learn

 🤖AI Agents  Content type: Blog
medium.com·

Using local LLMs for agentic coding

 🤖AI Agents  Content type: Blog
blog.alexewerlof.com·

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

 🤖AI Agents  Content type: Blog
blogs.nvidia.com·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

 🐛Bug Bounty  Content type: Blog
adambien.blog·

How we fight GPU scarcity without compromise

 🤖AI Agents  Content type: Blog
equixly.com··Hacker News

Slack bot for the whole team, not per-seat

 🔑Authentication  Content type: Discussion
plugand.ai··Hacker News

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

 vibe coding  Content type: News  Content type: Blog
developer.nvidia.com·

How LLMs work | Practical Leaders

 🤖AI Agents

What are AI parameters — and why does everyone keep talking about billions of them?

 🤖AI Agents  Content type: Blog
medium.com·
Less-relevant results

The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has

 🤖AI Agents
xda-developers.com·

local llm on laptop 780M GPU using llama + gemma 4 qat

 vibe coding  Content type: Blog
alper.bearblog.dev·

Report: GKE Inference Gateway delivers up to 92% faster AI responses

 🤖AI Agents  Content type: Blog

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help