AI

Feeds to Scour
SubscribedAll
Scoured 28 posts in 11.0 ms

local llm on laptop 780M GPU using llama + gemma 4 qat

 🤖Transformers  Content type: Blog
alper.bearblog.dev·

Here's a llama.cpp CLI Command builder.

 🤖Transformers

DiffusionGemma: The Developer Guide- Google Developers Blog

 🤖Machine Learning  Content type: Blog

DiffusionGemma: 4x Faster Text Generation

 🤖Machine Learning  Content type: News  Content type: Blog

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

 🤖Transformers  Content type: Code
github.com··r/LocalLLaMA

Machinic Psychopharmacology: Do LLMs Self-Medicate?

 🚀Model Serving
lesswrong.com··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🚀Model Serving  Content type: News  Content type: Blog

Qwen 3.6 27B AutoRound GGUF, need your feedback

 🤖Transformers

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

 🧠Deep Learning
smolhub.com··r/LocalLLaMA

Youssof Altoukhi (@Youssofal_)

 🚀Model Serving
xcancel.com··r/LocalLLaMA

Homebrew, Again

 🛠️Feature Engineering  Content type: Blog
jerryz.bearblog.dev·

A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.

 🐍Programming

Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

 🦙Claude
lesswrong.com·

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

 🦙Claude  Content type: Code
github.com··r/LocalLLaMA

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

 🚀Model Serving

Can activation verbalizers surface an internal chain of thought?

 🤖Transformers
lesswrong.com·

How to reduce capability degradation from off-model SFT

 🤖Machine Learning
lesswrong.com·

A handy llama-server launcher with easy model and configuration customisation

 🤖Transformers  Content type: Code
github.com··r/LocalLLaMA

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

 🤖Machine Learning  Content type: Blog

Defeating Introspection Adapters (and Why Threat Models Matter)

 🤖Machine Learning
lesswrong.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help