Transformers

Feeds to Scour
SubscribedAll
Scoured 23 posts in 9.7 ms

A handy llama-server launcher with easy model and configuration customisation

馃AIContent type: Code
github.comr/LocalLLaMA

Here's a llama.cpp CLI Command builder.

馃AI
Less-relevant results

DiffusionGemma: 4x Faster Text Generation

馃AIContent type: NewsContent type: Blog

local llm on laptop 780M GPU using llama + gemma 4 qat

馃AIContent type: Blog
alper.bearblog.dev

Qwen 3.6 27B AutoRound GGUF, need your feedback

馃AI

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

馃AI
smolhub.comr/LocalLLaMA

Can activation verbalizers surface an internal chain of thought?

馃AI
lesswrong.com

DiffusionGemma: The Developer Guide- Google Developers Blog

馃AIContent type: Blog

bigattichouse/packed-twin-inference: PTI achieves ~2脳 throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

馃AIContent type: Code
github.comr/LocalLLaMA

How to reduce capability degradation from off-model SFT

馃AI
lesswrong.com

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

馃AIContent type: NewsContent type: Blog

[PoC] server: support requantizing kv cache by wadealexc 路 Pull Request #24134 路 ggml-org/llama.cpp

馃AIContent type: Code
github.comr/LocalLLaMA

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

馃AIContent type: Blog

Defeating Introspection Adapters (and Why Threat Models Matter)

馃AI
lesswrong.com

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

馃AIContent type: Code
github.comr/LocalLLaMA

How Far Apart Does a Model Think Its Tokens Are?

馃Claude
lesswrong.com

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

馃Deep Learning

Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? 路 ggml-org llama.cpp 路 Discussion #24102

馃AIContent type: DiscussionContent type: Code
github.comr/LocalLLaMA

Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

馃AI
lesswrong.com

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help