Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Transformers
馃 Transformers
Specific
Attention Mechanism, BERT, GPT, Language Models
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
23
posts in
9.7
ms
A handy
llama-server
launcher with easy
model
and configuration customisation
聽
馃
AI
聽
Content type:
Code
github.com
路
3d
3 days ago
路
r/LocalLLaMA
Actions for A handy llama-server launcher with easy model and configuration customisation
Here's a
llama.cpp
CLI Command builder.
聽
馃
AI
llamabuilding.com
路
1d
1 day ago
路
r/LocalLLaMA
Actions for Here's a llama.cpp CLI Command builder.
Less-relevant results
DiffusionGemma: 4x Faster Text Generation
聽
馃
AI
聽
Content type:
News
聽
Content type:
Blog
blog.google
路
3h
3 hours ago
路
Hacker News
,
r/LocalLLaMA
,
r/singularity
Actions for DiffusionGemma: 4x Faster Text Generation
local llm on laptop 780M GPU using
llama
+ gemma 4 qat
聽
馃
AI
聽
Content type:
Blog
alper.bearblog.dev
路
4d
4 days ago
Actions for local llm on laptop 780M GPU using llama + gemma 4 qat
Qwen 3.6 27B AutoRound GGUF, need your feedback
聽
馃
AI
huggingface.co
路
1d
1 day ago
路
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
聽
馃
AI
smolhub.com
路
2d
2 days ago
路
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
Can activation verbalizers surface an internal chain of thought?
聽
馃
AI
lesswrong.com
路
3d
3 days ago
Actions for Can activation verbalizers surface an internal chain of thought?
DiffusionGemma: The Developer Guide- Google Developers Blog
聽
馃
AI
聽
Content type:
Blog
developers.googleblog.com
路
19h
19 hours ago
路
r/LocalLLaMA
Actions for DiffusionGemma: The Developer Guide- Google Developers Blog
bigattichouse/packed-twin-inference: PTI achieves ~2脳 throughput using a single quantized
model
(Q5_K_M or better) by running 4 generation streams in one batched
decode
call. The GPU loads
model
weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft
model
. No quality loss
聽
馃
AI
聽
Content type:
Code
github.com
路
1d
1 day ago
路
r/LocalLLaMA
Actions for bigattichouse/packed-twin-inference: PTI achieves ~2脳 throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss
Magenta RealTime 2: Open and Local Live Music
Models
聽
馃挰
Natural Language Processing
magenta.withgoogle.com
路
6d
6 days ago
路
Hacker News
,
Hacker News
,
r/LocalLLaMA
Actions for Magenta RealTime 2: Open and Local Live Music Models
How to reduce capability degradation from
off-model
SFT
聽
馃
AI
lesswrong.com
路
2d
2 days ago
Actions for How to reduce capability degradation from off-model SFT
MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
聽
馃
AI
聽
Content type:
News
聽
Content type:
Blog
kaitchup.substack.com
路
4d
4 days ago
路
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
[PoC] server: support requantizing kv cache by wadealexc 路 Pull Request #24134 路
ggml-org/llama.cpp
聽
馃
AI
聽
Content type:
Code
github.com
路
6d
6 days ago
路
r/LocalLLaMA
Actions for [PoC] server: support requantizing kv cache by wadealexc 路 Pull Request #24134 路 ggml-org/llama.cpp
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
聽
馃
AI
聽
Content type:
Blog
huggingface.co
路
2d
2 days ago
路
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
Defeating Introspection Adapters (and Why Threat
Models
Matter)
聽
馃
AI
lesswrong.com
路
6d
6 days ago
Actions for Defeating Introspection Adapters (and Why Threat Models Matter)
heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.
聽
馃
AI
聽
Content type:
Code
github.com
路
3d
3 days ago
路
r/LocalLLaMA
Actions for heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.
How Far Apart Does a
Model
Think Its
Tokens
Are?
聽
馃
Claude
lesswrong.com
路
2d
2 days ago
Actions for How Far Apart Does a Model Think Its Tokens Are?
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
聽
馃
Deep Learning
huggingface.co
路
6d
6 days ago
路
Hacker News
,
Hacker News
,
r/LocalLLaMA
Actions for nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? 路 ggml-org
llama.cpp
路 Discussion #24102
聽
馃
AI
聽
Content type:
Discussion
聽
Content type:
Code
github.com
路
5d
5 days ago
路
r/LocalLLaMA
Actions for Qwen3.6 + MTP: Calculated context size is smaller when I use `--spec-draft-type-* q4_0`. is this normal? 路 ggml-org llama.cpp 路 Discussion #24102
Revisiting GSM-Symbolic: Do 2026 Frontier
Models
Still Fail at Confounded Grade School Math?
聽
馃
AI
lesswrong.com
路
4d
4 days ago
Actions for Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help