💬Prompt EngineeringDEV CommunityContent type: Blog

llama-bench skipped FA on capable GPUs — b9437 corrects it (opens in new tab)

What flipped in b9437 Build b9437, published on May 30, 2026 at 20:56 UTC , ships two targeted default-value corrections to llama-bench. Flash attention (-fa) shifts from a hard-coded off to auto (LLAMA_FLASH_ATTN_TYPE_AUTO), and the GPU-layer count (-ngl) changes from the legacy sentinel 99 to -1. Both values now match what llama-server and llama-cli already used — the bench tool was simply never updated to track them until this build. Quick Answer: Before b9437 (published May 30, 2026) , ll...

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help