V4把KV压到13.5%,视频记忆快10倍 (opens in new tab)

DeepSeek-V4把「索引+稀疏」押进主架构:解码不再让完整KV cache常驻显存,而是用Neural Memory Indexer按需取相关历史片段,长上下文评测里KV占用压到13.5%、下游精度还微涨0.6个百分点。 视频世界模型的记忆搬进latent,省掉像素往返——Mirage不再在RGB空间建显式点云,端到端生成快10.57倍、显存降到1/55,同时在WorldScore上拿到SOTA。 看图能答对,动手就不行,SpatialWorld让agent在第一视角环境里边操作边推理空间关系,最强模型平均成功率也只有17.4%,瓶颈出在主动探索和长程规划而非单步推理。 模仿学习崩在分布外,未必要靠更大的策略网络。DARP在推理时检索专家示范,并显式建模查询与邻居的差异向量,多个域上比标准行为克隆提升15–46%。

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help