NVIDIA五模态压进一套权重 (opens in new tab)

NVIDIA把语言、图像、视频、音频、动作塞进一套权重:Cosmos 3用一套mixture-of-transformers赌「单模型通吃所有模态」,第三方在文生图、图生视频、机器人策略三项都评其为最佳开源。 同一个KV量化方法,prefill里没事、长解码里越错越离谱:KVarN指出误差会跨时间步累积,用方差归一化压住离群token-scale,2-bit拿下KV量化新SOTA,免标定、有vLLM实现。 把上下文里临时学到的东西写回权重:「语言模型需要睡眠」撇开隐喻,机制是蒸馏加合成数据自演练;但「写什么」和「防遗忘」两个硬问题摘要没正面回答。 采样预算从手调阈值变成可学习策略:把「采多少样」形式化成MDP,用RL训一个CPU上就能跑的小控制器,在「少采样还不掉点」上比强基线拿到更好折中。

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help