DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence (opens in new tab)
submitted by yogthos to technology1 points | 0 comments hardware efficiency gains are honestly the most interesting part of the paper. The main reason DeepSeek-V4 is so cheap to run comes down to how they completely bypassed the quadratic cost of standard attention for massive context windows. They built a hybrid attention architecture that interleaves Compressed Sparse Attention and Heavily Compressed Attention. Standard models keep every single token in the KV cache which absolutely kills m...
Read the original article