Back to inarcissuss's feed

🧠Transformer Architecture magazine.sebastianraschka.com

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention (opens in new tab)

Covers 3 stories including 用于代理编码和长时域工作的基础模型Covered by tldr.tech, kite.kagi.comDiscussed on Hacker News, Hacker News, Hacker News, and r/LocalLLaMA

From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs

Read the original article

Sign in to keep reading the full article.

Covered in 2 articles

Gemini Extended Thinking ✨, ChatGPT finance 📱, Claude Code at scale 👨‍💻

In other languages

kite.kagi.com·

LLM 시장에서 기업들의 입지를 넓혀가는 오픈 가중치 모델