🧠 LLM Research - inarcissuss

🗣️Large Language Models Machine Learning Mastery·

Clustering Unstructured Text with LLM Embeddings and HDBSCAN

HRM-Text: Efficient Pretraining Beyond Scaling

Covers sapientinc/HRM-Text: HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

Discussed on Hacker News

🧠LLM ByteByteGo Newsletter·

Large Language Models vs Small Language Models

Covers 6 stories including Attention is all you need (2017)

⚡Transformers astledsa.substack.com·

Tree Transformers

Discussed on Substack

🗣️Large Language Models medium.com

Large Language Models: Architectures, Pretraining, and Roadmaps

🗣️Large Language Models medium.com

How LLMs Actually Work

🗣️Large Language Models medium.com

Temperature and Sampling in Transformers: How LLMs Decide the Next Word

🧠AI Models Bloomberg

Tech Disruptors: Invisible Technologies on RLHF and LLM Training

🤖AI/ML medium.com

The Coming War Between Memory and Compute in AI Systems

📄AI Papers arXiv·

RoFormer: Enhanced Transformer with Rotary Position Embedding

Covered by 13 sources including pathtostaff.com, DEV Community

🗣️Large Language Models IT之家·

富士通介绍 PHOTON 框架：1.2B 模型多查询性能 475 倍于 Transformer

🤖AI Development hamanlp.org·

Lean Zig by building an LLM from scratch

Covers Zig Software Foundation ⚡ Zig Programming Language

Discussed on Hacker News

🤖LLM, Agent Deep (Learning) Focus·

Agentic RL: Frameworks and Best Practices

Covers 3 stories including MCP is an open protocol that standardizes how apps provide context to LLMs

Discussed on Substack

🤖AI Development medium.com

Why I Stopped Focusing on ML Algorithms and Started Focusing on Data and Systems

🗣️Large Language Models foojay·

BoxLang 1.14.0 : Query Transformers – Take Full Control of Your Query Results

🤖人工智能 SiliconANGLE·

Mirendil raises $200M to speed up scientific research with AI

🤖AI Development GitHub·

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Discussed on Hacker News

🔬Deep Learning Neuroscience News·

LoRA: Low-Rank Adaptation of Large Language Models

Train LLM from Scratch

Clustering Unstructured Text with LLM Embeddings and HDBSCAN

HRM-Text: Efficient Pretraining Beyond Scaling

Large Language Models vs Small Language Models

Tree Transformers

Large Language Models: Architectures, Pretraining, and Roadmaps

How LLMs Actually Work

Temperature and Sampling in Transformers: How LLMs Decide the Next Word

Tech Disruptors: Invisible Technologies on RLHF and LLM Training

The Coming War Between Memory and Compute in AI Systems

RoFormer: Enhanced Transformer with Rotary Position Embedding

富士通介绍 PHOTON 框架：1.2B 模型多查询性能 475 倍于 Transformer

Lean Zig by building an LLM from scratch

Agentic RL: Frameworks and Best Practices

Why I Stopped Focusing on ML Algorithms and Started Focusing on Data and Systems

BoxLang 1.14.0 : Query Transformers – Take Full Control of Your Query Results

Mirendil raises $200M to speed up scientific research with AI

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Human Memory Limits Make AI Better at Grammar