🧠 LLM Research - inarcissuss

Large Language Models: Architectures, Pretraining, and Roadmaps

HRM-Text: Efficient Pretraining Beyond Scaling

Discussed on Hacker News

Temperature and Sampling in Transformers: How LLMs Decide the Next Word

Clustering Unstructured Text with LLM Embeddings and HDBSCAN

Memorization in large language models in medicine prevalence characteristics and implications

The Coming War Between Memory and Compute in AI Systems

富士通介绍 PHOTON 框架：1.2B 模型多查询性能 475 倍于 Transformer

Large Language Models vs Small Language Models

Why I Stopped Focusing on ML Algorithms and Started Focusing on Data and Systems

Tree Transformers

Discussed on Substack

CellTosg2Sequence: A Unified Text-Omics-Signaling-Graph Large Language Model for Single-Cell Analysis

Mirendil raises $200M to speed up scientific research with AI

Transformer-based operator learning framework for self-energy in strongly correlated systems

Tech Disruptors: Invisible Technologies on RLHF and LLM Training

How LLMs Actually Work

LLM Refusal Behavior on Open-Weight Model

Discussed on Hacker News

OpenAI debuts Jalapeño, a custom chip built to cut ChatGPT costs and reduce Nvidia reliance

LoRA: Low-Rank Adaptation of Large Language Models

Train LLM from Scratch

Large Language Models: Architectures, Pretraining, and Roadmaps

HRM-Text: Efficient Pretraining Beyond Scaling

Temperature and Sampling in Transformers: How LLMs Decide the Next Word

Clustering Unstructured Text with LLM Embeddings and HDBSCAN

Memorization in large language models in medicine prevalence characteristics and implications

The Coming War Between Memory and Compute in AI Systems

富士通介绍 PHOTON 框架：1.2B 模型多查询性能 475 倍于 Transformer

Large Language Models vs Small Language Models

Why I Stopped Focusing on ML Algorithms and Started Focusing on Data and Systems

Tree Transformers

CellTosg2Sequence: A Unified Text-Omics-Signaling-Graph Large Language Model for Single-Cell Analysis

Mirendil raises $200M to speed up scientific research with AI

Transformer-based operator learning framework for self-energy in strongly correlated systems

Tech Disruptors: Invisible Technologies on RLHF and LLM Training

How LLMs Actually Work

LLM Refusal Behavior on Open-Weight Model

OpenAI debuts Jalapeño, a custom chip built to cut ChatGPT costs and reduce Nvidia reliance

Which tokens does a hybrid model predict better?