🧠 LLM Training

Rust port of transformers (1M lines of code)

Discussed on Hacker News

fareedkhan-dev.github.io·

Train LLM from Scratch

Discussed on Hacker News

Pareto LoRA: Mitigating Modality Imbalance in Unified Multimodal Models via Pareto-Optimal Gradient Integration

Memorization in large language models in medicine prevalence characteristics and implications

huggingface.co·

Beyond LoRA: Can you beat the most popular fine-tuning technique?

·

Tech Disruptors: Invisible Technologies on RLHF and LLM Training

Machine Learning Blog·

Pre-Training Isn’t Bitter Enough

Covered by Deep Learning Weekly

·

I Finally Used Hugging Face. Here’s What I Built and What I Actually Learned.

lesswrong.com·

Alignement pretraining could backfire

Covers Teaching Claude why

chierhu.medium.com·

Scaling Self-Play with Self-Guidance: An AlphaZero-Style Path for Language Models

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Discussed on Hacker News

mlx-lora-studio.netlify.app·

MLX LoRA Studio — Fine-tune LLMs on your Mac

Covers ml-explore/mlx

LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune

Discussed on DEV

Tox21mer, A transformer foundation model for Tox21 high-throughput concentration-response curves data

shahzadasghar.medium.com·

When 95% of AI’s Brain is English, the Rest of the World Pays a Tax

mateostarcevicfilipovic.medium.com·

I tested 8 free AI image upscalers so you don’t have to. Only 4 are real.

·

The AI Model That Hijacks the Computer That Loads It

AMD at MLPerf Training 6.0: Instinct MI355X approaches Blackwell and scales across multiple servers for the first time

i-programmer.info·

Stanford's CME296 Diffusion & Large Vision Models

Show HN: Chess bot based on the transformer architecture

Discussed on Hacker News

Log in to enable infinite scrolling