🤖 Transformers - hello · Scour

Transformers Architecture: How Google’s ‘Attention Is All You Need’ Changed Deep Learning Forever

pub.towardsai.net·6h

💬Natural Language Processing

Flag this post

Deciphering Human Language for Machines: A Developer's Guide to NLP

dev.to·7h·

Discuss: DEV

💬Natural Language Processing

Flag this post

Continuous Autoregressive Language Models

shaochenze.github.io·9h·

Discuss: Hacker News

Flag this post

How Self-Attention Actually Works (Simple Explanation)

dev.to·1h·

Discuss: DEV

💬Natural Language Processing

Flag this post

Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning

arxiv.org·7h

Flag this post

Feature Stores 2.0: The Next Frontier of Scalable Data Engineering for AI

hackernoon.com·7h

🎨Design Systems

Flag this post

An introduction to program synthesis (Part II) - Automatically generating features for machine learning

mchav.github.io·1h·

Discuss: r/programming

🎭Program Synthesis

Flag this post

Post-training methods for language models

developers.redhat.com·1d

💬Prompt Engineering

Flag this post

Detailed Technical Documentation on AI Implementation Logic (Taking Large Language Models as an Example )

nbtab.com·1d·

Discuss: DEV

Flag this post

Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)

sebastianraschka.com·2d·

Discuss: r/LLM

💬Prompt Engineering

Flag this post

Beyond Standard LLMs

magazine.sebastianraschka.com·23h·

Discuss: Hacker News, r/LLM

🎯Reinforcement Learning

Flag this post

'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'

lmsys.org·1d

Flag this post

How to Create Your Own AI GPT: A Developer’s Guide

dev.to·1d·

Discuss: DEV

💬Prompt Engineering

Flag this post

Generalizing Test-Time Compute-Optimal Scaling as an Optimizable Graph

huggingface.co·7h·

Discuss: Hacker News

Flag this post

Topographical sparse mapping: A training framework for deep learning models

sciencedirect.com·14h·

Discuss: Hacker News

👁️Computer Vision

Flag this post

Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique

venturebeat.com·16h

⚡Incremental Computation

Flag this post

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

arxiv.org·1d

👁️Computer Vision

Flag this post

An underqualified reading list about the transformer architecture

fvictorio.github.io·5d·

Discuss: Hacker News

💬Prompt Engineering

Flag this post

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization

paperium.net·3d·

Discuss: DEV

💬Prompt Engineering

Flag this post

Automated Figure-Text Alignment & Knowledge Extraction for Scientific Literature

dev.to·13h·

Discuss: DEV

Flag this post

Loading more...