Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Training
🧠 LLM Training
Specific
LLM training, pretraining, RLHF, model training, arxiv ML
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
236
posts in
23.4
ms
pathtostaff.com
·
4d
4 days ago
Self-Attention Solved the Sequential Bottleneck
Covers
14 stories
See all stories this covers
including
Attention is all you need (2017)
Covered by
tldr.tech
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Self-Attention Solved the Sequential Bottleneck
arXiv
·
22h
22 hours ago
Transformer-Based
Language
Models
Across Domain Verticals: Architectures, Applications and Critical Assessment
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment
medium.com
·
4d
4 days ago
The AI
Model
That Hijacks the Computer That Loads It
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The AI Model That Hijacks the Computer That Loads It
pyimagesearch.com
·
2d
2 days ago
Google DeepMind’s Gemma 4: MoE,
Efficiency
Tricks, and Benchmarks
Covers
7 stories
See all stories this covers
including
GitHub here . You can follow the build instructions below as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don't have a GPU or just want CPU inferen...
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Google DeepMind’s Gemma 4: MoE, Efficiency Tricks, and Benchmarks
igor´sLAB
·
5d
5 days ago
AMD at MLPerf
Training
6.0: Instinct MI355X approaches Blackwell and scales across multiple servers for the first time
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AMD at MLPerf Training 6.0: Instinct MI355X approaches Blackwell and scales across multiple servers for the first time
Simon Willison’s Weblog
·
2d
2 days ago
Porting the Moebius 0.2B image inpainting
model
to run in the browser with Claude Code
Covers
3 stories
See all stories this covers
including
Hugging Face – Fun chat with your own Artificial Intelligence
Covered by
indiehacker.news
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code
alvinashcraft.com
·
15h
15 hours ago
Dew Drop - June 24, 2026 (#4697)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Dew Drop - June 24, 2026 (#4697)
Hugging Face
·
23h
23 hours ago
Qwen-AgentWorld-35B-A3B: a 3B-active MoE
trained
to simulate MCP, terminal, SWE, Android, web and OS environments
Covers
2 stories
See all stories this covers
including
vllm-project/vllm
Covered by
GitHub
,
news.smol.ai
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Qwen-AgentWorld-35B-A3B: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments
GitHub
·
6d
6 days ago
zai-org/GLM-5
Covers
9 stories
See all stories this covers
including
GLM-5.2 (6 minute read)
Covered by
5 sources
See all sources covering this story
including
DEV Community
,
The Decoder
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for zai-org/GLM-5
IEEE Spectrum
·
5d
5 days ago
IEEE Rolls Out Large Language
Models
Virtual
Training
Course
Covers
4 stories
See all stories this covers
including
How to Compress DICOM (.dcm) Images from 1.4 MB to KB Using Python?
Covered by
contextmaestro.com
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for IEEE Rolls Out Large Language Models Virtual Training Course
fitservers.com
·
1d
1 day ago
The Production-Ready Guide to Self-Hosting LLaMA 3 on a GPU Dedicated Server
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Production-Ready Guide to Self-Hosting LLaMA 3 on a GPU Dedicated Server
arXiv
·
22h
22 hours ago
TuringViT: Making SOTA Vision
Transformers
Accessible to All
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for TuringViT: Making SOTA Vision Transformers Accessible to All
VentureBeat
·
1d
1 day ago
Enterprise-grade
AI image generation in 2 seconds is here: Krea 2 Raw and Turbo available as open weights under custom license
Covers
Krea (@krea_ai) on X
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Enterprise-grade AI image generation in 2 seconds is here: Krea 2 Raw and Turbo available as open weights under custom license
YouTube
Content type:
Video
·
6d
6 days ago
Token Injection: Crashing
LLM
Inference With Special Tokens
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Token Injection: Crashing LLM Inference With Special Tokens
biorxiv.org
·
2d
2 days ago
CellTosg2Sequence: A Unified Text-Omics-Signaling-Graph Large Language
Model
for Single-Cell Analysis
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for CellTosg2Sequence: A Unified Text-Omics-Signaling-Graph Large Language Model for Single-Cell Analysis
fineset.io
·
1d
1 day ago
Show HN: Describe a research topic, get a daily-updated
ArXiv/S2
dataset
Covered by
Hugging Face
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: Describe a research topic, get a daily-updated ArXiv/S2 dataset
Microsoft Developer Blogs
·
6d
6 days ago
Outcome-driven learning systems: Enterprise
RL
with OpenEnv and Foundry
Covers
3 stories
See all stories this covers
including
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Covered by
threadreaderapp.com
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Outcome-driven learning systems: Enterprise RL with OpenEnv and Foundry
NVIDIA Blog
·
2d
2 days ago
NAIRR Science Program Reshapes Scientific Research, Powered by NVIDIA AI Infrastructure
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for NAIRR Science Program Reshapes Scientific Research, Powered by NVIDIA AI Infrastructure
Hugging Face
·
2d
2 days ago
Experimenting with the Proposed Cross-Origin Storage API in
Transformers.js
Covers
Origin private file system – MDN
Covered by
Blogccasion
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Experimenting with the Proposed Cross-Origin Storage API in Transformers.js
kaggle.com
·
3d
3 days ago
If a 270M
Model
Already Worked, Why Did I
Fine-Tune
a 7B One?
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for If a 270M Model Already Worked, Why Did I Fine-Tune a 7B One?
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report