Skip to main content
Scour
Discover
Docs
Login
Sign Up
Discover
About
Docs
Changelog
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🧠 LLM Training
GitHub
·
4d
4 days ago
Rust port of
transformers
(1M lines of code)
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Rust port of transformers (1M lines of code)
fareedkhan-dev.github.io
·
1d
1 day ago
Train
LLM
from Scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Train LLM from Scratch
arxiv.org
·
5d
5 days ago
Pareto LoRA: Mitigating Modality Imbalance in Unified Multimodal
Models
via Pareto-Optimal
Gradient
Integration
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Pareto LoRA: Mitigating Modality Imbalance in Unified Multimodal Models via Pareto-Optimal Gradient Integration
Nature
·
3d
3 days ago
Memorization in large language
models
in medicine prevalence characteristics and implications
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Memorization in large language models in medicine prevalence characteristics and implications
huggingface.co
·
4d
4 days ago
Beyond LoRA: Can you beat the most popular
fine-tuning
technique?
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Beyond LoRA: Can you beat the most popular fine-tuning technique?
bloomberg.com
·
7h
7 hours ago
Tech Disruptors: Invisible Technologies on
RLHF
and
LLM
Training
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Tech Disruptors: Invisible Technologies on RLHF and LLM Training
Machine Learning Blog
·
5d
5 days ago
Pre-Training
Isn’t Bitter Enough
Covered by
Deep Learning Weekly
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Pre-Training Isn’t Bitter Enough
medium.com
·
5d
5 days ago
I
Finally
Used
Hugging
Face
. Here’s What I Built and What I Actually Learned.
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I Finally Used Hugging Face. Here’s What I Built and What I Actually Learned.
lesswrong.com
·
5d
5 days ago
Alignement
pretraining
could backfire
Covers
Teaching Claude why
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Alignement pretraining could backfire
chierhu.medium.com
·
4d
4 days ago
Scaling Self-Play with Self-Guidance: An AlphaZero-Style Path for Language
Models
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Scaling Self-Play with Self-Guidance: An AlphaZero-Style Path for Language Models
GitHub
·
3d
3 days ago
Show HN: NanoEuler – GPT-2 scale
model
in pure C/CUDA from scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch
mlx-lora-studio.netlify.app
·
3d
3 days ago
MLX LoRA Studio —
Fine-tune
LLMs on your Mac
Covers
ml-explore/mlx
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MLX LoRA Studio — Fine-tune LLMs on your Mac
kaggle.com
·
1d
1 day ago
LoRA: I
Trained
<1% of a 1.5B
Model
and Matched a Full
Fine-Tune
Discussed on
DEV
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for LoRA: I Trained <1% of a 1.5B Model and Matched a Full Fine-Tune
biorxiv.org
·
2d
2 days ago
Tox21mer, A
transformer
foundation
model
for Tox21 high-throughput concentration-response curves data
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Tox21mer, A transformer foundation model for Tox21 high-throughput concentration-response curves data
shahzadasghar.medium.com
·
4d
4 days ago
When 95% of AI’s Brain is English, the Rest of the World Pays a Tax
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for When 95% of AI’s Brain is English, the Rest of the World Pays a Tax
mateostarcevicfilipovic.medium.com
·
6d
6 days ago
I tested 8 free AI image upscalers so you don’t have to. Only 4 are real.
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I tested 8 free AI image upscalers so you don’t have to. Only 4 are real.
medium.com
·
2d
2 days ago
The AI
Model
That Hijacks the Computer That Loads It
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The AI Model That Hijacks the Computer That Loads It
igor´sLAB
·
3d
3 days ago
AMD at MLPerf
Training
6.0: Instinct MI355X approaches Blackwell and scales across multiple servers for the first time
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AMD at MLPerf Training 6.0: Instinct MI355X approaches Blackwell and scales across multiple servers for the first time
i-programmer.info
·
6d
6 days ago
Stanford's CME296 Diffusion & Large Vision
Models
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Stanford's CME296 Diffusion & Large Vision Models
GitHub
·
5d
5 days ago
Show HN: Chess bot based on the
transformer
architecture
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: Chess bot based on the transformer architecture
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report