Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Tokenization
🔤 Tokenization
Specific
tokenizer, BPE, byte pair encoding, subword tokenization
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
18
posts in
22.2
ms
🗣️
LLMs
arXiv
·
1d
1 day ago
QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for
Tokenizer
Evaluation in Agglutinative Low-Resource
Languages
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for Tokenizer Evaluation in Agglutinative Low-Resource Languages
⚡
Transformers
colobu.com
·
5d
5 days ago
LLM 究竟是如何工作的?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for LLM 究竟是如何工作的?
🧠
LLM Research
GitHub
·
6d
6 days ago
Show HN: NanoEuler – GPT-2 scale
model
in pure C/CUDA from scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch
Less-relevant results
⚡
Transformers
Hugging Face
·
5d
5 days ago
[NEW
MODEL
] SupraLabs started the Any2Any
model
family!
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for [NEW MODEL] SupraLabs started the Any2Any model family!
🤖
AI Development
hamanlp.org
·
1d
1 day ago
Lean Zig by building an LLM from scratch
Covers
Zig Software Foundation ⚡ Zig Programming Language
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Lean Zig by building an LLM from scratch
🤖
AI
aircityshops.com
·
3d
3 days ago
Zero Weights Graph
Language
Engine (MSE-GLM)
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Zero Weights Graph Language Engine (MSE-GLM)
🗣️
Large Language Models
arXiv
·
2d
2 days ago
Phonemes to the Rescue: Multilingual
Tokenization
Based on International Phonetic Alphabet
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Phonemes to the Rescue: Multilingual Tokenization Based on International Phonetic Alphabet
🤖
AI Development
inkdroid
·
3d
3 days ago
Bookmarks - book, ai, map, llm
Covers
7 stories
See all stories this covers
including
AI Economics for Dummies
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Bookmarks - book, ai, map, llm
🔧
Tool Use
GitHub
·
1d
1 day ago
Show HN: Heddle, content-addressed contracts for spec-driven agent loops
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: Heddle, content-addressed contracts for spec-driven agent loops
⚡
LLM Optimization
Luke Salamone's Blog
·
5d
5 days ago
Semantic Search in Under 3MB
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Semantic Search in Under 3MB
🔬
AI for Science
Environmental Research Letters
·
3d
3 days ago
Deciphering the contribution of submarine groundwater discharge to estuary hypoxia
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Deciphering the contribution of submarine groundwater discharge to estuary hypoxia
⚡
LLM Optimization
arXiv
·
23h
23 hours ago
Minimax PAC Bounds for Learning in Exogenous Contextual MDPs
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Minimax PAC Bounds for Learning in Exogenous Contextual MDPs
🔌
Claude Plugins
GitHub
·
3d
3 days ago
Open Source Openrouter – Routatic
Covers
2 stories
See all stories this covers
including
Amazon Bedrock – Build genAI applications and agents at production scale – AWS
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Open Source Openrouter – Routatic
🗣️
Large Language Models
arXiv
·
6d
6 days ago
Toten: Knowledge-Based Ontological
Tokenization
Of Physical Quantities And Technical Notation In Brazilian Portuguese
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese
💬
LLM Prompting
GitHub
·
6d
6 days ago
chatstore – persistent chat history service for LLM apps, zero infrastructure
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for chatstore – persistent chat history service for LLM apps, zero infrastructure
🧠
LLM Research
arXiv
·
6d
6 days ago
IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources
💬
LLM Prompting
GitHub
·
4d
4 days ago
Show HN: Crespo – Tree-sitter AST blueprints instead of raw code for LLMs
Covered by
DEV Community
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: Crespo – Tree-sitter AST blueprints instead of raw code for LLMs
🧠
Agent Memory
GitHub
·
5d
5 days ago
Letheo – a Cognitive Runtime for agent memory in Rust (forgetting by physics)
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Letheo – a Cognitive Runtime for agent memory in Rust (forgetting by physics)
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report