Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
✂️ Tokenization
Text Splitting, Word Boundaries, NLP Pipeline, Lexical Analysis
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
200042
posts in
19.4
ms
Compute
Optimal
Tokenization
(2 minute read)
🔢
Kolmogorov Complexity
arxiviq.substack.com
·
6d
·
Substack
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for
Specialized
Text
Summarization
📝
TextRank
arxiv.org
·
3h
AI Paper Review: Language Models are Few-Shot
Learners
(
GPT-3
)
🤖
Transformer Architecture
freecodecamp.org
·
10h
Mistral's Open TTS, Anthropic's Activation Translator, and Matt
Pocock
's Skills Repo:
Tokenizer
#28
🎭
Anthropic Claude
newsletter.artofsaience.com
·
1d
Compute
Optimal
Tokenization
: Scaling Laws for Data Compression in LLMs
🔢
Kolmogorov Complexity
co-tok.github.io
·
4h
·
Hacker News
Tokenizer
Tampering
🌱
Stemming
hiddenlayer.com
·
23h
NLP
· Machine Learning
💬
Natural Language Processing
medium.com
·
6d
I Built a
Tokenizer
From
Scratch
.
🏭
Code Generation
medium.com
·
3d
Text Analysis for Hybrid Search: Tokenization,
Stopwords
&
Accent
Folding
🌱
Stemming
weaviate.io
·
5d
Empowering
Language Model Applications: Understanding and Evaluating Vector
Databases
in Production
💬
Natural Language Processing
mlops.community
·
5d
Reinforcing
Recursive
Language Models (18 minute read)
🧠
LLM Reasoning
alphaxiv.org
·
6d
·
Hacker News
SomaliWeb
v1: A
Quality-Filtered
Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark
🌱
Stemming
arxiv.org
·
3h
Structure tokens
sharpen
the feature
vocabulary
of protein language models
🌱
Stemming
biorxiv.org
·
3d
A More Word-like Image
Tokenization
for
MLLMs
🔗
RAG
arxiv.org
·
3h
Guided
Generation for LLM
Outputs
🏭
Code Generation
mlops.community
·
5d
WinTok
: A Win-Win Hybrid Tokenizer via
Decomposing
Visual Understanding and Generation with Transferable Tokens
🔗
RAG
arxiv.org
·
3h
OmniGene-4
: A Unified Bio-Language MoE Model with Router-Level
Interpretability
🤖
Transformer Architecture
biorxiv.org
·
3d
Vision Foundation Models as
Generalist
Tokenizers
for Image Generation
🔗
RAG
arxiv.org
·
3h
Transformer
Scalability
Crisis: The First Comprehensive
Empirical
Analysis of Performance Walls in Modern Language Models
🤖
Transformer Architecture
arxiv.org
·
1d
RoPE
Distinguishes
Neither Positions Nor Tokens in Long Contexts,
Provably
🔢
Kolmogorov Complexity
arxiv.org
·
1d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help