Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Tokenization
🔤 Tokenization
BPE, WordPiece, SentencePiece, Subword Encoding
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
25
posts in
9.9
ms
LDARNet: DNA Adaptive Representation Network with Learnable
Tokenization
for Genomic Modeling
📝
NLP
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling
Vibe Diaries: Training Nanochat
🤖
AI
vibediary.dev
·
2d
2 days ago
·
Hacker News
Actions for Vibe Diaries: Training Nanochat
The PM’s Playbook for Shipping AI Features That Actually
Work
in Production
📊
Statistics
Content type:
Blog
oreilly.com
·
20h
20 hours ago
Actions for The PM’s Playbook for Shipping AI Features That Actually Work in Production
How Far Apart Does a Model Think Its
Tokens
Are?
⚡
Speculative Decoding
lesswrong.com
·
3d
3 days ago
Actions for How Far Apart Does a Model Think Its Tokens Are?
Aperio: Lightweight search engine in Rust – GBs of data in < 1ms, < 256MB RAM
🔍
Information Retrieval
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
,
r/opensource
Actions for Aperio: Lightweight search engine in Rust – GBs of data in < 1ms, < 256MB RAM
AdaTok: Self-Budgeting Image
Tokenization
with Quality-Preserving Dynamic Tokens
🎨
Generative AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens
Less-relevant results
The Read Model Zoo: Projections Beyond Tables - EventSourcingDB
📊
Data Science
Content type:
Blog
Content type:
Reference
docs.eventsourcingdb.io
·
3d
3 days ago
·
Hacker News
Actions for The Read Model Zoo: Projections Beyond Tables - EventSourcingDB
Optimality of FSQ
Tokens
for Continuous Diffusion for Categorical Data with Application to Text-to-Speech
🤖
AI
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for Optimality of FSQ Tokens for Continuous Diffusion for Categorical Data with Application to Text-to-Speech
A Taxonomy of Real-World Asset
Tokenization
for Blockchain-Based Financial Infrastructure
📝
NLP
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Taxonomy of Real-World Asset Tokenization for Blockchain-Based Financial Infrastructure
DREAM: Dynamic Refinement of Early Assignment Mappings
🎯
Recommender Systems
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DREAM: Dynamic Refinement of Early Assignment Mappings
harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
🤖
AI
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
UniDexTok: A Unified Dexterous Hand
Tokenizer
from Real Data
🦾
Robotics
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data
Steganography Without Modification: Hidden Communication via LLM Seeds
📝
NLP
Content type:
Academic
arxiv.org
·
1d
1 day ago
·
Hacker News
Actions for Steganography Without Modification: Hidden Communication via LLM Seeds
Neural Field
Tokenizations
with Hierarchy and Spatial Locality Priors
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Neural Field Tokenizations with Hierarchy and Spatial Locality Priors
Balancing Image Compression and Generation with Bootstrapped
Tokenization
📝
NLP
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for Balancing Image Compression and Generation with Bootstrapped Tokenization
LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts
📝
NLP
Content type:
Academic
arxiv.org
·
16h
16 hours ago
Actions for LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts
CleanCodec: Efficient and Robust Speech
Tokenization
via Perceptually Guided
Encoding
📝
NLP
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding
ChannelTok: Efficient Flexible-Length Vision
Tokenization
📝
NLP
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for ChannelTok: Efficient Flexible-Length Vision Tokenization
Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override
MeshTok: Efficient Multi-Scale
Tokenization
for Scalable PDE Transformers
📝
NLP
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help