Tokenization

Feeds to Scour
SubscribedAll
Scoured 25 posts in 16.2 ms

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

 📝NLP  Content type: Academic
arxiv.org·

Vibe Diaries: Training Nanochat

 🤖AI
vibediary.dev··Hacker News

The PM’s Playbook for Shipping AI Features That Actually Work in Production

 📊Statistics  Content type: Blog
oreilly.com·

How Far Apart Does a Model Think Its Tokens Are?

 Speculative Decoding
lesswrong.com·

Aperio: Lightweight search engine in Rust – GBs of data in < 1ms, < 256MB RAM

 🔍Information Retrieval  Content type: Code

AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens

 🎨Generative AI  Content type: Academic
arxiv.org·
Less-relevant results

The Read Model Zoo: Projections Beyond Tables - EventSourcingDB

 📊Data Science  Content type: Blog  Content type: Reference

Optimality of FSQ Tokens for Continuous Diffusion for Categorical Data with Application to Text-to-Speech

 🤖AI  Content type: Academic
arxiv.org·

A Taxonomy of Real-World Asset Tokenization for Blockchain-Based Financial Infrastructure

 📝NLP  Content type: Academic
arxiv.org·

DREAM: Dynamic Refinement of Early Assignment Mappings

 🎯Recommender Systems  Content type: Academic
arxiv.org·

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🤖AI  Content type: Code
github.com··Hacker News

UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data

 🦾Robotics  Content type: Academic
arxiv.org·

Steganography Without Modification: Hidden Communication via LLM Seeds

 📝NLP  Content type: Academic
arxiv.org··Hacker News

Neural Field Tokenizations with Hierarchy and Spatial Locality Priors

 👁️Computer Vision  Content type: Academic
arxiv.org·

Balancing Image Compression and Generation with Bootstrapped Tokenization

 📝NLP  Content type: Academic
arxiv.org·

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

 📝NLP  Content type: Academic
arxiv.org·

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

 📝NLP  Content type: Academic
arxiv.org·

ChannelTok: Efficient Flexible-Length Vision Tokenization

 📝NLP  Content type: Academic
arxiv.org·

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

 ✍️Prompt Engineering  Content type: Academic
arxiv.org·

MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers

 📝NLP  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help