From Text to Token: How Tokenization Pipelines Work
paradedb.comยท23h
๐Ÿ”คTokenization
The RAG Playbook: A Data Science Guide to Document Chunking
pub.towardsai.netยท6h
๐Ÿ”„LLM RAG Pipelines
YouTube gets ~5% CTR lift on Shorts by replacing embedding tables with Semantic IDs
shaped.aiยท23h
๐Ÿ“ŠFeed Optimization
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
arxiv.orgยท19h
๐Ÿง LLM Inference
Writing regex is pure joy. You can't convince me otherwise.
triangulatedexistence.mataroa.blogยท21hยท
๐Ÿ“‘Inverted Indexes
Open Lineage
usenix.orgยท19h
๐Ÿ“˜Typescript
How different AI engines generate and cite answers
searchengineland.comยท11h
๐Ÿ“ŠFeed Optimization
Assuring Agent Safety Evaluations By Analysing Transcripts
lesswrong.comยท13h
๐Ÿ†LLM Benchmarking
timelinize/timelinize
github.comยท22h
๐Ÿ—œ๏ธZstd
Personal Knowledge Management Systems & Digital Gardens
lavenderlit.bearblog.devยท17h
๐Ÿ”ŽInverted Index
MultiPar 1.3.3.5 Beta / 1.3.2.9
majorgeeks.comยท15h
๐Ÿ“„File Formats
Show HN: Rebuilt Bible search app to run 100% client-side with Transformers.js
biblos.appยท2hยท
Discuss: Hacker News
๐Ÿš€LanceDB
Show HN: 1M retail interior image dataset for computer vision (UK/US/EU)
groceryinsight.comยท11hยท
Discuss: Hacker News
๐Ÿ“ŠVector Databases
You don't avoid the chaos. You filter it.
threadreaderapp.comยท6h
๐ŸงนSpam Filters
Introducing the SambaNova SDK
sambanova.aiยท17h
๐Ÿ”งDeveloper tools
GoMem is a high-performance memory allocator library for Go
github.comยท21h
๐Ÿง Memory Allocators
Vite is like the United Nations of JavaScript
stackoverflow.blogยท15h
๐Ÿ”งDeveloper tools
A.I. Slop Is Here
nytimes.comยท18h
๐Ÿ’ณContent Monetization
Neuro-Symbolic AI
en.wikipedia.orgยท9hยท
Discuss: Hacker News
๐Ÿง LLM Inference
Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems
arxiv.orgยท19h
๐ŸงญContent Discovery