Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔤 Character Classification
Unicode Processing, Character Sets, Text Parsing, SMT Applications
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
181693
posts in
24.6
ms
Extract
PDF text in your browser with
LiteParse
for the web
📄
PDF Internals
simonwillison.net
·
2d
·
Hacker News
Unicode
Shape
Detector
📐
Geometric Hashing
unicode-atlas.vercel.app
·
6d
’
Dexter
:
Resurrection
’ Season 2's Most Important Returning Character Sets up
Dexter
's Biggest Threat Yet
🔢
Denotational Semantics
movieweb.com
·
2d
From
ASCII
to
Unicode
: How Computers Understand Text
🔤
Unicode Normalization
tiniacoleyba.com
·
6d
Decoding Text
Spans
for Efficient and Accurate
Named-Entity
Recognition
⚙️
Compression Benchmarking
arxiv.org
·
2d
Language
Modeling
Without Neural Networks
📝
Text Compression
nathan.rs
·
5d
·
Hacker News
Unicode
quick
reference
🔤
Unicode Normalization
pixelbeat.org
·
5d
Article:
Redesigning
Banking PDF Table Extraction: A
Layered
Approach with Java
✅
Format Verification
infoq.com
·
4d
The
fastest
way to match characters on ARM
processors
?
🚀
SIMD Parsing
lemire.me
·
6d
·
Lobsters
,
Hacker News
haifengl/smile
: Statistical Machine Intelligence & Learning Engine
🧠
Machine Learning
github.com
·
3d
·
Hacker News
My Notes on
Makemore
Part1
: Building a Character-Level Language Model from Scratch
⟷
Bidirectional Grammars
medium.com
·
5d
Beyond
N-gram
: Data-Aware
X-GRAM
Extraction for Efficient Embedding
Parameter
Scaling
🗂️
Vector Search
arxiv.org
·
1d
Scripts Through Time: A Survey of the Evolving Role of
Transliteration
in
NLP
🔤
Unicode Normalization
arxiv.org
·
3d
Code-Switching Information Retrieval:
Benchmarks
, Analysis, and the Limits of Current
Retrievers
⚙️
Compression Benchmarking
arxiv.org
·
4d
Depth Registers Unlock
W4A4
on
SwiGLU
: A Reader/Generator Decomposition
📼
Cassette Combinators
arxiv.org
·
4d
From
Handwriting
to Structured Data: Benchmarking AI
Digitisation
of Handwritten Forms
🤖
Manuscript AI
arxiv.org
·
4d
PIIBench
: A Unified Multi-Source Benchmark Corpus for Personally
Identifiable
Information Detection
⚙️
Compression Benchmarking
arxiv.org
·
5d
Suffix
Random Access via Function Inversion: A Key for
Asymmetric
Streaming String Algorithms
🌊
Streaming Compression
arxiv.org
·
3d
Automatic Dataset Construction (
ADC
): Sample Collection, Data
Curation
, and Beyond
🗂️
Vector Databases
arxiv.org
·
4d
SIF
:
Semantically
In-Distribution Fingerprints for Large Vision-Language Models
👁️
Perceptual Hashing
arxiv.org
·
4d
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help