One Weird Hashing Trick
notes.hella.cheap·12w·

Here’s a little thing that’s obvious once you see it but isn’t obvious initially (or at least it wasn’t to me).

Say you want to generate low-dimensional projections of bag-of-words (or bag-of-ngrams) vectors (in the rest of this post wherever I say “word” you can substitute “ngram”). Jhonson-Lindenstrauss says if you have the bag-of-words vector you can get a good projection by just multiplying it by a random matrix (!) and the distortion scales logarithmically (!!) not with the number of dimensions but actually with the database size (!!!). And even better that matrix can just be a matrix of 1 or -1 with equal probability.

But if the number of input di…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help