What is SimHash?
dev.to·21h·
Discuss: DEV
Flag this post

Hello, I’m Maneshwar. I’m working on FreeDevTools online currently building *one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.

At a high level:

  • SimHash is a hashing/fingerprinting algorithm developed by Moses Charikar (originally 2002) for near-duplicate detection.
  • Unlike a cryptographic hash (where you want completely different outputs for slightly different inputs), SimHash is built so that similar inputs produce similar hashes (i.e., small Hamming distance between their fingerprints).
  • It is often used in large scale document/Web-page deduplication, spam detection, content clustering, etc.
  • I…

Similar Posts

Loading similar posts...