Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
-
Updated
Aug 28, 2023 - Python
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
Implementation of algorithms for big data using python, numpy, pandas.
A Java program to check Plagiarisms between multiple documents using the method of Shingling, MinHashing and Locality Sensitive Hashing.
Code for Shingling
Data-based analysis of the structure and linguistic characteristics of the Bible and the Quran
Finding Similar Items: Textually Similar Documents
Implementing Locality Sensitive Hashing for DNA Sequences.
Duplicate Detection on Hoaxy Dataset
Finding Similar Items: Textually Similar Documents
Data Mining Algorithms
Add a description, image, and links to the shingling topic page so that developers can more easily learn about it.
To associate your repository with the shingling topic, visit your repo's landing page and select "manage topics."