For a University project we had to find plagiarism in a dataset of nearly 4 million scientific papers. We used different techniques and algorithms to process such a big dataset to finally get results.

Project can be found at: https://raa.ms/LSDE/