Big Data: Finding plagiarism in 4TB of scientific papers
For a University project we had to find plagiarism in a dataset of nearly 4 million scientific papers. We used different techniques and algorithms to process such a big dataset to finally get results.
Project can be found at: https://raa.ms/LSDE/