Almost Optimal Algorithms for Detecting Near-Duplicates in Domain-Independent Big Data
In this chapter, we propose Merge-Filter Representative-based Clustering (Merge-Filter-RC), a general domain-independent method for finding near-duplicate records within and across different data sources. Following that,...