article thumbnail

Machine Learning Problems: The Easy Parts

Contify

So, we implemented de-duplication algorithms to significantly reduce the resources required to process the information in these documents. We therefore had to write our own version of hierarchical clustering that returns a large number of tight clusters or duplicates. just rephrased. Entity Identification. Named Entities.

article thumbnail

Machine Learning Problems: The Easy Parts

Contify

So, we implemented de-duplication algorithms to significantly reduce the resources required to process the information in these documents. We therefore had to write our own version of hierarchical clustering that returns a large number of tight clusters or duplicates. just rephrased. Entity Identification. Named Entities.