Tag: Data Quality

How to deal with the annoying implications of changing data sources

Let’s discuss a common scenario in AI consulting. The client provides access to data sources in formats such as CSVs or databases that aren’t in a production environment. Why? Usually, they’re exploring the value of the project, do not want to disclose too much data and want to prevent technical problems from happening at the […]

Written by on Nov 20, 2022

Stop removing outliers just because!

Outliers are data points that stand out for being different from the remaining data distribution. An outlier can be: An odd value in a feature A data point distant from the centroid of the data A data point in a region of low density, but between areas of high density. Suppose you have been working […]

Written by on Nov 14, 2022

Privacy Preserving Machine Learning

This article reports my work at NILG.AI during a curricular internship on privacy-preserving Machine Learning. Trip data is any type of data that connects the origin and destination of a person’s travel and is generated in countless ways as we move about our day and interact with systems connected to the internet. But why is […]

Written by on Aug 16, 2022

Objectively Estimating Data Quality

In Artificial Intelligence, it is important to measure the quality of the data we are trying to use. For instance, if we want to classify a cervix image according to the degree of cancer, how do we know if that image follows the acquisition protocol and can be used for diagnosing the patient [1] so […]

Written by on Feb 27, 2020