There is always one more data bug
There is always one more data bug If we, as data scientists, receive a dataset from a reliable source, we should go ahead with the analysis (classification, clustering, deep learning, etc), right? Well, yes, that’s what most of us (myself included) often do, especially if there is a tight deadline. However, this could be dangerous. Let me describe some rude awakenings I suffered over the past decades, as well as remind you some fast and easy preventive measures. Examples (a.k.a. horror stories) E1 Geographical data Two decades ago, we got access to a public dataset of cross-roads in California, […]