Reliable data is 90% of the data scientist’s work

telegraphglobalwarming An article in the UK newspaper The Telegraph today reports to have uncovered a huge scandal in the measurement and use of temperature records from South American weather stations. The article claims that temperature readings have been reversed to show a 1 degree celsius rise in the past 40 years when in fact temperatures have been cooling.

Whether there is any truth or not in the article is of huge importance and interest. But the meta-message to take away here is that data has to be vetted and reliable and trustworthy before models can be built and decisions taken.

It is no surprise then that a data scientist may find themselves spending much more time obtaining, cleaning, checking, and re-checking data, than analysing it. And this is just how it should be, and also why a data scientist is unique in their role as the curator of data. The article is a timely reminder that this responsibility should be taken extremely seriously and executed with the upmost integrity.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s