How to clean up your data?

Raw data is generally messy. Most projects require a substantial investment in ‘cleaning’ empty records, duplicates, unidentifiable inputs, and other anomalies that make it harder to discern patterns in the data.

There are a number of resources that can help you understand this process and make it easier to spot problems.

You can also learn more about techniques and tips regarding data cleaning:

The Data Journalism Handbook has a section on cleaning messy data.
The Online Journalism Blog also has blog posts on various aspects of data cleaning.

Have you applied or developed a practice that you would like to share with the influence mapping community? Edit this post on Github!

Edit on GitHub We'd love to include your changes.