April Roundup: Thoughts and tools for & from the data practitioners

Stories and thoughts

Explore Panama Papers and its data model

aprnews1

As you may know, you can now explore and find out who is behind almost 320,000 offshore companies and trusts from the Panama Papers investigation. The team also suggests how to use the database.

Check out the graph data model used by the ICIJ and explore how they constructed it using Cypher in Neo4j and watch Mar Cabra -lead editor of the #PanamaPapers investigation- in the last GraphConnect Europe conference in London explaining “how they use tech to tell great stories”.

See also:


 

Learning from Lombardi

aprnews2

“In Lombardi’s case, even his early scribbles on a project are more informative, because they show a fundamentally human thought process, of trying to draw the story out of the mass of data he had collected. This is the opposite of many computational approaches that begin with a mass of data, followed by an often failed attempt to simplify it”.

This article was originally prepared in 2009 by Ben Fry, founder @FathomInfo, and it goes through his experience getting to know Lombardi’s creative process.


Thoughts on open data

aprnews3

What's wrong with open-data sites, and how we can fix them By Cesar Hidalgo, MIT Associate Professor whose team recently created Data USA

‘Free your data’ is over. Now, we need data to be free: Co-founder of Journalism++, Nicolas Kayser-Bril on why it’s important for data journalists to be more careful with government data and should collect more data independently.

Over-politeness is the fatal flaw in the open data movement: According to Tom Steinberg, “overly-friendly collaboration between governments and transparency advocates sucks the oxygen out of the room”. Check out also John Wonderlich (Sunlight Foundation) response to the article.


Tools

aprnewstools

Onodo, an open source network analysis tool for non-tech users (Influence Mapping)

The tool will feature a replicable and collaborative platform which will facilitate integration with other tools and will allow to import and integrate bulk data from multiple sources. The beta version will be launched by the end of May and presented at Democracy Lab.

April's Toolkit

Polyglot // A natural language pipeline that supports massive multilingual applications for language analysis.

ParseKit // Enigma’s new infrastructure for building and managing data pipelines

Dataproofer // This tool is built to automate the process of checking a dataset for errors or potential mistakes.

Municipal Money API // It publishes the financial information of South African municipalities in a machine-friendly format.

Represent // Browse the latest votes and bills, see how often lawmakers vote against their parties and compare voting records (API here)

Aleph // Tool for indexing large amounts of both unstructured (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search. It is built with investigative reporting as a primary use case. aleph allows cross-referencing mentions of well-known entities.

See also:

aprnews5


 

Reports and Resources

aprnews6

Your friendly guide to colors in data visualisation

Lisa Charlotte Rost asked her Twitter followers: “Can somebody tell me how to get better with color? My color decisions are awful”. These are the websites and tools that her followers recommended, with her thoughts around them.


Map

aprnews7

The government GitHub ecosystem: Using GitHub’s API, Emmanuel Feld compiled a database of government GitHub organizations, their repositories, members, and contributors and dove in.


 

Next Events