Personas / User needs in Investigative Journalism

Let’s give names to our use cases, as a way to be able to refer to them more targetedly in future conversations.

Find people data

Discover relationships between people
Be able to identify people quickly, with less hours of Googling
Find out what a person owns (regardless of country)
Who are the persons, friends, contacts, business partners, enemies?
Finding all info linking a person to another
Know how reliable and recent information is
Find all info on a person
Correlate online (social) activity with sources, documents
A MP monitoring portal powered by CSO data that lives on a mainstream media outlet site
Know who the influencing people in a country are
Worldwide patters in legislative voting? Do women vote differently from men on social policy? Is there a correlation between legislators age & support for same-sex marriage? etc.

Find story ideas

Figure out the most important facts in a set of documents quickly
Be able to find stories I did not know I had
Given some data, find the important facts - the stories
A way to detect / ID patterns in/between files and share results
Getting to know if anything inside news anywhere is connected to my data

Track changes

How do we keep track of changing data (monthly or yearly, etc.)
Update an old story with fresh data… for free
Get notified each time some new info about a person becomes available
Notify me when new data emerge concerning a story I’m reporting
Data isn’t always perfect, it sometimes needs revision. How does a tool deal with updates of connections?
Notify me when a politician changes point of view
I want automatic passive discovery of entities across different document stores

Data verification

Can any document confirm some rumor I heard elsewhere?
Is there a newer/older/different version of this document?
Trust that a document is not a forgery
Cite source documents in my final story
See articles that have already been written about a document

Workflow

Toolkit to pilot a structured journalism activity w/in a media house
Keep track of my research
Make corrections and annotations on documents
I want a way to securely edit docs collaboratively
I want a free open-source Nuix-like front end

Languages

Tabula in other alphabets!

Follow the money

Visualize the amount of money flowing through a production & supply chain, including investment and trading
Find suspicious corporate structures by pattern
Who is paying for something (building, company, event, etc.)
Find out who owns a particular asset
Find out all online traces of a company
Quickly find companies structure
Discover & tabulate financial relationships between entities of interest

Anomalies

See the extreme records in a dataset (biggest, smallest)
Is there anything unusual about this document
Find anomalies in financial data

Data processing

Wipe metadata from a document
Integrate & automate entity searches w/in media houses CMS
A way to ID dirty document files + group for cleaning
See if two datasets are connected without waiting to clean (it)
I want the ability to OCR a hundred page PDF on a standard computer in under a second
Search another person’s document
Join two databases to find insights neither could provide alone
Manually go through docs to fill up a spreadsheet
Work with a dataset that’s too large for Excel
Sort things
Count things
Map out places mentioned in document
How long is the word that was redacted in this document?
Correlate data between different types of media.

Spreadsheets

Better applications where spreadsheet is the interface (i.e. more featureful spreadsheets for enrichment, reconciliation, etc.
Have template spreadsheets for different story models
Database look-up from Google Spreadsheets.

Provenance / Sources

Source data linking
Structuring ad hoc data as a part of the journalism process
Machine readable provenance for news stories

Entity, Identification, Matching

Match similar entity, allow manual match
Automated data cleaning tracker - always know the source of your data.
Entity reconciliation API (for deduplication) (i.e. I have entities, I want to match them up, against other’s people’s entities. I want to be able to match up each database’s entities).
Fuzzy - match two lists, quickly.
Be able to match lists of names (peps) to companies’ databases.
Learn easy ways to reconcile and match data, names, ID’s

Identify relationships

Identifying people and linking people from different areas
Understand nature of the relationship (opposition v/s alliance)

Workflow

Process very quickly: Time matters!
Be secure
Spend less time reviewing data quality
Simple data analysis (Not “R”)
Bookmark
Have a Pre-launch checklist for data projects
Manage different type of access sights (public, private, share)

Monitoring alerts - Fetch patterns

Have an AI that feeds journalists with alerts of organised crime and corruption
Hit lists search / alerts
Alert me when a partner adds a document for someone I am interested in
Monitor entities information from news and databases and rank it
Have my database search the Internet for me for people I am interested in or add the docs to my database
A tool that keeps on investigating after the reporter lost interest in the topic
If this then that / Simple alerts for datasets that are regularly published
How do I keep track of relationships
Look for patters in data using machine learning

Document display

Very efficient viewer with search highlights
Display original document
Highlights

Import data docs

Microwork platform for uploading hand written data
Import docs when I don’t know the languages in them
Import any doc in any format
Import structured data or unstructured data into system
Turn forms into spreadsheet
OCR
Read disclosure documents and make a spreadsheet of major board decisions.

Skunkworks

Open Source transliteration that works
Global address parsing.

Federated search data docs cross queries

Figure out who has documents I need to see
A service to find who may have data matching a query, i.e. “yes-no” response with information on how to contact that source.
Search the database of other partners (API)
Database that allows for complex boolean or other search functionality
Run large groups of people or companies against other people’s (or my own) database
Where to look for data on my topic
Federated search: Search OpenCorporates, Littlesis, Poderopedia, etc.
Search other news sources, Lexis Nexis for poor people
Be able to mine all social networks at once

Visualization / geocoding

Understanding domain specifics hierarchy
On entry into database -all geographic things are converted to latitude or longitude (geocode)
Export from database directly to visualization tool, map or timeline
Timelines
Draw a map of the ownership structure of companies in my story
Visualize networks, maps, occurrence

Editorial

Explaining complex relationships to readers

Entity and facet extraction

Entity extraction in any language
Define new facets, depending on the dataset
Knowing what to look for in a pile of documents
Entity-extraction in different languages
Pull names quickly off of a webpage

Misc

I’d like a tool that would measure the cost of corruption in terms of impact (environ’ / pockets / social).
Global Address Parsing.
Conflict of interest lead finder
Contribute data with Sunset Secrecy / we will keep this secret / closed and only for 30 days
Open Source transliteration that works
Signature detection comparision
Tool that makes money
Needle in a haystack
See a system structure
Merge info from multiple sources
Identify things

Personas / User needs in Investigative Journalism

London Conference 2015