Personas / User needs in Investigative Journalism

Let’s give names to our use cases, as a way to be able to refer to them more targetedly in future conversations.

Find people data

  • Discover relationships between people
  • Be able to identify people quickly, with less hours of Googling
  • Find out what a person owns (regardless of country)
  • Who are the persons, friends, contacts, business partners, enemies?
  • Finding all info linking a person to another
  • Know how reliable and recent information is
  • Find all info on a person
  • Correlate online (social) activity with sources, documents
  • A MP monitoring portal powered by CSO data that lives on a mainstream media outlet site
  • Know who the influencing people in a country are
  • Worldwide patters in legislative voting? Do women vote differently from men on social policy? Is there a correlation between legislators age & support for same-sex marriage? etc.

Find story ideas

  • Figure out the most important facts in a set of documents quickly
  • Be able to find stories I did not know I had
  • Given some data, find the important facts - the stories
  • A way to detect / ID patterns in/between files and share results
  • Getting to know if anything inside news anywhere is connected to my data

Track changes

  • How do we keep track of changing data (monthly or yearly, etc.)
  • Update an old story with fresh data… for free
  • Get notified each time some new info about a person becomes available
  • Notify me when new data emerge concerning a story I’m reporting
  • Data isn’t always perfect, it sometimes needs revision. How does a tool deal with updates of connections?
  • Notify me when a politician changes point of view
  • I want automatic passive discovery of entities across different document stores

Data verification

  • Can any document confirm some rumor I heard elsewhere?
  • Is there a newer/older/different version of this document?
  • Trust that a document is not a forgery
  • Cite source documents in my final story
  • See articles that have already been written about a document

Workflow

  • Toolkit to pilot a structured journalism activity w/in a media house
  • Keep track of my research
  • Make corrections and annotations on documents
  • I want a way to securely edit docs collaboratively
  • I want a free open-source Nuix-like front end

Languages

Tabula in other alphabets!

Follow the money

  • Visualize the amount of money flowing through a production & supply chain, including investment and trading
  • Find suspicious corporate structures by pattern
  • Who is paying for something (building, company, event, etc.)
  • Find out who owns a particular asset
  • Find out all online traces of a company
  • Quickly find companies structure
  • Discover & tabulate financial relationships between entities of interest

Anomalies

  • See the extreme records in a dataset (biggest, smallest)
  • Is there anything unusual about this document
  • Find anomalies in financial data

Data processing

  • Wipe metadata from a document
  • Integrate & automate entity searches w/in media houses CMS
  • A way to ID dirty document files + group for cleaning
  • See if two datasets are connected without waiting to clean (it)
  • I want the ability to OCR a hundred page PDF on a standard computer in under a second
  • Search another person’s document
  • Join two databases to find insights neither could provide alone
  • Manually go through docs to fill up a spreadsheet
  • Work with a dataset that’s too large for Excel
  • Sort things
  • Count things
  • Map out places mentioned in document
  • How long is the word that was redacted in this document?
  • Correlate data between different types of media.

Spreadsheets

  • Better applications where spreadsheet is the interface (i.e. more featureful spreadsheets for enrichment, reconciliation, etc.
  • Have template spreadsheets for different story models
  • Database look-up from Google Spreadsheets.

Provenance / Sources

  • Source data linking
  • Structuring ad hoc data as a part of the journalism process
  • Machine readable provenance for news stories

Entity, Identification, Matching

  • Match similar entity, allow manual match
  • Automated data cleaning tracker - always know the source of your data.
  • Entity reconciliation API (for deduplication) (i.e. I have entities, I want to match them up, against other’s people’s entities. I want to be able to match up each database’s entities).
  • Fuzzy - match two lists, quickly.
  • Be able to match lists of names (peps) to companies’ databases.
  • Learn easy ways to reconcile and match data, names, ID’s

Identify relationships

  • Identifying people and linking people from different areas
  • Understand nature of the relationship (opposition v/s alliance)

Workflow

  • Process very quickly: Time matters!
  • Be secure
  • Spend less time reviewing data quality
  • Simple data analysis (Not “R”)
  • Bookmark
  • Have a Pre-launch checklist for data projects
  • Manage different type of access sights (public, private, share)

Monitoring alerts - Fetch patterns

  • Have an AI that feeds journalists with alerts of organised crime and corruption
  • Hit lists search / alerts
  • Alert me when a partner adds a document for someone I am interested in
  • Monitor entities information from news and databases and rank it
  • Have my database search the Internet for me for people I am interested in or add the docs to my database
  • A tool that keeps on investigating after the reporter lost interest in the topic
  • If this then that / Simple alerts for datasets that are regularly published
  • How do I keep track of relationships
  • Look for patters in data using machine learning

Document display

  • Very efficient viewer with search highlights
  • Display original document
  • Highlights

Import data docs

  • Microwork platform for uploading hand written data
  • Import docs when I don’t know the languages in them
  • Import any doc in any format
  • Import structured data or unstructured data into system
  • Turn forms into spreadsheet
  • OCR
  • Read disclosure documents and make a spreadsheet of major board decisions.

Skunkworks

  • Open Source transliteration that works
  • Global address parsing.

Federated search data docs cross queries

  • Figure out who has documents I need to see
  • A service to find who may have data matching a query, i.e. “yes-no” response with information on how to contact that source.
  • Search the database of other partners (API)
  • Database that allows for complex boolean or other search functionality
  • Run large groups of people or companies against other people’s (or my own) database
  • Where to look for data on my topic
  • Federated search: Search OpenCorporates, Littlesis, Poderopedia, etc.
  • Search other news sources, Lexis Nexis for poor people
  • Be able to mine all social networks at once

Visualization / geocoding

  • Understanding domain specifics hierarchy
  • On entry into database -all geographic things are converted to latitude or longitude (geocode)
  • Export from database directly to visualization tool, map or timeline
  • Timelines
  • Draw a map of the ownership structure of companies in my story
  • Visualize networks, maps, occurrence

Editorial

  • Explaining complex relationships to readers

Entity and facet extraction

  • Entity extraction in any language
  • Define new facets, depending on the dataset
  • Knowing what to look for in a pile of documents
  • Entity-extraction in different languages
  • Pull names quickly off of a webpage

Misc

  • I’d like a tool that would measure the cost of corruption in terms of impact (environ’ / pockets / social).
  • Global Address Parsing.
  • Conflict of interest lead finder
  • Contribute data with Sunset Secrecy / we will keep this secret / closed and only for 30 days
  • Open Source transliteration that works
  • Signature detection comparision
  • Tool that makes money
  • Needle in a haystack
  • See a system structure
  • Merge info from multiple sources
  • Identify things