Let’s give names to our use cases, as a way to be able to refer to them more targetedly in future conversations.
Find people data
- Discover relationships between people
- Be able to identify people quickly, with less hours of Googling
- Find out what a person owns (regardless of country)
- Who are the persons, friends, contacts, business partners, enemies?
- Finding all info linking a person to another
- Know how reliable and recent information is
- Find all info on a person
- Correlate online (social) activity with sources, documents
- A MP monitoring portal powered by CSO data that lives on a mainstream media outlet site
- Know who the influencing people in a country are
- Worldwide patters in legislative voting? Do women vote differently from men on social policy? Is there a correlation between legislators age & support for same-sex marriage? etc.
Find story ideas
- Figure out the most important facts in a set of documents quickly
- Be able to find stories I did not know I had
- Given some data, find the important facts - the stories
- A way to detect / ID patterns in/between files and share results
- Getting to know if anything inside news anywhere is connected to my data
Track changes
- How do we keep track of changing data (monthly or yearly, etc.)
- Update an old story with fresh data… for free
- Get notified each time some new info about a person becomes available
- Notify me when new data emerge concerning a story I’m reporting
- Data isn’t always perfect, it sometimes needs revision. How does a tool deal with updates of connections?
- Notify me when a politician changes point of view
- I want automatic passive discovery of entities across different document stores
Data verification
- Can any document confirm some rumor I heard elsewhere?
- Is there a newer/older/different version of this document?
- Trust that a document is not a forgery
- Cite source documents in my final story
- See articles that have already been written about a document
Workflow
- Toolkit to pilot a structured journalism activity w/in a media house
- Keep track of my research
- Make corrections and annotations on documents
- I want a way to securely edit docs collaboratively
- I want a free open-source Nuix-like front end
Languages
Tabula in other alphabets!
Follow the money
- Visualize the amount of money flowing through a production & supply chain, including investment and trading
- Find suspicious corporate structures by pattern
- Who is paying for something (building, company, event, etc.)
- Find out who owns a particular asset
- Find out all online traces of a company
- Quickly find companies structure
- Discover & tabulate financial relationships between entities of interest
Anomalies
- See the extreme records in a dataset (biggest, smallest)
- Is there anything unusual about this document
- Find anomalies in financial data
Data processing
- Wipe metadata from a document
- Integrate & automate entity searches w/in media houses CMS
- A way to ID dirty document files + group for cleaning
- See if two datasets are connected without waiting to clean (it)
- I want the ability to OCR a hundred page PDF on a standard computer in under a second
- Search another person’s document
- Join two databases to find insights neither could provide alone
- Manually go through docs to fill up a spreadsheet
- Work with a dataset that’s too large for Excel
- Sort things
- Count things
- Map out places mentioned in document
- How long is the word that was redacted in this document?
- Correlate data between different types of media.
Spreadsheets
- Better applications where spreadsheet is the interface (i.e. more featureful spreadsheets for enrichment, reconciliation, etc.
- Have template spreadsheets for different story models
- Database look-up from Google Spreadsheets.
Provenance / Sources
- Source data linking
- Structuring ad hoc data as a part of the journalism process
- Machine readable provenance for news stories
Entity, Identification, Matching
- Match similar entity, allow manual match
- Automated data cleaning tracker - always know the source of your data.
- Entity reconciliation API (for deduplication) (i.e. I have entities, I want to match them up, against other’s people’s entities. I want to be able to match up each database’s entities).
- Fuzzy - match two lists, quickly.
- Be able to match lists of names (peps) to companies’ databases.
- Learn easy ways to reconcile and match data, names, ID’s
Identify relationships
- Identifying people and linking people from different areas
- Understand nature of the relationship (opposition v/s alliance)
Workflow
- Process very quickly: Time matters!
- Be secure
- Spend less time reviewing data quality
- Simple data analysis (Not “R”)
- Bookmark
- Have a Pre-launch checklist for data projects
- Manage different type of access sights (public, private, share)
Monitoring alerts - Fetch patterns
- Have an AI that feeds journalists with alerts of organised crime and corruption
- Hit lists search / alerts
- Alert me when a partner adds a document for someone I am interested in
- Monitor entities information from news and databases and rank it
- Have my database search the Internet for me for people I am interested in or add the docs to my database
- A tool that keeps on investigating after the reporter lost interest in the topic
- If this then that / Simple alerts for datasets that are regularly published
- How do I keep track of relationships
- Look for patters in data using machine learning
Document display
- Very efficient viewer with search highlights
- Display original document
- Highlights
Import data docs
- Microwork platform for uploading hand written data
- Import docs when I don’t know the languages in them
- Import any doc in any format
- Import structured data or unstructured data into system
- Turn forms into spreadsheet
- OCR
- Read disclosure documents and make a spreadsheet of major board decisions.
Skunkworks
- Open Source transliteration that works
- Global address parsing.
Federated search data docs cross queries
- Figure out who has documents I need to see
- A service to find who may have data matching a query, i.e. “yes-no” response with information on how to contact that source.
- Search the database of other partners (API)
- Database that allows for complex boolean or other search functionality
- Run large groups of people or companies against other people’s (or my own) database
- Where to look for data on my topic
- Federated search: Search OpenCorporates, Littlesis, Poderopedia, etc.
- Search other news sources, Lexis Nexis for poor people
- Be able to mine all social networks at once
Visualization / geocoding
- Understanding domain specifics hierarchy
- On entry into database -all geographic things are converted to latitude or longitude (geocode)
- Export from database directly to visualization tool, map or timeline
- Timelines
- Draw a map of the ownership structure of companies in my story
- Visualize networks, maps, occurrence
Editorial
- Explaining complex relationships to readers
Entity and facet extraction
- Entity extraction in any language
- Define new facets, depending on the dataset
- Knowing what to look for in a pile of documents
- Entity-extraction in different languages
- Pull names quickly off of a webpage
Misc
- I’d like a tool that would measure the cost of corruption in terms of impact (environ’ / pockets / social).
- Global Address Parsing.
- Conflict of interest lead finder
- Contribute data with Sunset Secrecy / we will keep this secret / closed and only for 30 days
- Open Source transliteration that works
- Signature detection comparision
- Tool that makes money
- Needle in a haystack
- See a system structure
- Merge info from multiple sources
- Identify things