Looking at the possibilities for standard to increase data interoperability between our datasets and tools, including the idea of a common API for investigative entity mapping tools: “Who’s Got Dirt?” API. Presented by James McKinney.
Frame of the presentation
Collaboration models: How the people work together
Interoperability APIs and formats: How the software works together
Workflow definitions: What should the software do
- There are a lot of different databases out there and the usual way to know if there’s data matching your query is finally made by e-mail
API’s
- First we work with what we have: LittleSis, OpenCorporates, OpenOil, Poderopedia, Quién Manda, Open Duka, CrocTail, Grano, etc.
- The first problem around this is that you have to work around distributed queries
- End up looking at databases that might help
- So you have a big issue around discoverability: finding the information you need
So, the current common process is:
- Consult the list of websites / APIs
- Visit each website / API
- read each website’s documentation
- / read each API reference
- perform searches on each website
- / send requests to each API
- Read the data and interpret its schema
- (whether it’s CSV, JSON, XML, HTML)
The proposed process is:
- Consult one website / API
- Visit one website / API
- read one website’s documentation
- / read one API reference
- perform searches on one website
- / send requests to one API
- Read the data and interpret its schema
- (learn only one response format)
(Shows Demo)
One request format
GET /entities?queries=<queries>
{
"q0": {
"query": {
"type": "Entity",
"name~=": "John Smith",
"jurisdiction_code|=": ["gb", "ie"],
"memberships": [{
"role": "director",
"inactive": false
}]
}
}
}
One request format, one response format
- Metaweb Query Language (MQL) from Freebase
- OpenRefine Reconciliation Service API
- Popolo terms for field names
Endpoints
- /entities
- /relations
- /lists
Implementation
- A library that accepts MQL parameters and returns the URL to send to each API
require 'whos_got_dirt'
require 'faraday'
input = {
'name~=' => 'John Smith',
'jurisdiction_code|=' => ['gb', 'ie'],
'memberships' => [{
'role' => 'director',
'inactive' => false,
}],
}
url = WhosGotDirt::Requests::Person::OpenCorporates.new(input).to_s
#=> "https://api.opencorporates.com/officers/search?q=John+Smith&position=director&inactive=false&jurisdiction_code=gb%7Cie&order=score"
- A library that accepts an API response and returns results in Popolo format
response = Faraday.get(url)
results = WhosGotDirt::Responses::Person::OpenCorporates.new(response).to_a
#=> [{"@type"=>"Person",
# "name"=>"JOHN SMITH",
# "updated_at"=>"2014-10-25T00:34:16+00:00",
# "identifiers"=>[{"identifier"=>"46065070", "scheme"=>"OpenCorporates"}],
# "links"=>[{"url"=>"https://opencorporates.com/officers/46065070", "note"=>"…"}],
# "memberships"=>
# [{"role"=>"director",
# "organization"=>
# {"name"=>"EVOLUTION (GB) LIMITED",
# "identifiers"=>[{"identifier"=>"05997209", "scheme"=>"Company Registry"}],
# "links"=>[{"url"=>"https://opencorporates.com/companies/gb/05997209", "note"=>"…"}],
# "jurisdiction_code"=>"gb"}}],
# "current_status"=>"CURRENT",
# "jurisdiction_code"=>"gb",
# "occupation"=>"MANAGER",
# "sources"=>
# [{"url"=>"https://api.opencorporates.com/officers/search?inactive=false&jurisdiction_code=gb%7Cie&order=score&position=director&q=John+Smith", “note"=>"OpenCorporates"}]}]
- A server that handles all the ugly parts (error handling, request queueing, response caching, etc.)
Project scope
- At least two APIs must agree on something for it to even be a candidate for inclusion
Roadmap
- Add /relations and /lists endpoints
- Support more APIs
- Update Popolo