The Who’s got dirt? API provides a single access point to multiple APIs of influence data on the web. It proxies requests to the supported APIs, so that users only need to learn a single request format and a single response format.
Documentation
Basics
Who’s got dirt? recognizes three types of influence data:
- An entity is a person or an organization: for example, a company.
- A relation exists between entities: for example, a person is an officer of a company.
- A list is a listing of entities: for example, the companies within a corporate grouping.
Supported APIs
Who’s got dirt? supports the following endpoints of these APIs of influence data:
- CorpWatch has information on public companies and their subsidiaries, based on SEC filings. (docs)
- /companies.json, queried via
/entities
- /companies.json, queried via
- LittleSis has information on American people and organizations in business and government, and the relationships between them. (docs)
- /entities.xml, queried via
/entities
- /lists.xml, queried via
/lists
- /entities.xml, queried via
- OpenCorporates has information on over 90 million companies around the world. (docs)
- /companies/search, queried via
/entities
- /corporate_groupings/search, queried via
/lists
- /officers/search, queried via
/relations
- /companies/search, queried via
- OpenDuka has information on Kenyan people and organizations. (docs)
- OpenOil has information on oil concessions around the world. (docs)
- /concession/search, queried via
/relations
- /concession/search, queried via
- Poderopedia has information on Chilean organizations and people in business and politics. (docs)
Don’t see an API you use? Please request its support in this issue.
The Who’s got dirt? API’s request format supports all filters of the supported APIs, and its response format returns all data from the supported APIs. In other words, there is no loss of functionality in using the Who’s got dirt? API.
API Keys
An API key is required to proxy requests to some APIs. You may register for API keys at:
- CorpWatch
- LittleSis (required)
- OpenCorporates (sometimes required)
- OpenDuka (required)
- OpenOil (required)
- Poderopedia (required)
API Limits & Pagination
APIs limit the number of results returned per page:
- CorpWatch: maximum 5000 per page
- LittleSis: maximum 1000 per page for entities and 100 per page for lists
- OpenCorporates: maximum 30 per page without API key or 100 per page with API key
- OpenDuka: no pagination, all results returned
- OpenOil: no maximum per page
- Poderopedia: no pagination, 10 results returned
To change the number of results returned per page, use the limit
parameter. To paginate, use the page
parameter.
API Security
Some APIs do not support HTTPS:
- CorpWatch
- OpenDuka
- Poderopedia
Also, if you do not trust the public API at https://whosgotdirt.herokuapp.com/, please read the technical documentation to deploy your own private API.
API Terms & Conditions
Please be aware of each API’s terms and conditions:
- CorpWatch
- LittleSis (CC BY-SA 3.0 US)
- OpenCorporates (ODbL 1.0)
- OpenDuka (CC BY-NC 3.0)
- OpenOil (CC BY-SA 4.0)
- Poderopedia
Usage
The Who’s got dirt? API’s base URL is https://whosgotdirt.herokuapp.com/.
Each endpoint (/entities
, for example) accepts a single query string parameter queries
. For the request GET /entities?queries=<queries>
, <queries>
may look like:
{
"q0": {
"query": {
"name~=": "John Smith",
"jurisdiction_code|=": ["gb", "ie"],
"memberships": [{
"role": "director",
"inactive": false
}]
},
"endpoints": [
"CorpWatch",
"OpenCorporates"
]
}
}
You may use any query ID instead of q0
. You may submit multiple queries with different query IDs. You may use the POST
HTTP method if the query string is too long.
You may use endpoints
within each query to request the given endpoints only. The valid values for endpoints
are:
CorpWatch
LittleSis
OpenCorporates
OpenDuka
OpenOil
Poderopedia
Query format
The format of query
within each query is inspired from the Metaweb Query Language. Each property name (name
, for example) in query
may be followed by an MQL operator (~=
, for example). If no operator follows a property name, the operator is equality. (In the tables below, =
denotes equality, but you should never append =
to a property name: for example, use name
, not name=
.) The other operators are:
~=
-
The pattern matching operator tests whether a property contains a word or phrase.
"name~=": "ACME Inc."
|=
-
The "one of" operator tests whether a property is equal to any value in an array.
"country_code|=": ["gb", "us"]
>=
-
The greater-than-or-equal operators tests whether a property is greater than or equal to a value.
"founding_date>=": "2010-01-01"
>
-
The greater-than operators tests whether a property is greater than a value.
"founding_date>": "2010-01-01"
<=
-
The less-than-or-equal operators tests whether a property is less than or equal to a value.
"founding_date<=": "2010-01-01"
<
-
The less-than operators tests whether a property is less than a value.
"founding_date<": "2010-01-01"
a:
-
While not an operator, a property prefix (
a:
, for example) can be used to express theAND
operator.
"a:industry_code": "be_nace_2008-66191", "b:industry_code": "be_nace_2008-66199"
Not all APIs support all parameters (created_at
, for example) and operators (|=
, for example). See the tables below for each API’s support for parameters and operators.
If a parameter or operator is unsupported by an API, it is silently ignored.
Error handling
Errors may occur at the request, query or response level.
Request errors
- If the
queries
parameter is invalid JSON, a400 Bad Request
is returned. - If the
queries
parameter is missing, blank, or not a JSON object, a422 Unprocessable Entity
is returned.
For example, GET /entities
returns:
{
"status": "422 Unprocessable Entity",
"messages": [{
"message": "parameter 'queries' must be provided"
}]
}
Query errors
- If a query in the
queries
parameter is not a JSON object, has noquery
, or has aquery
that is not a JSON object, an error message is returned for that query.
For example, GET /entities?queries={"q0":{}}
returns:
{
"status": "200 OK",
"q0": {
"count": 0,
"result": [],
"messages": [{
"message": "'query' must be provided"
}]
}
}
API errors
- If an API returns an error, an error message is returned for that response. Who’s got dirt? returns the API’s original error message, however cryptic.
For example, GET /entities?queries={"q0":{"query":{"type":"Person","name":"John Smith"}}}
returns:
{
"status": "200 OK",
"q0": {
"count": 100,
"result": [
…
],
"messages": [{
"info": {
"url": "https://api.littlesis.org/entities.xml?q=John+Smith"
},
"status": "401 Unauthorized",
"message": "Your request must include a query parameter named \"_key\" with a valid API key value. To obtain an API key, visit http://api.littlesis.org/register."
}, {
"info": {
"url": "http://api.poderopedia.org/visualizacion/search?alias=John+Smith&entity=persona"
},
"status": "400 Bad Request",
"message": "400 BAD REQUEST"
}]
}
}
Endpoints
Entities
The endpoint is GET /entities?queries=<queries>
.
This table documents which operators, if any, are supported by each API for each parameter. You may need to scroll the table to the right to see all columns.
Note: The type
parameter is required by Poderopedia.
Parameter | Definition | Example | CorpWatch |
LittleSis |
OpenCorporates |
OpenDuka |
Poderopedia |
---|---|---|---|---|---|---|---|
API key 1 | Supply an API key. | "corp_watch_api_key": "..." |
= |
= |
= |
= |
= |
limit |
Limit the number of results. | "limit": 5 |
= |
= |
= |
||
name |
Find entities by name. | "name~=": "ACME Inc." |
~= |
~= |
~= |
~= |
~= |
classification |
Find entities by classification. | "classification": "LLC" |
= |
= |= |
|||
created_at |
Find entities by the creation date of the metadata. | "created_at>=": "2010-01-01" |
>= |
||||
founding_date |
Find organizations by founding date. | "founding_date": "2010-01-01" |
= >= > <= < |
||||
dissolution_date |
Find organizations by dissolution date. | "dissolution_date": "2010-01-01" |
= >= > <= < |
||||
identifiers |
Find entities by identifier. | "identifiers": [{ "identifier": "911653725", "scheme": "SEC Central Index Key" }] |
= |
||||
identifiers |
Find entities by identifier scheme. | "identifiers": [{ "identifier": "911653725", "scheme": "SEC Central Index Key" }] |
= |
||||
contact_details 2 |
Find entities by address. | "contact_details": [{ "type": "address", "value~=": "52 London" }] |
~= |
~= |
|||
industry_code |
Find organizations by industry (SIC) code. | "industry_code": "2011" |
= |
= |= a: |
|||
sector_code |
Find organizations by SIC sector. | "sector_code": "4100" |
= |
||||
substring_match |
Match within words on name~= and address queries. |
"substring_match": 1 |
= |
||||
country_code |
Find entities by country code. | "country_code": "US" |
= |
= |= |
|||
subdiv_code |
Find entities by country subdivision code. | "subdiv_code": "OR" |
= |
||||
year |
Find organizations with SEC filings in a given year. | "year": 2005 |
= >= <= |
||||
source_type |
Find organizations that appear as "filers" in SEC filings or as subsidiaries ("relationships") only. | "source_type": "relationships" |
= |
||||
num_children |
Find organizations by the number of direct descendants in a hierarchy. | "num_children": 3 |
= |
||||
num_parents |
Find organizations by the number of direct ancestors in a hierarchy. | "num_parents": 2 |
= |
||||
top_parent_id |
Find organizations within the hierarchy of another organization. | "top_parent_id": "cw_7324" |
= |
||||
search_all |
Match descriptions and summaries on name~= queries. |
"search_all": 1 |
= |
||||
jurisdiction_code |
Find organizations by jurisdiction code. | "jurisdiction_code": "gb" |
= |= |
||||
current_status |
Find organizations by status. | "current_status": "Dissolved" |
= |
||||
inactive |
Find active or inactive organizations. | "inactive": false |
= |
||||
branch |
Find branch or non-branch organizations. | "branch": true |
= |
||||
nonprofit |
Find nonprofit or other organizations. | "nonprofit": true |
= |
||||
type |
Find entities of the class Person or Organization . |
"type": "Person" |
= |
Ruby Example
require 'cgi'
require 'open-uri'
require 'json'
queries = <<-EOL
{
"q0": {
"query": {
"name~=": "John Smith",
"jurisdiction_code|=": ["gb", "ie"],
"memberships": [{
"role": "director",
"inactive": false
}]
}
}
}
EOL
value = JSON.dump(JSON.load(queries))
#=> {"q0":{"query":{"name~=":"John Smith","jurisdiction_code|=":["gb","ie"],"memberships":
# [{"role":"director","inactive":false}]}}}
url = "https://whosgotdirt.herokuapp.com/entities?queries=#{CGI.escape(value)}"
#=> https://whosgotdirt.herokuapp.com/entities?queries=%7B%22q0%22%3A%7B%22query%22%3A%7B%22name%7E%3D%22%3A%22John+Smith%22%2C...
results = JSON.load(open(url).read)
#=> {"q0"=>
# {"count"=>3915,
# "result"=>
# [{"name"=>"JOHN SMITH",
# "updated_at"=>"2014-10-25T00:34:16+00:00",
# "identifiers"=>[{"identifier"=>"46065070", "scheme"=>"OpenCorporates"}],
# "contact_details"=>[],
# "links"=>[{"url"=>"https://opencorporates.com/officers/46065070", "note"=>"OpenCorporates URL"}],
# "memberships"=>
# [{"role"=>"director",
# "start_date"=>"2006-11-24",
# "organization"=>
# {"name"=>"EVOLUTION (GB) LIMITED",
# "identifiers"=>[{"identifier"=>"05997209", "scheme"=>"Company Register"}],
# "links"=>[{"url"=>"https://opencorporates.com/companies/gb/05997209", "note"=>"OpenCorporates URL"}],
# "jurisdiction_code"=>"gb"}}],
# "current_status"=>"CURRENT",
# "jurisdiction_code"=>"gb",
# "occupation"=>"MANAGER",
# "sources"=>
# [{"url"=>"https://api.opencorporates.com/officers/search?inactive=false&jurisdiction_code=gb%7Cie&order=score&position=director&q=John+Smith",
# "note"=>"OpenCorporates"}]},
# ...]
Relations
The API endpoint is GET /relations?queries=<queries>
.
Parameter | Definition | Example | OpenCorporates |
OpenOil |
---|---|---|---|---|
API key 1 | Supply an API key. | "open_oil_api_key": "..." |
= |
= |
limit |
Limit the number of results. | "limit": 5 |
= |
= |
subject.name |
Find related entities by name. | "subject": [{ "name~=": "John Smith" }] |
~= |
= |
subject.birth_date |
Find related people by birth date. | "subject": [{ "birth_date": "2010-01-01" }] |
= >= > <= < |
|
subject.contact_details 2 |
Find related entities by address. | "subject": [{ "contact_details": [{ "type": "address", "value~=": "52 London" }] }] |
~= |
|
jurisdiction_code |
Find officerships by jurisdiction code. | "jurisdiction_code": "gb" |
= |= |
|
role |
Find officerships by role. | "role": "ceo" |
= |
|
inactive |
Find active or inactive officerships. | "inactive": false |
= |
|
country_code |
Find concessions by country code. | "country_code": "BR" |
= |
|
status |
Find concessions with a "licensed" or "unlicensed" status. | "status": "licensed" |
= |
|
type |
Find concessions with an "offshore" or "onshore" type. | "type": "offshore" |
= |
Lists
The endpoint is GET /lists?queries=<queries>
.
Parameter | Definition | Example | LittleSis |
OpenCorporates |
---|---|---|---|---|
API key 1 | Supply an API key. | "little_sis_api_key": "..." |
= |
= |
limit |
Limit the number of results. | "limit": 5 |
= |
= |
name |
Find lists by name. | "name~=": "Barclays" |
~= |
~= |
Footnotes
1. Each API has its own API key parameter:
corp_watch_api_key
little_sis_api_key
open_corporates_api_key
open_duka_api_key
poderopedia_api_key
2. Only contact_details
with a type
of address
are supported.
Response formats
A set of JSON Schema describe the response formats of:
The Entity schema is a combination of the Person and Organization models in Popolo, a format used by dozens of civil society organizations, businesses and governments to model government and legislative data.
The Relation schema combines terms from RDF and Schema.org, with a few additional properties shared with other Popolo models.
The List schema is a JSON Schema version of Schema.org’s ItemList, with a few additional properties shared with other Popolo models.
Each API may return additional properties not modeled in the schema.
These schema are based entirely on what the APIs publish, and therefore do not fulfill all use cases an influence data project may encounter. However, they may serve as a starting point for future work in that direction.
Notes
After performing an initial query – for example, a search for companies – a common use case is to perform a second query using the results of the first query – for example, a search for all officers of those companies. Who’s got dirt? does not (yet) support this use case, because such second-level queries are more numerous and variable across APIs (issue #15). However, you may nonetheless use the results of the first Who’s got dirt? query to perform your own API-specific second query.
Differences from the Metaweb Query Language (MQL)
The API’s request and response formats are inspired from the Metaweb Query Language and the OpenRefine Reconciliation Service API. The differences are:
- Does not support MQL Read parameters other than
query
. - Does not support MQL directives other than
limit
; instead of counting results, returns a newcount
field at the same level as theresult
field. - Does not support asking for values or wildcards; instead returns all fields.
- Does not return a
status
field at the query level, because it would need to report the status of many responses. Does not return acode
field.