Part of “Demos: Solutions for Document Management” (10 minute presentations of existing document platforms). Presented by Smari McCarthy.
About to be launched.
- 2.300 registered users, +-100 active
- 550k docs, 2.5k research tickets (20M entities)
- Python, Django; Postgres, ElasticSearch; Angular, JQuery
- Some modules already released under AGPL, more modules will be released in coming months
Challenges
- Sitting on multiple terabytes of data. Need to process, index, understand
- Structured and unstructured data
- Work with 17 different languages with 4 different writing systems. Our journalists in the field are trying to get their work done and we develop whatever is needed, rather than doing this single purpose tool, we are trying to work in a more general, broad, adaptable for other needs tool that can be used also for editing.
- Over 30 collaborating, some of them not into sharing their data
- Work environment
Tool features
- Business Registries (Database of databases): Good starting point for investigations
- Request information
- Storing of files (kind of Dropbox), with public in the network or private options
- Document Search: 50 thousand document database and from anything that you mark as public can be searched
- Media search: Location, dates, keywords
Roadmap
- All in heavy development
- Currently supports research tickets, file management, database lookup, meta search Entity database. F. ex: Being able to do pattern search is rather important, patterns of corporate structures that are key indications of wrong doing, can bring up interesting investigations. Editorial pipeline, which is kind of the back-end of the journalists, the process of their work is an absolute mess. Try to address this editorial pipeline. F. ex: No system for sharing and editing documents.
- Developed internally in OCCRP with focus on internal needs; will expand as OCCRP’s needs expand - sustainability through necessity.