DocumentCloud

Part of “Demos: Solutions for Document Management” (10 minute presentations of existing document platforms). DocumentCloud was represented in this session by Jonathan Stray.


  • Entirely grant funded since 2009: 2,6 million.
  • Users and docs: 1.4k organizations. You can share w/different people or within your organization. 2.3 million docs, 29 million page. They are the largest users in active users.
  • Smooth import: Many supported formats with viewer, Automatic OCR, Multiple languages, organize for project, easy to collaborate
  • The only one of these projects with Embeddable viewer: Key to their success, because address exactly what journalists want to do with their documents: show what they found. This is a good example about useful features for journalists, not necessarily IJ. Collaborative analysis (Public, Private, Public per project) and editing tools. Analysis tools: Support entities extraction, in a limited way (via OpenCalais). Shows you per document, not all of the entities in a set of documents. Back-end is Solr.

Access

  • Open Source, with main platform and various components available at https://github.com/documentcloud) Offer open access to search public documents that have been uploaded. Accounts for uploading and processing documents are available to journalists. Part of IRE.

Future

  • One of the reasons they received their last funding is to develop a sustainability model focused on customers. Currently they are experimenting with giving out accounts to other people, and experimenting asking for people to pay for their accounts. Roadmap: Improvements to processing speed and stability; responsive, mobile-ready viewer; improved reporting of user analytics and document processing status; embed support deployed earlier this year; redesign of user workspace and product site.
  • Sustainability / TB announced in the first quarter of 2016.