Foundations of a solid anti-corruption data infrastructure

Last updated 5 months ago

Joining up data and standards for anti-corruption

A solid anti-corruption data infrastructure can only be built when the relevant datasets can communicate with each other. The higher the number of connections, the better the chance of using the datasets to spot potential corruption red flags. Based on the priority datasets for building an anti-corruption data infrastructure (see table 4), a series of core data elements have been identified and have also been matched to available open data standards.

A data standard is a framework for how data should be collected and published, including how to describe individuals and organisations, how to register specific events or transaction and how to organize data to meet minimum quality requirements. Using a standardised approach means that different datasets can talk to each other. Moreover, the adherence to open data standards contributes in securing that a larger number of users can benefit from the data available.

It is desirable that both governments and civil society, review the existing availability of data and agree on a route map to disclose it as open data. At the same time, it is important to review how data is structured and assess if it needs to be restructured to meet open data standards.

Table 5. Summary of priority data standards for building an anti-corruption data infrastructure

Open Contracting Data Standard

Data guidance for disclosing public procurement data in open formats about contracting processes from planning to implementation stage. Extensions for other types of contracting such as public private partnerships and concessions are under development. More information:

Open Contracting Partnership(CSO)

Fiscal Data Package

Schema for publishing and consuming fiscal data, especially data generated during the planning and execution of budgets. It supports data on expenditures and revenues. More information:

Open Knowledge Foundation(CSO)


Popolo is an initiative on open government data specifications. Its goal is to "define data interchange formats and data models so that organizations can spend less time transforming and modeling data and more time applying it to the problems they face". It allows standardization of data related to people, organizations, motions and voting, events, speeches, among others. More information:

Global Beneficial Ownership Register

An open schema under development for collecting and publishing beneficial ownership data globally. It will enable users to register in a standardized way data about the ultimate beneficiary or owner of a certain good (such as land) or an organization or entity (such as companies) across different countries. More information:

Open Ownership (Global Coalition)

Open Corporates Schema

Schema for publishing and consuming data on companies worldwide, including data on jurisdiction, incorporation date, shareholders and subsidiaries. It recently incorporated beneficial ownership data released by the UK Government. More information:

Open Corporates (Private firm)

The Anti-Corruption Open Up Guide

Box 6. The G20 Open Data Portals: enablers of Anti-Corruption Data?

The G20 has recently pushed the open data agenda globally. Accounting for 85% of the gross world product (GWP), 80% of world trade and two-thirds of the world population, actions implemented by these countries can lead trends across the world. Taking this into account, open data portals from the G20 countries were reviewed to understand the ease for identifying anti-corruption related datasets.To start, only 16 out of 20 members have an open data portal. China, South Africa, South Korea and Turkey have not yet launched a portal where open government datasets can be accessed and downloaded. In total, these open data portals contain 593,220 datasets. The top three countries with more datasets available are Canada (41.3%), the United States (33.7%) and the United Kingdom (4.4%).Based on this sample, a series of related-corruption words —in the portal’s official language— were looked up through their own search engines. For example, when the words “Corruption” and “Anti-corruption” were key search words, a total of only 114 and 311 datasets were respectively found. This means that only 0.05% of the available datasets is directly classified as a resource that could be used for anti-corruption purposes. Saudi Arabia, Mexico, Germany, Brazil and Argentina yield 0 answers for both requests.Although, these results are not conclusive regarding the existence of anti-corruption data, they are prove that better categorizations or search mechanisms are needed to access such data. As matter of fact, the number of data fixed categories goes from 9 up to 33, making difficult to find data on similar issues across countries. Also, 50% of the open data portals reviewed (Australia, Argentina, Brazil, France, Germany, Indonesia, Japan and the USA) offer users the possibility of tagging freely datasets, allowing to search for information outside the standard categories. Regardless of the approach to be chose by each country, it is clear that there is a great opportunity for G20 governments to make their open data portals enablers of anti-corruption strategies.

The Anti-Corruption Open Up Guide