1 Introudction

In this work, we use The Corruption Perceptions Index (CPI) 2020: Score timeseries since 2012 to demonstrate the level of corporate and political corruption in different region over years. This index is an annual dataset compiled and published by Transparency International, a global non-governmental organization that monitors and publicizes corporate and political corruption, the data could be found under this link.

The data set provides scores on a scale of 0 to 100, with higher scores indicating lower levels of perceived corruption, for 180 countries and territories around the world. The scores are based on a combination of surveys and assessments of corruption in the public sector, carried out by experts and business executives.

The CPI is widely used by academics, policymakers, and journalists as a measure of corruption levels in different countries, and to monitor trends in corruption over time. The 2020 data set is the most recent available, and includes scores for each country and territory for every year from 2012 to 2020, allowing for longitudinal analysis of trends in perceived corruption.

The variables in the data set include:

Country: Name of Country
ISO3: a three-letter country code
Region: region codes, in particular:
- SSA: Sub-Saharan Africa
- ECA: Eastern Europe and Central Asia
- EMNA: Middle East and North Africa (excluding high-income countries)
- AME: Latin America and the Caribbean
- AP: East Asia and Pacific
- WE/EU: Western Europe and the European Union
CPI score YYYY: Corruption Perceptions Index in year YYYY, from year 2012 to year 2020. The range of score is between 0 and 100, a higher score means lower level of Corruption Perceptions.
Rank YYYY: the rank of country based on CPI score in year YYYY.
Sources YYYY: the data source from year YYYY
Standard error YYYY: the standard error of the CPI score in year YYYY.

Totally, there are 180 rows and 34 columns in the data set. The data set is quite clean after skipping the first two rows in the csv file, however, we make some modification of the data structure, which will be demonstrated in following section.

2 Visualisation

Firstly, we import the library and load the data, and set the default theme for ggplot visualization.

library(tidyverse)
library(pander)
theme_set(theme_minimal(base_size = 11, 
                        base_family = "mono"))


df = read_csv("GlobalCorruption.csv", skip = 2)

Our question is on the level of corporate and political corruption in different region over years, hence, we only keep required columns, that is, Region and CPI score over years.

cpi = df %>%
  select( Region, which(grepl('CPI', colnames(df))))

We then pivot the table to long format, with this, our data only have three columns, one for region, one for years, which is YYYY that extracted from original columns CPI score YYYY, and one for CPI score. Following is the sample rows:

cpi %>%
  pivot_longer(cols = -c(Region)) %>%
  mutate(
    years = str_extract(name, '[0-9]+') %>%
      as.integer()
  ) %>%
  rename(CPI = value) %>%
  select(-name) -> cpi

cpi %>%
  head(3) %>%
  pander(caption = "sample rows")

sample rows
Region	CPI	years
WE/EU	88	2020
WE/EU	87	2019
WE/EU	88	2018

Next, we create a help column, which is the average CPI score for each region, we will use this columns to reorder the x axis in the visualization, so that the difference over region will be more clear.

cpi%>%
  group_by(Region)%>%
  mutate(avg_cpi = mean(CPI, na.rm = T)) %>%
  ungroup() ->cpi

Finally, we create our visualization:

cpi %>%
  ggplot(aes(Region %>%
               reorder(avg_cpi), CPI, fill = as.factor(years))) +
  geom_boxplot() +
  theme(legend.position = "bottom", plot.title = element_text(size = 13, family = "serif")) +
  guides(fill = guide_legend(nrow = 1))+
  labs(fill = 'year', x = NULL)

Distribution of CPI over Time by Regions

From the visualization, it is obvious that:

SSA and ECA has lowest CPI score overall, which suggest high level of corruption by public perception.
WE/EU has highest CPI score overall, which suggest low level of corruption by public perception.
Overall years, based on median level, the CPI score for SSA and WE/EU is sightly decreased, while it is a bit increased for rest region, however, the level of CPI scores remained relatively stable between 2012 and 2020 for all regions.

3 Further work

We work the data on region level, from the visualization above, we find there are outliers, which suggests us work on the country level. Moreover, we focus on the CPI score, we may also take research on the rank of countries based on CPI scores.

Corruption Perceptions Index

the level of corporate and political corruption in different region over years

2023-03-20

1 Introudction

2 Visualisation

3 Further work