Recreation of ‘The Economist Publication’s’ Scatterplot

This visualization is a part of my ongoing ggplot project aimed at recreating a significant scatter plot published online in The Economist in 2011. The focus of this endeavor is to replicate and analyze the relationship between Corruption and Human Development, as highlighted in the original publication. Through ggplot2 in R, I am meticulously reconstructing the scatter plot that elucidates the correlation between Corruption Perception Index (CPI) and Human Development Index (HDI). The objective is to understand and visualize the patterns, trends, and potential correlations observed in the data, offering insights into the relationship between corruption levels and the overall development of various countries.

Link to the publication : https://www.economist.com/graphic-detail/2011/12/02/corrosive-corruption

# installing the necessary packages
install.packages(c("tidyverse", "ggthemes"), repos = "https://cran.rstudio.com/")

## package 'tidyverse' successfully unpacked and MD5 sums checked
## package 'ggthemes' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\HP\AppData\Local\Temp\Rtmp02laYQ\downloaded_packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggthemes)

#importing the data
data <- read.csv("F:\\R practice\\Udemy R Course Notes\\R-for-Data-Science-and-Machine-Learning\\Training Exercises\\Capstone and Data Viz Projects\\Data Visualization Project\\Economist_Assignment_Data.csv")

head(data)

##   X     Country HDI.Rank   HDI CPI            Region
## 1 1 Afghanistan      172 0.398 1.5      Asia Pacific
## 2 2     Albania       70 0.739 3.1 East EU Cemt Asia
## 3 3     Algeria       96 0.698 2.9              MENA
## 4 4      Angola      148 0.486 2.0               SSA
## 5 5   Argentina       45 0.797 3.0          Americas
## 6 6     Armenia       86 0.716 2.6 East EU Cemt Asia

View(data)

# You will need to specify x=CPI and y=HDI and color=Region as aesthetics
p1 <- ggplot(data, aes(x = CPI, y = HDI)) +
  geom_point(aes(color = Region), shape = 1, size = 5)+
  geom_smooth(method = lm, formula = y ~ log(x), se = F, color = 'red') 

# I'm employing a subset strategy to reduce label points, enhancing readability in the visualization
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
                   "Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
                   "India", "Italy", "China", "South Africa", "Spane",
                   "Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
                   "United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
                   "New Zealand", "Singapore")

p1 + geom_text(aes(label = Country), color = "gray20",
               data = subset(data, Country %in% pointsToLabel), check_overlap = TRUE) +
  theme_calc() + scale_x_continuous(name = "Corruption Perceptions Index, 2011 (10 = least corrupt")+
  labs(title = "Corruption and Human development", x = "Corruption Perceptions Index, 2011 (10 = least corrupt)", 
  y = "Human Development Index, 2011 (1 = Best)")

Recreation of ‘The Economist Publication’s’ Scatterplot

Anoop S Hari

2023-12-01