This visualization is a part of my ongoing ggplot project aimed at recreating a significant scatter plot published online in The Economist in 2011. The focus of this endeavor is to replicate and analyze the relationship between Corruption and Human Development, as highlighted in the original publication. Through ggplot2 in R, I am meticulously reconstructing the scatter plot that elucidates the correlation between Corruption Perception Index (CPI) and Human Development Index (HDI). The objective is to understand and visualize the patterns, trends, and potential correlations observed in the data, offering insights into the relationship between corruption levels and the overall development of various countries.
Link to the publication : https://www.economist.com/graphic-detail/2011/12/02/corrosive-corruption
# installing the necessary packages
install.packages(c("tidyverse", "ggthemes"), repos = "https://cran.rstudio.com/")
## package 'tidyverse' successfully unpacked and MD5 sums checked
## package 'ggthemes' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\HP\AppData\Local\Temp\Rtmp02laYQ\downloaded_packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
#importing the data
data <- read.csv("F:\\R practice\\Udemy R Course Notes\\R-for-Data-Science-and-Machine-Learning\\Training Exercises\\Capstone and Data Viz Projects\\Data Visualization Project\\Economist_Assignment_Data.csv")
head(data)
## X Country HDI.Rank HDI CPI Region
## 1 1 Afghanistan 172 0.398 1.5 Asia Pacific
## 2 2 Albania 70 0.739 3.1 East EU Cemt Asia
## 3 3 Algeria 96 0.698 2.9 MENA
## 4 4 Angola 148 0.486 2.0 SSA
## 5 5 Argentina 45 0.797 3.0 Americas
## 6 6 Armenia 86 0.716 2.6 East EU Cemt Asia
View(data)
# You will need to specify x=CPI and y=HDI and color=Region as aesthetics
p1 <- ggplot(data, aes(x = CPI, y = HDI)) +
geom_point(aes(color = Region), shape = 1, size = 5)+
geom_smooth(method = lm, formula = y ~ log(x), se = F, color = 'red')
# I'm employing a subset strategy to reduce label points, enhancing readability in the visualization
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
"Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Spane",
"Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
"United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
"New Zealand", "Singapore")
p1 + geom_text(aes(label = Country), color = "gray20",
data = subset(data, Country %in% pointsToLabel), check_overlap = TRUE) +
theme_calc() + scale_x_continuous(name = "Corruption Perceptions Index, 2011 (10 = least corrupt")+
labs(title = "Corruption and Human development", x = "Corruption Perceptions Index, 2011 (10 = least corrupt)",
y = "Human Development Index, 2011 (1 = Best)")