In this project, we’ll create a The Economist style visualization, a scatter plot for the relationship between the “Human Development Index” and the “Corruption Perceptions Index” of countries.
library(data.table)
library(ggplot2)
df <- fread(input = 'Economist Data.csv',header = TRUE)
head(df)
## Country HDI.Rank HDI CPI Region
## 1: Afghanistan 172 0.398750 1.5 Asia Pacific
## 2: Albania 70 0.761875 3.1 East EU Cemt Asia
## 3: Algeria 96 0.711250 2.9 MENA
## 4: Angola 148 0.503750 2.0 SSA
## 5: Argentina 45 0.823125 3.0 Americas
## 6: Armenia 86 0.722500 2.6 East EU Cemt Asia
A high CPI indicates a low level of corruption.
pl <- ggplot(df, aes(x=CPI,y=HDI,color=Region)) + geom_point(shape=1,size=2)
pl
Adding a trend line:
pl2 <- pl + geom_smooth(aes(group=1), method = 'lm', formula = y~log(x), se=FALSE, color='red')
pl2
Adding labels:
There are way too many countries to label every dot in our plot so we chose a handful of them:
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
"Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Spane",
"Botswana", "Cape Verde", "Bhutan", "Australia", "France",
"United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
"New Zealand", "Singapore")
pl3 <- pl2 + geom_text(aes(label=Country), color='gray20', data = subset(df, Country %in% pointsToLabel),check_overlap = TRUE)
pl3
Making it look a little nicer and adding scales to the y and x axis:
pl4 <- pl3 + theme_bw()
pl4
pl5 <- pl4 + scale_x_continuous(name = "Corruption Perceptions Index, 2011 (10=least corrupt)",limits = c(.9, 10.5),breaks=1:10) + scale_y_continuous(name = "Human Development Index, 2011 (1=Best)",limits = c(0.2, 1.0))
Adding a title:
pl6 <- pl5 + ggtitle("Corruption and Human development")
pl6
Finally adding the Economist theme:
library(ggthemes)
pl6 + theme_economist_white()