The purpose of this project is to perform data visualization to explore the relationship between Corruption and Human Development across various nations based on UN Human Development Report. The data for the project is taken from an article ‘Corrosive corruption’ published in The Economist.
# Load the libraries
library(ggplot2)
library(ggthemes)
library(data.table)
# Load the data
df <- fread('Economist_Assignment_Data.csv', drop=1)
str(df)
## Classes 'data.table' and 'data.frame': 173 obs. of 5 variables:
## $ Country : chr "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ HDI.Rank: int 172 70 96 148 45 86 2 19 91 53 ...
## $ HDI : num 0.398 0.739 0.698 0.486 0.797 0.716 0.929 0.885 0.7 0.771 ...
## $ CPI : num 1.5 3.1 2.9 2 3 2.6 8.8 7.8 2.4 7.3 ...
## $ Region : chr "Asia Pacific" "East EU Cemt Asia" "MENA" "SSA" ...
## - attr(*, ".internal.selfref")=<externalptr>
Let’s create a scatter plot object called pl. We will specify x=CPI and y=HDI and color=Region as aesthetics.
pl <- ggplot(df, aes(CPI, HDI, color=Region)) + geom_point()
pl
We can see a plot of HDI vs CPI. The points are colored by region.
Let’s change the points to be larger empty circles.
pl <- ggplot(df, aes(CPI, HDI, color=Region)) + geom_point(shape=1, size=4)
pl
Let’s add a single line of best fit
pl2 <- pl + geom_smooth(aes(group=1))
pl2
## `geom_smooth()` using method = 'loess'
Let’s further smooth the trendline
pl2 <- pl + geom_smooth(aes(group=1), method='lm', formula=y~log(x),
se=FALSE, color='red')
pl2
Let’s add labels for countries to the plot
pl3 <- pl2 + geom_text(aes(label = Country))
pl3
Observation - We can see that there are way too many labels. Let’s clean this plot.
Let’s add labels only for certain countries to make the plot easily understandable.
# Select specific countries
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
"Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Spane",
"Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
"United States", "Germany", "Britain", "Barbados", "Norway", "Japan", "New Zealand", "Singapore", "Cuba")
# Add countries to plot
pl3 <- pl2 + geom_text(aes(label=Country), color='gray20',
data=subset(df, Country %in% pointsToLabel),
check_overlap = TRUE)
pl3
Let’s add theme to the plot
pl4 <- pl3 + theme_bw()
pl4
Let’s add x and y axes, limits and title to the plot
pl5 <- pl4 + scale_x_continuous(limits = c(.9, 10.5), breaks=1:10) +
scale_y_continuous(limits = c(.2, 1.0)) +
labs(x="Corruption Perceptions Index, 2011 (10=least corrupt)",
y="Human Development Index, 2011 (1=Best)",
title='Corruption and Human Development')
pl5