The goal of this guide is to show how you can make a similar graph as the image below. I will walk you through how
to create the graph using ggplot2 and how we can add more functionality for the user by making it interactive. I will
also highlight a few things that we need to alter with our code to make this easier!
If you would like the data set for this, feel free to send me an email: amcgarvey271@gmail.com
Click Here to jump down to create the graph in the image below!
Click Here to jump down to the interactive version!
This data set is from my Data Science and Machine Learning with R Course. The original assignment was to make a scatter plot similar to the image above.
my.data <- read.csv("Economist_Assignment_Data.csv")
head(my.data)
## X Country HDI.Rank HDI CPI Region
## 1 1 Afghanistan 172 0.398 1.5 Asia Pacific
## 2 2 Albania 70 0.739 3.1 East EU Cemt Asia
## 3 3 Algeria 96 0.698 2.9 MENA
## 4 4 Angola 148 0.486 2.0 SSA
## 5 5 Argentina 45 0.797 3.0 Americas
## 6 6 Armenia 86 0.716 2.6 East EU Cemt Asia
# Removing the first column since it's just an integer
my.data <- my.data[, -1]
# Fixing the typo
my.data$Region[my.data$Region == "East EU Cemt Asia"] <- "East EU Cent Asia"
library(ggplot2)
sp <- ggplot(my.data, aes(x = CPI, y = HDI))
sp + geom_point(aes(color = Region))
This looks nice by itself! Let’s change those dots to open circles and add a trend line.
sp + geom_point(aes(color = Region), shape = 21, size = 4) + # This changes the shape of the data points
# stat_smooth() will add our trend line
stat_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red")
Now we need to add the country names to the data points as they did in the original graph. To avoid having every data point labeled, we need to perform an extra step!
# These are all the countries we want labeled
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
"Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
"India", "Italy", "China", "South Africa", "Spane",
"Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
"United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
"New Zealand", "Singapore")
sp + geom_point(aes(color = Region), shape = 21, size = 4) +
stat_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red") +
# This will label the countries a single time and ensure there's no overlapping of words
geom_text(aes(label = Country), color = "gray20",
data = subset(my.data, Country %in% pointsToLabel),check_overlap = TRUE)
As you can probably see, we have a bit of a scaling issue. Let’s fix that and adjust our axis labels to something more meaningful as well.
sp + geom_point(aes(color = Region), shape = 21, size = 4) +
stat_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red") +
geom_text(aes(label = Country), color = "gray20",
data = subset(my.data, Country %in% pointsToLabel),check_overlap = TRUE) +
# Formatting our x-axis
scale_x_continuous(name = "Corruption Perceptions Index, 2011, (10 = least corrupt)",
limits = c(1,10), breaks = 1:10) +
# Formatting our y-axis
scale_y_continuous(name = "Human Development Index, 2011 (1 = best)", limits = c(0.2, 1)) +
labs(title = "Corruption and Human Development") +
# Changing the theme to a white background
theme_bw()
This looks great on it’s own, but let’s see how plotly can make this even better!
library(plotly)
There’s one change we need to make when setting the parameters to the base graph. I found this doesn’t affect the regular ggplot2 graph but makes it easier to display the interactive visuals with plotly.
sp <- ggplot(my.data, aes(x = CPI, y = HDI, label = Country))
# This "label = Country" is the only change
# It will allow us to see the country while hovering over the data point
sp2 <- sp + geom_point(aes(color = Region), shape = 21, size = 4) +
geom_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red") +
scale_x_continuous(name = "Corruption Perceptions Index, 2011, (10 = least corrupt)",
limits = c(1,10), breaks = 1:10) +
scale_y_continuous(name = "Human Development Index, 2011 (1 = best)", limits = c(0.2, 1)) +
labs(title = "Corruption and Human Development") +
theme_bw()
# This tells plotly to use our graph assigned to sp2 and what info to display while hovering over each data point
# "label" is the Country data we assigned in the previous section
ggplotly(sp2, tooltip = c("HDI", "CPI", "label"))
Now we can hover over each point to view the data associated with it!