How Plotly Can Enhance Your Graphs in a Single Line

The goal of this guide is to show how you can make a similar graph as the image below. I will walk you through how
to create the graph using ggplot2 and how we can add more functionality for the user by making it interactive. I will
also highlight a few things that we need to alter with our code to make this easier!

If you would like the data set for this, feel free to send me an email: amcgarvey271@gmail.com

Click Here to jump down to create the graph in the image below!

Click Here to jump down to the interactive version!

Reference Graph

This data set is from my Data Science and Machine Learning with R Course. The original assignment was to make a scatter plot similar to the image above.

Loading in the Dataset

my.data <- read.csv("Economist_Assignment_Data.csv")
head(my.data)

##   X     Country HDI.Rank   HDI CPI            Region
## 1 1 Afghanistan      172 0.398 1.5      Asia Pacific
## 2 2     Albania       70 0.739 3.1 East EU Cemt Asia
## 3 3     Algeria       96 0.698 2.9              MENA
## 4 4      Angola      148 0.486 2.0               SSA
## 5 5   Argentina       45 0.797 3.0          Americas
## 6 6     Armenia       86 0.716 2.6 East EU Cemt Asia

# Removing the first column since it's just an integer
my.data <- my.data[, -1]

# Fixing the typo
my.data$Region[my.data$Region == "East EU Cemt Asia"] <- "East EU Cent Asia"

Walkthrough to create the original graph

Load ggplot2

library(ggplot2)

Create the base scatter plot

sp <- ggplot(my.data, aes(x = CPI, y = HDI))
sp + geom_point(aes(color = Region))

This looks nice by itself! Let’s change those dots to open circles and add a trend line.

sp + geom_point(aes(color = Region), shape = 21, size = 4) + # This changes the shape of the data points
  
  # stat_smooth() will add our trend line
  stat_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red")

Now we need to add the country names to the data points as they did in the original graph. To avoid having every data point labeled, we need to perform an extra step!

# These are all the countries we want labeled
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
                   "Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
                   "India", "Italy", "China", "South Africa", "Spane",
                   "Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
                   "United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
                   "New Zealand", "Singapore")

sp + geom_point(aes(color = Region), shape = 21, size = 4) +
  stat_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red") +
  
  # This will label the countries a single time and ensure there's no overlapping of words
  geom_text(aes(label = Country), color = "gray20", 
            data = subset(my.data, Country %in% pointsToLabel),check_overlap = TRUE)

As you can probably see, we have a bit of a scaling issue. Let’s fix that and adjust our axis labels to something more meaningful as well.

sp + geom_point(aes(color = Region), shape = 21, size = 4) +
  stat_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red") +
  geom_text(aes(label = Country), color = "gray20", 
            data = subset(my.data, Country %in% pointsToLabel),check_overlap = TRUE) + 
  
  # Formatting our x-axis
  scale_x_continuous(name = "Corruption Perceptions Index, 2011, (10 = least corrupt)",
                     limits = c(1,10), breaks = 1:10) + 
  
  # Formatting our y-axis
  scale_y_continuous(name = "Human Development Index, 2011 (1 = best)", limits = c(0.2, 1)) + 
  labs(title = "Corruption and Human Development") +
  
  # Changing the theme to a white background
  theme_bw()

This looks great on it’s own, but let’s see how plotly can make this even better!

Walkthrough to create the interactive graph

Load plotly

library(plotly)

There’s one change we need to make when setting the parameters to the base graph. I found this doesn’t affect the regular ggplot2 graph but makes it easier to display the interactive visuals with plotly.

sp <- ggplot(my.data, aes(x = CPI, y = HDI, label = Country))  
# This "label = Country" is the only change
# It will allow us to see the country while hovering over the data point

sp2 <- sp + geom_point(aes(color = Region), shape = 21, size = 4) +
  geom_smooth(method = "lm", formula = y ~ log(x), se = FALSE, color = "red") +
  scale_x_continuous(name = "Corruption Perceptions Index, 2011, (10 = least corrupt)",
                     limits = c(1,10), breaks = 1:10) + 
  scale_y_continuous(name = "Human Development Index, 2011 (1 = best)", limits = c(0.2, 1)) + 
  labs(title = "Corruption and Human Development") +
  theme_bw()

The one line to take this graph to the next level!

# This tells plotly to use our graph assigned to sp2 and what info to display while hovering over each data point
# "label" is the Country data we assigned in the previous section

ggplotly(sp2, tooltip = c("HDI", "CPI", "label"))

Now we can hover over each point to view the data associated with it!

Want to narrow down your search?

Click a region in the legend to exclude it from the graph
Double click a region to focus in on it