This exercise introduces some advanced packages and syntax for producing data visualizations. We will examine just two visualizations: (i) waffle plots and (ii) interactive plots. There are many others.
As you examine each visualization, it is worth keeping Healy (2016) in mind: Which visualizations are useful and why (or why not)?
NOTE: You should knit this RMD to HTML (not PDF) when you are finished. PDF cannot handle the interactive features we will add to our visualizations.
Download the sports.xlsx dataset from Canvas, which contains data on revenue for major sports leagues in the US and Canada. It is based on data from Wikipedia.
The dataset contains the following variables:
LeagueSportTeamsRevenue (US billions)Average attendance (2017, thousands)TV Revenue (US billions)Games per Team per YearLoad the dataset in the setup chunk. Then, take a moment to explore the dataset.
You will also require several external packages. Install and load them in the setup chunk. They are:
ggplot2waffleplotlyWe will examine less common visualization of univariate data: waffle plots.
Waffle plots are similar to pie charts, but they use squares or images instead of pie slices to depict proportions (or frequencies). Waffle plots require the ggplot2 and waffle packages.
The chunk below gives the basic syntax for a waffle plot, where:
x denotes a vector whose values you wish to plot.r is your desired block height (e.g., 5 blocks high / 5 rows of blocks).s is the size of each individual block.col denotes your your color palette.You don’t need to change my code. Just examine it. Then proceed below.
waffle(x,
rows = r,
size = s,
colors = col,
title = "A title",
xlab = "A label to show the scale")
Let’s create a waffle plot that shows average attendance for different sports leagues in 2017. We will do so in three steps. You can complete each step in the chunk below by modifying my code:
Create a new object named attendance that is simply the Average attendance (2017, thousands) variable from the dataset.
Label the different values of the new object: To which sports leagues does each value pertain?
Create your waffle plot. Use rows = 6 and size = 1. Examine the output.
# Create a new object named `attendance` that equals `Average attendance (2017, thousands)
attendance <- sports$`Average attendance (2017, thousands)`
# Label the different values in `attendance` using the `League` variable.
names(attendance) <- sports$League
# Create the waffle plot (rows = 6, size = 1)
waffle(attendance,
rows = 6,
size = 1)
Lets’ use the plotly package to produce an interactive plot.
The plotly package has a function called ggplotly that converts a ggplot object to an interactive plot. When users pass their cursor over the different points in an interactive plot, a pop-up will appear that gives the values associated with them.
The chunk below gives the basic syntax for an interactive plot. You do not need to modify my code. Just examine it. Then proceed below.
plot <- ggplot(data,
aes(xvar,
yvar) +
geom_point()
ggplotly(plot)
The first portion of the code should be familiar by now. The code simply creates a new scatterplot named plot.
To create an interactive scatterplot, we simply apply the ggplotly function to the scatterplot object.
In the chunk below, create a scatterplot that depicts the relationship between attendance and revenue across sports leagues. Convert your plot to an interactive plot.
For 2 points of extra credit on the assignment, use additional graphical features to make your plot publication quality.
plot <- ggplot(sports,
aes(`Average attendance (2017, thousands)`,
`Revenue (US billions)`)) +
theme_minimal() +
geom_point(alpha = 0.9, size = 2, aes(color= League)) +
xlab("Average Attendance (thousands)") +
ylab("Revenue (US billions)") +
labs(title = "Attendance Versus Revenue (2017)") +
theme(plot.title = element_text(face = "bold",
hjust = 0.5, size = 12))
ggplotly(plot)