This lab reference practice problems from “R for Data Science” - Chapter 3: Data Visualisation https://r4ds.had.co.nz/data-visualisation.html
First, call the tidyverse package
library(tidyverse)
The diamonds
dataset is built into the ggplot2
package.
Prices of over 50,000 round cut diamonds
Description: A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:
?diamonds
What kinds of variables are we working with?
# look at the structure of the data
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
Let’s start with a basic bar graph. This is a univariate graphical tool for the relative frequency of categorical data.
ggplot(data=diamonds)+
geom_bar(aes(x=cut))
# Apply cut to the color aesthetic
ggplot(data=diamonds)+
geom_bar(aes(x=cut, color=cut))
Does the output of the graphic above match what you imagined? Why or why not?
Now try changing the aesthetic to fill.
# Apply cut to the fill aesthetic
ggplot(data=diamonds)+
geom_bar(aes(x=cut, fill=cut))
Does this match what you were hoping for?
What do you notice about this color palette?
Hint: How is it different than the following example using the mpg
dataset:
ggplot(data=mpg)+
geom_bar(aes(x=class, fill=class))
By default, when we use a different variable to add color to a bar graph than the frequency variable, R creates a stacked bar graph
# Look at the frequency of cut
# Apply clarity to the fill aesthetic
ggplot(data=diamonds)+
geom_bar(aes(x=cut, fill=clarity))
When we are interested in comparing proportions across, we can use the `position=“fill”’ argument.
# add position="fill" to your graph above
ggplot(data=diamonds)+
geom_bar(aes(x=cut, fill=clarity), position="fill")
When we are interested in comparing counts across groups, we can use the `position=“dodge”’ argument.
# add position="dodge" to your graph above
ggplot(data=diamonds)+
geom_bar(aes(x=cut, fill=clarity), position="dodge")
“Voter Registrations Are Way, Way Down During the Pandemic” (Jun 26, 2020) by Kaleigh Rogers and Nathaniel Rakich
https://fivethirtyeight.com/features/voter-registrations-are-way-way-down-during-the-pandemic/
How are graphics used to tell the author’s story?
What geometries are used?
What does the raw data look like?
# Import data
vreg<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv",
header=TRUE)
head(vreg)
## Jurisdiction Year Month New.registered.voters
## 1 Arizona 2016 Jan 25852
## 2 Arizona 2016 Feb 51155
## 3 Arizona 2016 Mar 48614
## 4 Arizona 2016 Apr 30668
## 5 Arizona 2020 Jan 33229
## 6 Arizona 2020 Feb 50853
Relevel the data so that its in the right order:
# Level the Month variable so that its in the right order (ie not alphabetical)
vreg$Month<-factor(vreg$Month,
levels=c("Jan", "Feb", "Mar", "Apr", "May"))
### USE spread() FROM tidyr
vregYear<-vreg%>%
spread(Year, New.registered.voters)
### RENAME THE COLUMNS
colnames(vregYear)<-c("Jurisdiction", "Month", "Y2016", "Y2020")
Add change in registration.
### mutate() FROM dplyr()
vregChange<-vregYear%>%
mutate(change=Y2020-Y2016)
# type code/answer here