This lab reference practice problems from “R for Data Science” - Chapter 3: Data Visualisation https://r4ds.had.co.nz/data-visualisation.html
First, call the tidyverse package
library(tidyverse)
The diamonds
dataset is built into the ggplot2
package.
Prices of over 50,000 round cut diamonds
Description: A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:
?diamonds
What kinds of variables are we working with?
# look at the structure of the data
Let’s start with a basic bar graph. This is a univariate graphical tool for the relative frequency of categorical data.
ggplot(data=diamonds)+
geom_bar(aes(x=cut))
# Apply cut to the color aesthetic
Does the output of the graphic above match what you imagined? Why or why not?
Now try changing the aesthetic to fill.
# Apply cut to the fill aesthetic
Does this match what you were hoping for?
What do you notice about this color palette?
Hint: How is it different than the following example using the mpg
dataset:
ggplot(data=mpg)+
geom_bar(aes(x=class, fill=class))
By default, when we use a different variable to add color to a bar graph than the frequency variable, R creates a stacked bar graph
# Look at the frequency of cut
# Apply clarity to the fill aesthetic
When we are interested in comparing proportions across, we can use the `position=“fill”’ argument.
# add position="fill" to your graph above
When we are interested in comparing counts across groups, we can use the `position=“dodge”’ argument.
# add position="dodge" to your graph above
“Voter Registrations Are Way, Way Down During the Pandemic” (Jun 26, 2020) by Kaleigh Rogers and Nathaniel Rakich
https://fivethirtyeight.com/features/voter-registrations-are-way-way-down-during-the-pandemic/
How are graphics used to tell the author’s story?
What geometries are used?
What does the raw data look like?
# Import data
vreg<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv",
header=TRUE)
head(vreg)
## Jurisdiction Year Month New.registered.voters
## 1 Arizona 2016 Jan 25852
## 2 Arizona 2016 Feb 51155
## 3 Arizona 2016 Mar 48614
## 4 Arizona 2016 Apr 30668
## 5 Arizona 2020 Jan 33229
## 6 Arizona 2020 Feb 50853
Relevel the data so that its in the right order:
# Level the Month variable so that its in the right order (ie not alphabetical)
vreg$Month<-factor(vreg$Month,
levels=c("Jan", "Feb", "Mar", "Apr", "May"))
### USE spread() FROM tidyr
vregYear<-vreg%>%
spread(Year, New.registered.voters)
### RENAME THE COLUMNS
colnames(vregYear)<-c("Jurisdiction", "Month", "Y2016", "Y2020")
Add change in registration.
### mutate() FROM dplyr()
vregChange<-vregYear%>%
mutate(change=Y2020-Y2016)
# type code/answer here