This lab reference practice problems from “R for Data Science” - Chapter 3: Data Visualisation https://r4ds.had.co.nz/data-visualisation.html
First, call the tidyverse package
library(tidyverse)
The mpg
dataset is built into the ggplot2
package.
Fuel economy data from 1999 to 2008 for 38 popular models of cars
Description: This dataset contains a subset of the fuel economy data that the EPA makes available on https://fueleconomy.gov/. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
?mpg
Make a scatterplot for displacement (displ
) vs highway miles per gallon (hwy
).
ggplot(data=mpg)+
geom_point(mapping=aes(x=displ, y=hwy))
ggplot(data=mpg)
. What do you see?# type code/answer here
mpg
? How many columns?# type code/answer here
drv
variable describe? Read the help file.# type code/answer here
hwy
vs cyl
.# type code/answer here
class
vs dvr
? Why is the plot not useful?# type code/answer here
Add color to our basic scatterplot.
# color based on the class of the vehicle
What do you notice about the color palette?
# transparency based on the class of the vehicle
Is this a good use of the transparency aesthetic? why?
# shape based on the class of the vehicle
# size based on the class of the vehicle
If you define part of the plot outside of the aesthetic mapping it will be applied to the entire plot.
ggplot(data=mpg)+
geom_point(mapping=aes(x=displ, y=hwy, color="blue"))
mpg
are categorical? Which variabels are continuous?# type code/answer here
# type code/answer here
# type code/answer here
Faceting is a wonderful tool for looking at subsets or groups in your data.
# facet_wrap for class
# facet grid on two variabels: drv and cyl
Create a scatterplot with geom_point()
.
# geom_point
You can add a line to your data! By default the smooth function uses a moving average, but you can specify a linear model (method=lm
).
# geom_smooth
# use the group variable
# use the color variable to create groups
# Use point and smooth at the same time
What geom would you use to draw a line chart? A histogram? An area chart?
Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
ggplot(data=mpg, mapping=aes(x=displ, y=hwy, color=drv))+
geom_point()+
geom_smooth(se=FALSE)
What does the se
argument to geom_smooth()
do?
Will these two graphs look different? Why/why not?
ggplot(data=mpg, mapping=aes(x=displ, y=hwy))+
geom_point()+
geom_smooth()
ggplot()+
geom_point(data=mpg, mapping=aes(x=displ, y=hwy))+
geom_smooth(data=mpg, mapping=aes(x=displ, y=hwy))
“How Every NFL Team’s Fans Lean Politically”
https://fivethirtyeight.com/features/how-every-nfl-teams-fans-lean-politically/
How are graphics used to tell the author’s story?
What geometries are used?
What does the raw data look like?
# Import data
sports<-read.csv("https://raw.githubusercontent.com/kitadasmalley/FA2020_DataViz/main/data/NFL_fandom_data.csv",
header=TRUE)
head(sports)
## DMA NFL NBA MLB NHL NASCAR CBB CFB PctTrumpVote
## 1 Abilene-Sweetwater TX 0.45 0.21 0.14 0.02 0.04 0.03 0.11 0.79
## 2 Albany GA 0.32 0.30 0.09 0.01 0.08 0.03 0.17 0.59
## 3 Albany-Schenectady-Troy NY 0.40 0.20 0.20 0.08 0.06 0.03 0.04 0.44
## 4 Albuquerque-Santa Fe NM 0.53 0.21 0.11 0.03 0.03 0.04 0.06 0.40
## 5 Alexandria LA 0.42 0.28 0.09 0.01 0.05 0.03 0.12 0.70
## 6 Alpena MI 0.28 0.13 0.21 0.12 0.10 0.07 0.09 0.64
# Tidy the data
## Use gather to create:
### column for sport (categorical variable)
### Column for search interest (numeric - percent)
sportsT<-sports%>%
gather("sport", "searchInterest",-c(DMA, PctTrumpVote))
# Level the sport variable so that its in the right order
sportsT$sport<-factor(sportsT$sport,
level=c("NBA", "MLB", "NHL", "NFL", "CBB", "NASCAR", "CFB"))
# type code/answer here