Introduction to ggplot2: Part 1 (Points)

Content Reference:

This lab reference practice problems from “R for Data Science” - Chapter 3: Data Visualisation https://r4ds.had.co.nz/data-visualisation.html

In this lab we will discuss and apply:

Data and aesthetic mappings
Geometric Objects
Faceting

Example 1: MPG Dataset

First, call the tidyverse package

library(tidyverse)

The mpg dataset is built into the ggplot2 package.

Fuel economy data from 1999 to 2008 for 38 popular models of cars

Description: This dataset contains a subset of the fuel economy data that the EPA makes available on https://fueleconomy.gov/. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.

?mpg

A. Basic Scatter Plot

Make a scatterplot for displacement (displ) vs highway miles per gallon (hwy).

ggplot(data=mpg)+
  geom_point(mapping=aes(x=displ, y=hwy))

Warm-up Exercises

Run ggplot(data=mpg). What do you see?

# type code/answer here

How many rows are in mpg? How many columns?

# type code/answer here

What does the drv variable describe? Read the help file.

# type code/answer here

Make a scatterplot of hwy vs cyl.

# type code/answer here

What happens if you make a scatterplot of class vs dvr? Why is the plot not useful?

# type code/answer here

B. Aesthetic Mappings

I. Color

Add color to our basic scatterplot.

# color based on the class of the vehicle

What do you notice about the color palette?

II. Transparency

# transparency based on the class of the vehicle

Is this a good use of the transparency aesthetic? why?

III. Shape

# shape based on the class of the vehicle

IV. Size

# size based on the class of the vehicle

X. Defining Aesthetics Outside the Mapping

If you define part of the plot outside of the aesthetic mapping it will be applied to the entire plot.

Aesthetic Mapping: Group Exercises

What’s gone wrong with this code? Why are the points not blue?

ggplot(data=mpg)+
  geom_point(mapping=aes(x=displ, y=hwy, color="blue"))

Which variables in mpg are categorical? Which variabels are continuous?

# type code/answer here

Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs continuous variables?

# type code/answer here

What happens if you map the same variable to muliple aesthetics?

# type code/answer here

D. Geometries

I. Points

Create a scatterplot with geom_point().

# geom_point

II. Smooth

You can add a line to your data! By default the smooth function uses a moving average, but you can specify a linear model (method=lm).

# geom_smooth

i. Defining Groups within smooth

# use the group variable

# use the color variable to create groups

ii. Using multiple geometries

# Use point and smooth at the same time

Geometry: Group Questions

What geom would you use to draw a line chart? A histogram? An area chart?
Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

ggplot(data=mpg, mapping=aes(x=displ, y=hwy, color=drv))+
geom_point()+
geom_smooth(se=FALSE)

What does the se argument to geom_smooth() do?
Will these two graphs look different? Why/why not?

ggplot(data=mpg, mapping=aes(x=displ, y=hwy))+
  geom_point()+
  geom_smooth()

ggplot()+
  geom_point(data=mpg, mapping=aes(x=displ, y=hwy))+
  geom_smooth(data=mpg, mapping=aes(x=displ, y=hwy))

Example 2: FiveThiryEight

A. Read the Article

“How Every NFL Team’s Fans Lean Politically”

https://fivethirtyeight.com/features/how-every-nfl-teams-fans-lean-politically/

B. Discuss in Small Groups

How are graphics used to tell the author’s story?
What geometries are used?

C. The Data

What does the raw data look like?

# Import data
sports<-read.csv("https://raw.githubusercontent.com/kitadasmalley/FA2020_DataViz/main/data/NFL_fandom_data.csv", 
                 header=TRUE)

head(sports)

##                          DMA  NFL  NBA  MLB  NHL NASCAR  CBB  CFB PctTrumpVote
## 1      Abilene-Sweetwater TX 0.45 0.21 0.14 0.02   0.04 0.03 0.11         0.79
## 2                  Albany GA 0.32 0.30 0.09 0.01   0.08 0.03 0.17         0.59
## 3 Albany-Schenectady-Troy NY 0.40 0.20 0.20 0.08   0.06 0.03 0.04         0.44
## 4    Albuquerque-Santa Fe NM 0.53 0.21 0.11 0.03   0.03 0.04 0.06         0.40
## 5              Alexandria LA 0.42 0.28 0.09 0.01   0.05 0.03 0.12         0.70
## 6                  Alpena MI 0.28 0.13 0.21 0.12   0.10 0.07 0.09         0.64

D. Processing the Data

Tidy

# Tidy the data
## Use gather to create:
### column for sport (categorical variable)
### Column for search interest (numeric - percent)

sportsT<-sports%>%
  gather("sport", "searchInterest",-c(DMA, PctTrumpVote))

Reorder

# Level the sport variable so that its in the right order
sportsT$sport<-factor(sportsT$sport, 
                      level=c("NBA", "MLB", "NHL", "NFL", "CBB", "NASCAR", "CFB"))

E. Recreate the graphic

# type code/answer here