library("tidyverse")
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages ----------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts -------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Run ggplot(data = mpg)
what do you see?
ggplot(data = mpg)
This code creates an empty plot. The ggplot()
function creates the background of the plot, but since no layers were specified with geom function, nothing is drawn.
How many rows are in mtcars? How many columns?
There are 32 rows and 11 columns in the mtcars data frame.
nrow(mtcars)
## [1] 32
ncol(mtcars)
## [1] 11
The glimpse()
function also displays the number of rows and columns in a data frame.
glimpse(mtcars)
## Observations: 32
## Variables: 11
## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17...
## $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4,...
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8,...
## $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, ...
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3....
## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150,...
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90,...
## $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,...
## $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,...
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3,...
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1,...
What does the drv
variable describe? Read the help for ?mpg to find out.
The drv
variable is a categorical variable which categorizes cars into front-wheels, rear-wheels, or four-wheel drive.1
Value | Description |
---|---|
“f” | front-wheel drive |
“r” | rear-wheel drive |
“4” | four-wheel drive |
Make a scatter plot of hwy
vs. cyl
.
ggplot(mpg, aes(x = hwy, y = cyl)) +
geom_point()
What happens if you make a scatter plot of class
vs drv
? Why is the plot not useful?
The resulting scatterplot has only a few points.
ggplot(mpg, aes(x = class, y = drv)) +
geom_point()
A scatter plot is not a useful display of these variables since both drv
and class
are categorical variables. Since categorical variables typically take a small number of values, there are a limited number of unique combinations of (x
, y
) values that can be displayed. In this data, drv
takes 3 values and class
takes 7 values, meaning that there are only 21 values that could be plotted on a scatterplot of drv vs. class. In this data, there 12 values of (drv
, class
) are observed.
count(mpg, drv, class)
## # A tibble: 12 x 3
## drv class n
## <chr> <chr> <int>
## 1 4 compact 12
## 2 4 midsize 3
## 3 4 pickup 33
## 4 4 subcompact 4
## 5 4 suv 51
## 6 f compact 35
## 7 f midsize 38
## 8 f minivan 11
## 9 f subcompact 22
## 10 r 2seater 5
## 11 r subcompact 9
## 12 r suv 11
A simple scatter plot does not show how many observations there are for each (x, y) value. As such, scatterplots work best for plotting a continuous x and a continuous y variable, and when all (x, y) values are unique.