The 5 Named Graphs

Harold Nelson

1/30/2022

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
load("county.rda")
load("cdc.Rdata")

Scatterplot.

Review the section in ModernDive.

Use the county dataframe. Make a scatterplot with poverty on the x-axis and pop_change on the y-axis.

Solution

ggplot(data = county, mapping = aes(x = poverty, y = pop_change)) + 
  geom_point()
## Warning: Removed 5 rows containing missing values (geom_point).

Alpha

Reduce the value of alpha in geom_point() to solve the overplotting problem.

Solution

ggplot(data = county, mapping = aes(x = poverty, y = pop_change)) + geom_point(alpha = .2)
## Warning: Removed 5 rows containing missing values (geom_point).

Size

Try using a reduced size (default = 1) instead of alpha.

Solution

ggplot(data = county, mapping = aes(x = poverty, y = pop_change)) + 
  geom_point(size = .1)
## Warning: Removed 5 rows containing missing values (geom_point).

Jitter

Use geom_jitter() instead of geom_point() to solve the problem.

Solution

ggplot(data = county, mapping = aes(x = poverty, y = pop_change)) + 
  geom_jitter()
## Warning: Removed 5 rows containing missing values (geom_point).

Smoother

Add a geom_smooth() layer.

ggplot(data = county, mapping = aes(x = poverty, y = pop_change)) + 
  geom_point(size = .2) + geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5 rows containing non-finite values (stat_smooth).
## Warning: Removed 5 rows containing missing values (geom_point).

Linegraph

Download the file unrate.csv from Moodle and use the “Import Dataset” feature in the upper-right pane, Environment Tab, to get some recent unemployment rate data. Save the code in a chunk.

Solution

UNRATE <- read_csv("UNRATE.csv")
## Rows: 92 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): UNRATE
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Examine the data.

Solution

glimpse(UNRATE)
## Rows: 92
## Columns: 2
## $ DATE   <date> 2014-05-01, 2014-06-01, 2014-07-01, 2014-08-01, 2014-09-01, 20…
## $ UNRATE <dbl> 6.3, 6.1, 6.2, 6.1, 5.9, 5.7, 5.8, 5.6, 5.7, 5.5, 5.4, 5.4, 5.6…

Make a linegraph of the unemployment rate.

Solution

ggplot(data = UNRATE, 
       mapping = aes(x = DATE, y = UNRATE)) +
  geom_line()

Modify the graph.

In the geom_line(), set linetype = “dotted”. Add a geom_point() layer.

Solution

ggplot(data = UNRATE, 
       mapping = aes(x = DATE, y = UNRATE)) +
  geom_line(linetype = "dotted") +
  geom_point()

Histogram

Review the section.

Make a histogram of the variable weight in the cdc dataframe.

Solution

ggplot(data = cdc, mapping = aes(x = weight)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Try some different values of bins. The default is 30, so try 15 and 60.

Solution.

ggplot(data = cdc, mapping = aes(x = weight)) +
  geom_histogram(bins = 15)

ggplot(data = cdc, mapping = aes(x = weight)) +
  geom_histogram(bins = 60)

Facets

Review the section then use facetting to get separate histograms of the weights of men and women. Set nrow = 2.

Solution

ggplot(data = cdc, mapping = aes(x = weight)) +
  geom_histogram() +
  facet_wrap(~gender,nrow = 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Boxplot

Review the section, then do a side-by-side boxplot of the weights of men and women.

Solution

ggplot(data = cdc, mapping = aes(x = gender, y = weight)) +
  geom_boxplot()

Boxplot Without Categories.

The boxplot command in ggplot2 requires an x-variable. The base R boxplot does not. If you want to use the ggplot version without an x-variable, you can supply a constant. I recommend a string containing the name of the variable.

Solution

ggplot(data = cdc, aes(x = "Weight",y = weight)) + 
  geom_boxplot()

Barplot

Review the section, then create a barplot of gender in the cdc dataframe.

Solution

ggplot(data = cdc, aes(x = gender)) + geom_bar()

Geom_col()

The barplot geom does the counting of cases in raw data. If the counting has already been done, we use geom_col() instead.

cdc %>% 
  group_by(gender) %>% 
  summarize(count = n()) %>% 
  ggplot(aes(x = gender,y = count)) +
  geom_col()