Harold Nelson
1/30/2022
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Review the section in ModernDive.
Use the county dataframe. Make a scatterplot with poverty on the x-axis and pop_change on the y-axis.
## Warning: Removed 5 rows containing missing values (geom_point).
Reduce the value of alpha in geom_point() to solve the overplotting problem.
## Warning: Removed 5 rows containing missing values (geom_point).
Try using a reduced size (default = 1) instead of alpha.
## Warning: Removed 5 rows containing missing values (geom_point).
Use geom_jitter() instead of geom_point() to solve the problem.
## Warning: Removed 5 rows containing missing values (geom_point).
Add a geom_smooth() layer.
ggplot(data = county, mapping = aes(x = poverty, y = pop_change)) +
geom_point(size = .2) + geom_smooth(color = "red")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5 rows containing non-finite values (stat_smooth).
## Warning: Removed 5 rows containing missing values (geom_point).
Download the file unrate.csv from Moodle and use the “Import Dataset” feature in the upper-right pane, Environment Tab, to get some recent unemployment rate data. Save the code in a chunk.
## Rows: 92 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): UNRATE
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 92
## Columns: 2
## $ DATE <date> 2014-05-01, 2014-06-01, 2014-07-01, 2014-08-01, 2014-09-01, 20…
## $ UNRATE <dbl> 6.3, 6.1, 6.2, 6.1, 5.9, 5.7, 5.8, 5.6, 5.7, 5.5, 5.4, 5.4, 5.6…
In the geom_line(), set linetype = “dotted”. Add a geom_point() layer.
ggplot(data = UNRATE,
mapping = aes(x = DATE, y = UNRATE)) +
geom_line(linetype = "dotted") +
geom_point()
Review the section.
Make a histogram of the variable weight in the cdc dataframe.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Try some different values of bins. The default is 30, so try 15 and 60.
Review the section then use facetting to get separate histograms of the weights of men and women. Set nrow = 2.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Review the section, then do a side-by-side boxplot of the weights of men and women.
The boxplot command in ggplot2 requires an x-variable. The base R boxplot does not. If you want to use the ggplot version without an x-variable, you can supply a constant. I recommend a string containing the name of the variable.
Review the section, then create a barplot of gender in the cdc dataframe.