Harold Nelson
2022-09-19
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Rows: 19,997
## Columns: 15
## $ genhlth <fct> good, good, good, good, very good, very good, very good, v…
## $ exerany <dbl> 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1…
## $ hlthplan <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1…
## $ smoke100 <dbl> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0…
## $ height <dbl> 70, 64, 60, 66, 61, 64, 71, 67, 65, 70, 69, 69, 66, 70, 69…
## $ weight <int> 175, 125, 105, 132, 150, 114, 194, 170, 150, 180, 186, 168…
## $ wtdesire <int> 175, 115, 105, 124, 130, 114, 185, 160, 130, 170, 175, 148…
## $ age <int> 77, 33, 49, 42, 55, 55, 31, 45, 27, 44, 46, 62, 21, 69, 23…
## $ gender <fct> m, f, f, f, f, f, m, m, f, m, m, m, m, m, m, m, m, m, m, f…
## $ BMI <dbl> 25.10714, 21.45386, 20.50417, 21.30303, 28.33916, 19.56592…
## $ BMIDes <dbl> 25.10714, 19.73755, 20.50417, 20.01194, 24.56060, 19.56592…
## $ DesActRatio <dbl> 1.0000000, 0.9200000, 1.0000000, 0.9393939, 0.8666667, 1.0…
## $ BMICat <fct> Overweight, Normal, Normal, Normal, Overweight, Normal, Ov…
## $ BMIDesCat <fct> Overweight, Normal, Normal, Normal, Normal, Normal, Overwe…
## $ ageCat <fct> 58-99, 32-43, 44-57, 32-43, 44-57, 44-57, 18-31, 44-57, 18…
This dataframe is derived from the cdc dataframe, which you may have see before. It has been cleaned and enhanced.
Create a basic scatterplot with height on the x-axis and weight on the y-axis.
There is a lot of overplotting. To make some detail clear, use an alpha parameter in geom_point. Try values of .1 and .01.
## Exercise
Add a smoothing curve to the graph, accepting the defaults.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Drop the geom_point, but include two smoothing curves, one with the default method and another with method = “lm”.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
Use only the default smoother, but map color to gender in its aes.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Repeat the previous exercise using method = “lm”.
## `geom_smooth()` using formula 'y ~ x'
Use your tidyverse tools to construct a dataframe with one row for every combination of gender and height. In addition each row should contain the mean value of weight and the count of original observations. Show the head, tail and a summary() of the dataframe.
combos = cdc2 %>%
group_by(gender,height) %>%
summarise(mean_weight = mean(weight),
count = n()) %>%
ungroup()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 4
## gender height mean_weight count
## <fct> <dbl> <dbl> <int>
## 1 m 49 160 1
## 2 m 55 90 1
## 3 m 58 165 3
## 4 m 59 130 1
## 5 m 60 178 11
## 6 m 61 145. 15
## # A tibble: 6 × 4
## gender height mean_weight count
## <fct> <dbl> <dbl> <int>
## 1 f 71 179. 72
## 2 f 72 179. 56
## 3 f 73 200. 16
## 4 f 74 180. 8
## 5 f 77 174. 2
## 6 f 78 173. 3
## gender height mean_weight count
## m:29 Min. :48.00 Min. : 90.0 Min. : 1.0
## f:28 1st Qu.:59.00 1st Qu.:145.3 1st Qu.: 3.0
## Median :66.00 Median :165.0 Median : 56.0
## Mean :66.21 Mean :169.8 Mean : 350.8
## 3rd Qu.:73.00 3rd Qu.:190.6 3rd Qu.: 597.0
## Max. :84.00 Max. :265.0 Max. :1538.0
Create a scatterplot of height(x) and mean_weight(y) in the dataframe. Map color to gender and size to count.