Height Weight Gender

Harold Nelson

2022-09-19

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
load("cdc2.Rdata")
glimpse(cdc2)
## Rows: 19,997
## Columns: 15
## $ genhlth     <fct> good, good, good, good, very good, very good, very good, v…
## $ exerany     <dbl> 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1…
## $ hlthplan    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1…
## $ smoke100    <dbl> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0…
## $ height      <dbl> 70, 64, 60, 66, 61, 64, 71, 67, 65, 70, 69, 69, 66, 70, 69…
## $ weight      <int> 175, 125, 105, 132, 150, 114, 194, 170, 150, 180, 186, 168…
## $ wtdesire    <int> 175, 115, 105, 124, 130, 114, 185, 160, 130, 170, 175, 148…
## $ age         <int> 77, 33, 49, 42, 55, 55, 31, 45, 27, 44, 46, 62, 21, 69, 23…
## $ gender      <fct> m, f, f, f, f, f, m, m, f, m, m, m, m, m, m, m, m, m, m, f…
## $ BMI         <dbl> 25.10714, 21.45386, 20.50417, 21.30303, 28.33916, 19.56592…
## $ BMIDes      <dbl> 25.10714, 19.73755, 20.50417, 20.01194, 24.56060, 19.56592…
## $ DesActRatio <dbl> 1.0000000, 0.9200000, 1.0000000, 0.9393939, 0.8666667, 1.0…
## $ BMICat      <fct> Overweight, Normal, Normal, Normal, Overweight, Normal, Ov…
## $ BMIDesCat   <fct> Overweight, Normal, Normal, Normal, Normal, Normal, Overwe…
## $ ageCat      <fct> 58-99, 32-43, 44-57, 32-43, 44-57, 44-57, 18-31, 44-57, 18…

This dataframe is derived from the cdc dataframe, which you may have see before. It has been cleaned and enhanced.

Exercise

Create a basic scatterplot with height on the x-axis and weight on the y-axis.

Solution

g = cdc2 %>% 
  ggplot(aes(height,weight)) +
  geom_point()
g

Exercise

There is a lot of overplotting. To make some detail clear, use an alpha parameter in geom_point. Try values of .1 and .01.

Solution

g = cdc2 %>% 
  ggplot(aes(height,weight)) +
  geom_point(alpha = .1)
g

g = cdc2 %>% 
  ggplot(aes(height,weight)) +
  geom_point(alpha = .01)
g

## Exercise

Add a smoothing curve to the graph, accepting the defaults.

g = cdc2 %>% 
  ggplot(aes(height,weight)) +
  geom_point() +
  geom_smooth()
g
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Exercise

Drop the geom_point, but include two smoothing curves, one with the default method and another with method = “lm”.

Solution

g = cdc2 %>% 
  ggplot(aes(height,weight))  +
  geom_smooth() +
  geom_smooth(method = "lm")
g
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'

Exercise

Use only the default smoother, but map color to gender in its aes.

Solution

g = cdc2 %>% 
  ggplot(aes(height,weight))  +
  geom_smooth(aes(color = gender)) 
g
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Exercise

Repeat the previous exercise using method = “lm”.

g = cdc2 %>% 
  ggplot(aes(height,weight))  +
  geom_smooth(aes(color = gender), method = "lm")
g
## `geom_smooth()` using formula 'y ~ x'

Exercise

Use your tidyverse tools to construct a dataframe with one row for every combination of gender and height. In addition each row should contain the mean value of weight and the count of original observations. Show the head, tail and a summary() of the dataframe.

Solution

combos = cdc2 %>% 
  group_by(gender,height) %>% 
  summarise(mean_weight = mean(weight),
            count = n()) %>% 
  ungroup()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
head(combos)
## # A tibble: 6 × 4
##   gender height mean_weight count
##   <fct>   <dbl>       <dbl> <int>
## 1 m          49        160      1
## 2 m          55         90      1
## 3 m          58        165      3
## 4 m          59        130      1
## 5 m          60        178     11
## 6 m          61        145.    15
tail(combos)
## # A tibble: 6 × 4
##   gender height mean_weight count
##   <fct>   <dbl>       <dbl> <int>
## 1 f          71        179.    72
## 2 f          72        179.    56
## 3 f          73        200.    16
## 4 f          74        180.     8
## 5 f          77        174.     2
## 6 f          78        173.     3
summary(combos)
##  gender     height       mean_weight        count       
##  m:29   Min.   :48.00   Min.   : 90.0   Min.   :   1.0  
##  f:28   1st Qu.:59.00   1st Qu.:145.3   1st Qu.:   3.0  
##         Median :66.00   Median :165.0   Median :  56.0  
##         Mean   :66.21   Mean   :169.8   Mean   : 350.8  
##         3rd Qu.:73.00   3rd Qu.:190.6   3rd Qu.: 597.0  
##         Max.   :84.00   Max.   :265.0   Max.   :1538.0

Exercise

Create a scatterplot of height(x) and mean_weight(y) in the dataframe. Map color to gender and size to count.

Solution

g = combos %>% 
  ggplot(aes(x = height,
             y = mean_weight,
             color = gender,
             size = count)) + geom_point()
g