1 Setup

We’re using R Markdown to gather together into a single document the code we build, text commenting on and reacting to that code, and the output of the analyses we build.

1.1 Load packages and set theme

library(janitor)
library(knitr)
library(magrittr)
library(tidyverse)

theme_set(theme_bw())

Loading packages in R is like opening up apps on your phone. We need to tell R that, in addition to the base functions available in the software, we also have other functions we want to use.

  • The most important package is actually a series of packages called the tidyverse, which we’ll use in every R Markdown file we create this semester. The tidyverse includes several packages, all developed (in part) by Hadley Wickham, Chief Scientist at RStudio.
    • dplyr is the primary package we’ll use for data wrangling, cleaning and transformation
    • ggplot2 is the primary package we’ll use for visualization
    • other members of the tidyverse include packages for importing data, working with factors, and many other common activities.
  • The janitor package has some tools for examining, cleaning and tabulating data (including tabyl() and clean_names()) that we’ll use regularly.
  • The knitr package has a function called kable() that we’ll sometimes use to neaten up results.
  • The magrittr package includes a pipe function denoted %$% which enables us to use “pipes” to pull specific variables out of a data frame (tibble) even when the function we want to use isn’t part of the tidyverse.
  • It’s helpful to load the tidyverse package last.

2 Ingest the data to quicksur_raw

The data we’ll use is from the Quick Survey we did in Class 02 (pdf is available from the Class 02 README) and which I’ve also done in 2014-2020, in various forms.

2.1 Today’s Questions of Interest

A. What is the distribution of pulse rates among students in 431 since 2014?

B. Does the distribution of student heights change materially over time?

C. Do taller people appear to have paid less for their most recent haircut?

D. Does the relationship between height and haircut price look the same within each sex?

E. For the 2021 students specifically, do students have a more substantial tobacco history if they prefer to speak English or a language other than English?

2.2 Read in data from .csv file

quicksur_raw <- read_csv("data/quick_survey_2021.csv",
                         show_col_types = FALSE) %>%
    clean_names()

2.3 What do we have?

quicksur_raw
glimpse(quicksur_raw)
Rows: 440
Columns: 23
$ student     <dbl> 202158, 202157, 202156, 202155, 202154, 202153, 202152, 20~
$ glasses     <chr> "n", "n", "y", "y", "y", "y", "y", "y", "y", "n", "y", "n"~
$ pulse       <dbl> 82, 72, 76, 66, 72, 75, 96, 92, 64, 84, 60, 44, 65, 84, 80~
$ english     <chr> "n", "y", "y", "y", "n", "y", "y", "y", "y", "y", "y", "y"~
$ smoke       <dbl> 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1, 1~
$ statsofar   <dbl> 5, 4, 5, 5, 5, 6, 3, 5, 7, 3, 5, 5, 5, 5, 4, 5, 4, 5, 5, 4~
$ h_left      <dbl> 0, 0, 0, 3, 1, 3, 4, 0, 0, 0, 1, 7, 1, 13, 0, 2, 0, 16, 16~
$ h_right     <dbl> 10, 10, 20, 7, 15, 17, 16, 17, 19, 10, 19, 5, 12, 0, 15, 1~
$ handedness  <chr> "1", "1", "1", "0.4", "0.88", "0.7", "0.6", "1", "1", "1",~
$ love_htcm   <dbl> 183, 183, 178, 178, 175, 188, 191, 188, 188, 180, 180, 180~
$ statfuture  <dbl> 6, NA, 7, 7, 7, 7, 5, 5, 7, 6, 7, 7, 5, 5, 6, 6, 6, 5, 7, ~
$ haircut     <dbl> 35, 22, 60, 30, 40, 2, 50, 0, 40, 18, 0, 0, 70, 35, 20, 25~
$ lecture     <dbl> 1, 1, 4, 5, 5, 3, 3, 3, 4, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3~
$ alone       <dbl> 1, 3, 4, 4, 4, 4, 2, 3, 3, 4, 4, 4, 4, 2, 3, 4, 2, 4, 4, 2~
$ height_in   <dbl> 67.0, 71.0, 68.0, 72.0, 70.0, 60.0, 67.0, 71.0, 65.0, 62.0~
$ hand_span   <dbl> 20.00, 20.00, 17.50, 23.50, 22.00, 16.50, 18.00, 22.50, 21~
$ favcolor    <chr> "blue", "green", "blue", "blue", "light blue", "blue", "gr~
$ lastsleep   <dbl> 4.0, 6.5, 8.0, 7.0, 6.0, 9.0, 8.0, 7.0, 6.0, 5.0, 7.5, 8.0~
$ sex         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ ageguess    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ year        <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021~
$ lovetrueage <dbl> 54.5, 54.5, 54.5, 54.5, 54.5, 54.5, 54.5, 54.5, 54.5, 54.5~
$ lovetrueht  <dbl> 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190~

2.4 Counting Categories

quicksur_raw %>% count(glasses)
quicksur_raw %>% 
    filter(year == "2021") %>%
    tabyl(favcolor)

2.5 Summarizing Quantities

summary(quicksur_raw$pulse)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  30.00   64.00   72.00   73.31   80.00  110.00      73 

3 Manage the data into qsdat

3.1 Variables we’ll look at today

Based on Questions A-E, we need to include the following variables in our analytic data frame.

  • student: student identification (numerical code)
  • year: indicates year when survey was taken (August)
  • english: y = prefers to speak English, else n
  • smoke: 1 = never smoker, 2 = quit, 3 = current
  • pulse: pulse rate (beats per minute)
  • height_in: student’s height (in inches)
  • haircut: price of student’s last haircut (in $)
  • sex: f (for female) or m (for male)

3.2 Select our variables from quicksur_raw

qsdat <- quicksur_raw %>%
    select(student, year, english, smoke, 
           pulse, height_in, haircut, sex)
qsdat

3.3 Initial Numeric Summaries

  • Is everything the “type” of variable it should be? Are we getting the summaries we expect?
summary(qsdat)
    student            year        english              smoke      
 Min.   :201401   Min.   :2014   Length:440         Min.   :1.000  
 1st Qu.:201620   1st Qu.:2016   Class :character   1st Qu.:1.000  
 Median :201818   Median :2018   Mode  :character   Median :1.000  
 Mean   :201801   Mean   :2018                      Mean   :1.082  
 3rd Qu.:202015   3rd Qu.:2020                      3rd Qu.:1.000  
 Max.   :202158   Max.   :2021                      Max.   :3.000  
                                                    NA's   :2      
     pulse          height_in       haircut           sex           
 Min.   : 30.00   Min.   :57.0   Min.   :  0.00   Length:440        
 1st Qu.: 64.00   1st Qu.:64.0   1st Qu.: 13.50   Class :character  
 Median : 72.00   Median :67.0   Median : 20.00   Mode  :character  
 Mean   : 73.31   Mean   :67.2   Mean   : 29.41                     
 3rd Qu.: 80.00   3rd Qu.:70.0   3rd Qu.: 39.00                     
 Max.   :110.00   Max.   :77.5   Max.   :250.00                     
 NA's   :73       NA's   :7      NA's   :9                          
  • Categorical variables should list the categories, with associated counts. To accomplish this, the variable needs to be represented in R with a factor, rather than as a character or numeric variable.
  • Quantitative variables should show the minimum, median, mean, maximum, etc.

3.4 Change the categorical variables described with numbers to factors

We want the year and smoke information treated as categorical, rather than as quantitative.

qsdat <- qsdat %>%
    mutate(year = as_factor(year),
           smoke = as_factor(smoke))

3.5 Change each character variable to a factor instead

We want to be able to summarize english and sex. We can use a little coding trick to change all of the character variables to factors.

qsdat <- qsdat %>%
    mutate(across(where(is_character), as_factor)) 
  • The across(where()) syntax tells R to change everything that gives a TRUE response to “is this a character variable?” into a factor variable.

3.6 Recheck the summaries and do range checks

  • Do these summaries make sense?
  • Are the minimum and maximum values appropriate?
  • How much missingness are we to deal with?
summary(qsdat)
    student            year    english     smoke         pulse       
 Min.   :201401   2020   :67   n   : 85   1   :409   Min.   : 30.00  
 1st Qu.:201620   2016   :64   y   :352   2   : 22   1st Qu.: 64.00  
 Median :201818   2019   :61   NA's:  3   3   :  7   Median : 72.00  
 Mean   :201801   2021   :58              NA's:  2   Mean   : 73.31  
 3rd Qu.:202015   2018   :51                         3rd Qu.: 80.00  
 Max.   :202158   2015   :49                         Max.   :110.00  
                  (Other):90                         NA's   :73      
   height_in       haircut         sex     
 Min.   :57.0   Min.   :  0.00   f   :125  
 1st Qu.:64.0   1st Qu.: 13.50   m   :129  
 Median :67.0   Median : 20.00   NA's:186  
 Mean   :67.2   Mean   : 29.41             
 3rd Qu.:70.0   3rd Qu.: 39.00             
 Max.   :77.5   Max.   :250.00             
 NA's   :7      NA's   :9                  
  • Are we getting counts for all variables that are categorical?
    • Do the category levels make sense?
  • Are we getting means and medians for all variables that are quantities?
    • Do the minimum and maximum values make sense for each of these quantities?
  • Which variables have missing data, as indicated by NA's?
  • Just to fill in the gap left by the table above, we might ask how many students we had responding each year.
qsdat %>% count(year)

4 Question A (Pulse Rates of our Students)

4.1 A first try at a histogram

  • What is the distribution of student pulse rates?
ggplot(data = qsdat, aes(x = pulse)) +
    geom_histogram(fill = "green", col = "blue")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 73 rows containing non-finite values (stat_bin).

How might we describe this distribution?

  • What is the center?
  • How much of a range around that center do we see? How spread out are the data?
  • What is the shape of this distribution?
    • Is it symmetric, or is it skewed to the left or to the right?

4.2 Fundamental Numerical Summaries

summary(qsdat$pulse)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  30.00   64.00   72.00   73.31   80.00  110.00      73 
  • How do the summary statistics help us describe the data?
  • Do the values make sense to you?
qsdat %$% mosaic::favstats(pulse)
  • How can we measure variation (spread) in our data?

4.3 Improving our Histogram

qsdat %>%
    filter(complete.cases(pulse)) %>%
    ggplot(data = ., aes(x = pulse)) +
    geom_histogram(fill = "seagreen", col = "white",
                   bins = 20) +
    labs(title = "Pulse Rates of Dr. Love's students",
         subtitle = "2014 - 2021",
         y = "Number of Students",
         x = "Pulse Rate (beats per minute)")

  • How did we deal with missing data?
  • How did we add axis labels and titles to the plot?
  • What is the distinction between fill and col in the histogram?
  • What’s the best choice for number of bins in the histogram?

5 Question B (Student Heights over Time)

  • Does the distribution of student heights change materially over time, in the period of 2014-2021?

5.1 Five-Number Summary of Heights for Each Year

qsdat %>%
    filter(complete.cases(height_in)) %>%
    group_by(year) %>%
    summarize(n = n(),
              min = min(height_in),
              q25 = quantile(height_in, 0.25),
              median = median(height_in), 
              q75 = quantile(height_in, 0.75),
              max = max(height_in))
  • Key summaries based on percentiles / quantiles
    • minimum = 0th, maximum = 100th, median = 50th
    • quartiles (25th, 50th and 75th percentiles)
    • Range is maximum - minimum
    • IQR (inter-quartile range) is 75th - 25th percentile
  • These summaries are generally more resistant to outliers than mean, standard deviation
  • Form the elements of a boxplot (box-and-whisker plot)

5.2 Comparison Boxplot of Heights by Year

qsdat %>%
    filter(complete.cases(height_in)) %>%
    ggplot(data = ., aes(x = year, y = height_in)) +
    geom_boxplot() +
    labs(title = "Heights of Dr. Love's students, by year",
         subtitle = "2014 - 2021",
         x = "Year",
         y = "Height (in inches)")

  • How did we deal with missing data here?
  • Box covers the middle half of the data (25th and 75th percentiles)
  • Solid line indicates median
  • Whiskers extend from the quartiles to the most extreme values that are not judged by Tukey’s “fences” method to be candidate outliers
    • Fences are drawn at 25th percentile - 1.5 IQR and 75th percentile + 1.5 IQR
  • Are any values candidate outliers by this method? For which years?
  • Was it important to change year to a factor instead of a numeric variable earlier?

5.3 Adding a Violin Plot to the Boxplot

  • When we’d like to better understand the shape of a distribution, we can amplify the boxplot.
qsdat %>%
    filter(complete.cases(height_in)) %>%
    ggplot(data = ., aes(x = year, y = height_in)) +
    geom_violin() +
    geom_boxplot(aes(fill = year), width = 0.3) +
    guides(fill = "none") +
    scale_fill_viridis_d() +
    labs(title = "Heights of Dr. Love's students, by year",
         subtitle = "2014 - 2021",
         x = "Year",
         y = "Height (in inches)")

  • How did we change the boxplot when we added the violin?
  • What would happen if we added the boxplot first and the violin second?
  • What does guides(fill = "none") do?
  • What does scale_fill_viridis_d() do?

5.4 Table of Means and Standard Deviations

qsdat %>%
    filter(complete.cases(height_in)) %>%
    group_by(year) %>%
    summarize(n = n(),
              mean = mean(height_in),
              sd = sd(height_in))

5.5 The Empirical Rule for Approximately Normal Distributions

  • If the data followed a Normal (or Gaussian) distribution, then approximately 68% of heights would be within 1 SD of the mean, and approximately 95% of heights would be within 2 SD of the mean.

In 2021, we had 55 students whose height_in was available.

  • Do those 55 students appear to be well modeled by a Normal distribution, based on the histogram below and the boxplot with violin we saw previously?
qsdat %>%
    filter(complete.cases(height_in)) %>%
    filter(year == "2021") %>%
    ggplot(data = ., aes(x = height_in)) +
    geom_histogram(fill = "salmon", col = "white",
                   binwidth = 1) +
    labs(title = "Heights of Dr. Love's students",
         subtitle = "2021 (n = 55 students with height data)",
         y = "Number of Students",
         x = "Height (inches)")

  • Of those, how many were within 1 SD of the mean, in other words, how many were in between 67.75 - 4.13 = 63.62 inches and 67.75 + 4.13 = 71.88 inches?
qsdat %>% filter(complete.cases(height_in)) %>%
    filter(year == "2021") %>%
    count(height_in >= 63.62 & height_in <= 71.88)
34/(34+21)
[1] 0.6181818
  • How many of the 55 height_in values gathered in 2021 were between 67.75 - (2)4.13 = 59.49 and 67.75 + 2(4.13) = 76.01?
qsdat %>% filter(complete.cases(height_in)) %>%
    filter(year == "2021") %>%
    count(height_in >= 59.49 & height_in <= 76.01)
54/(54+1)
[1] 0.9818182

6 Question C (Heights and Haircut Prices)

  • Do taller people pay less for their haircuts?

6.1 Converting inches to centimeters

qsdat <- qsdat %>%
    mutate(height_cm = height_in * 2.54)

6.2 Initial Numerical Summaries

qsdat %>%
    filter(complete.cases(haircut, height_cm)) %>%
    summarize(n = n(), median(haircut), 
              median(height_cm), median(height_in))

6.3 A First Scatterplot

  • We’ll include the straight line from a linear model, in red.
qsdat %>% filter(complete.cases(height_cm, haircut)) %>%
    ggplot(., aes(x = height_cm, y = haircut)) +
    geom_point(alpha = 0.3) + 
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = TRUE) +
    labs(x = "Height (in cm)",
         y = "Price of last haircut (in $)",
         title = "Do taller people pay less for haircuts?")

6.4 What is the (Pearson) correlation of height and haircut price?

qsdat %>% 
    select(height_in, height_cm, haircut) %>%
    filter(complete.cases(.)) %>%
    cor(.) %>%
    kable(digits = 2)
height_in height_cm haircut
height_in 1.00 1.00 -0.19
height_cm 1.00 1.00 -0.19
haircut -0.19 -0.19 1.00

6.5 What is the straight line regression model?

m1 <- qsdat %>% 
    filter(complete.cases(haircut, height_cm)) %$%
    lm(haircut ~ height_cm)

m1

Call:
lm(formula = haircut ~ height_cm)

Coefficients:
(Intercept)    height_cm  
   130.4566      -0.5916  
summary(m1)

Call:
lm(formula = haircut ~ height_cm)

Residuals:
    Min      1Q  Median      3Q     Max 
-38.796 -17.780  -5.272   7.966 218.718 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 130.4566    25.3486   5.147 4.06e-07 ***
height_cm    -0.5916     0.1482  -3.991 7.73e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 29.38 on 427 degrees of freedom
Multiple R-squared:  0.03597,   Adjusted R-squared:  0.03371 
F-statistic: 15.93 on 1 and 427 DF,  p-value: 7.734e-05

6.6 Compare the lm fit to a loess smooth curve?

qsdat %>% filter(complete.cases(height_cm, haircut)) %>%
    ggplot(., aes(x = height_cm, y = haircut)) +
    geom_point(alpha = 0.3) + 
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = FALSE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) +
    labs(x = "Height (in cm)",
         y = "Price of last haircut (in $)",
         title = "Do taller people pay less for haircuts?")

  • Does a linear model appear to fit these data well?

7 Question D (Height, Haircut Price and Sex)

  • Does the haircut-height relationship look the same within each sex?
qsdat %>% tabyl(sex)

7.1 When do we see the missing sex data?

qsdat %>% tabyl(sex, year)

7.2 Some Numerical Summaries

qsdat %>%
    filter(complete.cases(sex, haircut, height_cm)) %>%
    group_by(sex) %>%
    summarize(n = n(), median(haircut), median(height_cm))

7.3 A scatterplot, identifying the sex of each subject

qsdat %>% filter(complete.cases(height_cm, haircut, sex)) %>%
    ggplot(., aes(x = height_cm, y = haircut, 
                  group = sex, col = sex)) +
    geom_point() + 
    geom_smooth(method = "loess", formula = y ~ x, se = FALSE) +
    labs(x = "Height (in cm)",
         y = "Price of last haircut (in $)",
         title = "Do taller people pay less for haircuts?",
         subtitle = "accounting for sex")

7.4 An alternative approach to identifying sex, with facet_wrap

qsdat %>% filter(complete.cases(height_cm, haircut, sex)) %>%
    ggplot(., aes(x = height_cm, y = haircut, col = sex)) +
    geom_point() + 
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = FALSE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) +
    facet_wrap(~ sex) +
    labs(x = "Height (in cm)",
         y = "Price of last haircut (in $)",
         title = "Do taller people pay less for haircuts?",
         subtitle = "accounting for sex")

7.5 A third scatterplot looking at the same question

qsdat %>% filter(complete.cases(height_cm, haircut, sex)) %>%
    ggplot(., aes(x = height_cm, y = haircut, col = sex)) +
    geom_point() + 
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = FALSE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) +
    facet_grid(sex ~ .) +
    guides(col = "none") +
    labs(x = "Height (in cm)",
         y = "Price of last haircut (in $)",
         title = "Do taller people pay less for haircuts?",
         subtitle = "accounting for sex")

8 Question E (Tobacco and Language Preference in 2021)

  • Do students in the 2021 class have a more substantial history of tobacco use if they prefer to speak a language other than English?
qsdat2021 <- qsdat %>% 
    filter(year == "2021") %>%
    select(student, year, english, smoke)
summary(qsdat2021)
    student            year    english smoke 
 Min.   :202101   2021   :58   n:12    1:51  
 1st Qu.:202115   2014   : 0   y:46    2: 4  
 Median :202130   2015   : 0           3: 3  
 Mean   :202130   2016   : 0                 
 3rd Qu.:202144   2017   : 0                 
 Max.   :202158   2018   : 0                 
                  (Other): 0                 

No missing data.

8.1 Tabulating the categorical variables individually

qsdat2021 %>% tabyl(english)
qsdat2021 %>% tabyl(smoke) %>% adorn_pct_formatting()
  • What does adorn_pct_formatting() do?

8.2 Creating a cross-classification table (2 x 3)

qsdat2021 %>% tabyl(english, smoke)

8.3 Recode the smoke levels to more meaningful names in tobacco

qsdat2021 <- qsdat2021 %>% 
    mutate(tobacco = fct_recode(smoke, 
            "Never" = "1", "Quit" = "2", "Current" = "3"))

Check our work?

qsdat2021 %>% count(smoke, tobacco)
  • Everyone with smoke = 1 has tobacco as Never, etc.

8.4 Restate the cross-tabulation

Now we’ll use this new variable, and this time, add row and column totals.

qsdat2021 %>% tabyl(english, tobacco) %>% 
    adorn_totals(where = c("row", "col"))

Don’t forget to close your file with the session information.

9 Session Information

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggridges_0.5.3    mosaicData_0.20.2 ggformula_0.10.1  ggstance_0.3.5   
 [5] Matrix_1.3-4      lattice_0.20-44   forcats_0.5.1     stringr_1.4.0    
 [9] dplyr_1.0.7       purrr_0.3.4       readr_2.0.1       tidyr_1.1.3      
[13] tibble_3.1.3      ggplot2_3.3.5     tidyverse_1.3.1   magrittr_2.0.1   
[17] knitr_1.33        janitor_2.1.0    

loaded via a namespace (and not attached):
 [1] nlme_3.1-152      fs_1.5.0          lubridate_1.7.10  bit64_4.0.5      
 [5] httr_1.4.2        tools_4.1.1       backports_1.2.1   bslib_0.2.5.1    
 [9] utf8_1.2.2        R6_2.5.1          mgcv_1.8-36       DBI_1.1.1        
[13] colorspace_2.0-2  withr_2.4.2       tidyselect_1.1.1  gridExtra_2.3    
[17] leaflet_2.0.4.1   bit_4.0.4         compiler_4.1.1    cli_3.0.1        
[21] rvest_1.0.1       xml2_1.3.2        ggdendro_0.1.22   labeling_0.4.2   
[25] sass_0.4.0        mosaicCore_0.9.0  scales_1.1.1      digest_0.6.27    
[29] rmarkdown_2.10    pkgconfig_2.0.3   htmltools_0.5.1.1 labelled_2.8.0   
[33] dbplyr_2.1.1      highr_0.9         htmlwidgets_1.5.3 rlang_0.4.11     
[37] readxl_1.3.1      rstudioapi_0.13   jquerylib_0.1.4   farver_2.1.0     
[41] generics_0.1.0    jsonlite_1.7.2    crosstalk_1.1.1   vroom_1.5.4      
[45] Rcpp_1.0.7        munsell_0.5.0     fansi_0.5.0       lifecycle_1.0.0  
[49] stringi_1.7.3     yaml_2.2.1        snakecase_0.11.0  MASS_7.3-54      
[53] plyr_1.8.6        grid_4.1.1        parallel_4.1.1    ggrepel_0.9.1    
[57] crayon_1.4.1      haven_2.4.3       splines_4.1.1     hms_1.1.0        
[61] pillar_1.6.2      reprex_2.0.1      glue_1.4.2        evaluate_0.14    
[65] modelr_0.1.8      vctrs_0.3.8       tzdb_0.1.2        tweenr_1.0.2     
[69] cellranger_1.1.0  gtable_0.3.0      polyclip_1.10-0   assertthat_0.2.1 
[73] xfun_0.25         ggforce_0.3.3     broom_0.7.9       viridisLite_0.4.0
[77] mosaic_1.8.3      ellipsis_0.3.2