1 Goals and Objectives

1.1 Three Research Questions

  1. Among counties ranked in the 2021 County Health Rankings data, what is the nature of the association between the number of counties in a state, and the state’s population?
  2. Which states have larger populations than we’d expect, based on their number of counties?
  3. Which states have smaller populations than we’d expect, based on their county count?

1.2 What is new here?

  1. Using group_by and summarize to aggregate hierarchical data to a new level of analysis (from counties to “states.”)
  2. Labeling all of the points within a scatterplot.
  3. Switching from scientific notation to “comma” notation in a ggplot.
  4. The complete set of glance() results for a linear model.
  5. Answering a scientific question about the efficacy of a model in terms of poorly-fit points.
  6. Assessing the largest outlier in a regression model with standardized residuals, studentized residuals, and an outlier test provided by outlierTest() from the car package.
  7. Using the Box-Cox ladder (family) of power transformations and the boxCox() and powerTransform() functions provided by the car package.
  8. Developing a Cleveland dot plot to visualize the results of a situation where we might otherwise use a bar chart.

2 Setup

2.1 The YAML code used here

The YAML code used in this example is listed below…

title: "States, Counties and Population"
author: "Thomas E. Love"
date: "An Example for 431 Class 13: 2021-10-05"
output: 
    html_document:
        code_folding: show
        toc: TRUE
        toc_float: TRUE
        number_sections: TRUE
        df_print: paged

2.2 Packages and Settings

knitr::opts_chunk$set(comment=NA)

library(broom)
library(car) # note: new today
library(glue)
library(janitor)
library(tidyverse)

theme_set(theme_light())

3 Data Ingest and Key Variables

We’re looking at the 2021 County Health Rankings data you’re using for Project A.

  • Data are available at the county level.
  • Variable v051_rawvalue describes the county’s population.
  • county_ranked indicates if the county is ranked by County Health Rankings.
data_url <- "https://www.countyhealthrankings.org/sites/default/files/media/document/analytic_data2021.csv"
chr_2021_raw <- read_csv(data_url, skip = 1)

chr_2021_today <- chr_2021_raw %>% 
    select(fipscode, state, county, 
           county_ranked, v051_rawvalue) %>%
    filter(county_ranked == 1)

3.1 State-Specific Counts

For each of the “states” we are counting:

  • the number of ranked counties in 2021 CHR data
  • the population across those counties

Here, we use group_by() and summarize() to help us change the unit of analysis from the county to the “state” (I use “state” since DC is included.)

table1 <- chr_2021_today %>% 
    group_by(state) %>%
    summarize(ranked_counties = n(), 
              state_popn = sum(v051_rawvalue))

table1
  • table1 will be our data set for the remaining work in this Example.
  • The ranked counties identified here, in total, account for 3081 counties and 328175371 people.

4 Modeling: Approach A

4.1 A labeled scatterplot

Model state population as a function of its number of ranked counties.

ggplot(table1, aes(x = ranked_counties, 
                   y = state_popn, label = state)) +
    geom_label(size = 3) +
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = TRUE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) + 
    labs(title = "Approach A: No Transformation",
         subtitle = glue('Pearson correlation of state_popn and ranked_counties is {round_half_up(cor(table1$state_popn, table1$ranked_counties),2)}'),
         caption = "Note that, for example, 1e+07 = 10 million residents")

4.2 Eliminating Scientific Notation

Note the use of scales::comma here.

ggplot(table1, aes(x = ranked_counties, 
                   y = state_popn, label = state)) +
    geom_label(size = 3) +
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = TRUE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) + 
    scale_y_continuous(labels = scales::comma) +
    labs(title = "Approach A: No Transformation",
         subtitle = glue('Pearson correlation of state_popn and ranked_counties is {round_half_up(cor(table1$state_popn, table1$ranked_counties),2)}'))

4.3 Model A

modelA <- lm(state_popn ~ ranked_counties, data = table1)
tidy(modelA, conf.int = TRUE, conf.level = 0.95)

5 What is in the glance() function for a linear model?

glance(modelA)
  • \(R^2\) or r.squared is the proportion of variation in our outcome (here, state_popn) accounted for by the model using ranked_counties as a predictor. As you might expect, this is the square of the Pearson correlation coefficient.

  • Adjusted \(R^2\), or adj.r.squared is not a proportion of anything, but rather an index that penalizes the \(R^2\) value for the number of predictors in the model. This helps combat against a certain form of overfitting, where the greediness of \(R^2\) (it never decreases even if you add predictors with no association with the outcome) causes model-builders to overstate the quality of fit.

  • If \(n\) is the number of observations used to fit the model, and \(p\) is the number of predictors (not counting the intercept term) used in the model, then:

\[ R^2_{adj} = 1 - \left[ \frac{(1 - R^2)(n - 1)}{n - p - 1} \right] \]

  • The estimated standard error of the residuals, .sigma or \(\hat{\sigma}\).
  • The test statistic (F test), called statistic, which helps generate the …
  • p value for the ANOVA F test, called p.value.
  • df = degrees of freedom from the numerator of the ANOVA F test statistic, which is essentially a count of the non-intercept coefficients estimated by the model. This was \(p\) in our discussion of adjusted \(R^2\).
  • logLik = the log-likelihood of the model, which will help us estimate …
  • AIC = the Akaike Information Criterion for the model (lower values indicate better fits)
  • BIC = the Bayesian Information Criterion for the model (lower values indicate better fits)
  • deviance = the model deviance, which we’ll skip for now…
  • df.residual = degrees of freedom from the residual (denominator) of the ANOVA F test. This is \(n - p - 1\) from our adjusted \(R^2\) discussion.
  • nobs = number of observations used to fit the model. This is \(n\).

6 Assessing Residuals (Model A)

6.1 Basic Two Residual Plots

par(mfrow = c(1,2))
plot(modelA, which = c(1:2))

  • Consider what these plots have to say about the assumptions of linearity, constant variance and Normality.
  • Is it reasonable to assume that the data and model A support the assumptions of linear regression?

Which states have the largest residuals? Looks like row numbers 5, 10 and 35

modA_aug <- augment(modelA, data = table1)

slice(modA_aug, c(5, 10, 35))

6.2 Addressing Question 2

Which states have more population than we’d expect, based on their number of counties?

Here are the six states with the largest positive residual from Model A.

modA_aug %>% arrange(desc(.resid)) %>% head()
  • For example, California (CA) has 58 counties, which creates a prediction of 6.255 million people instead of the 39.5 million in population it actually has, so the residual is 33.256 million people.
  • What are the characteristics of these states? Note that the list below shows the top six states in terms of population.
modA_aug %>% arrange(desc(state_popn)) %>% head()

6.3 Addressing Question 3

Which states have smaller populations than we’d expect, based on their county count?

Here are the six states with the largest negative residual from Model A.

modA_aug %>% arrange(.resid) %>% head()
  • For example, Kansas (KS) has 104 counties, which creates a prediction of 9.669 million people instead of the 2.912 million in population Kansas’ counties have, so the residual is 6.756 million people.

6.4 Standardized Residuals

  • What does a standardized residual of 5.1 (for California) mean in a sample of 51?
  • The standardized residual is compared to a Normal distribution with mean 0 and standard deviation 1.
pnorm(q = 5.1, mean = 0, sd = 1, lower.tail = FALSE)
[1] 1.698267e-07
  • So, 1.7e-07 or 0.00000017 is the probability that a Normal distribution with mean 0 and standard deviation 1 would give us a result of 5.1 or higher.
  • 5 standard deviations above the mean should happen less often than 1.7 times in 10,000,000 attempts.
  • Can we give some level of understanding/context to that scale?
    • US population average deaths due to lightning per year is usually around 30, so the population yearly average is about 1 in 10 million.
  • Sometimes it’s helpful to think of a microlife - a unit of risk representing half an hour change of life expectancy, 1,000,000 half-hours is about 57 years and that roughly corresponds to a lifetime of adult exposure.
    • For instance, taking a statin is generally gaining you about 1 microlife per day of exposure, based on estimated life expectancy effects for men and women aged 35 years.
    • More examples are available on the microlife Wikipedia page and similar information is available on a [micromort] which is a unit of risk defined as a 1 in a million chance of death.
    • 1.7 ten-millionths of an average adult lifetime is about a 1/6 of a microlife, or about a 5 minute change in adult life expectancy.
    • The risk here is roughly equivalent to the additional risk expectancy associated with traveling 1 mile by motorcycle.

6.5 Standardized vs. Studentized Residuals

  • The residual for the \(i\)th observation is \(e_i\) = observed - predicted value for the \(i\)th observation
  • We have at least two different types of standardized residuals, in each case obtained by dividing the residuals by an estimate of its standard deviation.
    • Standardized residuals (also called internally studentized residuals) are calculated by taking into account the data point’s leverage on the model.
    • Studentized residuals (also called externally studentized residuals) are calculated from a model fit to every data point except the one you’re evaluating.

Technically, a standardized residual \(r_i\) (symbolized as .std.resid in augment() output from the broom package is calculated as:

\[ r_i = \frac{e_i}{MSE \sqrt(1-h_i)} \] where \(h_i\) is the leverage of the \(i\)th observation.

  • The leverage is symbolized by .hat in our augment() output from the broom package. It measures the degree to which the observation is an outlier along the predictor.
  • You can also obtain standardized residuals with rstandard() and studentized residuals with rstudent() applied to your model.
  • As we’ll see, the outlierTest() function uses a slightly different approaches to calculate the studentized results.
  • We usually use studentized residuals in testing outliers.
rstandard(modelA)[5]; rstudent(modelA)[5]
      5 
5.09585 
       5 
7.356453 

6.6 The outlierTest() function from car

Another approach is to use the outlierTest() function (available in the car package) to see if the most unusual value (in absolute value) is a surprise given the sample size. This uses the studentized residual rather than the standardized residual, which is just another way of measuring the outlier’s distance, but now not including that point in the calculation of the overall variation in the residuals.

outlierTest(modelA)
  rstudent unadjusted p-value Bonferroni p
5 7.356453         2.0938e-09   1.0678e-07

Our conclusion from the Bonferroni p value here (which is very, very small) is that there’s evidence to support the belief that we could have a serious problem with Normality, in that a studentized residual of 7.36 is out of the range we might reasonably expect to see given a Normal distribution of errors and 51 observations.

At any rate, California looks like a real outlier. Can we do better than our Model A?

7 Considering Potential Transformations

7.1 The Box-Cox ladder (family) of Power Transformations

\(\lambda\) Transformation Easy to Interpret
-2 \(1/y^2\), the inverse square Not typically
-1 \(1/y\), the inverse OK
-0.5 \(1/\sqrt{y}\), the inverse square root Not typically
0 \(log(y)\), the logarithm OK
0.5 \(\sqrt{y}\), the square root OK
1 y, the original (raw) outcome OK
2 \(y^2\), the square OK
3 \(y^3\), the cube Not typically
  • The easiest to interpret of these transformations are the inverse, logarithm, square root, and square. Use the others only in dire circumstances.

7.2 Using the boxCox() function from the car package

We’ll use the boxCox() function from the car package today, and also the powerTransform() function from the same package to help us identify a potential choice of power transformation.

boxCox(modelA)

powerTransform(modelA)
Estimated transformation parameter 
        Y1 
0.02489896 

So what transformation is this plot suggesting we try?

8 Approach B

Model the logarithm of state population as a function of its number of ranked counties.

ggplot(table1, aes(x = ranked_counties, y = log(state_popn), 
                   label = state)) +
    geom_label(size = 3) +
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = TRUE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) + 
    labs(title = "Approach B",
         subtitle = glue('Pearson correlation of log(state_popn) and ranked_counties is {round_half_up(cor(log(table1$state_popn), table1$ranked_counties),2)}'))

modelB <- lm(log(state_popn) ~ ranked_counties, 
             data = table1)
tidy(modelB, conf.int = TRUE, conf.level = 0.95)
glance(modelB)

8.1 Assessing Residuals (Model B)

par(mfrow = c(1,2))
plot(modelB, which = c(1:2))

  • Consider what these plots have to say about the assumptions of linearity, constant variance and Normality.
  • Is it reasonable to assume that the data and model B support the assumptions of linear regression?

Now, which states have the largest residuals? Still looks like row numbers 5, 10 and 35

modB_aug <- augment(modelB, data = table1)

slice(modB_aug, c(5, 10, 35))

8.2 Studentized Residuals?

The largest standardized residual is still California, but now it is 2.69. Is that a plausible value for the maximum of 51 values if the (standardized) residuals actually follow a Normal distribution?

pnorm(2.69, mean = 0, sd = 1, lower.tail = FALSE)
[1] 0.003572601

Or we could look at the studentized residual instead.

outlierTest(modelB)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
  rstudent unadjusted p-value Bonferroni p
5 2.888511          0.0057913      0.29536

Our conclusion from the Bonferroni p value here (which is larger than 0.05) is that there’s no evidence in the California value to suggest a serious problem with Normality. A maximum studentized residual of 2.89 is out of the range we might reasonably expect to see given a Normal distribution of errors and 51 observations.

This seems more reasonable. Could we do better still by taking the log of the predictor, as well?

9 Approach C

Model the logarithm of state population as a function of the logarithm of its number of ranked counties.

ggplot(table1, aes(x = log(ranked_counties), 
                   y = log(state_popn), label = state)) +
    geom_label(size = 3) +
    geom_smooth(method = "lm", col = "red",
                formula = y ~ x, se = TRUE) +
    geom_smooth(method = "loess", col = "blue",
                formula = y ~ x, se = FALSE) + 
    labs(title = "Approach C",
         subtitle = glue('Pearson correlation of log(state_popn) and log(ranked_counties) is {round_half_up(cor(log(table1$state_popn), log(table1$ranked_counties)),2)}'))

modelC <- lm(log(state_popn) ~ log(ranked_counties), 
             data = table1)
tidy(modelC, conf.int = TRUE, conf.level = 0.95)
glance(modelC)

9.1 Assessing Residuals (Model C)

par(mfrow = c(1,2))
plot(modelC, which = c(1:2))

  • Again, consider what these plots have to say about the assumptions of linearity, constant variance and Normality.
  • Is it reasonable to assume that the data and model C support the assumptions of linear regression?

Now, which states have the largest residuals in absolute value? Row numbers 5, 29 and 42

modC_aug <- augment(modelC, data = table1)

slice(modC_aug, c(5, 29, 42))

9.2 Studentized Residuals?

The largest standardized residual is still California, but now it is 2.13. Is that a plausible value for the maximum of 51 values if the (standardized) residuals actually follow a Normal distribution?

pnorm(2.46, mean = 0, sd = 1, lower.tail = FALSE)
[1] 0.006946851

Or we could look at the studentized residual instead.

outlierTest(modelC)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
  rstudent unadjusted p-value Bonferroni p
5 2.599649           0.012363      0.63052

10 Comparing B to C

Since Models B and C predict the same outcome (log(state_popn)) we can compare them directly on in-sample performance using glance.

bind_rows(glance(modelB), glance(modelC)) %>%
    mutate(model = c("B", "C"))

Model C looks like the stronger performer on all five summaries we’ve discussed.

Summary Stronger Model
\(R^2\) C (0.313 vs. 0.303)
Adjusted \(R^2\) C (0.299 vs. 0.288)
\(\sigma\) C (0.875 vs. 0.882)
AIC C (135.1 vs. 135.8)
BIC C (140.9 vs. 141.6)

10.1 Addressing Question 2 again

Using model C, which states have larger population than we’d expect, based on their number of counties?

Here are the six states with the largest positive residual from Model C.

modC_aug %>% mutate(logpop = log(state_popn)) %>%
    select(state, ranked_counties, state_popn, logpop, 
           .fitted, .resid, everything()) %>%
    arrange(desc(.resid)) %>% head()

10.2 Addressing Question 3 again

Using model C, which states have smaller population than we’d expect, based on their number of counties?

Here are the six states with the largest negative residual from Model C.

modC_aug %>% mutate(logpop = log(state_popn)) %>%
    select(state, ranked_counties, state_popn, logpop, 
           .fitted, .resid, everything()) %>%
    arrange(.resid) %>% head()

Which states are the smallest by population?

modC_aug %>% arrange(state_popn) %>% head()

11 Approach D

Calculate population per county in each state and use a bar chart and then a Cleveland dot plot to visualize the result.

table1 <- table1 %>%
    mutate(county_size = state_popn / ranked_counties)

11.1 Population per County by State: A Bar Chart

Let’s use geom_col() to make a bar chart to show our continuous variable (county_size) broken down by a categorical variable (state).

ggplot(table1, aes(x = state, y = county_size)) +
    geom_col() 

How could we improve this plot?

ggplot(table1, aes(x = reorder(state, desc(county_size)), 
                   y = county_size)) +
    geom_col(fill = "dodgerblue", 
             col = "black") +
    scale_y_continuous(labels = scales::comma) +
    theme(axis.text.x = element_text(size = 5)) +
    labs(y = "Population per Ranked County",
         x = "U.S. State or District")

Still tough to work with, I think.

11.2 A Cleveland Dot Plot

Cleveland dot plots are an excellent alternative to a simple bar chart, especially when you are comparing more than a few items.

ggplot(table1, aes(x = county_size, 
                   y = reorder(state, county_size))) +
    geom_point() +
    theme_bw() +
    scale_x_continuous(labels = scales::comma, 
                       sec.axis = dup_axis()) +
    labs(y = "State", x = "Population per County")

table1 %>% arrange(county_size)

12 Summarizing

12.1 Questions 1-3

We’ll leave it to you to craft a reasonable set of conclusions on the issues of interest, in light of the output provided here.

12.2 What Was New Here?

  1. Using group_by and summarize to aggregate hierarchical data to a new level of analysis (from counties to “states.”)
  2. Labeling all of the points within a scatterplot.
  3. Switching from scientific notation to “comma” notation in a ggplot.
  4. The complete set of glance() results for a linear model.
  5. Answering a scientific question about the efficacy of a model in terms of poorly-fit points.
  6. Assessing the largest outlier in a regression model with standardized residuals, studentized residuals, and an outlier test provided by outlierTest() from the car package.
  7. Using the Box-Cox ladder (family) of power transformations and the boxCox() and powerTransform() functions provided by the car package.
  8. Developing a Cleveland dot plot to visualize the results of a situation where we might otherwise use a bar chart.

12.3 Session Information

sessioninfo::session_info()
- Session info ---------------------------------------------------------------
 setting  value                       
 version  R version 4.1.1 (2021-08-10)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RTerm                       
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2021-10-05                  

- Packages -------------------------------------------------------------------
 package     * version date       lib source        
 abind         1.4-5   2016-07-21 [1] CRAN (R 4.1.0)
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
 backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
 bit           4.0.4   2020-08-04 [1] CRAN (R 4.1.0)
 bit64         4.0.5   2020-08-30 [1] CRAN (R 4.1.0)
 broom       * 0.7.9   2021-07-27 [1] CRAN (R 4.1.0)
 bslib         0.3.0   2021-09-02 [1] CRAN (R 4.1.1)
 car         * 3.0-11  2021-06-27 [1] CRAN (R 4.1.0)
 carData     * 3.0-4   2020-05-22 [1] CRAN (R 4.1.0)
 cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.1.0)
 cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
 colorspace    2.0-2   2021-06-24 [1] CRAN (R 4.1.0)
 crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
 curl          4.3.2   2021-06-23 [1] CRAN (R 4.1.0)
 data.table    1.14.0  2021-02-21 [1] CRAN (R 4.1.0)
 DBI           1.1.1   2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.1.0)
 digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
 dplyr       * 1.0.7   2021-06-18 [1] CRAN (R 4.1.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
 evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
 fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
 farver        2.1.0   2021-02-28 [1] CRAN (R 4.1.0)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
 forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.1.0)
 foreign       0.8-81  2020-12-22 [1] CRAN (R 4.1.0)
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
 generics      0.1.0   2020-10-31 [1] CRAN (R 4.1.0)
 ggplot2     * 3.3.5   2021-06-25 [1] CRAN (R 4.1.0)
 glue        * 1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
 gtable        0.3.0   2019-03-25 [1] CRAN (R 4.1.0)
 haven         2.4.3   2021-08-04 [1] CRAN (R 4.1.0)
 highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
 hms           1.1.0   2021-05-17 [1] CRAN (R 4.1.0)
 htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.1.0)
 janitor     * 2.1.0   2021-01-05 [1] CRAN (R 4.1.0)
 jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.1.0)
 jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.1.0)
 knitr         1.34    2021-09-09 [1] CRAN (R 4.1.1)
 labeling      0.4.2   2020-10-20 [1] CRAN (R 4.1.0)
 lattice       0.20-44 2021-05-02 [1] CRAN (R 4.1.0)
 lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.1.0)
 lubridate     1.7.10  2021-02-26 [1] CRAN (R 4.1.0)
 magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
 Matrix        1.3-4   2021-06-01 [2] CRAN (R 4.1.1)
 mgcv          1.8-36  2021-06-01 [2] CRAN (R 4.1.1)
 modelr        0.1.8   2020-05-19 [1] CRAN (R 4.1.0)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.1.0)
 nlme          3.1-152 2021-02-04 [2] CRAN (R 4.1.1)
 openxlsx      4.2.4   2021-06-16 [1] CRAN (R 4.1.0)
 pillar        1.6.2   2021-07-29 [1] CRAN (R 4.1.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
 purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.0)
 Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
 readr       * 2.0.1   2021-08-10 [1] CRAN (R 4.1.1)
 readxl        1.3.1   2019-03-13 [1] CRAN (R 4.1.0)
 reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
 rio           0.5.27  2021-06-21 [1] CRAN (R 4.1.0)
 rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
 rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.1)
 rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
 rvest         1.0.1   2021-07-26 [1] CRAN (R 4.1.0)
 sass          0.4.0   2021-05-12 [1] CRAN (R 4.1.0)
 scales        1.1.1   2020-05-11 [1] CRAN (R 4.1.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
 snakecase     0.11.0  2019-05-25 [1] CRAN (R 4.1.0)
 stringi       1.7.4   2021-08-25 [1] CRAN (R 4.1.1)
 stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
 tibble      * 3.1.4   2021-08-25 [1] CRAN (R 4.1.1)
 tidyr       * 1.1.3   2021-03-03 [1] CRAN (R 4.1.0)
 tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.1.0)
 tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.1.0)
 tzdb          0.1.2   2021-07-20 [1] CRAN (R 4.1.0)
 utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
 vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
 vroom         1.5.5   2021-09-14 [1] CRAN (R 4.1.1)
 withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
 xfun          0.25    2021-08-06 [1] CRAN (R 4.1.0)
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.1.0)
 yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
 zip           2.2.0   2021-05-31 [1] CRAN (R 4.1.0)

[1] C:/Users/Thomas/Documents/R/win-library/4.1
[2] C:/Program Files/R/R-4.1.1/library