Load packages

Chapter 10 Homework

This homework will give you practice at working with a measurement dataset: airlift_mass_repeatability.csv. This data set represents repeated measures of “blank” air sampling filters.

A couple notes to consider when reporting answers in response to questions. The microbalance used to make these measurements reads out to the nearest microgram (\(\mu g\)), which is 0.000001 \(g\) or 0.001 \(mg\). Thus, be careful when reporting descriptive statistics so as not to overstate your precision. Use the round() function to avoid reporting more than 0.1 \(\mu g\) of precision (or 0.0001 \(mg\)). Here is some example code that uses the across() function from dplyr:: to round numeric output to just four digits (appropriate for \(mg\) units in this exercise):

dplyr::mutate(across(.cols = where(is.numeric), .fns = round, 3))

Question 1

Import the airlift_mass_repeatability.csv file into a data frame called blanks and perform the following data wrangling in a single pipe:

  • retain only the first 3 columns of data;
  • rename the columns with the names date, id, and mass_mg;
  • convert the date column vector into a date class object using lubridate::
  • convert the id variable to a class factor (this can be accomplished using base::as.factor() or purrr::as_factor())
  • create a new column vector named mass_mg by rescaling the mass_g data (i.e., convert \(g\) to \(mg\) by multiplying mass_g by 1000)
## # A tibble: 5 × 3
##   date       id    mass_mg
##   <date>     <fct>   <dbl>
## 1 2020-03-11 41669    97.8
## 2 2020-03-11 41669    97.8
## 3 2020-03-11 41669    97.8
## 4 2020-03-11 41671    97.6
## 5 2020-03-11 41671    97.6

Question 2:

2a. Are there any NAs present in the data frame?
2b. How many unique filter IDs are present in this data frame?
2c. How many samples are present for each filter ID? Hint: look up the dplyr::count() function.
2d. Over how long of a period were these blank measurements made? Hint: this can be done in base R with a max() - min() or with lubridate::interval() %>% as.duration().

## [1] 0
## [1] 41669 41671 41667 41666 41668
## Levels: 41666 41667 41668 41669 41671
## # A tibble: 5 × 2
##   id        n
##   <fct> <int>
## 1 41666    78
## 2 41667    78
## 3 41668    78
## 4 41669    76
## 5 41671    78
## Time difference of 35 days

2a: There are zero NAs present in the data frame

2b: There are 5 unique filter IDs present in the data frame. (41666, 41667, 41668, 41669, 41671)

2c: 41666 = 78, 41667 = 78, 41668 = 78, 41669 = 76, 41671 = 78

2d: The measurements occurred over a 35 day time period

Question 3

Group the blanks data frame by id and calculate mean, median, and standard deviations for each filter id. Hint: use group_by() %>% summarise() to do this efficiently.

## # A tibble: 5 × 4
##   id     mean median    sd
##   <fct> <dbl>  <dbl> <dbl>
## 1 41666  98.3   98.3 0.001
## 2 41667  95.5   95.5 0.001
## 3 41668  98.0   98.0 0.001
## 4 41669  97.8   97.8 0.001
## 5 41671  97.6   97.6 0.001

Question 4

Calculate the limit of detection (LOD) for this measurement method. Note: you will need to calculate standard deviations for each filter id (as done in question 3) and then estimate LOD from \(LOD = 3\cdot \sigma_b\) where \(\sigma_b\) is calculated for each filter id.

## # A tibble: 5 × 2
##   id      LOD
##   <fct> <dbl>
## 1 41666 0.003
## 2 41667 0.003
## 3 41668 0.003
## 4 41669 0.003
## 5 41671 0.003