This homework will give you practice at working with a measurement
dataset: airlift_mass_repeatability.csv. This data set
represents repeated measures of “blank” air sampling filters.
A couple notes to consider when reporting answers in response to
questions. The microbalance used to make these measurements reads out to
the nearest microgram (\(\mu g\)),
which is 0.000001 \(g\) or 0.001 \(mg\). Thus, be careful when reporting
descriptive statistics so as not to overstate your
precision. Use the round() function to
avoid reporting more than 0.1 \(\mu g\)
of precision (or 0.0001 \(mg\)). Here
is some example code that uses the across() function from
dplyr:: to round numeric output to just four digits
(appropriate for \(mg\) units in this
exercise):
dplyr::mutate(across(.cols = where(is.numeric), .fns = round, 3))
Import the airlift_mass_repeatability.csv file into a
data frame called blanks and perform the following data
wrangling in a single pipe:
date,
id, and mass_mg;date column vector into a date class object
using lubridate::id variable to a class factor
(this can be accomplished using base::as.factor() or
purrr::as_factor())mass_mg by rescaling
the mass_g data (i.e., convert \(g\) to \(mg\) by multiplying mass_g by
1000)## # A tibble: 5 × 3
## date id mass_mg
## <date> <fct> <dbl>
## 1 2020-03-11 41669 97.8
## 2 2020-03-11 41669 97.8
## 3 2020-03-11 41669 97.8
## 4 2020-03-11 41671 97.6
## 5 2020-03-11 41671 97.6
2a. Are there any NAs present in the data frame?
2b. How many unique filter IDs are present in this data frame?
2c. How many samples are present for each filter ID? Hint: look up the
dplyr::count() function.
2d. Over how long of a period were these blank measurements made? Hint:
this can be done in base R with a max() - min() or with
lubridate::interval() %>% as.duration().
## [1] 0
## [1] 41669 41671 41667 41666 41668
## Levels: 41666 41667 41668 41669 41671
## # A tibble: 5 × 2
## id n
## <fct> <int>
## 1 41666 78
## 2 41667 78
## 3 41668 78
## 4 41669 76
## 5 41671 78
## Time difference of 35 days
2a: There are zero NAs present in the data frame
2b: There are 5 unique filter IDs present in the data frame. (41666, 41667, 41668, 41669, 41671)
2c: 41666 = 78, 41667 = 78, 41668 = 78, 41669 = 76, 41671 = 78
2d: The measurements occurred over a 35 day time period
Group the blanks data frame by id and
calculate mean, median, and standard deviations for each filter id.
Hint: use group_by() %>% summarise() to do this
efficiently.
## # A tibble: 5 × 4
## id mean median sd
## <fct> <dbl> <dbl> <dbl>
## 1 41666 98.3 98.3 0.001
## 2 41667 95.5 95.5 0.001
## 3 41668 98.0 98.0 0.001
## 4 41669 97.8 97.8 0.001
## 5 41671 97.6 97.6 0.001
Calculate the limit of detection (LOD) for this measurement method.
Note: you will need to calculate standard deviations for each filter
id (as done in question 3) and then estimate LOD from \(LOD = 3\cdot \sigma_b\) where \(\sigma_b\) is calculated for each filter
id.
## # A tibble: 5 × 2
## id LOD
## <fct> <dbl>
## 1 41666 0.003
## 2 41667 0.003
## 3 41668 0.003
## 4 41669 0.003
## 5 41671 0.003