We have many TEROS sensors across the TEMPEST plots, and they often read relatively different things for the “same” location, though patterns are consistent. Additionally, there are some gaps in sensor records, which can lead to spikes or dips when one sensor reading something different from the others cuts in/out. This means we can do some automated QC, but also need to potentially scrub sensors and interpolate. For the record, this largely
First, let’s look at our datasets:
## # A tibble: 321,794 × 6
## id plot datetime_est vwc_m3m3 tsoil_c ec_uscm
## <chr> <chr> <dttm> <dbl> <dbl> <dbl>
## 1 T135 Control 2022-06-20 00:00:00 0.277 18.8 52
## 2 T135 Control 2022-06-20 00:15:00 0.276 18.8 53
## 3 T135 Control 2022-06-20 00:30:00 0.277 18.8 54
## 4 T135 Control 2022-06-20 00:45:00 0.277 18.8 49
## 5 T135 Control 2022-06-20 01:00:00 0.277 18.8 54
## 6 T135 Control 2022-06-20 01:15:00 0.276 18.8 53
## 7 T135 Control 2022-06-20 01:30:00 0.277 18.8 50
## 8 T135 Control 2022-06-20 01:45:00 0.277 18.8 56
## 9 T135 Control 2022-06-20 02:00:00 0.277 18.8 51
## 10 T135 Control 2022-06-20 02:15:00 0.277 18.7 51
## # ℹ 321,784 more rows
What if we just calculate means? Do our plots look okay? NO. There are issues with sensors missing data, which we can see as spikes and dips.
Since I’ve seen those dips/spikes explained in other TEROS data by sensors cutting in and out, let’s count record lengths and compare:
## [1] "T001" "T002" "T003" "T004" "T007" "T008" "T009" "T012" "T013" "T014"
## [11] "T015" "T016" "T017" "T019" "T020" "T021" "T023" "T024" "T025" "T026"
## [21] "T027" "T028" "T029" "T030" "T031" "T032" "T034" "T035" "T036" "T037"
## [31] "T038" "T039" "T041" "T047" "T049" "T050" "T051" "T052" "T053" "T054"
## [41] "T055" "T056" "T057" "T058" "T060" "T062" "T063" "T064" "T065" "T066"
## [51] "T067" "T068" "T069" "T070" "T071" "T072" "T073" "T074" "T075" "T076"
## [61] "T077" "T078" "T082" "T083" "T084" "T086" "T087" "T088" "T090" "T093"
## [71] "T096" "T097" "T102" "T103" "T110" "T112" "T113" "T114" "T115" "T116"
## [81] "T119" "T122" "T125" "T129" "T132" "T135" "T136"
Good news first: we don’t have any sensors that need to be removed, because they are all close to complete records. However, we have a good number of sensors that aren’t fully complete. Let’s look at data gaps: how many and how long?
## [1] 7017
First, we have 188 total missing values, which is <<1% of 65k, so that’s good news.
Good news here too: all sensors are missing less than 2% of their data.
SO. Last check, how long are the gaps?
The maximum gap length is 8 (2 hours). Okay, actual last check: are these gaps happening when we would be expecting maximum change (i.e. during the start of the event)?
They seem somewhat evenly spaced. I’m okay with interplation here compared to the alternative, which I guess is just having spikes which are known artifacts? Let’s go ahead and gap-fill.
## # A tibble: 328,811 × 10
## datetime_est plot id vwc_m3m3 tsoil_c ec_uscm gap_filled
## <dttm> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
## 1 2022-06-20 00:00:00 Control T135 0.277 18.8 52 FALSE
## 2 2022-06-20 00:00:00 Control T100 0.349 19.4 228 FALSE
## 3 2022-06-20 00:00:00 Control T132 0.392 18.7 491 FALSE
## 4 2022-06-20 00:00:00 Control T133 0.307 19.2 58 FALSE
## 5 2022-06-20 00:00:00 Control T019 0.372 18.8 148 FALSE
## 6 2022-06-20 00:00:00 Control T056 0.296 19.1 119 FALSE
## 7 2022-06-20 00:00:00 Control T129 0.388 19 110 FALSE
## 8 2022-06-20 00:00:00 Control T058 0.304 19.1 151 FALSE
## 9 2022-06-20 00:00:00 Control T059 0.420 19.2 133 FALSE
## 10 2022-06-20 00:00:00 Control T060 0.377 18.9 146 FALSE
## # ℹ 328,801 more rows
## # ℹ 3 more variables: vwc_m3m3_filled <dbl>, tsoil_c_filled <dbl>,
## # ec_uscm_filled <dbl>
We FINALLLY have data that look sensible (black lines above). Let’s bin by mean, plot one more time to make sure, then export.