231121_TEROS

Strategy

We have many TEROS sensors across the TEMPEST plots, and they often read relatively different things for the “same” location, though patterns are consistent. Additionally, there are some gaps in sensor records, which can lead to spikes or dips when one sensor reading something different from the others cuts in/out. This means we can do some automated QC, but also need to potentially scrub sensors and interpolate. For the record, this largely

First, let’s look at our datasets:

## # A tibble: 321,794 × 6
##    id    plot    datetime_est        vwc_m3m3 tsoil_c ec_uscm
##    <chr> <chr>   <dttm>                 <dbl>   <dbl>   <dbl>
##  1 T135  Control 2022-06-20 00:00:00    0.277    18.8      52
##  2 T135  Control 2022-06-20 00:15:00    0.276    18.8      53
##  3 T135  Control 2022-06-20 00:30:00    0.277    18.8      54
##  4 T135  Control 2022-06-20 00:45:00    0.277    18.8      49
##  5 T135  Control 2022-06-20 01:00:00    0.277    18.8      54
##  6 T135  Control 2022-06-20 01:15:00    0.276    18.8      53
##  7 T135  Control 2022-06-20 01:30:00    0.277    18.8      50
##  8 T135  Control 2022-06-20 01:45:00    0.277    18.8      56
##  9 T135  Control 2022-06-20 02:00:00    0.277    18.8      51
## 10 T135  Control 2022-06-20 02:15:00    0.277    18.7      51
## # ℹ 321,784 more rows

Raw data plots

Temp

VWC

EC

Level 0 - the easiest possible solution

What if we just calculate means? Do our plots look okay? NO. There are issues with sensors missing data, which we can see as spikes and dips.

Raw data plots

Temp

VWC

EC

Time-series completeness

Since I’ve seen those dips/spikes explained in other TEROS data by sensors cutting in and out, let’s count record lengths and compare:

##  [1] "T001" "T002" "T003" "T004" "T007" "T008" "T009" "T012" "T013" "T014"
## [11] "T015" "T016" "T017" "T019" "T020" "T021" "T023" "T024" "T025" "T026"
## [21] "T027" "T028" "T029" "T030" "T031" "T032" "T034" "T035" "T036" "T037"
## [31] "T038" "T039" "T041" "T047" "T049" "T050" "T051" "T052" "T053" "T054"
## [41] "T055" "T056" "T057" "T058" "T060" "T062" "T063" "T064" "T065" "T066"
## [51] "T067" "T068" "T069" "T070" "T071" "T072" "T073" "T074" "T075" "T076"
## [61] "T077" "T078" "T082" "T083" "T084" "T086" "T087" "T088" "T090" "T093"
## [71] "T096" "T097" "T102" "T103" "T110" "T112" "T113" "T114" "T115" "T116"
## [81] "T119" "T122" "T125" "T129" "T132" "T135" "T136"

Good news first: we don’t have any sensors that need to be removed, because they are all close to complete records. However, we have a good number of sensors that aren’t fully complete. Let’s look at data gaps: how many and how long?

## [1] 7017

First, we have 188 total missing values, which is <<1% of 65k, so that’s good news.

Good news here too: all sensors are missing less than 2% of their data. SO. Last check, how long are the gaps?

The maximum gap length is 8 (2 hours). Okay, actual last check: are these gaps happening when we would be expecting maximum change (i.e. during the start of the event)?

They seem somewhat evenly spaced. I’m okay with interplation here compared to the alternative, which I guess is just having spikes which are known artifacts? Let’s go ahead and gap-fill.

## # A tibble: 328,811 × 10
##    datetime_est        plot    id    vwc_m3m3 tsoil_c ec_uscm gap_filled
##    <dttm>              <chr>   <chr>    <dbl>   <dbl>   <dbl> <lgl>     
##  1 2022-06-20 00:00:00 Control T135     0.277    18.8      52 FALSE     
##  2 2022-06-20 00:00:00 Control T100     0.349    19.4     228 FALSE     
##  3 2022-06-20 00:00:00 Control T132     0.392    18.7     491 FALSE     
##  4 2022-06-20 00:00:00 Control T133     0.307    19.2      58 FALSE     
##  5 2022-06-20 00:00:00 Control T019     0.372    18.8     148 FALSE     
##  6 2022-06-20 00:00:00 Control T056     0.296    19.1     119 FALSE     
##  7 2022-06-20 00:00:00 Control T129     0.388    19       110 FALSE     
##  8 2022-06-20 00:00:00 Control T058     0.304    19.1     151 FALSE     
##  9 2022-06-20 00:00:00 Control T059     0.420    19.2     133 FALSE     
## 10 2022-06-20 00:00:00 Control T060     0.377    18.9     146 FALSE     
## # ℹ 328,801 more rows
## # ℹ 3 more variables: vwc_m3m3_filled <dbl>, tsoil_c_filled <dbl>,
## #   ec_uscm_filled <dbl>

Summarize and export final dataset

We FINALLLY have data that look sensible (black lines above). Let’s bin by mean, plot one more time to make sure, then export.

231121_TEROS_cleanup

2023-11-21

Strategy

Raw data plots

Temp

VWC

EC

Level 0 - the easiest possible solution

Raw data plots

Temp

VWC

EC

Time-series completeness

Summarize and export final dataset

Temp

VWC

EC