Creation date: 2018 February 27

Updated: 2018 March 22

Introduction

This document provides basic QA/QC assessments for the willow height dataset updated with 2016 and 2017 data entered from field sheets (version 2018-03-21). Analyses aim to characterize basic data structure and identify potential issues such as:

Basic exploratory analyses

Pre-process data

  • Type conversions (e.g., ID’s, treatment codes to character)
  • Identify then filter NA/missing for “plantht”
  • Create fields defining treatment classes and incorporate into siteid (“siteid2”, as defined in the water table dataset)
  • Create field combining willowid (not unique across all records) with site2

Questions/Issues:
1. “NA”, “na” and “missing” inconsistently used for ‘plantht’ 2. Unsure on the interpretation of the “2” for the dam field. Three unique values: 0,1,2. Expected only 0 and 1. Interpreted as “obs” 3. Any special meaning to “ID” field or is this a file-specific index? 4. In previous QA/QC checks, it was noted that willid is not unique across all sites. Temporary fix combining “willid” and “siteid2”

CLEANING: make the missing values for ‘plantht’ consistent

Issue:: As noted above, “NA”, “na” and “missing” inconsistently used for ‘plantht’

Solution:: Values for plantht equal to “missing” or “na” were changed to NA

CLEANING: For plotting and analysis, remove all NA and type convert ‘plantht’ to numeric

Cross-tabulation: Site vs year for Spring

Cross-tabulation: Site vs year for Fall

Cross-tabulation: Willow id vs year for Spring

  • Spring plantht measurements are missing for large stretches (2002-2016).
  • 2017 records have no data (n=0) and duplicate (n=2 or 3)

Cross-tabulation: Willow id vs year for Fall

  • Fall measurements are more complete over the period of record, but missing data many records have no data (n=0) or duplicate (n>1)

CLEANING: Bring in missing historical data

From the above analyses, it’s clear that not all of the historical data are present. Specifically missing are lots of Spring height measurements.

## Parsed with column specification:
## cols(
##   ID1 = col_integer(),
##   site = col_character(),
##   year = col_integer(),
##   season = col_character(),
##   willid = col_integer(),
##   exp = col_integer(),
##   dam = col_integer(),
##   browse = col_integer(),
##   plantht = col_integer(),
##   browseintensity = col_double(),
##   production = col_double()
## )
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 5 parsing failures.
## row # A tibble: 5 x 5 col     row col     expected               actual file                         expected   <int> <chr>   <chr>                  <chr>  <chr>                        actual 1  2827 plantht no trailing characters .5     'D:/Dropbox/PROJECTS/Yell_N~ file 2  2832 plantht no trailing characters .5     'D:/Dropbox/PROJECTS/Yell_N~ row 3  2836 plantht no trailing characters .5     'D:/Dropbox/PROJECTS/Yell_N~ col 4  3564 plantht no trailing characters .158   'D:/Dropbox/PROJECTS/Yell_N~ expected 5  6528 plantht no trailing characters .4     'D:/Dropbox/PROJECTS/Yell_N~
##  [1] "ID1"             "site"            "year"           
##  [4] "season"          "willid"          "exp"            
##  [7] "dam"             "browse"          "plantht"        
## [10] "browseintensity" "production"
##  [1] "ID"           "site"         "year"         "season"      
##  [5] "willid"       "exp"          "dam"          "browse"      
##  [9] "plantht"      "treat"        "site2"        "willid.site2"

Cross-tabulation: Site vs year for Spring

Cross-tabulation: Site vs year for Fall

Cross-tabulation: Willow id vs year for Spring

  • Spring plantht measurements are still missing data for spring 2016.

Cross-tabulation: Willow id vs year for Fall

Create table of duplicate willid records

Duplicate willowid search: group_by(willid,year,season)

Inlcludes diplicate willowid across sites, seasons, and years…

Duplicate willowid search: group_by(willid,year,season, site)

Only willid duplicated within a site, season, and year These may represent moved tags.

Graphical evalaution

Heat map: count of observations by “willid” and “year” – Spring

Heat map: count of observations by “willid” and “year” – Fall

## Warning: Removed 285 rows containing non-finite values (stat_boxplot).

boxplots of Spring height measurements

## Warning: Removed 283 rows containing non-finite values (stat_boxplot).

boxplots of Spring height measurements

## Warning: Removed 2 rows containing non-finite values (stat_boxplot).