Creation date: 2018 April 3
Updated: 2018 April 12
This document provides basic QA/QC assessments for the willow production data updated with 2016 and 2017 data entered from field sheets.
Analyses aim to characterize basic data structure and identify potential issues such as:
There are 1122 records in the raw data set.
Questions/Issues:
1. “biggest_shoot_diameter_mm” is completely empty. Likewise for “cont_sht_len”. Do these fields need to be on the form if they’re not being used?
2. We’ve got varying levels of “missingness” for various key fields (e.g., date, site, plot, spp). Need to Check field forms
There are 14 records with a missing value for “stem_id”.
There are 7 records with a missing value for “stem_id”.
Is this the same thing as “willid”?
There are 9 records with a missing value for “plant”.
There are 14 records with a missing value for “year”.
| live | n |
|---|---|
| dead | 106 |
| DEAD | 39 |
| live | 827 |
| missing | 10 |
| nd | 106 |
| NEW | 2 |
| retag | 3 |
| NA | 29 |
Questions/Issues:
Need to standardize the encoding for the ‘live’ field: “NEW” to “retag”? What about “DEAD” vs. “dead”. According to LM these mean different things… *“nd” vs “NA” vs “missing”?
| br_scheme | n |
|---|---|
| 0 | 398 |
| 1 | 52 |
| 2 | 4 |
| 3 | 2 |
| NA | 666 |
Examine the records with Browse scheme = NA
Questions/Issues: Should NA mean dead stem? Should 0 == live, but occurence of browse/unbrowse (depnding on what you’re recording)
Need to figure out how these field names map to the ones in the 2017 data entered from field forms.
Missing data
Number of unique values