Creation date: 2018 April 21
Updated: 2018 April 21
This document provides basic QA/QC assessments for the willow production data updated with 2016 data entered from field sheets.
Analyses aim to characterize basic data structure and identify potential issues such as:
The field “Record Id” has beed added to help indexing rows for QA/QC purposes. It’s a sequential index of rows. Use to look for rows in the original xlsx file…
There are 1217 records in the raw data set.
Questions/Issues:
1. We’ve got varying levels of “missingness” for various key fields (e.g., date, spp). Check field forms
There are 9 records with a missing value for “date”.
There are 15 records with a missing value for “stem_id”.
In the 2017 prod, this is “willid”. Also referened elsewhere as “plant”. We should make consistent. There are 18 records with a missing value for “wildid”.
There are 9 records with a missing value for “year”.
There are 14 records with a missing value for “spp”.
Distinct spp codes
| spp | n |
|---|---|
| beb | 360 |
| boothi | 325 |
| drum | 46 |
| geyer | 429 |
| plantifolia | 7 |
| pseudo | 36 |
| NA | 14 |
| live | n |
|---|---|
| dead | 101 |
| DEAD | 69 |
| live | 862 |
| live? | 1 |
| missing | 68 |
| nd | 101 |
| NEW | 15 |
Questions/Issues:
Need to standardize the encoding for the ‘live’ field: If “dead” and “DEAD” are different, we should come up with a less ambiguous way to encode this.
What about “live” vs. “live?” ? *“nd” vs “NA” vs “missing”?
There are 1155 records with a missing value for “br_scheme”.
Questions/Issues: Does NA mean ubrowsed?
There are 280 records with a missing value for “ub_scheme”.