Creation date: 2018 April 3

Updated: 2018 April 12

Introduction

This document provides basic QA/QC assessments for the willow production data updated with 2016 and 2017 data entered from field sheets.

Analyses aim to characterize basic data structure and identify potential issues such as:

2017 Data import and initial inspection

There are 1122 records in the raw data set.

QA/QC checks: 2017

Questions/Issues:
1. “biggest_shoot_diameter_mm” is completely empty. Likewise for “cont_sht_len”. Do these fields need to be on the form if they’re not being used?
2. We’ve got varying levels of “missingness” for various key fields (e.g., date, site, plot, spp). Need to Check field forms

Missing data: dates

There are 14 records with a missing value for “stem_id”.

Missing data: Stem ID

There are 7 records with a missing value for “stem_id”.

Missing data: Plant

Is this the same thing as “willid”?

There are 9 records with a missing value for “plant”.

Missing data: year

There are 14 records with a missing value for “year”.

live

live n
dead 106
DEAD 39
live 827
missing 10
nd 106
NEW 2
retag 3
NA 29

Questions/Issues:
Need to standardize the encoding for the ‘live’ field: “NEW” to “retag”? What about “DEAD” vs. “dead”. According to LM these mean different things… *“nd” vs “NA” vs “missing”?

Browse scheme

br_scheme n
0 398
1 52
2 4
3 2
NA 666

Examine the records with Browse scheme = NA

Questions/Issues: Should NA mean dead stem? Should 0 == live, but occurence of browse/unbrowse (depnding on what you’re recording)

unbrowse scheme

Read in the 2001-2015 data and compare structure, field names, etc.

Need to figure out how these field names map to the ones in the 2017 data entered from field forms.

QA/QC checks: 2001-2015 data

Missing data

Number of unique values