Creation date: 2018 April 21

Updated: 2018 April 21

Introduction

This document provides basic QA/QC assessments for the willow production data updated with 2016 data entered from field sheets.

Analyses aim to characterize basic data structure and identify potential issues such as:

2016 Data import and initial inspection

The field “Record Id” has beed added to help indexing rows for QA/QC purposes. It’s a sequential index of rows. Use to look for rows in the original xlsx file…

There are 1217 records in the raw data set.

QA/QC checks: 2016

Questions/Issues:
1. We’ve got varying levels of “missingness” for various key fields (e.g., date, spp). Check field forms

Missing data: date

There are 9 records with a missing value for “date”.

Missing data: Stem ID

There are 15 records with a missing value for “stem_id”.

Missing data: Wildid

In the 2017 prod, this is “willid”. Also referened elsewhere as “plant”. We should make consistent. There are 18 records with a missing value for “wildid”.

Missing data: year

There are 9 records with a missing value for “year”.

Missing data: spp

There are 14 records with a missing value for “spp”.

Distinct spp codes

spp n
beb 360
boothi 325
drum 46
geyer 429
plantifolia 7
pseudo 36
NA 14

live

live n
dead 101
DEAD 69
live 862
live? 1
missing 68
nd 101
NEW 15

Questions/Issues:
Need to standardize the encoding for the ‘live’ field: If “dead” and “DEAD” are different, we should come up with a less ambiguous way to encode this.
What about “live” vs. “live?” ? *“nd” vs “NA” vs “missing”?

Browse scheme

There are 1155 records with a missing value for “br_scheme”.

Questions/Issues: Does NA mean ubrowsed?

unbrowse scheme

There are 280 records with a missing value for “ub_scheme”.