Creation date: 2018 April 20

Updated: 2018 April 21

Introduction

This document provides basic QA/QC tools for the Spring 2017 willow utilization data.

Analyses aim to characterize basic data structure and identify potential issues such as:

2017 Spring data import and initial inspection

The field “Record Id” has beed added to help indexing rows for QA/QC purposes. It’s a sequential index of rows.

There are 1285 records in the raw data set.

Date: potential missing data or entry errors

A total of 195 records are have an NA or incomplete (e.g., only year) date.

## Warning: 39 failed to parse.

QA/QC checks: 2017

Questions/Issues:
1. We’ve got varying levels of “missingness” for various key fields (e.g., date, plant, stid, live). Check field forms

Missing data: stid

There are 6 records with a missing value for “stid”.
There are 1279 distinct stid values.

Missing data: plant

There are 6 records with a missing value for “plant”. Should this be renamed to “willid”?

Missing data: live

There are 23 records with a missing value for “live”.

live n
dead 113
DEAD 68
live 947
missing 42
nd 87
new 3
NEW 2
NA 23

Questions/Issues:
Need to standardize the encoding for the ‘live’ field:
“NEW” to “retag”?
“NEW” vs “new”? “DEAD” vs “dead”?
*“nd” vs “NA” vs “missing”?

Browse scheme

br_sch n
0 433
1 451
2 31
3 11
4 5
5 4
6 3
NA 347

*Examine the records with Browse scheme = NA

Questions/Issues: Are NA correct?

unbrowse scheme

Not necessarily something to change, but the scheme fields are names differently between the production and utilization data sets. For example: ub_scheme in prod, ubr_sch in utilization. ubr_sch of 120: Is this correct? Some of the other scheme codes seem really big…