Utilization 2017 QAQC

Creation date: 2018 April 20

Updated: 2018 April 21

Introduction

This document provides basic QA/QC tools for the Spring 2017 willow utilization data.

Analyses aim to characterize basic data structure and identify potential issues such as:

Inconsistently named/typed factors
Missing values
Data values outside of expected range or showing unusual patterns

2017 Spring data import and initial inspection

The field “Record Id” has beed added to help indexing rows for QA/QC purposes. It’s a sequential index of rows.

There are 1285 records in the raw data set.

Date: potential missing data or entry errors

A total of 195 records are have an NA or incomplete (e.g., only year) date.

## Warning: 39 failed to parse.

QA/QC checks: 2017

Questions/Issues:
1. We’ve got varying levels of “missingness” for various key fields (e.g., date, plant, stid, live). Check field forms

Missing data: stid

There are 6 records with a missing value for “stid”.
There are 1279 distinct stid values.

Missing data: plant

There are 6 records with a missing value for “plant”. Should this be renamed to “willid”?

Missing data: live

There are 23 records with a missing value for “live”.

live	n
dead	113
DEAD	68
live	947
missing	42
nd	87
new	3
NEW	2
NA	23

Questions/Issues:
Need to standardize the encoding for the ‘live’ field:
“NEW” to “retag”?
“NEW” vs “new”? “DEAD” vs “dead”?
*“nd” vs “NA” vs “missing”?

Browse scheme

br_sch	n
0	433
1	451
2	31
3	11
4	5
5	4
6	3
NA	347

*Examine the records with Browse scheme = NA

Questions/Issues: Are NA correct?

unbrowse scheme

Not necessarily something to change, but the scheme fields are names differently between the production and utilization data sets. For example: ub_scheme in prod, ubr_sch in utilization. ubr_sch of 120: Is this correct? Some of the other scheme codes seem really big…