Datamart QA Tool

What is the Datamart?

The DV Datamart includes data elements surrounding an instance of DV and its movement through the criminal justice system (e.g. incident, arrest, case review, sentencing, and disposition).
It’s generated by DOTComm, a municipal agency.
We use it to help the City / Women’s Fund generate statistics for various grant reports throughout the year and for analyses more generally (e.g. the State of DV Report published through the Women’s Fund).
We receive a new iteration of the Datamart four times a year for these purposes.

What is this tool? Why is it helpful?

This tool is designed to help identify suspect characteristics of received Datamarts which may indicate something went awry during the Datamart’s generation.
For example, if a Datamart contains an unusual amount of missing data, this tool should convey that.
This is helpful in that much of the Datamart’s generation happens behind closed doors and has historically been fraught with problems (e.g. we may receive a Datamart intended to represent one time period when it actually represents another).
This QA tool is designed to spot potential red flags before we devote time into working with a wonky Datamart.
Additionally, this tool should shed some light on in what ways a particular Datamart might be problematic.

Check 1: Does the new Datamart have the same features as previous Datamarts?

Results:

## [1] "Newest iteration of the Datamart has expected number of features and feature names"

Results:

Results:

Results:

Results:

Results:

Results:

#Areas for Improvement

Incorporate more quantifiable consistency & variance checks.

Currently, consistency between Datamarts is primarily explored and displayed visually. Future versions should incorporate checks that look for specific quantifiable conditions (e.g. does a new Datamart deviate more than X standard deviations from previous Datamarts?)
These sort of hard rules should be worked into existing visualizations. For example, if a quantifiable red flag is tripped (e.g. a new Datamart value varies significantly from previous versions) the plot should clearly indicate the viewer that this is going on. I think this process would involve incorporating conditional statements into plotting scripts, which is something I’ll need to look into.

Automate the Datamart input / update process

Currently, to test a new iteration of the Datamart I’d have to manually adjust the script to incorporate it.
Ideally, it would be great if I could just place the new Datamart in the appropriate folder in Dropbox and the script takes it from there.

Anticipate what a QA tool would need to look like in the proposed overhauled Datamart

Use this small project as a way to think about what a good QA tool will look like for the overhauled Datamart we are proposing