Hopefully this document will provide you with all you need to know to write OnCore discrepancy checks. If you have any questions about this document or about the OnCore discrepancy process, please contact Nathan Byers.

The location of the special Excel file

The file we will be using for creating discrepancy reports is located on the HG-Docs drive.

On the drive, locate the “OnCore Discrepancy Lists” folder and open it up.

Inside the “OnCore Discrepancy Lists” folder you’ll find an Excel file called “OnCore_Data_Discrepancy.xlsx”.

That’s where it’s at.

HG-Docs > OnCore Discrepancy Lists > OnCore_Data_Discrepancy.xlsx

The format of the special Excel file

The Excel file has many columns in it, and it may have more in the future. However, do not change the format of this file yourself! If you would like to change the format, please check with Nathan Byers or Eric Bailey first.

To create a new discrepancy check, start a new row and fill out the Date and User columns. After that, the most important columns to fill out are “Purpose”, the-protocol-columns, and “Filter”.

Purpose The Purpose column should be a short description of the discrepancy in OnCore that you would like to check. It should be descriptive enough for you to recognize what exactly is being checked, but not an entire paragraph. The text from the Purpose column will be printed in your discrepancy report and will help you identify which discrepancy is which.
The-protocol-columns Each protocol has its own column. Type a “yes” in the cell for each protocol that should be checked for a specific discrepancy. For a single discrepancy check, more than one protocol can have a “yes”.
Filter The Filter column is the place where you will logically describe the discrepancy check. Below you will learn how to write the filter, but it is important to understand that this text will be read by a machine that will feed the text into an automated process. Please use the format and syntax we describe below without any additional comments or Excel formatting.

Why the Excel file is so special

The idea behind this process is that anyone can add a discrepancy check to their discrepancy report without requesting IT to write it for them. Once you make a change to the file (i.e., add a new row, or modify the logic of an existing row), then the next discrepancy report will incorporate your change to the special file. There should be no need to contact IT and request a new discrepancy check, and you don’t have to wait on IT to write it.

The process we have developed to check OnCore for discrepancies depends on the special Excel file to be in the same location, have the same file name, and have the same format.

Significant changes to the file may break the automated process for creating the discrepancy reports. The only changes that should be made to the file are the creation of new rows that follow the format that is described in this document. If you have questions about the formatting or about how exactly this automated process works, please contact Nathan Byers.

How to write filters for discrepancies

The Filter column of the Excel file should follow the following general format:

[field_name] logical_comparison [field_name]

and the result of the logical comparison should be TRUE for the records you would like to see in your report.

A specific example would be the Accession Date before Collection Date discrepancy:

[accession_date] < [collection_date_time]

This means that any specimen where the accession date is before the collection date will show up in your discrepancy report.

A field_name is the name of a column in a table in the OnCore Oracle database. It is not the same as the label in the OnCore website. As the example above shows, the label “Accession Date” in the OnCore website is not exactly the same as the field name in the OnCore database (accession_date).

To find the field name you need, you can do a key word search in the Searchable Table below.

Searchable Table

If you do a search for the words “Accession Date” you will see several fields in the DATABASE_FIELD column. The options are “ACCESSION_DATE”, “ACCESSION_DATE_TIME”, and “SPECIMEN_ACCESSION_DATE”. In this case, it’s not obvious that these choices will give you different values, so let’s assume you just choose the “ACCESSION_DATE” value from the DATABSE_FIELD column. The beginning of the text that you will put in the Filter column of the special Excel file would be

[ACCESSION_DATE]

You take the value from the DATABASE_FIELD column in the Searchable Table and place it inside square brackets. The case does not matter, so you could have typed [accession_date] or [Accession_Date]. All that matters is that you have the right letters and you include the underscore, _.

The next thing you need to add for the Accession Date before Collection Date discrepancy is the “before” part. If you think of dates as numbers, then the value for a date that happens first should be smaller than a date that happens second. So we could say that the first date is less than the second date. To build the filter we need to supply a symbol that means less than, which is the < character.

[ACCESSION_DATE] <

Below is a Logical Table that includes all of the symbols you will need to write filters for discrepancies.

Logical Table

Operator	Description	Example
<	less than	[accession_date] < [collection_date_time]
<=	less than or equal to	[accession_date] <= [collection_date_time]
>	greater than	[specimen_quantity] > 2.0
>=	greater than or equal to	[specimen_quantity] >= 2.0
==	exactly equal to	[specimen_status] == ‘Available’
!=	not equal to	[frozen_date] != [collection_date_time]
x \| y	x OR y	[specimen_type] == ‘Plasma’ \| [specimen_type] == ‘Serum’
x & y	x AND y	[specimen_type] == ‘Plasma’ & [specimen_quantity] > 2.0
x %in% y	x IN y	[collection_group] %in% c(‘RNA’, ‘DNA’)

The last thing you need to add for the Accession Date before Collection Date discrepancy is the Collection Date. In the Searchable Table you’ll find one field name, COLLECTION_DATE_TIME. So that completes our example:

[ACCESSION_DATE] < [COLLECTION_DATE_TIME]

Table names

When you look up a field name in the Searchable Table there might be more than one Database Table that has that field. For example, searching for “Date Frozen” returns 7 tables that have a “DATE_FROZEN” field, 3 tables that have a “FROZEN_DATE” field, and 1 table that has a “FROZEN_DATE_TIME” field. If more than one table in the database contains the same field, then simply typing the name of the field in your filter will result in using the table that is labeled “Default” in the “DEFAULT_TABLE” column of the Searchable Table.

For example, if you choose to use “DATE_FROZEN” and write it like this in your filter,

[DATE_FROZEN]

then the field will be taken from the “SV_AN_SPC_BSM_PBMC_001” table. This won’t be a good choice if you want your filter to look at all specimens. To specify which table to pull the field from, use the format [table_name.field_name] in your filter. In this case it would probably be best to use the “SV_AN_SPC_BSM_ALL_SPECIMEN_FOR” table, so it should be typed as [SV_AN_SPC_BSM_ALL_SPECIMEN_FOR.DATE_FROZEN]. A complete discrepancy filter might look like this:

[sv_an_spc_bsm_all_specimen_for.frozen_date] != [collection_date_time]

Again, the result of the logical comparison should be true for all records that you want to see in your discrepancy report. So, in this case, all specimen where the frozen date didn’t equal the collection date would show up in your discrepancy report.

Note on Annotations: If there is no table name next to a field in the Searchable Table, then that means the field is actually an annotation. It is not technically a column in a table in the Oracle database.

Not New

There is a column in the special Excel file named “Not_New”. This should be filled out as “TRUE” or “FALSE”.

A “TRUE” value means that your discrepancy check will only be checking records that do not have a Specimen Status of “New” and do not have Reason Destroyed values of “Not Filled” or “Extra Label”.

A “FALSE” value in the “Not_New” column means that there will be no pre-filtering of the records based on Specimen Status or Reason Destroyed.

Special formats and functions

The text in the Filter column of the special Excel file needs to follow formatting rules beyond field names and logical operators.

Characters As you can see in the examples of the Logical Table, a character value (letters and words) must be wrapped in single quotes, [specimen_type] == 'Plasma'.
Numbers Numbers should not have any quotes, [specimen_quantity] > 2.0.
Dates If you want to provide a specific date, you need to use the format YYYY-MM-DD in single quotes and place the date inside the parenthesis of as.Date(), which is a function (see below). For example, if you wanted to check records where the birth date of the subject is after January 1, 1998, then you would type [birth_date] > as.Date('1998-01-01').
Missing Values If you want to indicate when something doesn’t have a value in the database, use NA with no quotes. For example, [specimen_type] == 'Nucleic Acid' & [nucleic_acid_derivative] == NA will return records where there is a value of Nucleic Acid for specimen type but there is no value for nucleic acid derivative (i.e., is it DNA or RNA?). Or, you can use the is.na() function. See the Function Table below.
A Group of Values If you would like to reference a whole group of values, place them inside of the c() function (see the Function Table below). For example, if you wanted to know if the collection group was either RNA or DNA, instead of writing two == comparisons (like [collection_group] == 'RNA' | [collection_group] == 'DNA') you can use the %in% comparison with the group of values c('RNA', 'DNA'). So the filter would be [collection_group] %in% c('RNA', 'DNA').

Function Table

Function	Description	Example
as.Date(‘YYYY-MM-DD’)	As a date	`[birth_date] > as.Date('1998-01-01')`
grepl(‘text’, field_name)	‘text’ IN field_name	`grepl('90plus', [patient_id])`
is.na(field_name)	Is the value missing?	`is.na([nucleic_acid_derivative])`
c( )	A collection of values	`[collection_group] %in% c('RNA', 'DNA')`

( ) and order of operations

Logical comparisons inside of parentheses will be evaluated first. For example, suppose we had the following values for nucleic acid derivative, specimen quanity, and specimen unit of measurement.

SPECIMEN_NO	NUCLEIC_ACID_DERIVATIVE	SPECIMEN_QUANTITY	SPECIMEN_UOM
123	miRNA	1.0	ng

We might want to write a filter that checked to see, if NUCLEIC_ACID_DERIVATIVE was “miRNA” or “RNA”, then is SPECIMEN_UOM not “ng”. If we write the following filter, with the first two logical comparisons grouped together inside the ( ), we’ll get a value of FALSE.

([NUCLEIC_ACID_DERIVATIVE] == 'miRNA' | [NUCLEIC_ACID_DERIVATIVE] == 'RNA') & [SPECIMEN_UOM] != 'ng'

## [1] FALSE

This is what we would want, because this record should not show up in our discrepancy report.

However, if we grouped the last two logical comparisons in the ( ), the result would be TRUE.

[NUCLEIC_ACID_DERIVATIVE] == 'miRNA' | ([NUCLEIC_ACID_DERIVATIVE] == 'RNA' & [SPECIMEN_UOM] != 'ng')

## [1] TRUE

That’s because the last two comparisons will be evaluated first. So the grouping would result in TRUE | (FALSE & FALSE) which would reduce to TRUE | TRUE and ultimately TRUE. The proper grouping would result in (TRUE | FALSE) & FALSE which would reduce to TRUE & FALSE and ultimately FALSE. See the Truth Table below for a review.

Truth Table

P	Q	P AND Q	P OR Q
TRUE	TRUE	TRUE	TRUE
TRUE	FALSE	FALSE	TRUE
FALSE	TRUE	FALSE	TRUE
FALSE	FALSE	FALSE	FALSE

Discrepancy Report Output

The discrepancy reports will be located in the HG-Data drive. Each protocol will have it’s own folder, and inside the protocol there will be a Discrepancies folder. So for the PPMI protocol, the discrepancy reports are located here:

HG-Data > MMGE-MJFF-PPMI-BIO > Discrepancies

Log file

Located in the same directory as the special Excel file is a .csv file named “OnCore_Discrepancy_Log.csv”.

HG-Docs > OnCore Discrepancy Lists > OnCore_Discrepancy_Log.csv

When a discrepancy check is run for a particular protocol and there is a problem with it, then information will be logged in this file. Below are descriptions of each column in the log file.

Time The date and time the discrepancy check was run. Records file will only remain in the log for 30 days.
Purpose The text in the “Purpose” column in the special Excel file, identifying which discrepancy is being checked.
Status The status of the check–possible values will be “Error”, “Warning”, and “Message”. If the check is being tested, then it can also have a status of “Valid” (see the Testing section below).
Status_Description Information about why there was a problem. If the Status is “Error”, then this column will tell you if there is an issue with the fields provided or a problem with the filter.
Filter The text in the “Filter” column in the special Excel file.

Records in the log file that are older than 30 days will automatically be deleted. You can also delete records manually if you want to. Just don’t change the format of the log file. Specifically, don’t add columns or change column names.

Testing

If you don’t want to wait for your test to be run on Saturday morning before you find out if it works or not, you can have it tested sooner. Simply type something in the Test column of the special Excel file. You can type “yes” or “YES” or “test”, it doesn’t matter. If there is something in that column for your test, then it will be run nightly and the status will be recorded in the log file (described above). The actually discrepancy report is not written. Your filter is run just to see if it works.

If the status is “Error” in the log file, then you can go and change your filter and find out the next day if it worked. Once you get a status of “Valid” for your test in the log file, then you know that the test worked. However, if you don’t delete the value in the Test column of the special Excel file, your discrepancy report won’t be written on Saturday morning. So make sure to delete the value in the Test column when the status in the log file is “Valid”.

OnCore Discrepancy Documentation

Nathan Byers

February 16, 2016