DRAFT Notes: Clinical Data Science Specialization

Identifying Patient Populations

February 2021: Google BigQuery R Rstudio RMarkdown Tidyverse MIMIC-III.

library(tidyverse)
htmltools::includeHTML("ID_Patient_Pops_wk5_PhenotypingEvaluation_VY_.html")

Develop a Computational Phenotyping Algorithm to Identify Patients with Hypertension

[VY edit ver 20210128 2219]

This Peer-graded Assignment for Week 5 is the final project for the University of Colorado System course “Identifying Patient Populations” on Coursera.

Instructions
This assignment will be your opportunity to pull all the tools and techniques you have learned in this course and apply it to the real-world problem of identifying patients with hypertension.

In order to create this computational phenotyping algorithm and accomplish this task you will:
* Test two or more individual data types
* Apply two or more manipulations of individual data types
* Create two or more combinations of data types
* Provide 2x2 tables and evaluation metrics (sensitivity/specificity/PPV/NPV) for all algorithms tested.
* Choose and justify the selection of a final computational phenotyping algorithm.

Review criteria
You will be expected to submit one document: A presentation slide deck

You will be assessed on:
1. Whether you tested two or more individual data types to identify patients with hypertension.
2. Whether you applied two or more manipulations of individual data types to identify patients with hypertension.
3. Whether you created two or more combinations of data types identify patients with hypertension.
4. Whether you provided 2x2 tables and evaluation metrics (sensitivity / specificity / PPV / NPV) for all algorithms tested.
5. Whether you chose a final computational phenotyping algorithm and justified that choice based on algorithm performance, complexity, and portability.

Step-By-Step Assignment Instructionss
Training Data:
Gold standard data from manual record review is available on Google BigQuery in the “course3_data.hypertension_goldstandard” table.

Clinical Background Information:
What is Hypertension?
Systolic BP >= 140 mmHg on more than two occasions
Diastolic BP>= 90 mmHg on more than two occasions

How is Hypertension billed?
Multiple Codes for Essential Hypertension:
401.0 Malignant
401.1 Benign
401.9 Unspecified
Even more codes for other kinds of hypertension (e.g., hypertension in certain areas/vessels of the body)

How is Hypertension Treated?
In addition to diet and exercise there are a number of blood pressure lowering drugs available. Many of these drugs have uses beyond treating hypertension and in fact may be used for non-hypertensive indications more frequently than hypertension. A quasi exhaustive list of these therapies as they are referred to by MIMIC records can be found on GBQ in course3_data. D_ANTIHYPERTENSIVES.

Setup
Set up environment by loading the packages, creating the connection to Google BigQuery.

library(tidyverse)

## Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
## when loading 'dplyr'

## Warning: package 'tibble' was built under R version 4.0.3

library(magrittr)
library(bigrquery)
library(caret)
library(stringr)

con <- DBI::dbConnect(drv = bigquery(),
                      project = "learnclinicaldatascience")

The caret package in R has a function called confusionMatrix(), and here is LW’s wrapper function to simplify the 2x2 tables later.

## getStats(df, predicted, reference)
getStats <- function(df, ...){
  df %>%
    select_(.dots = lazyeval::lazy_dots(...)) %>%
    mutate_all(funs(factor(., levels = c(1,0)))) %>% 
    table() %>% 
    confusionMatrix()
}

Creating Training and Testing Populations

Let’s read in the gold standard file, and create a random 80/20 training/testing split.

hypertension <- tbl(con, "course3_data.hypertension_goldstandard")
hypertension %>%
  summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
            max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
            mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
            sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
            nd_subj_id = n_distinct(SUBJECT_ID))

set.seed(5)
training <- hypertension %>% 
  collect() %>% 
  sample_n(80)
training %>%
  summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
            max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
            mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
            sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
            pct_hyper = sum(HYPERTENSION, na.rm = TRUE)/n_distinct(SUBJECT_ID),
            nd_subj_id = n_distinct(SUBJECT_ID))

testing <- hypertension %>% 
  collect() %>% 
  filter(!SUBJECT_ID %in% training$SUBJECT_ID)
testing %>%
  summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
            max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
            mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
            sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
            pct_hyper = sum(HYPERTENSION, na.rm = TRUE)/n_distinct(SUBJECT_ID),
            nd_subj_id = n_distinct(SUBJECT_ID))

training1 <- add_column(training, test_flag =0)
testing1 <- add_column(testing, test_flag =1)
goldhyper <- union_all(training1,testing1)
rm(training1)
rm(testing1)
goldhyper %>%
  summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
            max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
            mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
            sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
            pct_hyper = sum(HYPERTENSION, na.rm = TRUE)/n_distinct(SUBJECT_ID),
            nd_subj_id = n_distinct(SUBJECT_ID),
            sum_test_flag = sum(test_flag, na.rm = TRUE))

goldhyper

Evaluate Tables and Fields

Blood Pressure

Looking into sources for Systolic and Diastolic Blood Pressure readings, we use the following query:

SELECT * FROM mimic3_demo.D_ITEMS
  WHERE LOWER(LABEL) LIKE "%systolic%" OR LOWER(LABEL) LIKE "%diastolic%"

The MIMIC sample database has 53 items, but most are geared towards arterial, noninvasive, or other methods reflecting the ICU orientation of the data from metavision and carevue sources. To enfore some standardization, I manually reviewed these items for the more “basic” types of blood pressure readings and narrowed these to 12 types (6 each):

ITEMID  LABEL   DBSOURCE
8440    Manual BP [Diastolic]   carevue
8502    BP Cuff [Diastolic] carevue
8503    BP Left Arm [Diastolic] carevue
8506    BP Right Arm [Diastolic]    carevue
224643  Manual Blood Pressure Diastolic Left    metavision
227242  Manual Blood Pressure Diastolic Right   metavision
442 Manual BP [Systolic]    carevue
3313    BP Cuff [Systolic]  carevue
3315    BP Left Arm [Systolic]  carevue
3321    BP Right Arm [Systolic] carevue
224167  Manual Blood Pressure Systolic Left metavision
227243  Manual Blood Pressure Systolic Right    metavision

Using this specific subset of BP measurement codes against the CHARTEVENTS table, only 14 rows came up (7 of each) – for 3 unique patients.

SELECT *, charte.VALUENUM AS  DIASTOLIC_NUM
  FROM mimic3_demo.D_ITEMS ditem
  JOIN mimic3_demo.CHARTEVENTS charte
  ON ditem.ITEMID = charte.ITEMID
    WHERE ditem.ITEMID IN (8440,8502,8503,8506,224643,227242) AND NOT charte.VALUENUM IS NULL
UNION ALL
SELECT *, charte.VALUENUM AS  SYSTOLIC_NUM
  FROM mimic3_demo.D_ITEMS ditem
  JOIN mimic3_demo.CHARTEVENTS charte
  ON ditem.ITEMID = charte.ITEMID
    WHERE ditem.ITEMID IN (442,3313,3315,3321,224167,227243) AND NOT charte.VALUENUM IS NULL

That we do not see a lot of traditional BP measurements reflects the ICU / Hospital Admission flavor of the data. Blood pressure is not useful for this exercise.

chartevents <- tbl(con, "mimic3_demo.CHARTEVENTS")
hyper_bp <- chartevents %>% 
  filter(ITEMID %in% c(442,3313,3315,3321,224167,227243,
                          8440,8502,8503,8506,224643,227242)) %>% 
  mutate(hyper_bp_systolic_cnt  = case_when(ITEMID %in%
                          c(442,3313,3315,3321,224167,227243)
                          && VALUENUM >= 140 ~ 1, TRUE ~ 0),                                    hyper_bp_diastolic_cnt = case_when(ITEMID %in%
                          c(8440,8502,8503,8506,224643,227242)
                          && VALUENUM >=  90 ~ 1, TRUE ~ 0)) %>% 
  group_by(SUBJECT_ID) %>% 
  summarise(hyper_bp_systolic_sum  = sum(hyper_bp_systolic_cnt,  na.rm = TRUE),
            hyper_bp_diastolic_sum = sum(hyper_bp_diastolic_cnt, na.rm = TRUE )) %>% 
  mutate(hyper_bp_systolic_min3  = case_when(hyper_bp_systolic_sum  >= 3 ~ 1,
                                    TRUE ~ 0),
         hyper_bp_diastolic_min3 = case_when(hyper_bp_diastolic_sum >= 3 ~ 1,
                                    TRUE ~ 0),
         hyper_bp                = case_when((hyper_bp_systolic_min3 >0) |
                                              (hyper_bp_diastolic_min3 >0) ~ 1,
                                    TRUE ~ 0)) %>% 
  select(SUBJECT_ID, hyper_bp)
hyper_bp

count(hyper_bp)

str(training)

## tibble [80 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID  : int [1:80] 42302 40503 10059 44212 10027 40177 43870 10043 10045 10013 ...
##  $ HYPERTENSION: int [1:80] 1 1 1 1 1 1 1 0 0 1 ...

str(hyper_bp)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "select"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "summarise"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "group_by"
##   .. .. .. ..$ x   :List of 4
##   .. .. .. .. ..$ name: chr "select"
##   .. .. .. .. ..$ x   :List of 4
##   .. .. .. .. .. ..$ name: chr "filter"
##   .. .. .. .. .. ..$ x   :List of 2
##   .. .. .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.CHARTEVENTS"
##   .. .. .. .. .. .. ..$ vars: chr [1:15] "ROW_ID" "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID" ...
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. .. .. ..$ dots:List of 1
##   .. .. .. .. .. .. ..$ : language ~ITEMID %in% c(442, 3313, 3315, 3321, 224167, 227243, 8440, 8502, 8503,      8506, 224643, 227242)
##   .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000022925770> 
##   .. .. .. .. .. ..$ args: list()
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. .. .. ..$ dots: list()
##   .. .. .. .. ..$ args:List of 1
##   .. .. .. .. .. ..$ vars:List of 17
##   .. .. .. .. .. .. ..$ ROW_ID                : symbol ROW_ID
##   .. .. .. .. .. .. ..$ SUBJECT_ID            : symbol SUBJECT_ID
##   .. .. .. .. .. .. ..$ HADM_ID               : symbol HADM_ID
##   .. .. .. .. .. .. ..$ ICUSTAY_ID            : symbol ICUSTAY_ID
##   .. .. .. .. .. .. ..$ ITEMID                : symbol ITEMID
##   .. .. .. .. .. .. ..$ CHARTTIME             : symbol CHARTTIME
##   .. .. .. .. .. .. ..$ STORETIME             : symbol STORETIME
##   .. .. .. .. .. .. ..$ CGID                  : symbol CGID
##   .. .. .. .. .. .. ..$ VALUE                 : symbol VALUE
##   .. .. .. .. .. .. ..$ VALUENUM              : symbol VALUENUM
##   .. .. .. .. .. .. ..$ VALUEUOM              : symbol VALUEUOM
##   .. .. .. .. .. .. ..$ WARNING               : symbol WARNING
##   .. .. .. .. .. .. ..$ ERROR                 : symbol ERROR
##   .. .. .. .. .. .. ..$ RESULTSTATUS          : symbol RESULTSTATUS
##   .. .. .. .. .. .. ..$ STOPPED               : symbol STOPPED
##   .. .. .. .. .. .. ..$ hyper_bp_systolic_cnt : language ~case_when(ITEMID %in% c(442, 3313, 3315, 3321, 224167, 227243) && VALUENUM >=      140 ~ 1, TRUE ~ 0)
##   .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000228d1290> 
##   .. .. .. .. .. .. ..$ hyper_bp_diastolic_cnt: language ~case_when(ITEMID %in% c(8440, 8502, 8503, 8506, 224643, 227242) && VALUENUM >=      90 ~ 1, TRUE ~ 0)
##   .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000228d1290> 
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. .. .. ..$ dots:List of 1
##   .. .. .. .. ..$ SUBJECT_ID: symbol SUBJECT_ID
##   .. .. .. ..$ args:List of 1
##   .. .. .. .. ..$ add: logi FALSE
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
##   .. .. ..$ dots:List of 2
##   .. .. .. ..$ hyper_bp_systolic_sum : language ~sum(hyper_bp_systolic_cnt, na.rm = TRUE)
##   .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227cccb0> 
##   .. .. .. ..$ hyper_bp_diastolic_sum: language ~sum(hyper_bp_diastolic_cnt, na.rm = TRUE)
##   .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227cccb0> 
##   .. .. ..$ args: list()
##   .. .. ..- attr(*, "class")= chr [1:3] "op_summarise" "op_single" "op"
##   .. ..$ dots: list()
##   .. ..$ args:List of 1
##   .. .. ..$ vars:List of 5
##   .. .. .. ..$ SUBJECT_ID             : symbol SUBJECT_ID
##   .. .. .. ..$ hyper_bp_systolic_sum  : symbol hyper_bp_systolic_sum
##   .. .. .. ..$ hyper_bp_diastolic_sum : symbol hyper_bp_diastolic_sum
##   .. .. .. ..$ hyper_bp_systolic_min3 : language ~case_when(hyper_bp_systolic_sum >= 3 ~ 1, TRUE ~ 0)
##   .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227b4738> 
##   .. .. .. ..$ hyper_bp_diastolic_min3: language ~case_when(hyper_bp_diastolic_sum >= 3 ~ 1, TRUE ~ 0)
##   .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227b4738> 
##   .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 2
##   .. .. ..$ SUBJECT_ID: symbol SUBJECT_ID
##   .. .. ..$ hyper_bp  : language ~case_when((hyper_bp_systolic_min3 > 0) | (hyper_bp_diastolic_min3 > 0) ~      1, TRUE ~ 0)
##   .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227b4738> 
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyper_bp <- as_tibble(hyper_bp)
str(hyper_bp)

## tibble [3 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID: int [1:3] 42066 40601 10045
##  $ hyper_bp  : num [1:3] 0 0 0

hyper_bp

hyper_bp_cf <- training %>% 
  left_join(hyper_bp) %>% 
  mutate(hyper_bp = coalesce(hyper_bp, 0)) %>% 
  collect() %>% 
  getStats(hyper_bp, HYPERTENSION)

## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

hyper_bp_cf

## Confusion Matrix and Statistics
## 
##         HYPERTENSION
## hyper_bp  1  0
##        1  0  0
##        0 48 32
##                                          
##                Accuracy : 0.4            
##                  95% CI : (0.292, 0.5156)
##     No Information Rate : 0.6            
##     P-Value [Acc > NIR] : 0.9999         
##                                          
##                   Kappa : 0              
##                                          
##  Mcnemar's Test P-Value : 1.17e-11       
##                                          
##             Sensitivity : 0.0            
##             Specificity : 1.0            
##          Pos Pred Value : NaN            
##          Neg Pred Value : 0.4            
##              Prevalence : 0.6            
##          Detection Rate : 0.0            
##    Detection Prevalence : 0.0            
##       Balanced Accuracy : 0.5            
##                                          
##        'Positive' Class : 1              
##

fourfoldplot(hyper_bp_cf$table)

## Warning in sqrt(odds(tab)$or): NaNs produced

#export ConfusionMatrix numbers output to tibble row
str(hyper_bp_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 0 48 0 32
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyper_bp    : chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.4 0 0.292 0.516 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0 1 NaN 0.4 NA 0 NA 0.6 0 0 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyper_bp_cfa <- as.data.frame(as.matrix(hyper_bp_cf, what = "overall"))
hyper_bp_cfa$row_names <- row.names(as.matrix(hyper_bp_cf, what = "overall"))
hyper_bp_cfa[nrow(hyper_bp_cfa) + 1,] = c("hyper_bp","Data_Type")
row.names(hyper_bp_cfa) <- str_replace_all(hyper_bp_cfa$row_names," ","_")
hyper_bp_cfa <- as.data.frame(t(hyper_bp_cfa))
hyper_bp_cfa <- hyper_bp_cfa[1,c(8,1:7)]
row.names(hyper_bp_cfa) <- NULL
str(hyper_bp_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyper_bp"
##  $ Accuracy      : chr "0.4"
##  $ Kappa         : chr "0"
##  $ AccuracyLower : chr "0.29200935625531"
##  $ AccuracyUpper : chr "0.51562305950451"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.999899723274404"
##  $ McnemarPValue : chr "1.17002131170662e-11"

hyper_bp_cfa

hyper_bp_cfs <- as.data.frame(as.matrix(hyper_bp_cf, what = "classes"))
hyper_bp_cfs$row_names <- row.names(as.matrix(hyper_bp_cf, what = "classes"))
hyper_bp_cfs[nrow(hyper_bp_cfs) + 1,] = c("hyper_bp","Data_Type")
row.names(hyper_bp_cfs) <- str_replace_all(hyper_bp_cfs$row_names," ","_")
hyper_bp_cfs <- as.data.frame(t(hyper_bp_cfs))
hyper_bp_cfs <- hyper_bp_cfs[1,c(12,1:11)]
row.names(hyper_bp_cfs) <- NULL
str(hyper_bp_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyper_bp"
##  $ Sensitivity         : chr "0"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "NaN"
##  $ Neg_Pred_Value      : chr "0.4"
##  $ Precision           : chr NA
##  $ Recall              : chr "0"
##  $ F1                  : chr NA
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0"
##  $ Detection_Prevalence: chr "0"
##  $ Balanced_Accuracy   : chr "0.5"

hyper_bp_cfs

hyper_bp_cfa <- hyper_bp_cfa %>% 
  inner_join(hyper_bp_cfs, by = "Data_Type") 
rm(hyper_bp_cfs)
str(hyper_bp_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyper_bp"
##  $ Accuracy            : chr "0.4"
##  $ Kappa               : chr "0"
##  $ AccuracyLower       : chr "0.29200935625531"
##  $ AccuracyUpper       : chr "0.51562305950451"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.999899723274404"
##  $ McnemarPValue       : chr "1.17002131170662e-11"
##  $ Sensitivity         : chr "0"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "NaN"
##  $ Neg_Pred_Value      : chr "0.4"
##  $ Precision           : chr NA
##  $ Recall              : chr "0"
##  $ F1                  : chr NA
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0"
##  $ Detection_Prevalence: chr "0"
##  $ Balanced_Accuracy   : chr "0.5"

hyper_bp_cfa

knitr::kable(t(hyper_bp_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyper_bp
Accuracy	0.4
Sensitivity	0
Specificity	1
Pos_Pred_Value	NaN
Neg_Pred_Value	0.4

phenotypes_cf <- hyper_bp_cfa

ICD9 Codes

Checking to see if there were more ICD9 codes that should be included beyond the 3 provided for essentail hypertension:

SELECT * FROM mimic3_demo.D_ICD_DIAGNOSES
  WHERE LOWER(long_title) LIKE "%hypertension%"

We see 61 ICD9 codes, but the other descriptions refer to other types of hypertension (pulmonary, renovascular, venous, pregnancy-related, without –, etc. ). Let’s go with the 3 originally stated:

ROW_ID  ICD9_CODE   SHORT_TITLE LONG_TITLE
ICD9_CODE   SHORT_TITLE LONG_TITLE
4010    Malignant hypertension  Malignant essential hypertension
4011    Benign hypertension Benign essential hypertension
4019    Hypertension NOS    Unspecified essential hypertension

ICD9_CODE will be one of our data types for this exercise. Any of the 3 ICD9 codes: 401.0 Malignant 401.1 Benign 401.9 Unspecified

Any of 3 ICD codes

diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_any <- diagnoses_icd %>% 
  filter(ICD9_CODE %in% c("4010","4011","4019")) %>% 
  distinct(SUBJECT_ID) %>% 
  mutate(hyp_icd_any = 1)
str(hyp_icd_any)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "distinct"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "select"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "filter"
##   .. .. .. ..$ x   :List of 2
##   .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
##   .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. ..$ dots:List of 1
##   .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4010", "4011", "4019")
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000280835a0> 
##   .. .. .. ..$ args: list()
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. ..$ dots: list()
##   .. .. ..$ args:List of 1
##   .. .. .. ..$ vars:List of 1
##   .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000028090f48> 
##   .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. ..$ dots: list()
##   .. ..$ args: list()
##   .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 2
##   .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
##   .. .. ..$ hyp_icd_any: language ~1
##   .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_icd_any

hyp_icd_any <- as_tibble(hyp_icd_any)
str(hyp_icd_any)

## tibble [38 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID : int [1:38] 10026 10089 10106 10017 10019 10027 10033 43827 44083 10038 ...
##  $ hyp_icd_any: num [1:38] 1 1 1 1 1 1 1 1 1 1 ...

hyp_icd_any

hyp_icd_any_cf <- training %>% 
  left_join(hyp_icd_any) %>% 
  mutate(hyp_icd_any = coalesce(hyp_icd_any, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_any, HYPERTENSION)
hyp_icd_any_cf

## Confusion Matrix and Statistics
## 
##            HYPERTENSION
## hyp_icd_any  1  0
##           1 29  2
##           0 19 30
##                                           
##                Accuracy : 0.7375          
##                  95% CI : (0.6271, 0.8296)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.0071799       
##                                           
##                   Kappa : 0.4976          
##                                           
##  Mcnemar's Test P-Value : 0.0004803       
##                                           
##             Sensitivity : 0.6042          
##             Specificity : 0.9375          
##          Pos Pred Value : 0.9355          
##          Neg Pred Value : 0.6122          
##              Prevalence : 0.6000          
##          Detection Rate : 0.3625          
##    Detection Prevalence : 0.3875          
##       Balanced Accuracy : 0.7708          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd_any_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_any_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 29 19 2 30
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_any : chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.738 0.498 0.627 0.83 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.604 0.938 0.935 0.612 0.935 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_any_cfa <- as.data.frame(as.matrix(hyp_icd_any_cf, what = "overall"))
hyp_icd_any_cfa$row_names <- row.names(as.matrix(hyp_icd_any_cf, what = "overall"))
hyp_icd_any_cfa[nrow(hyp_icd_any_cfa) + 1,] = c("hyp_icd_any","Data_Type")
row.names(hyp_icd_any_cfa) <- str_replace_all(hyp_icd_any_cfa$row_names," ","_")
hyp_icd_any_cfa <- as.data.frame(t(hyp_icd_any_cfa))
hyp_icd_any_cfa <- hyp_icd_any_cfa[1,c(8,1:7)]
row.names(hyp_icd_any_cfa) <- NULL
str(hyp_icd_any_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_any"
##  $ Accuracy      : chr "0.7375"
##  $ Kappa         : chr "0.497607655502392"
##  $ AccuracyLower : chr "0.627149169787568"
##  $ AccuracyUpper : chr "0.829590748899592"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.00717992545660734"
##  $ McnemarPValue : chr "0.000480341199981871"

hyp_icd_any_cfa

hyp_icd_any_cfs <- as.data.frame(as.matrix(hyp_icd_any_cf, what = "classes"))
hyp_icd_any_cfs$row_names <- row.names(as.matrix(hyp_icd_any_cf, what = "classes"))
hyp_icd_any_cfs[nrow(hyp_icd_any_cfs) + 1,] = c("hyp_icd_any","Data_Type")
row.names(hyp_icd_any_cfs) <- str_replace_all(hyp_icd_any_cfs$row_names," ","_")
hyp_icd_any_cfs <- as.data.frame(t(hyp_icd_any_cfs))
hyp_icd_any_cfs <- hyp_icd_any_cfs[1,c(12,1:11)]
row.names(hyp_icd_any_cfs) <- NULL
str(hyp_icd_any_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_any"
##  $ Sensitivity         : chr "0.604166666666667"
##  $ Specificity         : chr "0.9375"
##  $ Pos_Pred_Value      : chr "0.935483870967742"
##  $ Neg_Pred_Value      : chr "0.612244897959184"
##  $ Precision           : chr "0.935483870967742"
##  $ Recall              : chr "0.604166666666667"
##  $ F1                  : chr "0.734177215189873"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.3625"
##  $ Detection_Prevalence: chr "0.3875"
##  $ Balanced_Accuracy   : chr "0.770833333333333"

hyp_icd_any_cfs

hyp_icd_any_cfa <- hyp_icd_any_cfa %>% 
  inner_join(hyp_icd_any_cfs, by = "Data_Type") 
rm(hyp_icd_any_cfs)
str(hyp_icd_any_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_any"
##  $ Accuracy            : chr "0.7375"
##  $ Kappa               : chr "0.497607655502392"
##  $ AccuracyLower       : chr "0.627149169787568"
##  $ AccuracyUpper       : chr "0.829590748899592"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.00717992545660734"
##  $ McnemarPValue       : chr "0.000480341199981871"
##  $ Sensitivity         : chr "0.604166666666667"
##  $ Specificity         : chr "0.9375"
##  $ Pos_Pred_Value      : chr "0.935483870967742"
##  $ Neg_Pred_Value      : chr "0.612244897959184"
##  $ Precision           : chr "0.935483870967742"
##  $ Recall              : chr "0.604166666666667"
##  $ F1                  : chr "0.734177215189873"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.3625"
##  $ Detection_Prevalence: chr "0.3875"
##  $ Balanced_Accuracy   : chr "0.770833333333333"

hyp_icd_any_cfa

knitr::kable(t(hyp_icd_any_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_any
Accuracy	0.7375
Sensitivity	0.604166666666667
Specificity	0.9375
Pos_Pred_Value	0.935483870967742
Neg_Pred_Value	0.612244897959184

phenotypes_cf <- union(phenotypes_cf,hyp_icd_any_cfa)

ICD9_CODE 401.0 Malignant

diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4010 <- diagnoses_icd %>% 
  filter(ICD9_CODE %in% c("4010")) %>% 
  distinct(SUBJECT_ID) %>% 
  mutate(hyp_icd_4010 = 1)
str(hyp_icd_4010)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "distinct"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "select"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "filter"
##   .. .. .. ..$ x   :List of 2
##   .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
##   .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. ..$ dots:List of 1
##   .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4010")
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000211500b8> 
##   .. .. .. ..$ args: list()
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. ..$ dots: list()
##   .. .. ..$ args:List of 1
##   .. .. .. ..$ vars:List of 1
##   .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000021142550> 
##   .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. ..$ dots: list()
##   .. ..$ args: list()
##   .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 2
##   .. .. ..$ SUBJECT_ID  : symbol SUBJECT_ID
##   .. .. ..$ hyp_icd_4010: language ~1
##   .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_icd_4010

hyp_icd_4010 <- as_tibble(hyp_icd_4010)
str(hyp_icd_4010)

## tibble [1 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID  : int 10026
##  $ hyp_icd_4010: num 1

hyp_icd_4010

hyp_icd_4010_cf <- training %>% 
  left_join(hyp_icd_4010) %>% 
  mutate(hyp_icd_4010 = coalesce(hyp_icd_4010, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_4010, HYPERTENSION)
hyp_icd_4010_cf

## Confusion Matrix and Statistics
## 
##             HYPERTENSION
## hyp_icd_4010  1  0
##            1  0  0
##            0 48 32
##                                          
##                Accuracy : 0.4            
##                  95% CI : (0.292, 0.5156)
##     No Information Rate : 0.6            
##     P-Value [Acc > NIR] : 0.9999         
##                                          
##                   Kappa : 0              
##                                          
##  Mcnemar's Test P-Value : 1.17e-11       
##                                          
##             Sensitivity : 0.0            
##             Specificity : 1.0            
##          Pos Pred Value : NaN            
##          Neg Pred Value : 0.4            
##              Prevalence : 0.6            
##          Detection Rate : 0.0            
##    Detection Prevalence : 0.0            
##       Balanced Accuracy : 0.5            
##                                          
##        'Positive' Class : 1              
##

fourfoldplot(hyp_icd_4010_cf$table)

## Warning in sqrt(odds(tab)$or): NaNs produced

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4010_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 0 48 0 32
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_4010: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.4 0 0.292 0.516 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0 1 NaN 0.4 NA 0 NA 0.6 0 0 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_4010_cfa <- as.data.frame(as.matrix(hyp_icd_4010_cf, what = "overall"))
hyp_icd_4010_cfa$row_names <- row.names(as.matrix(hyp_icd_4010_cf, what = "overall"))
hyp_icd_4010_cfa[nrow(hyp_icd_4010_cfa) + 1,] = c("hyp_icd_4010","Data_Type")
row.names(hyp_icd_4010_cfa) <- str_replace_all(hyp_icd_4010_cfa$row_names," ","_")
hyp_icd_4010_cfa <- as.data.frame(t(hyp_icd_4010_cfa))
hyp_icd_4010_cfa <- hyp_icd_4010_cfa[1,c(8,1:7)]
row.names(hyp_icd_4010_cfa) <- NULL
str(hyp_icd_4010_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_4010"
##  $ Accuracy      : chr "0.4"
##  $ Kappa         : chr "0"
##  $ AccuracyLower : chr "0.29200935625531"
##  $ AccuracyUpper : chr "0.51562305950451"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.999899723274404"
##  $ McnemarPValue : chr "1.17002131170662e-11"

hyp_icd_4010_cfa

hyp_icd_4010_cfs <- as.data.frame(as.matrix(hyp_icd_4010_cf, what = "classes"))
hyp_icd_4010_cfs$row_names <- row.names(as.matrix(hyp_icd_4010_cf, what = "classes"))
hyp_icd_4010_cfs[nrow(hyp_icd_4010_cfs) + 1,] = c("hyp_icd_4010","Data_Type")
row.names(hyp_icd_4010_cfs) <- str_replace_all(hyp_icd_4010_cfs$row_names," ","_")
hyp_icd_4010_cfs <- as.data.frame(t(hyp_icd_4010_cfs))
hyp_icd_4010_cfs <- hyp_icd_4010_cfs[1,c(12,1:11)]
row.names(hyp_icd_4010_cfs) <- NULL
str(hyp_icd_4010_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_4010"
##  $ Sensitivity         : chr "0"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "NaN"
##  $ Neg_Pred_Value      : chr "0.4"
##  $ Precision           : chr NA
##  $ Recall              : chr "0"
##  $ F1                  : chr NA
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0"
##  $ Detection_Prevalence: chr "0"
##  $ Balanced_Accuracy   : chr "0.5"

hyp_icd_4010_cfs

hyp_icd_4010_cfa <- hyp_icd_4010_cfa %>% 
  inner_join(hyp_icd_4010_cfs, by = "Data_Type") 
rm(hyp_icd_4010_cfs)
str(hyp_icd_4010_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_4010"
##  $ Accuracy            : chr "0.4"
##  $ Kappa               : chr "0"
##  $ AccuracyLower       : chr "0.29200935625531"
##  $ AccuracyUpper       : chr "0.51562305950451"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.999899723274404"
##  $ McnemarPValue       : chr "1.17002131170662e-11"
##  $ Sensitivity         : chr "0"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "NaN"
##  $ Neg_Pred_Value      : chr "0.4"
##  $ Precision           : chr NA
##  $ Recall              : chr "0"
##  $ F1                  : chr NA
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0"
##  $ Detection_Prevalence: chr "0"
##  $ Balanced_Accuracy   : chr "0.5"

hyp_icd_4010_cfa

knitr::kable(t(hyp_icd_4010_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_4010
Accuracy	0.4
Sensitivity	0
Specificity	1
Pos_Pred_Value	NaN
Neg_Pred_Value	0.4

phenotypes_cf <- union(phenotypes_cf,hyp_icd_4010_cfa)

ICD9_CODE 401.1 Benign

diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4011 <- diagnoses_icd %>% 
  filter(ICD9_CODE %in% c("4011")) %>% 
  distinct(SUBJECT_ID) %>% 
  mutate(hyp_icd_4011 = 1)
str(hyp_icd_4011)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "distinct"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "select"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "filter"
##   .. .. .. ..$ x   :List of 2
##   .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
##   .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. ..$ dots:List of 1
##   .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4011")
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000254164d0> 
##   .. .. .. ..$ args: list()
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. ..$ dots: list()
##   .. .. ..$ args:List of 1
##   .. .. .. ..$ vars:List of 1
##   .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000025426850> 
##   .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. ..$ dots: list()
##   .. ..$ args: list()
##   .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 2
##   .. .. ..$ SUBJECT_ID  : symbol SUBJECT_ID
##   .. .. ..$ hyp_icd_4011: language ~1
##   .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_icd_4011

hyp_icd_4011 <- as_tibble(hyp_icd_4011)
str(hyp_icd_4011)

## tibble [1 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID  : int 40595
##  $ hyp_icd_4011: num 1

hyp_icd_4011

hyp_icd_4011_cf <- training %>% 
  left_join(hyp_icd_4011) %>% 
  mutate(hyp_icd_4011 = coalesce(hyp_icd_4011, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_4011, HYPERTENSION)
hyp_icd_4011_cf

## Confusion Matrix and Statistics
## 
##             HYPERTENSION
## hyp_icd_4011  1  0
##            1  1  0
##            0 47 32
##                                           
##                Accuracy : 0.4125          
##                  95% CI : (0.3035, 0.5282)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.9998          
##                                           
##                   Kappa : 0.0167          
##                                           
##  Mcnemar's Test P-Value : 1.949e-11       
##                                           
##             Sensitivity : 0.02083         
##             Specificity : 1.00000         
##          Pos Pred Value : 1.00000         
##          Neg Pred Value : 0.40506         
##              Prevalence : 0.60000         
##          Detection Rate : 0.01250         
##    Detection Prevalence : 0.01250         
##       Balanced Accuracy : 0.51042         
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd_4011_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4011_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 1 47 0 32
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_4011: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.4125 0.0167 0.3035 0.5282 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.0208 1 1 0.4051 1 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_4011_cfa <- as.data.frame(as.matrix(hyp_icd_4011_cf, what = "overall"))
hyp_icd_4011_cfa$row_names <- row.names(as.matrix(hyp_icd_4011_cf, what = "overall"))
hyp_icd_4011_cfa[nrow(hyp_icd_4011_cfa) + 1,] = c("hyp_icd_4011","Data_Type")
row.names(hyp_icd_4011_cfa) <- str_replace_all(hyp_icd_4011_cfa$row_names," ","_")
hyp_icd_4011_cfa <- as.data.frame(t(hyp_icd_4011_cfa))
hyp_icd_4011_cfa <- hyp_icd_4011_cfa[1,c(8,1:7)]
row.names(hyp_icd_4011_cfa) <- NULL
str(hyp_icd_4011_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_4011"
##  $ Accuracy      : chr "0.4125"
##  $ Kappa         : chr "0.0167364016736401"
##  $ AccuracyLower : chr "0.303524592937594"
##  $ AccuracyUpper : chr "0.528151659574796"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.999761568218371"
##  $ McnemarPValue : chr "1.94905225611693e-11"

hyp_icd_4011_cfa

hyp_icd_4011_cfs <- as.data.frame(as.matrix(hyp_icd_4011_cf, what = "classes"))
hyp_icd_4011_cfs$row_names <- row.names(as.matrix(hyp_icd_4011_cf, what = "classes"))
hyp_icd_4011_cfs[nrow(hyp_icd_4011_cfs) + 1,] = c("hyp_icd_4011","Data_Type")
row.names(hyp_icd_4011_cfs) <- str_replace_all(hyp_icd_4011_cfs$row_names," ","_")
hyp_icd_4011_cfs <- as.data.frame(t(hyp_icd_4011_cfs))
hyp_icd_4011_cfs <- hyp_icd_4011_cfs[1,c(12,1:11)]
row.names(hyp_icd_4011_cfs) <- NULL
str(hyp_icd_4011_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_4011"
##  $ Sensitivity         : chr "0.0208333333333333"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "1"
##  $ Neg_Pred_Value      : chr "0.405063291139241"
##  $ Precision           : chr "1"
##  $ Recall              : chr "0.0208333333333333"
##  $ F1                  : chr "0.0408163265306122"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.0125"
##  $ Detection_Prevalence: chr "0.0125"
##  $ Balanced_Accuracy   : chr "0.510416666666667"

hyp_icd_4011_cfs

hyp_icd_4011_cfa <- hyp_icd_4011_cfa %>% 
  inner_join(hyp_icd_4011_cfs, by = "Data_Type") 
rm(hyp_icd_4011_cfs)
str(hyp_icd_4011_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_4011"
##  $ Accuracy            : chr "0.4125"
##  $ Kappa               : chr "0.0167364016736401"
##  $ AccuracyLower       : chr "0.303524592937594"
##  $ AccuracyUpper       : chr "0.528151659574796"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.999761568218371"
##  $ McnemarPValue       : chr "1.94905225611693e-11"
##  $ Sensitivity         : chr "0.0208333333333333"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "1"
##  $ Neg_Pred_Value      : chr "0.405063291139241"
##  $ Precision           : chr "1"
##  $ Recall              : chr "0.0208333333333333"
##  $ F1                  : chr "0.0408163265306122"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.0125"
##  $ Detection_Prevalence: chr "0.0125"
##  $ Balanced_Accuracy   : chr "0.510416666666667"

hyp_icd_4011_cfa

knitr::kable(t(hyp_icd_4011_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_4011
Accuracy	0.4125
Sensitivity	0.0208333333333333
Specificity	1
Pos_Pred_Value	1
Neg_Pred_Value	0.405063291139241

phenotypes_cf <- union(phenotypes_cf,hyp_icd_4011_cfa)

ICD9_CODE 401.9 Unspecified

diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4019 <- diagnoses_icd %>% 
  filter(ICD9_CODE %in% c("4019")) %>% 
  distinct(SUBJECT_ID) %>% 
  mutate(hyp_icd_4019 = 1)
str(hyp_icd_4019)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "distinct"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "select"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "filter"
##   .. .. .. ..$ x   :List of 2
##   .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
##   .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. ..$ dots:List of 1
##   .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4019")
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000278f54a0> 
##   .. .. .. ..$ args: list()
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. ..$ dots: list()
##   .. .. ..$ args:List of 1
##   .. .. .. ..$ vars:List of 1
##   .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000279039a8> 
##   .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. ..$ dots: list()
##   .. ..$ args: list()
##   .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 2
##   .. .. ..$ SUBJECT_ID  : symbol SUBJECT_ID
##   .. .. ..$ hyp_icd_4019: language ~1
##   .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_icd_4019

hyp_icd_4019 <- as_tibble(hyp_icd_4019)
str(hyp_icd_4019)

## tibble [36 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID  : int [1:36] 10089 10106 10017 10019 10027 10033 43827 44083 10038 40503 ...
##  $ hyp_icd_4019: num [1:36] 1 1 1 1 1 1 1 1 1 1 ...

hyp_icd_4019

hyp_icd_4019_cf <- training %>% 
  left_join(hyp_icd_4019) %>% 
  mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_4019, HYPERTENSION)
hyp_icd_4019_cf

## Confusion Matrix and Statistics
## 
##             HYPERTENSION
## hyp_icd_4019  1  0
##            1 28  2
##            0 20 30
##                                          
##                Accuracy : 0.725          
##                  95% CI : (0.6138, 0.819)
##     No Information Rate : 0.6            
##     P-Value [Acc > NIR] : 0.0136496      
##                                          
##                   Kappa : 0.4762         
##                                          
##  Mcnemar's Test P-Value : 0.0002896      
##                                          
##             Sensitivity : 0.5833         
##             Specificity : 0.9375         
##          Pos Pred Value : 0.9333         
##          Neg Pred Value : 0.6000         
##              Prevalence : 0.6000         
##          Detection Rate : 0.3500         
##    Detection Prevalence : 0.3750         
##       Balanced Accuracy : 0.7604         
##                                          
##        'Positive' Class : 1              
##

fourfoldplot(hyp_icd_4019_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4019_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 28 20 2 30
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_4019: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.725 0.476 0.614 0.819 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.583 0.938 0.933 0.6 0.933 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_4019_cfa <- as.data.frame(as.matrix(hyp_icd_4019_cf, what = "overall"))
hyp_icd_4019_cfa$row_names <- row.names(as.matrix(hyp_icd_4019_cf, what = "overall"))
hyp_icd_4019_cfa[nrow(hyp_icd_4019_cfa) + 1,] = c("hyp_icd_4019","Data_Type")
row.names(hyp_icd_4019_cfa) <- str_replace_all(hyp_icd_4019_cfa$row_names," ","_")
hyp_icd_4019_cfa <- as.data.frame(t(hyp_icd_4019_cfa))
hyp_icd_4019_cfa <- hyp_icd_4019_cfa[1,c(8,1:7)]
row.names(hyp_icd_4019_cfa) <- NULL
str(hyp_icd_4019_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_4019"
##  $ Accuracy      : chr "0.725"
##  $ Kappa         : chr "0.476190476190476"
##  $ AccuracyLower : chr "0.613756579155591"
##  $ AccuracyUpper : chr "0.818962361442961"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.013649591659318"
##  $ McnemarPValue : chr "0.000289619409755552"

hyp_icd_4019_cfa

hyp_icd_4019_cfs <- as.data.frame(as.matrix(hyp_icd_4019_cf, what = "classes"))
hyp_icd_4019_cfs$row_names <- row.names(as.matrix(hyp_icd_4019_cf, what = "classes"))
hyp_icd_4019_cfs[nrow(hyp_icd_4019_cfs) + 1,] = c("hyp_icd_4019","Data_Type")
row.names(hyp_icd_4019_cfs) <- str_replace_all(hyp_icd_4019_cfs$row_names," ","_")
hyp_icd_4019_cfs <- as.data.frame(t(hyp_icd_4019_cfs))
hyp_icd_4019_cfs <- hyp_icd_4019_cfs[1,c(12,1:11)]
row.names(hyp_icd_4019_cfs) <- NULL
str(hyp_icd_4019_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_4019"
##  $ Sensitivity         : chr "0.583333333333333"
##  $ Specificity         : chr "0.9375"
##  $ Pos_Pred_Value      : chr "0.933333333333333"
##  $ Neg_Pred_Value      : chr "0.6"
##  $ Precision           : chr "0.933333333333333"
##  $ Recall              : chr "0.583333333333333"
##  $ F1                  : chr "0.717948717948718"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.35"
##  $ Detection_Prevalence: chr "0.375"
##  $ Balanced_Accuracy   : chr "0.760416666666667"

hyp_icd_4019_cfs

hyp_icd_4019_cfa <- hyp_icd_4019_cfa %>% 
  inner_join(hyp_icd_4019_cfs, by = "Data_Type") 
rm(hyp_icd_4019_cfs)
str(hyp_icd_4019_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_4019"
##  $ Accuracy            : chr "0.725"
##  $ Kappa               : chr "0.476190476190476"
##  $ AccuracyLower       : chr "0.613756579155591"
##  $ AccuracyUpper       : chr "0.818962361442961"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.013649591659318"
##  $ McnemarPValue       : chr "0.000289619409755552"
##  $ Sensitivity         : chr "0.583333333333333"
##  $ Specificity         : chr "0.9375"
##  $ Pos_Pred_Value      : chr "0.933333333333333"
##  $ Neg_Pred_Value      : chr "0.6"
##  $ Precision           : chr "0.933333333333333"
##  $ Recall              : chr "0.583333333333333"
##  $ F1                  : chr "0.717948717948718"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.35"
##  $ Detection_Prevalence: chr "0.375"
##  $ Balanced_Accuracy   : chr "0.760416666666667"

hyp_icd_4019_cfa

knitr::kable(t(hyp_icd_4019_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_4019
Accuracy	0.725
Sensitivity	0.583333333333333
Specificity	0.9375
Pos_Pred_Value	0.933333333333333
Neg_Pred_Value	0.6

phenotypes_cf <- union(phenotypes_cf,hyp_icd_4019_cfa)

ICD9_CODE 401.9 Unspecified, minimum 2 instances

diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4019_min2 <- diagnoses_icd %>% 
  filter(ICD9_CODE %in% c("4019")) %>% 
#  distinct(SUBJECT_ID) %>% 
  mutate(hyp_icd_4019 = 1) %>% 
  group_by(SUBJECT_ID) %>% 
  summarise(hyp_icd_4019_count = sum(hyp_icd_4019, na.rm = TRUE)) %>% 
  mutate(hyp_icd_4019_min2 = case_when(hyp_icd_4019_count >= 2 ~ 1,
                                    TRUE ~ 0)) %>% 
  select(SUBJECT_ID, hyp_icd_4019_min2)

str(hyp_icd_4019_min2)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "summarise"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "group_by"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "select"
##   .. .. .. ..$ x   :List of 4
##   .. .. .. .. ..$ name: chr "filter"
##   .. .. .. .. ..$ x   :List of 2
##   .. .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
##   .. .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. .. ..$ dots:List of 1
##   .. .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4019")
##   .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x000000002275a8d8> 
##   .. .. .. .. ..$ args: list()
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. .. ..$ dots: list()
##   .. .. .. ..$ args:List of 1
##   .. .. .. .. ..$ vars:List of 6
##   .. .. .. .. .. ..$ ROW_ID      : symbol ROW_ID
##   .. .. .. .. .. ..$ SUBJECT_ID  : symbol SUBJECT_ID
##   .. .. .. .. .. ..$ HADM_ID     : symbol HADM_ID
##   .. .. .. .. .. ..$ SEQ_NUM     : symbol SEQ_NUM
##   .. .. .. .. .. ..$ ICD9_CODE   : symbol ICD9_CODE
##   .. .. .. .. .. ..$ hyp_icd_4019: language ~1
##   .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. .. ..$ dots:List of 1
##   .. .. .. ..$ SUBJECT_ID: symbol SUBJECT_ID
##   .. .. ..$ args:List of 1
##   .. .. .. ..$ add: logi FALSE
##   .. .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
##   .. ..$ dots:List of 1
##   .. .. ..$ hyp_icd_4019_count: language ~sum(hyp_icd_4019, na.rm = TRUE)
##   .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000224ba0e8> 
##   .. ..$ args: list()
##   .. ..- attr(*, "class")= chr [1:3] "op_summarise" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 2
##   .. .. ..$ SUBJECT_ID       : symbol SUBJECT_ID
##   .. .. ..$ hyp_icd_4019_min2: language ~case_when(hyp_icd_4019_count >= 2 ~ 1, TRUE ~ 0)
##   .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000224a9ba8> 
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_icd_4019_min2

hyp_icd_4019_min2 <- as_tibble(hyp_icd_4019_min2)
str(hyp_icd_4019_min2)

## tibble [36 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID       : int [1:36] 10089 10106 10017 10019 10027 10033 43827 44083 10038 40503 ...
##  $ hyp_icd_4019_min2: num [1:36] 0 0 0 0 0 0 0 1 0 0 ...

hyp_icd_4019_min2

hyp_icd_4019_min2_cf <- training %>% 
  left_join(hyp_icd_4019_min2) %>% 
  mutate(hyp_icd_4019_min2 = coalesce(hyp_icd_4019_min2, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_4019_min2, HYPERTENSION)
hyp_icd_4019_min2_cf

## Confusion Matrix and Statistics
## 
##                  HYPERTENSION
## hyp_icd_4019_min2  1  0
##                 1  5  0
##                 0 43 32
##                                           
##                Accuracy : 0.4625          
##                  95% CI : (0.3503, 0.5776)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.9953          
##                                           
##                   Kappa : 0.0851          
##                                           
##  Mcnemar's Test P-Value : 1.504e-10       
##                                           
##             Sensitivity : 0.1042          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.4267          
##              Prevalence : 0.6000          
##          Detection Rate : 0.0625          
##    Detection Prevalence : 0.0625          
##       Balanced Accuracy : 0.5521          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd_4019_min2_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4019_min2_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 5 43 0 32
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_4019_min2: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION     : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.4625 0.0851 0.3503 0.5776 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.104 1 1 0.427 1 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_4019_min2_cfa <- as.data.frame(as.matrix(hyp_icd_4019_min2_cf, what = "overall"))
hyp_icd_4019_min2_cfa$row_names <- row.names(as.matrix(hyp_icd_4019_min2_cf, what = "overall"))
hyp_icd_4019_min2_cfa[nrow(hyp_icd_4019_min2_cfa) + 1,] = c("hyp_icd_4019_min2","Data_Type")
row.names(hyp_icd_4019_min2_cfa) <- str_replace_all(hyp_icd_4019_min2_cfa$row_names," ","_")
hyp_icd_4019_min2_cfa <- as.data.frame(t(hyp_icd_4019_min2_cfa))
hyp_icd_4019_min2_cfa <- hyp_icd_4019_min2_cfa[1,c(8,1:7)]
row.names(hyp_icd_4019_min2_cfa) <- NULL
str(hyp_icd_4019_min2_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_4019_min2"
##  $ Accuracy      : chr "0.4625"
##  $ Kappa         : chr "0.0851063829787235"
##  $ AccuracyLower : chr "0.350276426538398"
##  $ AccuracyUpper : chr "0.577582471995166"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.995292579337281"
##  $ McnemarPValue : chr "1.50428571598827e-10"

hyp_icd_4019_min2_cfa

hyp_icd_4019_min2_cfs <- as.data.frame(as.matrix(hyp_icd_4019_min2_cf, what = "classes"))
hyp_icd_4019_min2_cfs$row_names <- row.names(as.matrix(hyp_icd_4019_min2_cf, what = "classes"))
hyp_icd_4019_min2_cfs[nrow(hyp_icd_4019_min2_cfs) + 1,] = c("hyp_icd_4019_min2","Data_Type")
row.names(hyp_icd_4019_min2_cfs) <- str_replace_all(hyp_icd_4019_min2_cfs$row_names," ","_")
hyp_icd_4019_min2_cfs <- as.data.frame(t(hyp_icd_4019_min2_cfs))
hyp_icd_4019_min2_cfs <- hyp_icd_4019_min2_cfs[1,c(12,1:11)]
row.names(hyp_icd_4019_min2_cfs) <- NULL
str(hyp_icd_4019_min2_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_4019_min2"
##  $ Sensitivity         : chr "0.104166666666667"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "1"
##  $ Neg_Pred_Value      : chr "0.426666666666667"
##  $ Precision           : chr "1"
##  $ Recall              : chr "0.104166666666667"
##  $ F1                  : chr "0.188679245283019"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.0625"
##  $ Detection_Prevalence: chr "0.0625"
##  $ Balanced_Accuracy   : chr "0.552083333333333"

hyp_icd_4019_min2_cfs

hyp_icd_4019_min2_cfa <- hyp_icd_4019_min2_cfa %>% 
  inner_join(hyp_icd_4019_min2_cfs, by = "Data_Type") 
rm(hyp_icd_4019_min2_cfs)
str(hyp_icd_4019_min2_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_4019_min2"
##  $ Accuracy            : chr "0.4625"
##  $ Kappa               : chr "0.0851063829787235"
##  $ AccuracyLower       : chr "0.350276426538398"
##  $ AccuracyUpper       : chr "0.577582471995166"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.995292579337281"
##  $ McnemarPValue       : chr "1.50428571598827e-10"
##  $ Sensitivity         : chr "0.104166666666667"
##  $ Specificity         : chr "1"
##  $ Pos_Pred_Value      : chr "1"
##  $ Neg_Pred_Value      : chr "0.426666666666667"
##  $ Precision           : chr "1"
##  $ Recall              : chr "0.104166666666667"
##  $ F1                  : chr "0.188679245283019"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.0625"
##  $ Detection_Prevalence: chr "0.0625"
##  $ Balanced_Accuracy   : chr "0.552083333333333"

hyp_icd_4019_min2_cfa

knitr::kable(t(hyp_icd_4019_min2_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_4019_min2
Accuracy	0.4625
Sensitivity	0.104166666666667
Specificity	1
Pos_Pred_Value	1
Neg_Pred_Value	0.426666666666667

phenotypes_cf <- union(phenotypes_cf,hyp_icd_4019_min2_cfa)

Medications

A list of 167 text values representing antihypertensive drugs were provided. Best efforts to match this drug list against the PRESCRIPTIONS table results in 108 names left over (107 if we performed a secondary match).

Matches

  SELECT DISTINCT *
  FROM course3_data.D_ANTIHYPERTENSIVES antihyp
  JOIN mimic3_demo.PRESCRIPTIONS presc
  ON antihyp.DRUG = presc.DRUG
=709 rows

    SELECT DISTINCT *
  FROM course3_data.D_ANTIHYPERTENSIVES antihyp
  JOIN mimic3_demo.PRESCRIPTIONS presc
  ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
  ORDER BY 1, 2, 3, 4, 5, 6, 10, 11
=1395 rows

WITH temp1 AS
(  SELECT antihyp.DRUG
  FROM course3_data.D_ANTIHYPERTENSIVES antihyp
  LEFT JOIN mimic3_demo.PRESCRIPTIONS presc
  ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
    WHERE ROW_ID IS NULL)
 SELECT * FROM temp1 
  LEFT JOIN mimic3_demo.PRESCRIPTIONS presc2
  ON TRIM(LOWER(temp1.DRUG)) = TRIM(LOWER(presc2.DRUG_NAME_GENERIC))
  WHERE NOT ROW_ID IS NULL
UNION ALL 
  SELECT DISTINCT *
  FROM course3_data.D_ANTIHYPERTENSIVES antihyp
  JOIN mimic3_demo.PRESCRIPTIONS presc
  ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
  ORDER BY 1, 2, 3, 4, 5, 6, 10, 11
=1396 rows

Leftovers

SELECT antihyp.DRUG
  FROM course3_data.D_ANTIHYPERTENSIVES antihyp
  LEFT JOIN mimic3_demo.PRESCRIPTIONS presc
  ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
  WHERE ROW_ID IS NULL
=108 rows

WITH temp1 AS
(  SELECT antihyp.DRUG
  FROM course3_data.D_ANTIHYPERTENSIVES antihyp
  LEFT JOIN mimic3_demo.PRESCRIPTIONS presc
  ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
    WHERE ROW_ID IS NULL)
 SELECT * FROM temp1 
  LEFT JOIN mimic3_demo.PRESCRIPTIONS presc2
  ON TRIM(LOWER(temp1.DRUG)) = TRIM(LOWER(presc2.DRUG_NAME_GENERIC))
  WHERE ROW_ID IS NULL
=107 rows

List of DRUG names that were not matched in the PRESCIPTIONS table, using as-is text match then using cleaned-up versions of DRUG field for text match:

prescriptions <- tbl(con, "mimic3_demo.PRESCRIPTIONS")
antihypdrgs <- tbl(con, "course3_data.D_ANTIHYPERTENSIVES")

hyp_rxmiss <- antihypdrgs %>% 
  left_join(prescriptions) %>% 
  filter(is.null(ROW_ID)) %>% 
  group_by(DRUG) %>%  
  mutate(drug1 = str_trim(str_to_lower(DRUG))) %>% 
  select(drug1) %>% 
  arrange(drug1)
str(hyp_rxmiss)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "arrange"
##   ..$ x   :List of 4
##   .. ..$ name: chr "select"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "group_by"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "filter"
##   .. .. .. ..$ x   :List of 4
##   .. .. .. .. ..$ name: chr "join"
##   .. .. .. .. ..$ x   :List of 2
##   .. .. .. .. .. ..$ src:List of 2
##   .. .. .. .. .. .. ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. .. .. .. .. .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. .. ..@ dataset       : NULL
##   .. .. .. .. .. .. .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. .. .. .. .. .. .. ..@ page_size     : int 10000
##   .. .. .. .. .. .. .. .. ..@ quiet         : logi NA
##   .. .. .. .. .. .. .. .. ..@ bigint        : chr "integer"
##   .. .. .. .. .. .. ..$ disco: NULL
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##   .. .. .. .. .. ..$ ops:List of 2
##   .. .. .. .. .. .. ..$ x   : 'ident' chr "course3_data.D_ANTIHYPERTENSIVES"
##   .. .. .. .. .. .. ..$ vars: chr "DRUG"
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
##   .. .. .. .. ..$ y   :List of 2
##   .. .. .. .. .. ..$ src:List of 2
##   .. .. .. .. .. .. ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. .. .. .. .. .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. .. ..@ dataset       : NULL
##   .. .. .. .. .. .. .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. .. .. .. .. .. .. ..@ page_size     : int 10000
##   .. .. .. .. .. .. .. .. ..@ quiet         : logi NA
##   .. .. .. .. .. .. .. .. ..@ bigint        : chr "integer"
##   .. .. .. .. .. .. ..$ disco: NULL
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##   .. .. .. .. .. ..$ ops:List of 2
##   .. .. .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.PRESCRIPTIONS"
##   .. .. .. .. .. .. ..$ vars: chr [1:19] "ROW_ID" "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID" ...
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
##   .. .. .. .. ..$ args:List of 4
##   .. .. .. .. .. ..$ vars  :List of 3
##   .. .. .. .. .. .. ..$ alias: chr [1:19] "DRUG" "ROW_ID" "SUBJECT_ID" "HADM_ID" ...
##   .. .. .. .. .. .. ..$ x    : chr [1:19] "DRUG" NA NA NA ...
##   .. .. .. .. .. .. ..$ y    : chr [1:19] NA "ROW_ID" "SUBJECT_ID" "HADM_ID" ...
##   .. .. .. .. .. ..$ type  : chr "left"
##   .. .. .. .. .. ..$ by    :List of 2
##   .. .. .. .. .. .. ..$ x: chr "DRUG"
##   .. .. .. .. .. .. ..$ y: chr "DRUG"
##   .. .. .. .. .. ..$ suffix: chr [1:2] ".x" ".y"
##   .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_join" "op_double" "op"
##   .. .. .. ..$ dots:List of 1
##   .. .. .. .. ..$ : language ~is.null(ROW_ID)
##   .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026a1cd90> 
##   .. .. .. ..$ args: list()
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. ..$ dots:List of 1
##   .. .. .. ..$ DRUG: symbol DRUG
##   .. .. ..$ args:List of 1
##   .. .. .. ..$ add: logi FALSE
##   .. .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
##   .. ..$ dots: list()
##   .. ..$ args:List of 1
##   .. .. ..$ vars:List of 2
##   .. .. .. ..$ DRUG : symbol DRUG
##   .. .. .. ..$ drug1: language ~str_trim(str_to_lower(DRUG))
##   .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026a4f058> 
##   .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   ..$ dots:List of 1
##   .. ..$ : language ~drug1
##   .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026d20e20> 
##   ..$ args:List of 1
##   .. ..$ .by_group: logi FALSE
##   ..- attr(*, "class")= chr [1:3] "op_arrange" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_rxmiss

prescriptions_clean <- prescriptions %>% 
  filter(!is.null(str_trim(str_to_lower(DRUG)))) %>% 
  mutate(drug1 = str_trim(str_to_lower(DRUG)))
prescriptions_clean

antihypdrgs_clean <- antihypdrgs %>% 
  filter(!is.null(str_trim(str_to_lower(DRUG)))) %>% 
  mutate(drug1 = str_trim(str_to_lower(DRUG)))
antihypdrgs_clean

hyp_rxmiss <- antihypdrgs_clean %>% 
  left_join(prescriptions_clean, by = "drug1") %>% 
  filter(is.null(ROW_ID)) %>% 
  group_by(drug1) %>%  
  select(drug1)
str(hyp_rxmiss)

## List of 2
##  $ src:List of 2
##   ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. ..@ dataset       : NULL
##   .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. ..@ page_size     : int 10000
##   .. .. ..@ quiet         : logi NA
##   .. .. ..@ bigint        : chr "integer"
##   ..$ disco: NULL
##   ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##  $ ops:List of 4
##   ..$ name: chr "select"
##   ..$ x   :List of 4
##   .. ..$ name: chr "group_by"
##   .. ..$ x   :List of 4
##   .. .. ..$ name: chr "filter"
##   .. .. ..$ x   :List of 4
##   .. .. .. ..$ name: chr "join"
##   .. .. .. ..$ x   :List of 2
##   .. .. .. .. ..$ src:List of 2
##   .. .. .. .. .. ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. .. .. .. .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. ..@ dataset       : NULL
##   .. .. .. .. .. .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. .. .. .. .. .. ..@ page_size     : int 10000
##   .. .. .. .. .. .. .. ..@ quiet         : logi NA
##   .. .. .. .. .. .. .. ..@ bigint        : chr "integer"
##   .. .. .. .. .. ..$ disco: NULL
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##   .. .. .. .. ..$ ops:List of 4
##   .. .. .. .. .. ..$ name: chr "select"
##   .. .. .. .. .. ..$ x   :List of 4
##   .. .. .. .. .. .. ..$ name: chr "filter"
##   .. .. .. .. .. .. ..$ x   :List of 2
##   .. .. .. .. .. .. .. ..$ x   : 'ident' chr "course3_data.D_ANTIHYPERTENSIVES"
##   .. .. .. .. .. .. .. ..$ vars: chr "DRUG"
##   .. .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. .. .. .. ..$ dots:List of 1
##   .. .. .. .. .. .. .. ..$ : language ~!is.null(str_trim(str_to_lower(DRUG)))
##   .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x000000002693de60> 
##   .. .. .. .. .. .. ..$ args: list()
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. .. .. .. ..$ dots: list()
##   .. .. .. .. .. ..$ args:List of 1
##   .. .. .. .. .. .. ..$ vars:List of 2
##   .. .. .. .. .. .. .. ..$ DRUG : symbol DRUG
##   .. .. .. .. .. .. .. ..$ drug1: language ~str_trim(str_to_lower(DRUG))
##   .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026953bc0> 
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
##   .. .. .. ..$ y   :List of 2
##   .. .. .. .. ..$ src:List of 2
##   .. .. .. .. .. ..$ con  :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
##   .. .. .. .. .. .. .. ..@ project       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. ..@ dataset       : NULL
##   .. .. .. .. .. .. .. ..@ billing       : chr "learnclinicaldatascience"
##   .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
##   .. .. .. .. .. .. .. ..@ page_size     : int 10000
##   .. .. .. .. .. .. .. ..@ quiet         : logi NA
##   .. .. .. .. .. .. .. ..@ bigint        : chr "integer"
##   .. .. .. .. .. ..$ disco: NULL
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
##   .. .. .. .. ..$ ops:List of 4
##   .. .. .. .. .. ..$ name: chr "select"
##   .. .. .. .. .. ..$ x   :List of 4
##   .. .. .. .. .. .. ..$ name: chr "filter"
##   .. .. .. .. .. .. ..$ x   :List of 2
##   .. .. .. .. .. .. .. ..$ x   : 'ident' chr "mimic3_demo.PRESCRIPTIONS"
##   .. .. .. .. .. .. .. ..$ vars: chr [1:19] "ROW_ID" "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID" ...
##   .. .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
##   .. .. .. .. .. .. ..$ dots:List of 1
##   .. .. .. .. .. .. .. ..$ : language ~!is.null(str_trim(str_to_lower(DRUG)))
##   .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000024191268> 
##   .. .. .. .. .. .. ..$ args: list()
##   .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. .. .. .. .. ..$ dots: list()
##   .. .. .. .. .. ..$ args:List of 1
##   .. .. .. .. .. .. ..$ vars:List of 20
##   .. .. .. .. .. .. .. ..$ ROW_ID           : symbol ROW_ID
##   .. .. .. .. .. .. .. ..$ SUBJECT_ID       : symbol SUBJECT_ID
##   .. .. .. .. .. .. .. ..$ HADM_ID          : symbol HADM_ID
##   .. .. .. .. .. .. .. ..$ ICUSTAY_ID       : symbol ICUSTAY_ID
##   .. .. .. .. .. .. .. ..$ STARTDATE        : symbol STARTDATE
##   .. .. .. .. .. .. .. ..$ ENDDATE          : symbol ENDDATE
##   .. .. .. .. .. .. .. ..$ DRUG_TYPE        : symbol DRUG_TYPE
##   .. .. .. .. .. .. .. ..$ DRUG             : symbol DRUG
##   .. .. .. .. .. .. .. ..$ DRUG_NAME_POE    : symbol DRUG_NAME_POE
##   .. .. .. .. .. .. .. ..$ DRUG_NAME_GENERIC: symbol DRUG_NAME_GENERIC
##   .. .. .. .. .. .. .. ..$ FORMULARY_DRUG_CD: symbol FORMULARY_DRUG_CD
##   .. .. .. .. .. .. .. ..$ GSN              : symbol GSN
##   .. .. .. .. .. .. .. ..$ NDC              : symbol NDC
##   .. .. .. .. .. .. .. ..$ PROD_STRENGTH    : symbol PROD_STRENGTH
##   .. .. .. .. .. .. .. ..$ DOSE_VAL_RX      : symbol DOSE_VAL_RX
##   .. .. .. .. .. .. .. ..$ DOSE_UNIT_RX     : symbol DOSE_UNIT_RX
##   .. .. .. .. .. .. .. ..$ FORM_VAL_DISP    : symbol FORM_VAL_DISP
##   .. .. .. .. .. .. .. ..$ FORM_UNIT_DISP   : symbol FORM_UNIT_DISP
##   .. .. .. .. .. .. .. ..$ ROUTE            : symbol ROUTE
##   .. .. .. .. .. .. .. ..$ drug1            : language ~str_trim(str_to_lower(DRUG))
##   .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000241b16c8> 
##   .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##   .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
##   .. .. .. ..$ args:List of 4
##   .. .. .. .. ..$ vars  :List of 3
##   .. .. .. .. .. ..$ alias: chr [1:21] "DRUG.x" "drug1" "ROW_ID" "SUBJECT_ID" ...
##   .. .. .. .. .. ..$ x    : chr [1:21] "DRUG" "drug1" NA NA ...
##   .. .. .. .. .. ..$ y    : chr [1:21] NA NA "ROW_ID" "SUBJECT_ID" ...
##   .. .. .. .. ..$ type  : chr "left"
##   .. .. .. .. ..$ by    :List of 2
##   .. .. .. .. .. ..$ x: chr "drug1"
##   .. .. .. .. .. ..$ y: chr "drug1"
##   .. .. .. .. ..$ suffix: chr [1:2] ".x" ".y"
##   .. .. .. ..- attr(*, "class")= chr [1:3] "op_join" "op_double" "op"
##   .. .. ..$ dots:List of 1
##   .. .. .. ..$ : language ~is.null(ROW_ID)
##   .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000294ead90> 
##   .. .. ..$ args: list()
##   .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
##   .. ..$ dots:List of 1
##   .. .. ..$ drug1: symbol drug1
##   .. ..$ args:List of 1
##   .. .. ..$ add: logi FALSE
##   .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
##   ..$ dots: list()
##   ..$ args:List of 1
##   .. ..$ vars:List of 1
##   .. .. ..$ drug1: symbol drug1
##   ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
##  - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...

hyp_rxmiss

Create a table of patients that have PRESCRIPTIONS matched to the list of hypertension DRUG names (using cleaned-up versions of DRUG field for text match):

prescriptions_clean <- prescriptions %>% 
  filter(!is.null(str_trim(str_to_lower(DRUG)))) %>% 
  mutate(drug1 = str_trim(str_to_lower(DRUG))) %>% 
  collect()
prescriptions_clean

antihypdrgs_clean <- antihypdrgs %>% 
  filter(!is.null(str_trim(str_to_lower(DRUG)))) %>% 
  mutate(drug1 = str_trim(str_to_lower(DRUG))) %>% 
  collect()
antihypdrgs_clean

hyp_rx_any <- prescriptions_clean %>% 
  inner_join(antihypdrgs_clean, by = "drug1") %>% 
  distinct(SUBJECT_ID) %>% 
  mutate(hyp_rx_any = 1L)
str(hyp_rx_any)

## tibble [77 x 2] (S3: tbl_df/tbl/data.frame)
##  $ SUBJECT_ID: int [1:77] 42281 10059 10124 42346 41983 42199 44083 10088 10061 40601 ...
##  $ hyp_rx_any: int [1:77] 1 1 1 1 1 1 1 1 1 1 ...

hyp_rx_any

hyp_rx_any_cf <- training %>% 
  left_join(hyp_rx_any) %>% 
  mutate(hyp_rx_any = coalesce(hyp_rx_any, 0)) %>% 
  collect() %>% 
  getStats(hyp_rx_any, HYPERTENSION) 
hyp_rx_any_cf

## Confusion Matrix and Statistics
## 
##           HYPERTENSION
## hyp_rx_any  1  0
##          1 39 22
##          0  9 10
##                                          
##                Accuracy : 0.6125         
##                  95% CI : (0.497, 0.7194)
##     No Information Rate : 0.6            
##     P-Value [Acc > NIR] : 0.45762        
##                                          
##                   Kappa : 0.1341         
##                                          
##  Mcnemar's Test P-Value : 0.03114        
##                                          
##             Sensitivity : 0.8125         
##             Specificity : 0.3125         
##          Pos Pred Value : 0.6393         
##          Neg Pred Value : 0.5263         
##              Prevalence : 0.6000         
##          Detection Rate : 0.4875         
##    Detection Prevalence : 0.7625         
##       Balanced Accuracy : 0.5625         
##                                          
##        'Positive' Class : 1              
##

fourfoldplot(hyp_rx_any_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_rx_any_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 39 9 22 10
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_rx_any  : chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.613 0.134 0.497 0.719 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.812 0.312 0.639 0.526 0.639 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_rx_any_cfa <- as.data.frame(as.matrix(hyp_rx_any_cf, what = "overall"))
hyp_rx_any_cfa$row_names <- row.names(as.matrix(hyp_rx_any_cf, what = "overall"))
hyp_rx_any_cfa[nrow(hyp_rx_any_cfa) + 1,] = c("hyp_rx_any","Data_Type")
row.names(hyp_rx_any_cfa) <- str_replace_all(hyp_rx_any_cfa$row_names," ","_")
hyp_rx_any_cfa <- as.data.frame(t(hyp_rx_any_cfa))
hyp_rx_any_cfa <- hyp_rx_any_cfa[1,c(8,1:7)]
row.names(hyp_rx_any_cfa) <- NULL
str(hyp_rx_any_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_rx_any"
##  $ Accuracy      : chr "0.6125"
##  $ Kappa         : chr "0.134078212290503"
##  $ AccuracyLower : chr "0.496975542501899"
##  $ AccuracyUpper : chr "0.719434866875218"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.457620673916443"
##  $ McnemarPValue : chr "0.0311412105957967"

hyp_rx_any_cfa

hyp_rx_any_cfs <- as.data.frame(as.matrix(hyp_rx_any_cf, what = "classes"))
hyp_rx_any_cfs$row_names <- row.names(as.matrix(hyp_rx_any_cf, what = "classes"))
hyp_rx_any_cfs[nrow(hyp_rx_any_cfs) + 1,] = c("hyp_rx_any","Data_Type")
row.names(hyp_rx_any_cfs) <- str_replace_all(hyp_rx_any_cfs$row_names," ","_")
hyp_rx_any_cfs <- as.data.frame(t(hyp_rx_any_cfs))
hyp_rx_any_cfs <- hyp_rx_any_cfs[1,c(12,1:11)]
row.names(hyp_rx_any_cfs) <- NULL
str(hyp_rx_any_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_rx_any"
##  $ Sensitivity         : chr "0.8125"
##  $ Specificity         : chr "0.3125"
##  $ Pos_Pred_Value      : chr "0.639344262295082"
##  $ Neg_Pred_Value      : chr "0.526315789473684"
##  $ Precision           : chr "0.639344262295082"
##  $ Recall              : chr "0.8125"
##  $ F1                  : chr "0.71559633027523"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.4875"
##  $ Detection_Prevalence: chr "0.7625"
##  $ Balanced_Accuracy   : chr "0.5625"

hyp_rx_any_cfs

hyp_rx_any_cfa <- hyp_rx_any_cfa %>% 
  inner_join(hyp_rx_any_cfs, by = "Data_Type") 
rm(hyp_rx_any_cfs)
str(hyp_rx_any_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_rx_any"
##  $ Accuracy            : chr "0.6125"
##  $ Kappa               : chr "0.134078212290503"
##  $ AccuracyLower       : chr "0.496975542501899"
##  $ AccuracyUpper       : chr "0.719434866875218"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.457620673916443"
##  $ McnemarPValue       : chr "0.0311412105957967"
##  $ Sensitivity         : chr "0.8125"
##  $ Specificity         : chr "0.3125"
##  $ Pos_Pred_Value      : chr "0.639344262295082"
##  $ Neg_Pred_Value      : chr "0.526315789473684"
##  $ Precision           : chr "0.639344262295082"
##  $ Recall              : chr "0.8125"
##  $ F1                  : chr "0.71559633027523"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.4875"
##  $ Detection_Prevalence: chr "0.7625"
##  $ Balanced_Accuracy   : chr "0.5625"

hyp_rx_any_cfa

knitr::kable(t(hyp_rx_any_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_rx_any
Accuracy	0.6125
Sensitivity	0.8125
Specificity	0.3125
Pos_Pred_Value	0.639344262295082
Neg_Pred_Value	0.526315789473684

phenotypes_cf <- union(phenotypes_cf,hyp_rx_any_cfa)
#hyp_rx_any_cft <- as_tibble(as.matrix(hyp_rx_any_cf, what = "overall"))
#as.table(hyp_rx_any_cf)
#as.matrix(hyp_rx_any_cf)
#as.matrix(hyp_rx_any_cf, what = "overall")
#as.matrix(hyp_rx_any_cf, what = "classes")

At least 2 instances of medications listed

prescriptions_clean <- prescriptions %>% 
  filter(!is.null(str_trim(str_to_lower(DRUG)))) %>% 
  mutate(drug1 = str_trim(str_to_lower(DRUG))) %>% 
  collect()
prescriptions_clean

antihypdrgs_clean <- antihypdrgs %>% 
  filter(!is.null(str_trim(str_to_lower(DRUG)))) %>% 
  mutate(drug1 = str_trim(str_to_lower(DRUG))) %>% 
  collect()
antihypdrgs_clean

hyp_rx_min2 <- prescriptions_clean %>% 
  inner_join(antihypdrgs_clean, by = "drug1") %>% 
  mutate(hyp_rx_mark = 1) %>% 
  group_by(SUBJECT_ID) %>% 
  summarise(hyp_rx_count = sum(hyp_rx_mark, na.rm = TRUE)) %>% 
  mutate(hyp_rx_min2 = case_when(hyp_rx_count >= 2 ~ 1,
                                    TRUE ~ 0)) %>% 
  select(SUBJECT_ID, hyp_rx_count, hyp_rx_min2)
hyp_rx_min2

hyp_rx_min2_cf <- training %>% 
  left_join(hyp_rx_min2) %>% 
  mutate(hyp_rx_min2 = coalesce(hyp_rx_min2, 0)) %>% 
  collect() %>% 
  getStats(hyp_rx_min2, HYPERTENSION)
hyp_rx_min2_cf

## Confusion Matrix and Statistics
## 
##            HYPERTENSION
## hyp_rx_min2  1  0
##           1 39 21
##           0  9 11
##                                           
##                Accuracy : 0.625           
##                  95% CI : (0.5096, 0.7308)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.36873         
##                                           
##                   Kappa : 0.1667          
##                                           
##  Mcnemar's Test P-Value : 0.04461         
##                                           
##             Sensitivity : 0.8125          
##             Specificity : 0.3438          
##          Pos Pred Value : 0.6500          
##          Neg Pred Value : 0.5500          
##              Prevalence : 0.6000          
##          Detection Rate : 0.4875          
##    Detection Prevalence : 0.7500          
##       Balanced Accuracy : 0.5781          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_rx_min2_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_rx_min2_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 39 9 21 11
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_rx_min2 : chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION: chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.625 0.167 0.51 0.731 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.812 0.344 0.65 0.55 0.65 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_rx_min2_cfa <- as.data.frame(as.matrix(hyp_rx_min2_cf, what = "overall"))
hyp_rx_min2_cfa$row_names <- row.names(as.matrix(hyp_rx_min2_cf, what = "overall"))
hyp_rx_min2_cfa[nrow(hyp_rx_min2_cfa) + 1,] = c("hyp_rx_min2","Data_Type")
row.names(hyp_rx_min2_cfa) <- str_replace_all(hyp_rx_min2_cfa$row_names," ","_")
hyp_rx_min2_cfa <- as.data.frame(t(hyp_rx_min2_cfa))
hyp_rx_min2_cfa <- hyp_rx_min2_cfa[1,c(8,1:7)]
row.names(hyp_rx_min2_cfa) <- NULL
str(hyp_rx_min2_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_rx_min2"
##  $ Accuracy      : chr "0.625"
##  $ Kappa         : chr "0.166666666666667"
##  $ AccuracyLower : chr "0.509645235592888"
##  $ AccuracyUpper : chr "0.730806830053195"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.368726748258052"
##  $ McnemarPValue : chr "0.0446097180249397"

hyp_rx_min2_cfa

hyp_rx_min2_cfs <- as.data.frame(as.matrix(hyp_rx_min2_cf, what = "classes"))
hyp_rx_min2_cfs$row_names <- row.names(as.matrix(hyp_rx_min2_cf, what = "classes"))
hyp_rx_min2_cfs[nrow(hyp_rx_min2_cfs) + 1,] = c("hyp_rx_min2","Data_Type")
row.names(hyp_rx_min2_cfs) <- str_replace_all(hyp_rx_min2_cfs$row_names," ","_")
hyp_rx_min2_cfs <- as.data.frame(t(hyp_rx_min2_cfs))
hyp_rx_min2_cfs <- hyp_rx_min2_cfs[1,c(12,1:11)]
row.names(hyp_rx_min2_cfs) <- NULL
str(hyp_rx_min2_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_rx_min2"
##  $ Sensitivity         : chr "0.8125"
##  $ Specificity         : chr "0.34375"
##  $ Pos_Pred_Value      : chr "0.65"
##  $ Neg_Pred_Value      : chr "0.55"
##  $ Precision           : chr "0.65"
##  $ Recall              : chr "0.8125"
##  $ F1                  : chr "0.722222222222222"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.4875"
##  $ Detection_Prevalence: chr "0.75"
##  $ Balanced_Accuracy   : chr "0.578125"

hyp_rx_min2_cfs

hyp_rx_min2_cfa <- hyp_rx_min2_cfa %>% 
  inner_join(hyp_rx_min2_cfs, by = "Data_Type") 
rm(hyp_rx_min2_cfs)
str(hyp_rx_min2_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_rx_min2"
##  $ Accuracy            : chr "0.625"
##  $ Kappa               : chr "0.166666666666667"
##  $ AccuracyLower       : chr "0.509645235592888"
##  $ AccuracyUpper       : chr "0.730806830053195"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.368726748258052"
##  $ McnemarPValue       : chr "0.0446097180249397"
##  $ Sensitivity         : chr "0.8125"
##  $ Specificity         : chr "0.34375"
##  $ Pos_Pred_Value      : chr "0.65"
##  $ Neg_Pred_Value      : chr "0.55"
##  $ Precision           : chr "0.65"
##  $ Recall              : chr "0.8125"
##  $ F1                  : chr "0.722222222222222"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.4875"
##  $ Detection_Prevalence: chr "0.75"
##  $ Balanced_Accuracy   : chr "0.578125"

hyp_rx_min2_cfa

knitr::kable(t(hyp_rx_min2_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_rx_min2
Accuracy	0.625
Sensitivity	0.8125
Specificity	0.34375
Pos_Pred_Value	0.65
Neg_Pred_Value	0.55

phenotypes_cf <- union(phenotypes_cf,hyp_rx_min2_cfa)

Data Combination: ICD 401.9 or any instances of hypertension medications

hyp_icd4019_or_rx_any_cf <- training %>% 
  left_join(hyp_icd_4019) %>% 
  left_join(hyp_rx_any) %>% 
  mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
         hyp_rx_any = coalesce(hyp_rx_any, 0)) %>% 
  mutate(hyp_icd4019_or_rx_any = case_when(hyp_icd_4019 == 1 |
                                                 hyp_rx_any == 1  ~ 1,
                                               TRUE ~ 0)) %>% 
  collect() %>% 
  getStats(hyp_icd4019_or_rx_any, HYPERTENSION)
hyp_icd4019_or_rx_any_cf

## Confusion Matrix and Statistics
## 
##                      HYPERTENSION
## hyp_icd4019_or_rx_any  1  0
##                     1 41 22
##                     0  7 10
##                                           
##                Accuracy : 0.6375          
##                  95% CI : (0.5224, 0.7421)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.28606         
##                                           
##                   Kappa : 0.1808          
##                                           
##  Mcnemar's Test P-Value : 0.00933         
##                                           
##             Sensitivity : 0.8542          
##             Specificity : 0.3125          
##          Pos Pred Value : 0.6508          
##          Neg Pred Value : 0.5882          
##              Prevalence : 0.6000          
##          Detection Rate : 0.5125          
##    Detection Prevalence : 0.7875          
##       Balanced Accuracy : 0.5833          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd4019_or_rx_any_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_any_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 41 7 22 10
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd4019_or_rx_any: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION         : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.637 0.181 0.522 0.742 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.854 0.312 0.651 0.588 0.651 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd4019_or_rx_any_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_cf, what = "overall"))
hyp_icd4019_or_rx_any_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_cf, what = "overall"))
hyp_icd4019_or_rx_any_cfa[nrow(hyp_icd4019_or_rx_any_cfa) + 1,] = c("hyp_icd4019_or_rx_any","Data_Type")
row.names(hyp_icd4019_or_rx_any_cfa) <- str_replace_all(hyp_icd4019_or_rx_any_cfa$row_names," ","_")
hyp_icd4019_or_rx_any_cfa <- as.data.frame(t(hyp_icd4019_or_rx_any_cfa))
hyp_icd4019_or_rx_any_cfa <- hyp_icd4019_or_rx_any_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_any_cfa) <- NULL
str(hyp_icd4019_or_rx_any_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd4019_or_rx_any"
##  $ Accuracy      : chr "0.6375"
##  $ Kappa         : chr "0.180790960451977"
##  $ AccuracyLower : chr "0.522387278735191"
##  $ AccuracyUpper : chr "0.742105104628382"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.286055397395747"
##  $ McnemarPValue : chr "0.00932958471977156"

hyp_icd4019_or_rx_any_cfa

hyp_icd4019_or_rx_any_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_cf, what = "classes"))
hyp_icd4019_or_rx_any_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_cf, what = "classes"))
hyp_icd4019_or_rx_any_cfs[nrow(hyp_icd4019_or_rx_any_cfs) + 1,] = c("hyp_icd4019_or_rx_any","Data_Type")
row.names(hyp_icd4019_or_rx_any_cfs) <- str_replace_all(hyp_icd4019_or_rx_any_cfs$row_names," ","_")
hyp_icd4019_or_rx_any_cfs <- as.data.frame(t(hyp_icd4019_or_rx_any_cfs))
hyp_icd4019_or_rx_any_cfs <- hyp_icd4019_or_rx_any_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_any_cfs) <- NULL
str(hyp_icd4019_or_rx_any_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_any"
##  $ Sensitivity         : chr "0.854166666666667"
##  $ Specificity         : chr "0.3125"
##  $ Pos_Pred_Value      : chr "0.650793650793651"
##  $ Neg_Pred_Value      : chr "0.588235294117647"
##  $ Precision           : chr "0.650793650793651"
##  $ Recall              : chr "0.854166666666667"
##  $ F1                  : chr "0.738738738738739"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.5125"
##  $ Detection_Prevalence: chr "0.7875"
##  $ Balanced_Accuracy   : chr "0.583333333333333"

hyp_icd4019_or_rx_any_cfs

hyp_icd4019_or_rx_any_cfa <- hyp_icd4019_or_rx_any_cfa %>% 
  inner_join(hyp_icd4019_or_rx_any_cfs, by = "Data_Type") 
rm(hyp_icd4019_or_rx_any_cfs)
str(hyp_icd4019_or_rx_any_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_any"
##  $ Accuracy            : chr "0.6375"
##  $ Kappa               : chr "0.180790960451977"
##  $ AccuracyLower       : chr "0.522387278735191"
##  $ AccuracyUpper       : chr "0.742105104628382"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.286055397395747"
##  $ McnemarPValue       : chr "0.00932958471977156"
##  $ Sensitivity         : chr "0.854166666666667"
##  $ Specificity         : chr "0.3125"
##  $ Pos_Pred_Value      : chr "0.650793650793651"
##  $ Neg_Pred_Value      : chr "0.588235294117647"
##  $ Precision           : chr "0.650793650793651"
##  $ Recall              : chr "0.854166666666667"
##  $ F1                  : chr "0.738738738738739"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.5125"
##  $ Detection_Prevalence: chr "0.7875"
##  $ Balanced_Accuracy   : chr "0.583333333333333"

hyp_icd4019_or_rx_any_cfa

knitr::kable(t(hyp_icd4019_or_rx_any_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd4019_or_rx_any
Accuracy	0.6375
Sensitivity	0.854166666666667
Specificity	0.3125
Pos_Pred_Value	0.650793650793651
Neg_Pred_Value	0.588235294117647

phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_any_cfa)

Data Combination: ICD 401.9 or at least two instances of hypertension medications

hyp_icd4019_or_rx_min2_cf <- training %>% 
  left_join(hyp_icd_4019) %>% 
  left_join(hyp_rx_min2) %>% 
  mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
         hyp_rx_min2 = coalesce(hyp_rx_min2, 0)) %>% 
  mutate(hyp_icd4019_or_rx_min2 = case_when(hyp_icd_4019 == 1 |
                                                 hyp_rx_min2 == 1  ~ 1,
                                               TRUE ~ 0)) %>% 
  collect() %>% 
  getStats(hyp_icd4019_or_rx_min2, HYPERTENSION)
hyp_icd4019_or_rx_min2_cf

## Confusion Matrix and Statistics
## 
##                       HYPERTENSION
## hyp_icd4019_or_rx_min2  1  0
##                      1 41 21
##                      0  7 11
##                                           
##                Accuracy : 0.65            
##                  95% CI : (0.5352, 0.7533)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.21311         
##                                           
##                   Kappa : 0.2135          
##                                           
##  Mcnemar's Test P-Value : 0.01402         
##                                           
##             Sensitivity : 0.8542          
##             Specificity : 0.3438          
##          Pos Pred Value : 0.6613          
##          Neg Pred Value : 0.6111          
##              Prevalence : 0.6000          
##          Detection Rate : 0.5125          
##    Detection Prevalence : 0.7750          
##       Balanced Accuracy : 0.5990          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd4019_or_rx_min2_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_min2_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 41 7 21 11
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd4019_or_rx_min2: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION          : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.65 0.213 0.535 0.753 0.6 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.854 0.344 0.661 0.611 0.661 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd4019_or_rx_min2_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "overall"))
hyp_icd4019_or_rx_min2_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "overall"))
hyp_icd4019_or_rx_min2_cfa[nrow(hyp_icd4019_or_rx_min2_cfa) + 1,] = c("hyp_icd4019_or_rx_min2","Data_Type")
row.names(hyp_icd4019_or_rx_min2_cfa) <- str_replace_all(hyp_icd4019_or_rx_min2_cfa$row_names," ","_")
hyp_icd4019_or_rx_min2_cfa <- as.data.frame(t(hyp_icd4019_or_rx_min2_cfa))
hyp_icd4019_or_rx_min2_cfa <- hyp_icd4019_or_rx_min2_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_min2_cfa) <- NULL
str(hyp_icd4019_or_rx_min2_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd4019_or_rx_min2"
##  $ Accuracy      : chr "0.65"
##  $ Kappa         : chr "0.213483146067416"
##  $ AccuracyLower : chr "0.535203114520664"
##  $ AccuracyUpper : chr "0.753328063527465"
##  $ AccuracyNull  : chr "0.6"
##  $ AccuracyPValue: chr "0.213110087811361"
##  $ McnemarPValue : chr "0.01401927711396"

hyp_icd4019_or_rx_min2_cfa

hyp_icd4019_or_rx_min2_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "classes"))
hyp_icd4019_or_rx_min2_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "classes"))
hyp_icd4019_or_rx_min2_cfs[nrow(hyp_icd4019_or_rx_min2_cfs) + 1,] = c("hyp_icd4019_or_rx_min2","Data_Type")
row.names(hyp_icd4019_or_rx_min2_cfs) <- str_replace_all(hyp_icd4019_or_rx_min2_cfs$row_names," ","_")
hyp_icd4019_or_rx_min2_cfs <- as.data.frame(t(hyp_icd4019_or_rx_min2_cfs))
hyp_icd4019_or_rx_min2_cfs <- hyp_icd4019_or_rx_min2_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_min2_cfs) <- NULL
str(hyp_icd4019_or_rx_min2_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_min2"
##  $ Sensitivity         : chr "0.854166666666667"
##  $ Specificity         : chr "0.34375"
##  $ Pos_Pred_Value      : chr "0.661290322580645"
##  $ Neg_Pred_Value      : chr "0.611111111111111"
##  $ Precision           : chr "0.661290322580645"
##  $ Recall              : chr "0.854166666666667"
##  $ F1                  : chr "0.745454545454545"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.5125"
##  $ Detection_Prevalence: chr "0.775"
##  $ Balanced_Accuracy   : chr "0.598958333333333"

hyp_icd4019_or_rx_min2_cfs

hyp_icd4019_or_rx_min2_cfa <- hyp_icd4019_or_rx_min2_cfa %>% 
  inner_join(hyp_icd4019_or_rx_min2_cfs, by = "Data_Type") 
rm(hyp_icd4019_or_rx_min2_cfs)
str(hyp_icd4019_or_rx_min2_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_min2"
##  $ Accuracy            : chr "0.65"
##  $ Kappa               : chr "0.213483146067416"
##  $ AccuracyLower       : chr "0.535203114520664"
##  $ AccuracyUpper       : chr "0.753328063527465"
##  $ AccuracyNull        : chr "0.6"
##  $ AccuracyPValue      : chr "0.213110087811361"
##  $ McnemarPValue       : chr "0.01401927711396"
##  $ Sensitivity         : chr "0.854166666666667"
##  $ Specificity         : chr "0.34375"
##  $ Pos_Pred_Value      : chr "0.661290322580645"
##  $ Neg_Pred_Value      : chr "0.611111111111111"
##  $ Precision           : chr "0.661290322580645"
##  $ Recall              : chr "0.854166666666667"
##  $ F1                  : chr "0.745454545454545"
##  $ Prevalence          : chr "0.6"
##  $ Detection_Rate      : chr "0.5125"
##  $ Detection_Prevalence: chr "0.775"
##  $ Balanced_Accuracy   : chr "0.598958333333333"

hyp_icd4019_or_rx_min2_cfa

knitr::kable(t(hyp_icd4019_or_rx_min2_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd4019_or_rx_min2
Accuracy	0.65
Sensitivity	0.854166666666667
Specificity	0.34375
Pos_Pred_Value	0.661290322580645
Neg_Pred_Value	0.611111111111111

phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_min2_cfa)

Validation: hyp_icd_any_test using testing data

hyp_icd_any_test_cf <- testing %>% 
  left_join(hyp_icd_any) %>% 
  mutate(hyp_icd_any_test = coalesce(hyp_icd_any, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_any_test, HYPERTENSION)
hyp_icd_any_test_cf

## Confusion Matrix and Statistics
## 
##                 HYPERTENSION
## hyp_icd_any_test 1 0
##                1 6 1
##                0 9 3
##                                           
##                Accuracy : 0.4737          
##                  95% CI : (0.2445, 0.7114)
##     No Information Rate : 0.7895          
##     P-Value [Acc > NIR] : 0.99950         
##                                           
##                   Kappa : 0.0865          
##                                           
##  Mcnemar's Test P-Value : 0.02686         
##                                           
##             Sensitivity : 0.4000          
##             Specificity : 0.7500          
##          Pos Pred Value : 0.8571          
##          Neg Pred Value : 0.2500          
##              Prevalence : 0.7895          
##          Detection Rate : 0.3158          
##    Detection Prevalence : 0.3684          
##       Balanced Accuracy : 0.5750          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd_any_test_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_any_test_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 6 9 1 3
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_any_test: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION    : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.4737 0.0865 0.2445 0.7114 0.7895 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.4 0.75 0.857 0.25 0.857 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_any_test_cfa <- as.data.frame(as.matrix(hyp_icd_any_test_cf, what = "overall"))
hyp_icd_any_test_cfa$row_names <- row.names(as.matrix(hyp_icd_any_test_cf, what = "overall"))
hyp_icd_any_test_cfa[nrow(hyp_icd_any_test_cfa) + 1,] = c("hyp_icd_any_test","Data_Type")
row.names(hyp_icd_any_test_cfa) <- str_replace_all(hyp_icd_any_test_cfa$row_names," ","_")
hyp_icd_any_test_cfa <- as.data.frame(t(hyp_icd_any_test_cfa))
hyp_icd_any_test_cfa <- hyp_icd_any_test_cfa[1,c(8,1:7)]
row.names(hyp_icd_any_test_cfa) <- NULL
str(hyp_icd_any_test_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_any_test"
##  $ Accuracy      : chr "0.473684210526316"
##  $ Kappa         : chr "0.0865384615384616"
##  $ AccuracyLower : chr "0.244474689466196"
##  $ AccuracyUpper : chr "0.711356752083001"
##  $ AccuracyNull  : chr "0.789473684210526"
##  $ AccuracyPValue: chr "0.999504530247808"
##  $ McnemarPValue : chr "0.0268566955075243"

hyp_icd_any_test_cfa

hyp_icd_any_test_cfs <- as.data.frame(as.matrix(hyp_icd_any_test_cf, what = "classes"))
hyp_icd_any_test_cfs$row_names <- row.names(as.matrix(hyp_icd_any_test_cf, what = "classes"))
hyp_icd_any_test_cfs[nrow(hyp_icd_any_test_cfs) + 1,] = c("hyp_icd_any_test","Data_Type")
row.names(hyp_icd_any_test_cfs) <- str_replace_all(hyp_icd_any_test_cfs$row_names," ","_")
hyp_icd_any_test_cfs <- as.data.frame(t(hyp_icd_any_test_cfs))
hyp_icd_any_test_cfs <- hyp_icd_any_test_cfs[1,c(12,1:11)]
row.names(hyp_icd_any_test_cfs) <- NULL
str(hyp_icd_any_test_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_any_test"
##  $ Sensitivity         : chr "0.4"
##  $ Specificity         : chr "0.75"
##  $ Pos_Pred_Value      : chr "0.857142857142857"
##  $ Neg_Pred_Value      : chr "0.25"
##  $ Precision           : chr "0.857142857142857"
##  $ Recall              : chr "0.4"
##  $ F1                  : chr "0.545454545454546"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.315789473684211"
##  $ Detection_Prevalence: chr "0.368421052631579"
##  $ Balanced_Accuracy   : chr "0.575"

hyp_icd_any_test_cfs

hyp_icd_any_test_cfa <- hyp_icd_any_test_cfa %>% 
  inner_join(hyp_icd_any_test_cfs, by = "Data_Type") 
rm(hyp_icd_any_test_cfs)
str(hyp_icd_any_test_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_any_test"
##  $ Accuracy            : chr "0.473684210526316"
##  $ Kappa               : chr "0.0865384615384616"
##  $ AccuracyLower       : chr "0.244474689466196"
##  $ AccuracyUpper       : chr "0.711356752083001"
##  $ AccuracyNull        : chr "0.789473684210526"
##  $ AccuracyPValue      : chr "0.999504530247808"
##  $ McnemarPValue       : chr "0.0268566955075243"
##  $ Sensitivity         : chr "0.4"
##  $ Specificity         : chr "0.75"
##  $ Pos_Pred_Value      : chr "0.857142857142857"
##  $ Neg_Pred_Value      : chr "0.25"
##  $ Precision           : chr "0.857142857142857"
##  $ Recall              : chr "0.4"
##  $ F1                  : chr "0.545454545454546"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.315789473684211"
##  $ Detection_Prevalence: chr "0.368421052631579"
##  $ Balanced_Accuracy   : chr "0.575"

hyp_icd_any_test_cfa

knitr::kable(t(hyp_icd_any_test_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_any_test
Accuracy	0.473684210526316
Sensitivity	0.4
Specificity	0.75
Pos_Pred_Value	0.857142857142857
Neg_Pred_Value	0.25

phenotypes_cf <- union(phenotypes_cf,hyp_icd_any_test_cfa)

Validation: hyp_icd_4019_test using testing data

hyp_icd_4019_test_cf <- testing %>% 
  left_join(hyp_icd_4019) %>% 
  mutate(hyp_icd_4019_test = coalesce(hyp_icd_4019, 0)) %>% 
  collect() %>% 
  getStats(hyp_icd_4019_test, HYPERTENSION)
hyp_icd_4019_test_cf

## Confusion Matrix and Statistics
## 
##                  HYPERTENSION
## hyp_icd_4019_test  1  0
##                 1  5  1
##                 0 10  3
##                                          
##                Accuracy : 0.4211         
##                  95% CI : (0.2025, 0.665)
##     No Information Rate : 0.7895         
##     P-Value [Acc > NIR] : 0.99992        
##                                          
##                   Kappa : 0.0457         
##                                          
##  Mcnemar's Test P-Value : 0.01586        
##                                          
##             Sensitivity : 0.3333         
##             Specificity : 0.7500         
##          Pos Pred Value : 0.8333         
##          Neg Pred Value : 0.2308         
##              Prevalence : 0.7895         
##          Detection Rate : 0.2632         
##    Detection Prevalence : 0.3158         
##       Balanced Accuracy : 0.5417         
##                                          
##        'Positive' Class : 1              
##

fourfoldplot(hyp_icd_4019_test_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4019_test_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 5 10 1 3
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd_4019_test: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION     : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.4211 0.0457 0.2025 0.665 0.7895 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.333 0.75 0.833 0.231 0.833 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd_4019_test_cfa <- as.data.frame(as.matrix(hyp_icd_4019_test_cf, what = "overall"))
hyp_icd_4019_test_cfa$row_names <- row.names(as.matrix(hyp_icd_4019_test_cf, what = "overall"))
hyp_icd_4019_test_cfa[nrow(hyp_icd_4019_test_cfa) + 1,] = c("hyp_icd_4019_test","Data_Type")
row.names(hyp_icd_4019_test_cfa) <- str_replace_all(hyp_icd_4019_test_cfa$row_names," ","_")
hyp_icd_4019_test_cfa <- as.data.frame(t(hyp_icd_4019_test_cfa))
hyp_icd_4019_test_cfa <- hyp_icd_4019_test_cfa[1,c(8,1:7)]
row.names(hyp_icd_4019_test_cfa) <- NULL
str(hyp_icd_4019_test_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd_4019_test"
##  $ Accuracy      : chr "0.421052631578947"
##  $ Kappa         : chr "0.045662100456621"
##  $ AccuracyLower : chr "0.202521438977163"
##  $ AccuracyUpper : chr "0.665002155988265"
##  $ AccuracyNull  : chr "0.789473684210526"
##  $ AccuracyPValue: chr "0.999915197373638"
##  $ McnemarPValue : chr "0.015861332739773"

hyp_icd_4019_test_cfa

hyp_icd_4019_test_cfs <- as.data.frame(as.matrix(hyp_icd_4019_test_cf, what = "classes"))
hyp_icd_4019_test_cfs$row_names <- row.names(as.matrix(hyp_icd_4019_test_cf, what = "classes"))
hyp_icd_4019_test_cfs[nrow(hyp_icd_4019_test_cfs) + 1,] = c("hyp_icd_4019_test","Data_Type")
row.names(hyp_icd_4019_test_cfs) <- str_replace_all(hyp_icd_4019_test_cfs$row_names," ","_")
hyp_icd_4019_test_cfs <- as.data.frame(t(hyp_icd_4019_test_cfs))
hyp_icd_4019_test_cfs <- hyp_icd_4019_test_cfs[1,c(12,1:11)]
row.names(hyp_icd_4019_test_cfs) <- NULL
str(hyp_icd_4019_test_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd_4019_test"
##  $ Sensitivity         : chr "0.333333333333333"
##  $ Specificity         : chr "0.75"
##  $ Pos_Pred_Value      : chr "0.833333333333333"
##  $ Neg_Pred_Value      : chr "0.230769230769231"
##  $ Precision           : chr "0.833333333333333"
##  $ Recall              : chr "0.333333333333333"
##  $ F1                  : chr "0.476190476190476"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.263157894736842"
##  $ Detection_Prevalence: chr "0.315789473684211"
##  $ Balanced_Accuracy   : chr "0.541666666666667"

hyp_icd_4019_test_cfs

hyp_icd_4019_test_cfa <- hyp_icd_4019_test_cfa %>% 
  inner_join(hyp_icd_4019_test_cfs, by = "Data_Type") 
rm(hyp_icd_4019_test_cfs)
str(hyp_icd_4019_test_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd_4019_test"
##  $ Accuracy            : chr "0.421052631578947"
##  $ Kappa               : chr "0.045662100456621"
##  $ AccuracyLower       : chr "0.202521438977163"
##  $ AccuracyUpper       : chr "0.665002155988265"
##  $ AccuracyNull        : chr "0.789473684210526"
##  $ AccuracyPValue      : chr "0.999915197373638"
##  $ McnemarPValue       : chr "0.015861332739773"
##  $ Sensitivity         : chr "0.333333333333333"
##  $ Specificity         : chr "0.75"
##  $ Pos_Pred_Value      : chr "0.833333333333333"
##  $ Neg_Pred_Value      : chr "0.230769230769231"
##  $ Precision           : chr "0.833333333333333"
##  $ Recall              : chr "0.333333333333333"
##  $ F1                  : chr "0.476190476190476"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.263157894736842"
##  $ Detection_Prevalence: chr "0.315789473684211"
##  $ Balanced_Accuracy   : chr "0.541666666666667"

hyp_icd_4019_test_cfa

knitr::kable(t(hyp_icd_4019_test_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd_4019_test
Accuracy	0.421052631578947
Sensitivity	0.333333333333333
Specificity	0.75
Pos_Pred_Value	0.833333333333333
Neg_Pred_Value	0.230769230769231

phenotypes_cf <- union(phenotypes_cf,hyp_icd_4019_test_cfa)

Validation: hyp_rx_min2_test using testing data

hyp_rx_min2_test_cf <- testing %>% 
  left_join(hyp_rx_min2) %>% 
  mutate(hyp_rx_min2_test = coalesce(hyp_rx_min2, 0)) %>% 
  collect() %>% 
  getStats(hyp_rx_min2_test, HYPERTENSION)
hyp_rx_min2_test_cf

## Confusion Matrix and Statistics
## 
##                 HYPERTENSION
## hyp_rx_min2_test  1  0
##                1 12  2
##                0  3  2
##                                          
##                Accuracy : 0.7368         
##                  95% CI : (0.488, 0.9085)
##     No Information Rate : 0.7895         
##     P-Value [Acc > NIR] : 0.8054         
##                                          
##                   Kappa : 0.2748         
##                                          
##  Mcnemar's Test P-Value : 1.0000         
##                                          
##             Sensitivity : 0.8000         
##             Specificity : 0.5000         
##          Pos Pred Value : 0.8571         
##          Neg Pred Value : 0.4000         
##              Prevalence : 0.7895         
##          Detection Rate : 0.6316         
##    Detection Prevalence : 0.7368         
##       Balanced Accuracy : 0.6500         
##                                          
##        'Positive' Class : 1              
##

fourfoldplot(hyp_rx_min2_test_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_rx_min2_test_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 12 3 2 2
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_rx_min2_test: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION    : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.737 0.275 0.488 0.909 0.789 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.8 0.5 0.857 0.4 0.857 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_rx_min2_test_cfa <- as.data.frame(as.matrix(hyp_rx_min2_test_cf, what = "overall"))
hyp_rx_min2_test_cfa$row_names <- row.names(as.matrix(hyp_rx_min2_test_cf, what = "overall"))
hyp_rx_min2_test_cfa[nrow(hyp_rx_min2_test_cfa) + 1,] = c("hyp_rx_min2_test","Data_Type")
row.names(hyp_rx_min2_test_cfa) <- str_replace_all(hyp_rx_min2_test_cfa$row_names," ","_")
hyp_rx_min2_test_cfa <- as.data.frame(t(hyp_rx_min2_test_cfa))
hyp_rx_min2_test_cfa <- hyp_rx_min2_test_cfa[1,c(8,1:7)]
row.names(hyp_rx_min2_test_cfa) <- NULL
str(hyp_rx_min2_test_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_rx_min2_test"
##  $ Accuracy      : chr "0.736842105263158"
##  $ Kappa         : chr "0.274809160305343"
##  $ AccuracyLower : chr "0.487970654654127"
##  $ AccuracyUpper : chr "0.908534215092333"
##  $ AccuracyNull  : chr "0.789473684210526"
##  $ AccuracyPValue: chr "0.805444466849716"
##  $ McnemarPValue : chr "1"

hyp_rx_min2_test_cfa

hyp_rx_min2_test_cfs <- as.data.frame(as.matrix(hyp_rx_min2_test_cf, what = "classes"))
hyp_rx_min2_test_cfs$row_names <- row.names(as.matrix(hyp_rx_min2_test_cf, what = "classes"))
hyp_rx_min2_test_cfs[nrow(hyp_rx_min2_test_cfs) + 1,] = c("hyp_rx_min2_test","Data_Type")
row.names(hyp_rx_min2_test_cfs) <- str_replace_all(hyp_rx_min2_test_cfs$row_names," ","_")
hyp_rx_min2_test_cfs <- as.data.frame(t(hyp_rx_min2_test_cfs))
hyp_rx_min2_test_cfs <- hyp_rx_min2_test_cfs[1,c(12,1:11)]
row.names(hyp_rx_min2_test_cfs) <- NULL
str(hyp_rx_min2_test_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_rx_min2_test"
##  $ Sensitivity         : chr "0.8"
##  $ Specificity         : chr "0.5"
##  $ Pos_Pred_Value      : chr "0.857142857142857"
##  $ Neg_Pred_Value      : chr "0.4"
##  $ Precision           : chr "0.857142857142857"
##  $ Recall              : chr "0.8"
##  $ F1                  : chr "0.827586206896552"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.631578947368421"
##  $ Detection_Prevalence: chr "0.736842105263158"
##  $ Balanced_Accuracy   : chr "0.65"

hyp_rx_min2_test_cfs

hyp_rx_min2_test_cfa <- hyp_rx_min2_test_cfa %>% 
  inner_join(hyp_rx_min2_test_cfs, by = "Data_Type") 
rm(hyp_rx_min2_test_cfs)
str(hyp_rx_min2_test_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_rx_min2_test"
##  $ Accuracy            : chr "0.736842105263158"
##  $ Kappa               : chr "0.274809160305343"
##  $ AccuracyLower       : chr "0.487970654654127"
##  $ AccuracyUpper       : chr "0.908534215092333"
##  $ AccuracyNull        : chr "0.789473684210526"
##  $ AccuracyPValue      : chr "0.805444466849716"
##  $ McnemarPValue       : chr "1"
##  $ Sensitivity         : chr "0.8"
##  $ Specificity         : chr "0.5"
##  $ Pos_Pred_Value      : chr "0.857142857142857"
##  $ Neg_Pred_Value      : chr "0.4"
##  $ Precision           : chr "0.857142857142857"
##  $ Recall              : chr "0.8"
##  $ F1                  : chr "0.827586206896552"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.631578947368421"
##  $ Detection_Prevalence: chr "0.736842105263158"
##  $ Balanced_Accuracy   : chr "0.65"

hyp_rx_min2_test_cfa

knitr::kable(t(hyp_rx_min2_test_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_rx_min2_test
Accuracy	0.736842105263158
Sensitivity	0.8
Specificity	0.5
Pos_Pred_Value	0.857142857142857
Neg_Pred_Value	0.4

phenotypes_cf <- union(phenotypes_cf,hyp_rx_min2_test_cfa)

Validation: hyp_icd4019_or_rx_any_test using testing data

hyp_icd4019_or_rx_any_test_cf <- testing %>% 
  left_join(hyp_icd_4019) %>% 
  left_join(hyp_rx_any) %>% 
  mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
         hyp_rx_any = coalesce(hyp_rx_any, 0)) %>% 
  mutate(hyp_icd4019_or_rx_any = case_when(hyp_icd_4019 == 1 |
                                                 hyp_rx_any == 1  ~ 1,
                                               TRUE ~ 0)) %>% 
  collect() %>% 
  getStats(hyp_icd4019_or_rx_any, HYPERTENSION)
hyp_icd4019_or_rx_any_test_cf

## Confusion Matrix and Statistics
## 
##                      HYPERTENSION
## hyp_icd4019_or_rx_any  1  0
##                     1 14  2
##                     0  1  2
##                                           
##                Accuracy : 0.8421          
##                  95% CI : (0.6042, 0.9662)
##     No Information Rate : 0.7895          
##     P-Value [Acc > NIR] : 0.4101          
##                                           
##                   Kappa : 0.4771          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.9333          
##             Specificity : 0.5000          
##          Pos Pred Value : 0.8750          
##          Neg Pred Value : 0.6667          
##              Prevalence : 0.7895          
##          Detection Rate : 0.7368          
##    Detection Prevalence : 0.8421          
##       Balanced Accuracy : 0.7167          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd4019_or_rx_any_test_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_any_test_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 14 1 2 2
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd4019_or_rx_any: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION         : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.842 0.477 0.604 0.966 0.789 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.933 0.5 0.875 0.667 0.875 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd4019_or_rx_any_test_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "overall"))
hyp_icd4019_or_rx_any_test_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "overall"))
hyp_icd4019_or_rx_any_test_cfa[nrow(hyp_icd4019_or_rx_any_test_cfa) + 1,] = c("hyp_icd4019_or_rx_any_test","Data_Type")
row.names(hyp_icd4019_or_rx_any_test_cfa) <- str_replace_all(hyp_icd4019_or_rx_any_test_cfa$row_names," ","_")
hyp_icd4019_or_rx_any_test_cfa <- as.data.frame(t(hyp_icd4019_or_rx_any_test_cfa))
hyp_icd4019_or_rx_any_test_cfa <- hyp_icd4019_or_rx_any_test_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_any_test_cfa) <- NULL
str(hyp_icd4019_or_rx_any_test_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd4019_or_rx_any_test"
##  $ Accuracy      : chr "0.842105263157895"
##  $ Kappa         : chr "0.477064220183486"
##  $ AccuracyLower : chr "0.604215448733326"
##  $ AccuracyUpper : chr "0.966173750998235"
##  $ AccuracyNull  : chr "0.789473684210526"
##  $ AccuracyPValue: chr "0.410126360334181"
##  $ McnemarPValue : chr "1"

hyp_icd4019_or_rx_any_test_cfa

hyp_icd4019_or_rx_any_test_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "classes"))
hyp_icd4019_or_rx_any_test_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "classes"))
hyp_icd4019_or_rx_any_test_cfs[nrow(hyp_icd4019_or_rx_any_test_cfs) + 1,] = c("hyp_icd4019_or_rx_any_test","Data_Type")
row.names(hyp_icd4019_or_rx_any_test_cfs) <- str_replace_all(hyp_icd4019_or_rx_any_test_cfs$row_names," ","_")
hyp_icd4019_or_rx_any_test_cfs <- as.data.frame(t(hyp_icd4019_or_rx_any_test_cfs))
hyp_icd4019_or_rx_any_test_cfs <- hyp_icd4019_or_rx_any_test_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_any_test_cfs) <- NULL
str(hyp_icd4019_or_rx_any_test_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_any_test"
##  $ Sensitivity         : chr "0.933333333333333"
##  $ Specificity         : chr "0.5"
##  $ Pos_Pred_Value      : chr "0.875"
##  $ Neg_Pred_Value      : chr "0.666666666666667"
##  $ Precision           : chr "0.875"
##  $ Recall              : chr "0.933333333333333"
##  $ F1                  : chr "0.903225806451613"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.736842105263158"
##  $ Detection_Prevalence: chr "0.842105263157895"
##  $ Balanced_Accuracy   : chr "0.716666666666667"

hyp_icd4019_or_rx_any_test_cfs

hyp_icd4019_or_rx_any_test_cfa <- hyp_icd4019_or_rx_any_test_cfa %>% 
  inner_join(hyp_icd4019_or_rx_any_test_cfs, by = "Data_Type") 
rm(hyp_icd4019_or_rx_any_test_cfs)
str(hyp_icd4019_or_rx_any_test_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_any_test"
##  $ Accuracy            : chr "0.842105263157895"
##  $ Kappa               : chr "0.477064220183486"
##  $ AccuracyLower       : chr "0.604215448733326"
##  $ AccuracyUpper       : chr "0.966173750998235"
##  $ AccuracyNull        : chr "0.789473684210526"
##  $ AccuracyPValue      : chr "0.410126360334181"
##  $ McnemarPValue       : chr "1"
##  $ Sensitivity         : chr "0.933333333333333"
##  $ Specificity         : chr "0.5"
##  $ Pos_Pred_Value      : chr "0.875"
##  $ Neg_Pred_Value      : chr "0.666666666666667"
##  $ Precision           : chr "0.875"
##  $ Recall              : chr "0.933333333333333"
##  $ F1                  : chr "0.903225806451613"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.736842105263158"
##  $ Detection_Prevalence: chr "0.842105263157895"
##  $ Balanced_Accuracy   : chr "0.716666666666667"

hyp_icd4019_or_rx_any_test_cfa

knitr::kable(t(hyp_icd4019_or_rx_any_test_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd4019_or_rx_any_test
Accuracy	0.842105263157895
Sensitivity	0.933333333333333
Specificity	0.5
Pos_Pred_Value	0.875
Neg_Pred_Value	0.666666666666667

phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_any_test_cfa)

Validation: hyp_icd4019_or_rx_min2_test using testing data

hyp_icd4019_or_rx_min2_test_cf <- testing %>% 
  left_join(hyp_icd_4019) %>% 
  left_join(hyp_rx_min2) %>% 
  mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
         hyp_rx_min2 = coalesce(hyp_rx_min2, 0)) %>% 
  mutate(hyp_icd4019_or_rx_min2 = case_when(hyp_icd_4019 == 1 |
                                                 hyp_rx_min2 == 1  ~ 1,
                                               TRUE ~ 0)) %>% 
  collect() %>% 
  getStats(hyp_icd4019_or_rx_min2, HYPERTENSION)
hyp_icd4019_or_rx_min2_test_cf

## Confusion Matrix and Statistics
## 
##                       HYPERTENSION
## hyp_icd4019_or_rx_min2  1  0
##                      1 13  2
##                      0  2  2
##                                           
##                Accuracy : 0.7895          
##                  95% CI : (0.5443, 0.9395)
##     No Information Rate : 0.7895          
##     P-Value [Acc > NIR] : 0.6297          
##                                           
##                   Kappa : 0.3667          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.8667          
##             Specificity : 0.5000          
##          Pos Pred Value : 0.8667          
##          Neg Pred Value : 0.5000          
##              Prevalence : 0.7895          
##          Detection Rate : 0.6842          
##    Detection Prevalence : 0.7895          
##       Balanced Accuracy : 0.6833          
##                                           
##        'Positive' Class : 1               
##

fourfoldplot(hyp_icd4019_or_rx_min2_test_cf$table)

#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_min2_test_cf)

## List of 6
##  $ positive: chr "1"
##  $ table   : 'table' int [1:2, 1:2] 13 2 2 2
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hyp_icd4019_or_rx_min2: chr [1:2] "1" "0"
##   .. ..$ HYPERTENSION          : chr [1:2] "1" "0"
##  $ overall : Named num [1:7] 0.789 0.367 0.544 0.939 0.789 ...
##   ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
##  $ byClass : Named num [1:11] 0.867 0.5 0.867 0.5 0.867 ...
##   ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
##  $ mode    : chr "sens_spec"
##  $ dots    : list()
##  - attr(*, "class")= chr "confusionMatrix"

hyp_icd4019_or_rx_min2_test_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "overall"))
hyp_icd4019_or_rx_min2_test_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "overall"))
hyp_icd4019_or_rx_min2_test_cfa[nrow(hyp_icd4019_or_rx_min2_test_cfa) + 1,] = c("hyp_icd4019_or_rx_min2_test","Data_Type")
row.names(hyp_icd4019_or_rx_min2_test_cfa) <- str_replace_all(hyp_icd4019_or_rx_min2_test_cfa$row_names," ","_")
hyp_icd4019_or_rx_min2_test_cfa <- as.data.frame(t(hyp_icd4019_or_rx_min2_test_cfa))
hyp_icd4019_or_rx_min2_test_cfa <- hyp_icd4019_or_rx_min2_test_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_min2_test_cfa) <- NULL
str(hyp_icd4019_or_rx_min2_test_cfa)

## 'data.frame':    1 obs. of  8 variables:
##  $ Data_Type     : chr "hyp_icd4019_or_rx_min2_test"
##  $ Accuracy      : chr "0.789473684210526"
##  $ Kappa         : chr "0.366666666666667"
##  $ AccuracyLower : chr "0.544346918108494"
##  $ AccuracyUpper : chr "0.93947546229071"
##  $ AccuracyNull  : chr "0.789473684210526"
##  $ AccuracyPValue: chr "0.629747530620589"
##  $ McnemarPValue : chr "1"

hyp_icd4019_or_rx_min2_test_cfa

hyp_icd4019_or_rx_min2_test_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "classes"))
hyp_icd4019_or_rx_min2_test_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "classes"))
hyp_icd4019_or_rx_min2_test_cfs[nrow(hyp_icd4019_or_rx_min2_test_cfs) + 1,] = c("hyp_icd4019_or_rx_min2_test","Data_Type")
row.names(hyp_icd4019_or_rx_min2_test_cfs) <- str_replace_all(hyp_icd4019_or_rx_min2_test_cfs$row_names," ","_")
hyp_icd4019_or_rx_min2_test_cfs <- as.data.frame(t(hyp_icd4019_or_rx_min2_test_cfs))
hyp_icd4019_or_rx_min2_test_cfs <- hyp_icd4019_or_rx_min2_test_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_min2_test_cfs) <- NULL
str(hyp_icd4019_or_rx_min2_test_cfs)

## 'data.frame':    1 obs. of  12 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_min2_test"
##  $ Sensitivity         : chr "0.866666666666667"
##  $ Specificity         : chr "0.5"
##  $ Pos_Pred_Value      : chr "0.866666666666667"
##  $ Neg_Pred_Value      : chr "0.5"
##  $ Precision           : chr "0.866666666666667"
##  $ Recall              : chr "0.866666666666667"
##  $ F1                  : chr "0.866666666666667"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.684210526315789"
##  $ Detection_Prevalence: chr "0.789473684210526"
##  $ Balanced_Accuracy   : chr "0.683333333333333"

hyp_icd4019_or_rx_min2_test_cfs

hyp_icd4019_or_rx_min2_test_cfa <- hyp_icd4019_or_rx_min2_test_cfa %>% 
  inner_join(hyp_icd4019_or_rx_min2_test_cfs, by = "Data_Type") 
rm(hyp_icd4019_or_rx_min2_test_cfs)
str(hyp_icd4019_or_rx_min2_test_cfa)

## 'data.frame':    1 obs. of  19 variables:
##  $ Data_Type           : chr "hyp_icd4019_or_rx_min2_test"
##  $ Accuracy            : chr "0.789473684210526"
##  $ Kappa               : chr "0.366666666666667"
##  $ AccuracyLower       : chr "0.544346918108494"
##  $ AccuracyUpper       : chr "0.93947546229071"
##  $ AccuracyNull        : chr "0.789473684210526"
##  $ AccuracyPValue      : chr "0.629747530620589"
##  $ McnemarPValue       : chr "1"
##  $ Sensitivity         : chr "0.866666666666667"
##  $ Specificity         : chr "0.5"
##  $ Pos_Pred_Value      : chr "0.866666666666667"
##  $ Neg_Pred_Value      : chr "0.5"
##  $ Precision           : chr "0.866666666666667"
##  $ Recall              : chr "0.866666666666667"
##  $ F1                  : chr "0.866666666666667"
##  $ Prevalence          : chr "0.789473684210526"
##  $ Detection_Rate      : chr "0.684210526315789"
##  $ Detection_Prevalence: chr "0.789473684210526"
##  $ Balanced_Accuracy   : chr "0.683333333333333"

hyp_icd4019_or_rx_min2_test_cfa

knitr::kable(t(hyp_icd4019_or_rx_min2_test_cfa[,c(1,2,9,10,11,12)]), "simple")

Data_Type	hyp_icd4019_or_rx_min2_test
Accuracy	0.789473684210526
Sensitivity	0.866666666666667
Specificity	0.5
Pos_Pred_Value	0.866666666666667
Neg_Pred_Value	0.5

phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_min2_test_cfa)

List out metrics:

count(phenotypes_cf)

str(phenotypes_cf)

## 'data.frame':    15 obs. of  19 variables:
##  $ Data_Type           : chr  "hyper_bp" "hyp_icd_any" "hyp_icd_4010" "hyp_icd_4011" ...
##  $ Accuracy            : chr  "0.4" "0.7375" "0.4" "0.4125" ...
##  $ Kappa               : chr  "0" "0.497607655502392" "0" "0.0167364016736401" ...
##  $ AccuracyLower       : chr  "0.29200935625531" "0.627149169787568" "0.29200935625531" "0.303524592937594" ...
##  $ AccuracyUpper       : chr  "0.51562305950451" "0.829590748899592" "0.51562305950451" "0.528151659574796" ...
##  $ AccuracyNull        : chr  "0.6" "0.6" "0.6" "0.6" ...
##  $ AccuracyPValue      : chr  "0.999899723274404" "0.00717992545660734" "0.999899723274404" "0.999761568218371" ...
##  $ McnemarPValue       : chr  "1.17002131170662e-11" "0.000480341199981871" "1.17002131170662e-11" "1.94905225611693e-11" ...
##  $ Sensitivity         : chr  "0" "0.604166666666667" "0" "0.0208333333333333" ...
##  $ Specificity         : chr  "1" "0.9375" "1" "1" ...
##  $ Pos_Pred_Value      : chr  "NaN" "0.935483870967742" "NaN" "1" ...
##  $ Neg_Pred_Value      : chr  "0.4" "0.612244897959184" "0.4" "0.405063291139241" ...
##  $ Precision           : chr  NA "0.935483870967742" NA "1" ...
##  $ Recall              : chr  "0" "0.604166666666667" "0" "0.0208333333333333" ...
##  $ F1                  : chr  NA "0.734177215189873" NA "0.0408163265306122" ...
##  $ Prevalence          : chr  "0.6" "0.6" "0.6" "0.6" ...
##  $ Detection_Rate      : chr  "0" "0.3625" "0" "0.0125" ...
##  $ Detection_Prevalence: chr  "0" "0.3875" "0" "0.0125" ...
##  $ Balanced_Accuracy   : chr  "0.5" "0.770833333333333" "0.5" "0.510416666666667" ...

phenotypes_cf[,c(1,2,9,10,11,12)]

knitr::kable(phenotypes_cf[,c(1,2,9,10,11,12)], "simple")

Data_Type	Accuracy	Sensitivity	Specificity	Pos_Pred_Value	Neg_Pred_Value
hyper_bp	0.4	0	1	NaN	0.4
hyp_icd_any	0.7375	0.604166666666667	0.9375	0.935483870967742	0.612244897959184
hyp_icd_4010	0.4	0	1	NaN	0.4
hyp_icd_4011	0.4125	0.0208333333333333	1	1	0.405063291139241
hyp_icd_4019	0.725	0.583333333333333	0.9375	0.933333333333333	0.6
hyp_icd_4019_min2	0.4625	0.104166666666667	1	1	0.426666666666667
hyp_rx_any	0.6125	0.8125	0.3125	0.639344262295082	0.526315789473684
hyp_rx_min2	0.625	0.8125	0.34375	0.65	0.55
hyp_icd4019_or_rx_any	0.6375	0.854166666666667	0.3125	0.650793650793651	0.588235294117647
hyp_icd4019_or_rx_min2	0.65	0.854166666666667	0.34375	0.661290322580645	0.611111111111111
hyp_icd_any_test	0.473684210526316	0.4	0.75	0.857142857142857	0.25
hyp_icd_4019_test	0.421052631578947	0.333333333333333	0.75	0.833333333333333	0.230769230769231
hyp_rx_min2_test	0.736842105263158	0.8	0.5	0.857142857142857	0.4
hyp_icd4019_or_rx_any_test	0.842105263157895	0.933333333333333	0.5	0.875	0.666666666666667
hyp_icd4019_or_rx_min2_test	0.789473684210526	0.866666666666667	0.5	0.866666666666667	0.5

write.csv(phenotypes_cf,'phenotypes_cf.csv')

Save pngs of plots:

png('hyper_bp_%d.png')
fourfoldplot(hyper_bp_cf$table)

## Warning in sqrt(odds(tab)$or): NaNs produced

fourfoldplot(hyp_icd_any_cf$table)
fourfoldplot(hyp_icd_4010_cf$table)

## Warning in sqrt(odds(tab)$or): NaNs produced

fourfoldplot(hyp_icd_4011_cf$table)
fourfoldplot(hyp_icd_4019_cf$table)
fourfoldplot(hyp_icd_4019_min2_cf$table)
fourfoldplot(hyp_rx_any_cf$table)
fourfoldplot(hyp_rx_min2_cf$table)
fourfoldplot(hyp_icd4019_or_rx_any_cf$table)
fourfoldplot(hyp_icd4019_or_rx_min2_cf$table)
fourfoldplot(hyp_icd_any_test_cf$table)
fourfoldplot(hyp_icd_4019_test_cf$table)
fourfoldplot(hyp_rx_min2_test_cf$table)
fourfoldplot(hyp_icd4019_or_rx_any_test_cf$table)
fourfoldplot(hyp_icd4019_or_rx_min2_test_cf$table)
dev.off()

## png 
##   2

DRAFT Notes: Clinical Data Science Specialization

The University of Colorado

analysisinsightdata

Tuesday, June 21, 2022

Holding pen for some submissions I made while working through courses in the Clinical Data Science Specialization offered by The University of Colorado. My plan is to revisit these projects and notes at a later date and recompile and reformat.