library(tidyverse)
htmltools::includeHTML("ID_Patient_Pops_wk5_PhenotypingEvaluation_VY_.html")
[VY edit ver 20210128 2219]
This Peer-graded Assignment for Week 5 is the final project for the University of Colorado System course “Identifying Patient Populations” on Coursera.
Instructions
This assignment will be your opportunity to pull all the tools and techniques you have learned in this course and apply it to the real-world problem of identifying patients with hypertension.
In order to create this computational phenotyping algorithm and accomplish this task you will:
* Test two or more individual data types
* Apply two or more manipulations of individual data types
* Create two or more combinations of data types
* Provide 2x2 tables and evaluation metrics (sensitivity/specificity/PPV/NPV) for all algorithms tested.
* Choose and justify the selection of a final computational phenotyping algorithm.
Review criteria
You will be expected to submit one document: A presentation slide deck
You will be assessed on:
1. Whether you tested two or more individual data types to identify patients with hypertension.
2. Whether you applied two or more manipulations of individual data types to identify patients with hypertension.
3. Whether you created two or more combinations of data types identify patients with hypertension.
4. Whether you provided 2x2 tables and evaluation metrics (sensitivity / specificity / PPV / NPV) for all algorithms tested.
5. Whether you chose a final computational phenotyping algorithm and justified that choice based on algorithm performance, complexity, and portability.
Step-By-Step Assignment Instructionss
Training Data:
Gold standard data from manual record review is available on Google BigQuery in the “course3_data.hypertension_goldstandard” table.
Clinical Background Information:
What is Hypertension?
Systolic BP >= 140 mmHg on more than two occasions
Diastolic BP>= 90 mmHg on more than two occasions
How is Hypertension billed?
Multiple Codes for Essential Hypertension:
401.0 Malignant
401.1 Benign
401.9 Unspecified
Even more codes for other kinds of hypertension (e.g., hypertension in certain areas/vessels of the body)
How is Hypertension Treated?
In addition to diet and exercise there are a number of blood pressure lowering drugs available. Many of these drugs have uses beyond treating hypertension and in fact may be used for non-hypertensive indications more frequently than hypertension. A quasi exhaustive list of these therapies as they are referred to by MIMIC records can be found on GBQ in course3_data. D_ANTIHYPERTENSIVES.
Setup
Set up environment by loading the packages, creating the connection to Google BigQuery.
library(tidyverse)
## Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
## when loading 'dplyr'
## Warning: package 'tibble' was built under R version 4.0.3
library(magrittr)
library(bigrquery)
library(caret)
library(stringr)
con <- DBI::dbConnect(drv = bigquery(),
project = "learnclinicaldatascience")
The caret package in R has a function called confusionMatrix(), and here is LW’s wrapper function to simplify the 2x2 tables later.
## getStats(df, predicted, reference)
getStats <- function(df, ...){
df %>%
select_(.dots = lazyeval::lazy_dots(...)) %>%
mutate_all(funs(factor(., levels = c(1,0)))) %>%
table() %>%
confusionMatrix()
}
Creating Training and Testing Populations
Let’s read in the gold standard file, and create a random 80/20 training/testing split.
hypertension <- tbl(con, "course3_data.hypertension_goldstandard")
hypertension %>%
summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
nd_subj_id = n_distinct(SUBJECT_ID))
set.seed(5)
training <- hypertension %>%
collect() %>%
sample_n(80)
training %>%
summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
pct_hyper = sum(HYPERTENSION, na.rm = TRUE)/n_distinct(SUBJECT_ID),
nd_subj_id = n_distinct(SUBJECT_ID))
testing <- hypertension %>%
collect() %>%
filter(!SUBJECT_ID %in% training$SUBJECT_ID)
testing %>%
summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
pct_hyper = sum(HYPERTENSION, na.rm = TRUE)/n_distinct(SUBJECT_ID),
nd_subj_id = n_distinct(SUBJECT_ID))
training1 <- add_column(training, test_flag =0)
testing1 <- add_column(testing, test_flag =1)
goldhyper <- union_all(training1,testing1)
rm(training1)
rm(testing1)
goldhyper %>%
summarise(min_subj_id = min(SUBJECT_ID, na.rm = TRUE),
max_subj_id = max(SUBJECT_ID, na.rm = TRUE),
mean_subj_id = mean(SUBJECT_ID, na.rm = TRUE),
sum_hyper = sum(HYPERTENSION, na.rm = TRUE),
pct_hyper = sum(HYPERTENSION, na.rm = TRUE)/n_distinct(SUBJECT_ID),
nd_subj_id = n_distinct(SUBJECT_ID),
sum_test_flag = sum(test_flag, na.rm = TRUE))
goldhyper
Evaluate Tables and Fields
Looking into sources for Systolic and Diastolic Blood Pressure readings, we use the following query:
SELECT * FROM mimic3_demo.D_ITEMS
WHERE LOWER(LABEL) LIKE "%systolic%" OR LOWER(LABEL) LIKE "%diastolic%"
The MIMIC sample database has 53 items, but most are geared towards arterial, noninvasive, or other methods reflecting the ICU orientation of the data from metavision and carevue sources. To enfore some standardization, I manually reviewed these items for the more “basic” types of blood pressure readings and narrowed these to 12 types (6 each):
ITEMID LABEL DBSOURCE
8440 Manual BP [Diastolic] carevue
8502 BP Cuff [Diastolic] carevue
8503 BP Left Arm [Diastolic] carevue
8506 BP Right Arm [Diastolic] carevue
224643 Manual Blood Pressure Diastolic Left metavision
227242 Manual Blood Pressure Diastolic Right metavision
442 Manual BP [Systolic] carevue
3313 BP Cuff [Systolic] carevue
3315 BP Left Arm [Systolic] carevue
3321 BP Right Arm [Systolic] carevue
224167 Manual Blood Pressure Systolic Left metavision
227243 Manual Blood Pressure Systolic Right metavision
Using this specific subset of BP measurement codes against the CHARTEVENTS table, only 14 rows came up (7 of each) – for 3 unique patients.
SELECT *, charte.VALUENUM AS DIASTOLIC_NUM
FROM mimic3_demo.D_ITEMS ditem
JOIN mimic3_demo.CHARTEVENTS charte
ON ditem.ITEMID = charte.ITEMID
WHERE ditem.ITEMID IN (8440,8502,8503,8506,224643,227242) AND NOT charte.VALUENUM IS NULL
UNION ALL
SELECT *, charte.VALUENUM AS SYSTOLIC_NUM
FROM mimic3_demo.D_ITEMS ditem
JOIN mimic3_demo.CHARTEVENTS charte
ON ditem.ITEMID = charte.ITEMID
WHERE ditem.ITEMID IN (442,3313,3315,3321,224167,227243) AND NOT charte.VALUENUM IS NULL
That we do not see a lot of traditional BP measurements reflects the ICU / Hospital Admission flavor of the data. Blood pressure is not useful for this exercise.
chartevents <- tbl(con, "mimic3_demo.CHARTEVENTS")
hyper_bp <- chartevents %>%
filter(ITEMID %in% c(442,3313,3315,3321,224167,227243,
8440,8502,8503,8506,224643,227242)) %>%
mutate(hyper_bp_systolic_cnt = case_when(ITEMID %in%
c(442,3313,3315,3321,224167,227243)
&& VALUENUM >= 140 ~ 1, TRUE ~ 0), hyper_bp_diastolic_cnt = case_when(ITEMID %in%
c(8440,8502,8503,8506,224643,227242)
&& VALUENUM >= 90 ~ 1, TRUE ~ 0)) %>%
group_by(SUBJECT_ID) %>%
summarise(hyper_bp_systolic_sum = sum(hyper_bp_systolic_cnt, na.rm = TRUE),
hyper_bp_diastolic_sum = sum(hyper_bp_diastolic_cnt, na.rm = TRUE )) %>%
mutate(hyper_bp_systolic_min3 = case_when(hyper_bp_systolic_sum >= 3 ~ 1,
TRUE ~ 0),
hyper_bp_diastolic_min3 = case_when(hyper_bp_diastolic_sum >= 3 ~ 1,
TRUE ~ 0),
hyper_bp = case_when((hyper_bp_systolic_min3 >0) |
(hyper_bp_diastolic_min3 >0) ~ 1,
TRUE ~ 0)) %>%
select(SUBJECT_ID, hyper_bp)
hyper_bp
count(hyper_bp)
str(training)
## tibble [80 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID : int [1:80] 42302 40503 10059 44212 10027 40177 43870 10043 10045 10013 ...
## $ HYPERTENSION: int [1:80] 1 1 1 1 1 1 1 0 0 1 ...
str(hyper_bp)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "select"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "summarise"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "group_by"
## .. .. .. ..$ x :List of 4
## .. .. .. .. ..$ name: chr "select"
## .. .. .. .. ..$ x :List of 4
## .. .. .. .. .. ..$ name: chr "filter"
## .. .. .. .. .. ..$ x :List of 2
## .. .. .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.CHARTEVENTS"
## .. .. .. .. .. .. ..$ vars: chr [1:15] "ROW_ID" "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID" ...
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. .. .. ..$ dots:List of 1
## .. .. .. .. .. .. ..$ : language ~ITEMID %in% c(442, 3313, 3315, 3321, 224167, 227243, 8440, 8502, 8503, 8506, 224643, 227242)
## .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000022925770>
## .. .. .. .. .. ..$ args: list()
## .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. .. .. ..$ dots: list()
## .. .. .. .. ..$ args:List of 1
## .. .. .. .. .. ..$ vars:List of 17
## .. .. .. .. .. .. ..$ ROW_ID : symbol ROW_ID
## .. .. .. .. .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. .. .. .. .. ..$ HADM_ID : symbol HADM_ID
## .. .. .. .. .. .. ..$ ICUSTAY_ID : symbol ICUSTAY_ID
## .. .. .. .. .. .. ..$ ITEMID : symbol ITEMID
## .. .. .. .. .. .. ..$ CHARTTIME : symbol CHARTTIME
## .. .. .. .. .. .. ..$ STORETIME : symbol STORETIME
## .. .. .. .. .. .. ..$ CGID : symbol CGID
## .. .. .. .. .. .. ..$ VALUE : symbol VALUE
## .. .. .. .. .. .. ..$ VALUENUM : symbol VALUENUM
## .. .. .. .. .. .. ..$ VALUEUOM : symbol VALUEUOM
## .. .. .. .. .. .. ..$ WARNING : symbol WARNING
## .. .. .. .. .. .. ..$ ERROR : symbol ERROR
## .. .. .. .. .. .. ..$ RESULTSTATUS : symbol RESULTSTATUS
## .. .. .. .. .. .. ..$ STOPPED : symbol STOPPED
## .. .. .. .. .. .. ..$ hyper_bp_systolic_cnt : language ~case_when(ITEMID %in% c(442, 3313, 3315, 3321, 224167, 227243) && VALUENUM >= 140 ~ 1, TRUE ~ 0)
## .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000228d1290>
## .. .. .. .. .. .. ..$ hyper_bp_diastolic_cnt: language ~case_when(ITEMID %in% c(8440, 8502, 8503, 8506, 224643, 227242) && VALUENUM >= 90 ~ 1, TRUE ~ 0)
## .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000228d1290>
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. .. .. ..$ dots:List of 1
## .. .. .. .. ..$ SUBJECT_ID: symbol SUBJECT_ID
## .. .. .. ..$ args:List of 1
## .. .. .. .. ..$ add: logi FALSE
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
## .. .. ..$ dots:List of 2
## .. .. .. ..$ hyper_bp_systolic_sum : language ~sum(hyper_bp_systolic_cnt, na.rm = TRUE)
## .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227cccb0>
## .. .. .. ..$ hyper_bp_diastolic_sum: language ~sum(hyper_bp_diastolic_cnt, na.rm = TRUE)
## .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227cccb0>
## .. .. ..$ args: list()
## .. .. ..- attr(*, "class")= chr [1:3] "op_summarise" "op_single" "op"
## .. ..$ dots: list()
## .. ..$ args:List of 1
## .. .. ..$ vars:List of 5
## .. .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. .. ..$ hyper_bp_systolic_sum : symbol hyper_bp_systolic_sum
## .. .. .. ..$ hyper_bp_diastolic_sum : symbol hyper_bp_diastolic_sum
## .. .. .. ..$ hyper_bp_systolic_min3 : language ~case_when(hyper_bp_systolic_sum >= 3 ~ 1, TRUE ~ 0)
## .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227b4738>
## .. .. .. ..$ hyper_bp_diastolic_min3: language ~case_when(hyper_bp_diastolic_sum >= 3 ~ 1, TRUE ~ 0)
## .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227b4738>
## .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 2
## .. .. ..$ SUBJECT_ID: symbol SUBJECT_ID
## .. .. ..$ hyper_bp : language ~case_when((hyper_bp_systolic_min3 > 0) | (hyper_bp_diastolic_min3 > 0) ~ 1, TRUE ~ 0)
## .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000227b4738>
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyper_bp <- as_tibble(hyper_bp)
str(hyper_bp)
## tibble [3 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID: int [1:3] 42066 40601 10045
## $ hyper_bp : num [1:3] 0 0 0
hyper_bp
hyper_bp_cf <- training %>%
left_join(hyper_bp) %>%
mutate(hyper_bp = coalesce(hyper_bp, 0)) %>%
collect() %>%
getStats(hyper_bp, HYPERTENSION)
## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
hyper_bp_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyper_bp 1 0
## 1 0 0
## 0 48 32
##
## Accuracy : 0.4
## 95% CI : (0.292, 0.5156)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.9999
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 1.17e-11
##
## Sensitivity : 0.0
## Specificity : 1.0
## Pos Pred Value : NaN
## Neg Pred Value : 0.4
## Prevalence : 0.6
## Detection Rate : 0.0
## Detection Prevalence : 0.0
## Balanced Accuracy : 0.5
##
## 'Positive' Class : 1
##
fourfoldplot(hyper_bp_cf$table)
## Warning in sqrt(odds(tab)$or): NaNs produced
#export ConfusionMatrix numbers output to tibble row
str(hyper_bp_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 0 48 0 32
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyper_bp : chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.4 0 0.292 0.516 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0 1 NaN 0.4 NA 0 NA 0.6 0 0 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyper_bp_cfa <- as.data.frame(as.matrix(hyper_bp_cf, what = "overall"))
hyper_bp_cfa$row_names <- row.names(as.matrix(hyper_bp_cf, what = "overall"))
hyper_bp_cfa[nrow(hyper_bp_cfa) + 1,] = c("hyper_bp","Data_Type")
row.names(hyper_bp_cfa) <- str_replace_all(hyper_bp_cfa$row_names," ","_")
hyper_bp_cfa <- as.data.frame(t(hyper_bp_cfa))
hyper_bp_cfa <- hyper_bp_cfa[1,c(8,1:7)]
row.names(hyper_bp_cfa) <- NULL
str(hyper_bp_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyper_bp"
## $ Accuracy : chr "0.4"
## $ Kappa : chr "0"
## $ AccuracyLower : chr "0.29200935625531"
## $ AccuracyUpper : chr "0.51562305950451"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.999899723274404"
## $ McnemarPValue : chr "1.17002131170662e-11"
hyper_bp_cfa
hyper_bp_cfs <- as.data.frame(as.matrix(hyper_bp_cf, what = "classes"))
hyper_bp_cfs$row_names <- row.names(as.matrix(hyper_bp_cf, what = "classes"))
hyper_bp_cfs[nrow(hyper_bp_cfs) + 1,] = c("hyper_bp","Data_Type")
row.names(hyper_bp_cfs) <- str_replace_all(hyper_bp_cfs$row_names," ","_")
hyper_bp_cfs <- as.data.frame(t(hyper_bp_cfs))
hyper_bp_cfs <- hyper_bp_cfs[1,c(12,1:11)]
row.names(hyper_bp_cfs) <- NULL
str(hyper_bp_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyper_bp"
## $ Sensitivity : chr "0"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "NaN"
## $ Neg_Pred_Value : chr "0.4"
## $ Precision : chr NA
## $ Recall : chr "0"
## $ F1 : chr NA
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0"
## $ Detection_Prevalence: chr "0"
## $ Balanced_Accuracy : chr "0.5"
hyper_bp_cfs
hyper_bp_cfa <- hyper_bp_cfa %>%
inner_join(hyper_bp_cfs, by = "Data_Type")
rm(hyper_bp_cfs)
str(hyper_bp_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyper_bp"
## $ Accuracy : chr "0.4"
## $ Kappa : chr "0"
## $ AccuracyLower : chr "0.29200935625531"
## $ AccuracyUpper : chr "0.51562305950451"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.999899723274404"
## $ McnemarPValue : chr "1.17002131170662e-11"
## $ Sensitivity : chr "0"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "NaN"
## $ Neg_Pred_Value : chr "0.4"
## $ Precision : chr NA
## $ Recall : chr "0"
## $ F1 : chr NA
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0"
## $ Detection_Prevalence: chr "0"
## $ Balanced_Accuracy : chr "0.5"
hyper_bp_cfa
knitr::kable(t(hyper_bp_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyper_bp |
| Accuracy | 0.4 |
| Sensitivity | 0 |
| Specificity | 1 |
| Pos_Pred_Value | NaN |
| Neg_Pred_Value | 0.4 |
phenotypes_cf <- hyper_bp_cfa
Checking to see if there were more ICD9 codes that should be included beyond the 3 provided for essentail hypertension:
SELECT * FROM mimic3_demo.D_ICD_DIAGNOSES
WHERE LOWER(long_title) LIKE "%hypertension%"
We see 61 ICD9 codes, but the other descriptions refer to other types of hypertension (pulmonary, renovascular, venous, pregnancy-related, without –, etc. ). Let’s go with the 3 originally stated:
ROW_ID ICD9_CODE SHORT_TITLE LONG_TITLE
ICD9_CODE SHORT_TITLE LONG_TITLE
4010 Malignant hypertension Malignant essential hypertension
4011 Benign hypertension Benign essential hypertension
4019 Hypertension NOS Unspecified essential hypertension
ICD9_CODE will be one of our data types for this exercise. Any of the 3 ICD9 codes: 401.0 Malignant 401.1 Benign 401.9 Unspecified
Any of 3 ICD codes
diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_any <- diagnoses_icd %>%
filter(ICD9_CODE %in% c("4010","4011","4019")) %>%
distinct(SUBJECT_ID) %>%
mutate(hyp_icd_any = 1)
str(hyp_icd_any)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "distinct"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "select"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "filter"
## .. .. .. ..$ x :List of 2
## .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
## .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. ..$ dots:List of 1
## .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4010", "4011", "4019")
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000280835a0>
## .. .. .. ..$ args: list()
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. ..$ dots: list()
## .. .. ..$ args:List of 1
## .. .. .. ..$ vars:List of 1
## .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000028090f48>
## .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. ..$ dots: list()
## .. ..$ args: list()
## .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 2
## .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. ..$ hyp_icd_any: language ~1
## .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_icd_any
hyp_icd_any <- as_tibble(hyp_icd_any)
str(hyp_icd_any)
## tibble [38 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID : int [1:38] 10026 10089 10106 10017 10019 10027 10033 43827 44083 10038 ...
## $ hyp_icd_any: num [1:38] 1 1 1 1 1 1 1 1 1 1 ...
hyp_icd_any
hyp_icd_any_cf <- training %>%
left_join(hyp_icd_any) %>%
mutate(hyp_icd_any = coalesce(hyp_icd_any, 0)) %>%
collect() %>%
getStats(hyp_icd_any, HYPERTENSION)
hyp_icd_any_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_any 1 0
## 1 29 2
## 0 19 30
##
## Accuracy : 0.7375
## 95% CI : (0.6271, 0.8296)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.0071799
##
## Kappa : 0.4976
##
## Mcnemar's Test P-Value : 0.0004803
##
## Sensitivity : 0.6042
## Specificity : 0.9375
## Pos Pred Value : 0.9355
## Neg Pred Value : 0.6122
## Prevalence : 0.6000
## Detection Rate : 0.3625
## Detection Prevalence : 0.3875
## Balanced Accuracy : 0.7708
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_any_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_any_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 29 19 2 30
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_any : chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.738 0.498 0.627 0.83 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.604 0.938 0.935 0.612 0.935 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_any_cfa <- as.data.frame(as.matrix(hyp_icd_any_cf, what = "overall"))
hyp_icd_any_cfa$row_names <- row.names(as.matrix(hyp_icd_any_cf, what = "overall"))
hyp_icd_any_cfa[nrow(hyp_icd_any_cfa) + 1,] = c("hyp_icd_any","Data_Type")
row.names(hyp_icd_any_cfa) <- str_replace_all(hyp_icd_any_cfa$row_names," ","_")
hyp_icd_any_cfa <- as.data.frame(t(hyp_icd_any_cfa))
hyp_icd_any_cfa <- hyp_icd_any_cfa[1,c(8,1:7)]
row.names(hyp_icd_any_cfa) <- NULL
str(hyp_icd_any_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_any"
## $ Accuracy : chr "0.7375"
## $ Kappa : chr "0.497607655502392"
## $ AccuracyLower : chr "0.627149169787568"
## $ AccuracyUpper : chr "0.829590748899592"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.00717992545660734"
## $ McnemarPValue : chr "0.000480341199981871"
hyp_icd_any_cfa
hyp_icd_any_cfs <- as.data.frame(as.matrix(hyp_icd_any_cf, what = "classes"))
hyp_icd_any_cfs$row_names <- row.names(as.matrix(hyp_icd_any_cf, what = "classes"))
hyp_icd_any_cfs[nrow(hyp_icd_any_cfs) + 1,] = c("hyp_icd_any","Data_Type")
row.names(hyp_icd_any_cfs) <- str_replace_all(hyp_icd_any_cfs$row_names," ","_")
hyp_icd_any_cfs <- as.data.frame(t(hyp_icd_any_cfs))
hyp_icd_any_cfs <- hyp_icd_any_cfs[1,c(12,1:11)]
row.names(hyp_icd_any_cfs) <- NULL
str(hyp_icd_any_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_any"
## $ Sensitivity : chr "0.604166666666667"
## $ Specificity : chr "0.9375"
## $ Pos_Pred_Value : chr "0.935483870967742"
## $ Neg_Pred_Value : chr "0.612244897959184"
## $ Precision : chr "0.935483870967742"
## $ Recall : chr "0.604166666666667"
## $ F1 : chr "0.734177215189873"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.3625"
## $ Detection_Prevalence: chr "0.3875"
## $ Balanced_Accuracy : chr "0.770833333333333"
hyp_icd_any_cfs
hyp_icd_any_cfa <- hyp_icd_any_cfa %>%
inner_join(hyp_icd_any_cfs, by = "Data_Type")
rm(hyp_icd_any_cfs)
str(hyp_icd_any_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_any"
## $ Accuracy : chr "0.7375"
## $ Kappa : chr "0.497607655502392"
## $ AccuracyLower : chr "0.627149169787568"
## $ AccuracyUpper : chr "0.829590748899592"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.00717992545660734"
## $ McnemarPValue : chr "0.000480341199981871"
## $ Sensitivity : chr "0.604166666666667"
## $ Specificity : chr "0.9375"
## $ Pos_Pred_Value : chr "0.935483870967742"
## $ Neg_Pred_Value : chr "0.612244897959184"
## $ Precision : chr "0.935483870967742"
## $ Recall : chr "0.604166666666667"
## $ F1 : chr "0.734177215189873"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.3625"
## $ Detection_Prevalence: chr "0.3875"
## $ Balanced_Accuracy : chr "0.770833333333333"
hyp_icd_any_cfa
knitr::kable(t(hyp_icd_any_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_any |
| Accuracy | 0.7375 |
| Sensitivity | 0.604166666666667 |
| Specificity | 0.9375 |
| Pos_Pred_Value | 0.935483870967742 |
| Neg_Pred_Value | 0.612244897959184 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_any_cfa)
ICD9_CODE 401.0 Malignant
diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4010 <- diagnoses_icd %>%
filter(ICD9_CODE %in% c("4010")) %>%
distinct(SUBJECT_ID) %>%
mutate(hyp_icd_4010 = 1)
str(hyp_icd_4010)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "distinct"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "select"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "filter"
## .. .. .. ..$ x :List of 2
## .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
## .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. ..$ dots:List of 1
## .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4010")
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000211500b8>
## .. .. .. ..$ args: list()
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. ..$ dots: list()
## .. .. ..$ args:List of 1
## .. .. .. ..$ vars:List of 1
## .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000021142550>
## .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. ..$ dots: list()
## .. ..$ args: list()
## .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 2
## .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. ..$ hyp_icd_4010: language ~1
## .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_icd_4010
hyp_icd_4010 <- as_tibble(hyp_icd_4010)
str(hyp_icd_4010)
## tibble [1 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID : int 10026
## $ hyp_icd_4010: num 1
hyp_icd_4010
hyp_icd_4010_cf <- training %>%
left_join(hyp_icd_4010) %>%
mutate(hyp_icd_4010 = coalesce(hyp_icd_4010, 0)) %>%
collect() %>%
getStats(hyp_icd_4010, HYPERTENSION)
hyp_icd_4010_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_4010 1 0
## 1 0 0
## 0 48 32
##
## Accuracy : 0.4
## 95% CI : (0.292, 0.5156)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.9999
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 1.17e-11
##
## Sensitivity : 0.0
## Specificity : 1.0
## Pos Pred Value : NaN
## Neg Pred Value : 0.4
## Prevalence : 0.6
## Detection Rate : 0.0
## Detection Prevalence : 0.0
## Balanced Accuracy : 0.5
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_4010_cf$table)
## Warning in sqrt(odds(tab)$or): NaNs produced
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4010_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 0 48 0 32
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_4010: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.4 0 0.292 0.516 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0 1 NaN 0.4 NA 0 NA 0.6 0 0 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_4010_cfa <- as.data.frame(as.matrix(hyp_icd_4010_cf, what = "overall"))
hyp_icd_4010_cfa$row_names <- row.names(as.matrix(hyp_icd_4010_cf, what = "overall"))
hyp_icd_4010_cfa[nrow(hyp_icd_4010_cfa) + 1,] = c("hyp_icd_4010","Data_Type")
row.names(hyp_icd_4010_cfa) <- str_replace_all(hyp_icd_4010_cfa$row_names," ","_")
hyp_icd_4010_cfa <- as.data.frame(t(hyp_icd_4010_cfa))
hyp_icd_4010_cfa <- hyp_icd_4010_cfa[1,c(8,1:7)]
row.names(hyp_icd_4010_cfa) <- NULL
str(hyp_icd_4010_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_4010"
## $ Accuracy : chr "0.4"
## $ Kappa : chr "0"
## $ AccuracyLower : chr "0.29200935625531"
## $ AccuracyUpper : chr "0.51562305950451"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.999899723274404"
## $ McnemarPValue : chr "1.17002131170662e-11"
hyp_icd_4010_cfa
hyp_icd_4010_cfs <- as.data.frame(as.matrix(hyp_icd_4010_cf, what = "classes"))
hyp_icd_4010_cfs$row_names <- row.names(as.matrix(hyp_icd_4010_cf, what = "classes"))
hyp_icd_4010_cfs[nrow(hyp_icd_4010_cfs) + 1,] = c("hyp_icd_4010","Data_Type")
row.names(hyp_icd_4010_cfs) <- str_replace_all(hyp_icd_4010_cfs$row_names," ","_")
hyp_icd_4010_cfs <- as.data.frame(t(hyp_icd_4010_cfs))
hyp_icd_4010_cfs <- hyp_icd_4010_cfs[1,c(12,1:11)]
row.names(hyp_icd_4010_cfs) <- NULL
str(hyp_icd_4010_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_4010"
## $ Sensitivity : chr "0"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "NaN"
## $ Neg_Pred_Value : chr "0.4"
## $ Precision : chr NA
## $ Recall : chr "0"
## $ F1 : chr NA
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0"
## $ Detection_Prevalence: chr "0"
## $ Balanced_Accuracy : chr "0.5"
hyp_icd_4010_cfs
hyp_icd_4010_cfa <- hyp_icd_4010_cfa %>%
inner_join(hyp_icd_4010_cfs, by = "Data_Type")
rm(hyp_icd_4010_cfs)
str(hyp_icd_4010_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_4010"
## $ Accuracy : chr "0.4"
## $ Kappa : chr "0"
## $ AccuracyLower : chr "0.29200935625531"
## $ AccuracyUpper : chr "0.51562305950451"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.999899723274404"
## $ McnemarPValue : chr "1.17002131170662e-11"
## $ Sensitivity : chr "0"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "NaN"
## $ Neg_Pred_Value : chr "0.4"
## $ Precision : chr NA
## $ Recall : chr "0"
## $ F1 : chr NA
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0"
## $ Detection_Prevalence: chr "0"
## $ Balanced_Accuracy : chr "0.5"
hyp_icd_4010_cfa
knitr::kable(t(hyp_icd_4010_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_4010 |
| Accuracy | 0.4 |
| Sensitivity | 0 |
| Specificity | 1 |
| Pos_Pred_Value | NaN |
| Neg_Pred_Value | 0.4 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_4010_cfa)
ICD9_CODE 401.1 Benign
diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4011 <- diagnoses_icd %>%
filter(ICD9_CODE %in% c("4011")) %>%
distinct(SUBJECT_ID) %>%
mutate(hyp_icd_4011 = 1)
str(hyp_icd_4011)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "distinct"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "select"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "filter"
## .. .. .. ..$ x :List of 2
## .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
## .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. ..$ dots:List of 1
## .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4011")
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000254164d0>
## .. .. .. ..$ args: list()
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. ..$ dots: list()
## .. .. ..$ args:List of 1
## .. .. .. ..$ vars:List of 1
## .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000025426850>
## .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. ..$ dots: list()
## .. ..$ args: list()
## .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 2
## .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. ..$ hyp_icd_4011: language ~1
## .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_icd_4011
hyp_icd_4011 <- as_tibble(hyp_icd_4011)
str(hyp_icd_4011)
## tibble [1 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID : int 40595
## $ hyp_icd_4011: num 1
hyp_icd_4011
hyp_icd_4011_cf <- training %>%
left_join(hyp_icd_4011) %>%
mutate(hyp_icd_4011 = coalesce(hyp_icd_4011, 0)) %>%
collect() %>%
getStats(hyp_icd_4011, HYPERTENSION)
hyp_icd_4011_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_4011 1 0
## 1 1 0
## 0 47 32
##
## Accuracy : 0.4125
## 95% CI : (0.3035, 0.5282)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.9998
##
## Kappa : 0.0167
##
## Mcnemar's Test P-Value : 1.949e-11
##
## Sensitivity : 0.02083
## Specificity : 1.00000
## Pos Pred Value : 1.00000
## Neg Pred Value : 0.40506
## Prevalence : 0.60000
## Detection Rate : 0.01250
## Detection Prevalence : 0.01250
## Balanced Accuracy : 0.51042
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_4011_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4011_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 1 47 0 32
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_4011: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.4125 0.0167 0.3035 0.5282 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.0208 1 1 0.4051 1 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_4011_cfa <- as.data.frame(as.matrix(hyp_icd_4011_cf, what = "overall"))
hyp_icd_4011_cfa$row_names <- row.names(as.matrix(hyp_icd_4011_cf, what = "overall"))
hyp_icd_4011_cfa[nrow(hyp_icd_4011_cfa) + 1,] = c("hyp_icd_4011","Data_Type")
row.names(hyp_icd_4011_cfa) <- str_replace_all(hyp_icd_4011_cfa$row_names," ","_")
hyp_icd_4011_cfa <- as.data.frame(t(hyp_icd_4011_cfa))
hyp_icd_4011_cfa <- hyp_icd_4011_cfa[1,c(8,1:7)]
row.names(hyp_icd_4011_cfa) <- NULL
str(hyp_icd_4011_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_4011"
## $ Accuracy : chr "0.4125"
## $ Kappa : chr "0.0167364016736401"
## $ AccuracyLower : chr "0.303524592937594"
## $ AccuracyUpper : chr "0.528151659574796"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.999761568218371"
## $ McnemarPValue : chr "1.94905225611693e-11"
hyp_icd_4011_cfa
hyp_icd_4011_cfs <- as.data.frame(as.matrix(hyp_icd_4011_cf, what = "classes"))
hyp_icd_4011_cfs$row_names <- row.names(as.matrix(hyp_icd_4011_cf, what = "classes"))
hyp_icd_4011_cfs[nrow(hyp_icd_4011_cfs) + 1,] = c("hyp_icd_4011","Data_Type")
row.names(hyp_icd_4011_cfs) <- str_replace_all(hyp_icd_4011_cfs$row_names," ","_")
hyp_icd_4011_cfs <- as.data.frame(t(hyp_icd_4011_cfs))
hyp_icd_4011_cfs <- hyp_icd_4011_cfs[1,c(12,1:11)]
row.names(hyp_icd_4011_cfs) <- NULL
str(hyp_icd_4011_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_4011"
## $ Sensitivity : chr "0.0208333333333333"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "1"
## $ Neg_Pred_Value : chr "0.405063291139241"
## $ Precision : chr "1"
## $ Recall : chr "0.0208333333333333"
## $ F1 : chr "0.0408163265306122"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.0125"
## $ Detection_Prevalence: chr "0.0125"
## $ Balanced_Accuracy : chr "0.510416666666667"
hyp_icd_4011_cfs
hyp_icd_4011_cfa <- hyp_icd_4011_cfa %>%
inner_join(hyp_icd_4011_cfs, by = "Data_Type")
rm(hyp_icd_4011_cfs)
str(hyp_icd_4011_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_4011"
## $ Accuracy : chr "0.4125"
## $ Kappa : chr "0.0167364016736401"
## $ AccuracyLower : chr "0.303524592937594"
## $ AccuracyUpper : chr "0.528151659574796"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.999761568218371"
## $ McnemarPValue : chr "1.94905225611693e-11"
## $ Sensitivity : chr "0.0208333333333333"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "1"
## $ Neg_Pred_Value : chr "0.405063291139241"
## $ Precision : chr "1"
## $ Recall : chr "0.0208333333333333"
## $ F1 : chr "0.0408163265306122"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.0125"
## $ Detection_Prevalence: chr "0.0125"
## $ Balanced_Accuracy : chr "0.510416666666667"
hyp_icd_4011_cfa
knitr::kable(t(hyp_icd_4011_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_4011 |
| Accuracy | 0.4125 |
| Sensitivity | 0.0208333333333333 |
| Specificity | 1 |
| Pos_Pred_Value | 1 |
| Neg_Pred_Value | 0.405063291139241 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_4011_cfa)
ICD9_CODE 401.9 Unspecified
diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4019 <- diagnoses_icd %>%
filter(ICD9_CODE %in% c("4019")) %>%
distinct(SUBJECT_ID) %>%
mutate(hyp_icd_4019 = 1)
str(hyp_icd_4019)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "distinct"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "select"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "filter"
## .. .. .. ..$ x :List of 2
## .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
## .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. ..$ dots:List of 1
## .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4019")
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000278f54a0>
## .. .. .. ..$ args: list()
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. ..$ dots: list()
## .. .. ..$ args:List of 1
## .. .. .. ..$ vars:List of 1
## .. .. .. .. ..$ SUBJECT_ID: language ~SUBJECT_ID
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000279039a8>
## .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. ..$ dots: list()
## .. ..$ args: list()
## .. ..- attr(*, "class")= chr [1:3] "op_distinct" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 2
## .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. ..$ hyp_icd_4019: language ~1
## .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_icd_4019
hyp_icd_4019 <- as_tibble(hyp_icd_4019)
str(hyp_icd_4019)
## tibble [36 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID : int [1:36] 10089 10106 10017 10019 10027 10033 43827 44083 10038 40503 ...
## $ hyp_icd_4019: num [1:36] 1 1 1 1 1 1 1 1 1 1 ...
hyp_icd_4019
hyp_icd_4019_cf <- training %>%
left_join(hyp_icd_4019) %>%
mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0)) %>%
collect() %>%
getStats(hyp_icd_4019, HYPERTENSION)
hyp_icd_4019_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_4019 1 0
## 1 28 2
## 0 20 30
##
## Accuracy : 0.725
## 95% CI : (0.6138, 0.819)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.0136496
##
## Kappa : 0.4762
##
## Mcnemar's Test P-Value : 0.0002896
##
## Sensitivity : 0.5833
## Specificity : 0.9375
## Pos Pred Value : 0.9333
## Neg Pred Value : 0.6000
## Prevalence : 0.6000
## Detection Rate : 0.3500
## Detection Prevalence : 0.3750
## Balanced Accuracy : 0.7604
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_4019_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4019_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 28 20 2 30
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_4019: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.725 0.476 0.614 0.819 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.583 0.938 0.933 0.6 0.933 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_4019_cfa <- as.data.frame(as.matrix(hyp_icd_4019_cf, what = "overall"))
hyp_icd_4019_cfa$row_names <- row.names(as.matrix(hyp_icd_4019_cf, what = "overall"))
hyp_icd_4019_cfa[nrow(hyp_icd_4019_cfa) + 1,] = c("hyp_icd_4019","Data_Type")
row.names(hyp_icd_4019_cfa) <- str_replace_all(hyp_icd_4019_cfa$row_names," ","_")
hyp_icd_4019_cfa <- as.data.frame(t(hyp_icd_4019_cfa))
hyp_icd_4019_cfa <- hyp_icd_4019_cfa[1,c(8,1:7)]
row.names(hyp_icd_4019_cfa) <- NULL
str(hyp_icd_4019_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_4019"
## $ Accuracy : chr "0.725"
## $ Kappa : chr "0.476190476190476"
## $ AccuracyLower : chr "0.613756579155591"
## $ AccuracyUpper : chr "0.818962361442961"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.013649591659318"
## $ McnemarPValue : chr "0.000289619409755552"
hyp_icd_4019_cfa
hyp_icd_4019_cfs <- as.data.frame(as.matrix(hyp_icd_4019_cf, what = "classes"))
hyp_icd_4019_cfs$row_names <- row.names(as.matrix(hyp_icd_4019_cf, what = "classes"))
hyp_icd_4019_cfs[nrow(hyp_icd_4019_cfs) + 1,] = c("hyp_icd_4019","Data_Type")
row.names(hyp_icd_4019_cfs) <- str_replace_all(hyp_icd_4019_cfs$row_names," ","_")
hyp_icd_4019_cfs <- as.data.frame(t(hyp_icd_4019_cfs))
hyp_icd_4019_cfs <- hyp_icd_4019_cfs[1,c(12,1:11)]
row.names(hyp_icd_4019_cfs) <- NULL
str(hyp_icd_4019_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_4019"
## $ Sensitivity : chr "0.583333333333333"
## $ Specificity : chr "0.9375"
## $ Pos_Pred_Value : chr "0.933333333333333"
## $ Neg_Pred_Value : chr "0.6"
## $ Precision : chr "0.933333333333333"
## $ Recall : chr "0.583333333333333"
## $ F1 : chr "0.717948717948718"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.35"
## $ Detection_Prevalence: chr "0.375"
## $ Balanced_Accuracy : chr "0.760416666666667"
hyp_icd_4019_cfs
hyp_icd_4019_cfa <- hyp_icd_4019_cfa %>%
inner_join(hyp_icd_4019_cfs, by = "Data_Type")
rm(hyp_icd_4019_cfs)
str(hyp_icd_4019_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_4019"
## $ Accuracy : chr "0.725"
## $ Kappa : chr "0.476190476190476"
## $ AccuracyLower : chr "0.613756579155591"
## $ AccuracyUpper : chr "0.818962361442961"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.013649591659318"
## $ McnemarPValue : chr "0.000289619409755552"
## $ Sensitivity : chr "0.583333333333333"
## $ Specificity : chr "0.9375"
## $ Pos_Pred_Value : chr "0.933333333333333"
## $ Neg_Pred_Value : chr "0.6"
## $ Precision : chr "0.933333333333333"
## $ Recall : chr "0.583333333333333"
## $ F1 : chr "0.717948717948718"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.35"
## $ Detection_Prevalence: chr "0.375"
## $ Balanced_Accuracy : chr "0.760416666666667"
hyp_icd_4019_cfa
knitr::kable(t(hyp_icd_4019_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_4019 |
| Accuracy | 0.725 |
| Sensitivity | 0.583333333333333 |
| Specificity | 0.9375 |
| Pos_Pred_Value | 0.933333333333333 |
| Neg_Pred_Value | 0.6 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_4019_cfa)
ICD9_CODE 401.9 Unspecified, minimum 2 instances
diagnoses_icd <- tbl(con, "mimic3_demo.DIAGNOSES_ICD")
hyp_icd_4019_min2 <- diagnoses_icd %>%
filter(ICD9_CODE %in% c("4019")) %>%
# distinct(SUBJECT_ID) %>%
mutate(hyp_icd_4019 = 1) %>%
group_by(SUBJECT_ID) %>%
summarise(hyp_icd_4019_count = sum(hyp_icd_4019, na.rm = TRUE)) %>%
mutate(hyp_icd_4019_min2 = case_when(hyp_icd_4019_count >= 2 ~ 1,
TRUE ~ 0)) %>%
select(SUBJECT_ID, hyp_icd_4019_min2)
str(hyp_icd_4019_min2)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "summarise"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "group_by"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "select"
## .. .. .. ..$ x :List of 4
## .. .. .. .. ..$ name: chr "filter"
## .. .. .. .. ..$ x :List of 2
## .. .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.DIAGNOSES_ICD"
## .. .. .. .. .. ..$ vars: chr [1:5] "ROW_ID" "SUBJECT_ID" "HADM_ID" "SEQ_NUM" ...
## .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. .. ..$ dots:List of 1
## .. .. .. .. .. ..$ : language ~ICD9_CODE %in% c("4019")
## .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x000000002275a8d8>
## .. .. .. .. ..$ args: list()
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. .. ..$ dots: list()
## .. .. .. ..$ args:List of 1
## .. .. .. .. ..$ vars:List of 6
## .. .. .. .. .. ..$ ROW_ID : symbol ROW_ID
## .. .. .. .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. .. .. .. ..$ HADM_ID : symbol HADM_ID
## .. .. .. .. .. ..$ SEQ_NUM : symbol SEQ_NUM
## .. .. .. .. .. ..$ ICD9_CODE : symbol ICD9_CODE
## .. .. .. .. .. ..$ hyp_icd_4019: language ~1
## .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. .. ..$ dots:List of 1
## .. .. .. ..$ SUBJECT_ID: symbol SUBJECT_ID
## .. .. ..$ args:List of 1
## .. .. .. ..$ add: logi FALSE
## .. .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
## .. ..$ dots:List of 1
## .. .. ..$ hyp_icd_4019_count: language ~sum(hyp_icd_4019, na.rm = TRUE)
## .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000224ba0e8>
## .. ..$ args: list()
## .. ..- attr(*, "class")= chr [1:3] "op_summarise" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 2
## .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. ..$ hyp_icd_4019_min2: language ~case_when(hyp_icd_4019_count >= 2 ~ 1, TRUE ~ 0)
## .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000224a9ba8>
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_icd_4019_min2
hyp_icd_4019_min2 <- as_tibble(hyp_icd_4019_min2)
str(hyp_icd_4019_min2)
## tibble [36 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID : int [1:36] 10089 10106 10017 10019 10027 10033 43827 44083 10038 40503 ...
## $ hyp_icd_4019_min2: num [1:36] 0 0 0 0 0 0 0 1 0 0 ...
hyp_icd_4019_min2
hyp_icd_4019_min2_cf <- training %>%
left_join(hyp_icd_4019_min2) %>%
mutate(hyp_icd_4019_min2 = coalesce(hyp_icd_4019_min2, 0)) %>%
collect() %>%
getStats(hyp_icd_4019_min2, HYPERTENSION)
hyp_icd_4019_min2_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_4019_min2 1 0
## 1 5 0
## 0 43 32
##
## Accuracy : 0.4625
## 95% CI : (0.3503, 0.5776)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.9953
##
## Kappa : 0.0851
##
## Mcnemar's Test P-Value : 1.504e-10
##
## Sensitivity : 0.1042
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 0.4267
## Prevalence : 0.6000
## Detection Rate : 0.0625
## Detection Prevalence : 0.0625
## Balanced Accuracy : 0.5521
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_4019_min2_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4019_min2_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 5 43 0 32
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_4019_min2: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.4625 0.0851 0.3503 0.5776 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.104 1 1 0.427 1 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_4019_min2_cfa <- as.data.frame(as.matrix(hyp_icd_4019_min2_cf, what = "overall"))
hyp_icd_4019_min2_cfa$row_names <- row.names(as.matrix(hyp_icd_4019_min2_cf, what = "overall"))
hyp_icd_4019_min2_cfa[nrow(hyp_icd_4019_min2_cfa) + 1,] = c("hyp_icd_4019_min2","Data_Type")
row.names(hyp_icd_4019_min2_cfa) <- str_replace_all(hyp_icd_4019_min2_cfa$row_names," ","_")
hyp_icd_4019_min2_cfa <- as.data.frame(t(hyp_icd_4019_min2_cfa))
hyp_icd_4019_min2_cfa <- hyp_icd_4019_min2_cfa[1,c(8,1:7)]
row.names(hyp_icd_4019_min2_cfa) <- NULL
str(hyp_icd_4019_min2_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_4019_min2"
## $ Accuracy : chr "0.4625"
## $ Kappa : chr "0.0851063829787235"
## $ AccuracyLower : chr "0.350276426538398"
## $ AccuracyUpper : chr "0.577582471995166"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.995292579337281"
## $ McnemarPValue : chr "1.50428571598827e-10"
hyp_icd_4019_min2_cfa
hyp_icd_4019_min2_cfs <- as.data.frame(as.matrix(hyp_icd_4019_min2_cf, what = "classes"))
hyp_icd_4019_min2_cfs$row_names <- row.names(as.matrix(hyp_icd_4019_min2_cf, what = "classes"))
hyp_icd_4019_min2_cfs[nrow(hyp_icd_4019_min2_cfs) + 1,] = c("hyp_icd_4019_min2","Data_Type")
row.names(hyp_icd_4019_min2_cfs) <- str_replace_all(hyp_icd_4019_min2_cfs$row_names," ","_")
hyp_icd_4019_min2_cfs <- as.data.frame(t(hyp_icd_4019_min2_cfs))
hyp_icd_4019_min2_cfs <- hyp_icd_4019_min2_cfs[1,c(12,1:11)]
row.names(hyp_icd_4019_min2_cfs) <- NULL
str(hyp_icd_4019_min2_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_4019_min2"
## $ Sensitivity : chr "0.104166666666667"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "1"
## $ Neg_Pred_Value : chr "0.426666666666667"
## $ Precision : chr "1"
## $ Recall : chr "0.104166666666667"
## $ F1 : chr "0.188679245283019"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.0625"
## $ Detection_Prevalence: chr "0.0625"
## $ Balanced_Accuracy : chr "0.552083333333333"
hyp_icd_4019_min2_cfs
hyp_icd_4019_min2_cfa <- hyp_icd_4019_min2_cfa %>%
inner_join(hyp_icd_4019_min2_cfs, by = "Data_Type")
rm(hyp_icd_4019_min2_cfs)
str(hyp_icd_4019_min2_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_4019_min2"
## $ Accuracy : chr "0.4625"
## $ Kappa : chr "0.0851063829787235"
## $ AccuracyLower : chr "0.350276426538398"
## $ AccuracyUpper : chr "0.577582471995166"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.995292579337281"
## $ McnemarPValue : chr "1.50428571598827e-10"
## $ Sensitivity : chr "0.104166666666667"
## $ Specificity : chr "1"
## $ Pos_Pred_Value : chr "1"
## $ Neg_Pred_Value : chr "0.426666666666667"
## $ Precision : chr "1"
## $ Recall : chr "0.104166666666667"
## $ F1 : chr "0.188679245283019"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.0625"
## $ Detection_Prevalence: chr "0.0625"
## $ Balanced_Accuracy : chr "0.552083333333333"
hyp_icd_4019_min2_cfa
knitr::kable(t(hyp_icd_4019_min2_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_4019_min2 |
| Accuracy | 0.4625 |
| Sensitivity | 0.104166666666667 |
| Specificity | 1 |
| Pos_Pred_Value | 1 |
| Neg_Pred_Value | 0.426666666666667 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_4019_min2_cfa)
A list of 167 text values representing antihypertensive drugs were provided. Best efforts to match this drug list against the PRESCRIPTIONS table results in 108 names left over (107 if we performed a secondary match).
Matches
SELECT DISTINCT *
FROM course3_data.D_ANTIHYPERTENSIVES antihyp
JOIN mimic3_demo.PRESCRIPTIONS presc
ON antihyp.DRUG = presc.DRUG
=709 rows
SELECT DISTINCT *
FROM course3_data.D_ANTIHYPERTENSIVES antihyp
JOIN mimic3_demo.PRESCRIPTIONS presc
ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
ORDER BY 1, 2, 3, 4, 5, 6, 10, 11
=1395 rows
WITH temp1 AS
( SELECT antihyp.DRUG
FROM course3_data.D_ANTIHYPERTENSIVES antihyp
LEFT JOIN mimic3_demo.PRESCRIPTIONS presc
ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
WHERE ROW_ID IS NULL)
SELECT * FROM temp1
LEFT JOIN mimic3_demo.PRESCRIPTIONS presc2
ON TRIM(LOWER(temp1.DRUG)) = TRIM(LOWER(presc2.DRUG_NAME_GENERIC))
WHERE NOT ROW_ID IS NULL
UNION ALL
SELECT DISTINCT *
FROM course3_data.D_ANTIHYPERTENSIVES antihyp
JOIN mimic3_demo.PRESCRIPTIONS presc
ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
ORDER BY 1, 2, 3, 4, 5, 6, 10, 11
=1396 rows
Leftovers
SELECT antihyp.DRUG
FROM course3_data.D_ANTIHYPERTENSIVES antihyp
LEFT JOIN mimic3_demo.PRESCRIPTIONS presc
ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
WHERE ROW_ID IS NULL
=108 rows
WITH temp1 AS
( SELECT antihyp.DRUG
FROM course3_data.D_ANTIHYPERTENSIVES antihyp
LEFT JOIN mimic3_demo.PRESCRIPTIONS presc
ON TRIM(LOWER(antihyp.DRUG)) = TRIM(LOWER(presc.DRUG))
WHERE ROW_ID IS NULL)
SELECT * FROM temp1
LEFT JOIN mimic3_demo.PRESCRIPTIONS presc2
ON TRIM(LOWER(temp1.DRUG)) = TRIM(LOWER(presc2.DRUG_NAME_GENERIC))
WHERE ROW_ID IS NULL
=107 rows
List of DRUG names that were not matched in the PRESCIPTIONS table, using as-is text match then using cleaned-up versions of DRUG field for text match:
prescriptions <- tbl(con, "mimic3_demo.PRESCRIPTIONS")
antihypdrgs <- tbl(con, "course3_data.D_ANTIHYPERTENSIVES")
hyp_rxmiss <- antihypdrgs %>%
left_join(prescriptions) %>%
filter(is.null(ROW_ID)) %>%
group_by(DRUG) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG))) %>%
select(drug1) %>%
arrange(drug1)
str(hyp_rxmiss)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "arrange"
## ..$ x :List of 4
## .. ..$ name: chr "select"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "group_by"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "filter"
## .. .. .. ..$ x :List of 4
## .. .. .. .. ..$ name: chr "join"
## .. .. .. .. ..$ x :List of 2
## .. .. .. .. .. ..$ src:List of 2
## .. .. .. .. .. .. ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. .. .. .. .. .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. .. ..@ dataset : NULL
## .. .. .. .. .. .. .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
## .. .. .. .. .. .. .. .. ..@ page_size : int 10000
## .. .. .. .. .. .. .. .. ..@ quiet : logi NA
## .. .. .. .. .. .. .. .. ..@ bigint : chr "integer"
## .. .. .. .. .. .. ..$ disco: NULL
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## .. .. .. .. .. ..$ ops:List of 2
## .. .. .. .. .. .. ..$ x : 'ident' chr "course3_data.D_ANTIHYPERTENSIVES"
## .. .. .. .. .. .. ..$ vars: chr "DRUG"
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
## .. .. .. .. ..$ y :List of 2
## .. .. .. .. .. ..$ src:List of 2
## .. .. .. .. .. .. ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. .. .. .. .. .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. .. ..@ dataset : NULL
## .. .. .. .. .. .. .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
## .. .. .. .. .. .. .. .. ..@ page_size : int 10000
## .. .. .. .. .. .. .. .. ..@ quiet : logi NA
## .. .. .. .. .. .. .. .. ..@ bigint : chr "integer"
## .. .. .. .. .. .. ..$ disco: NULL
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## .. .. .. .. .. ..$ ops:List of 2
## .. .. .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.PRESCRIPTIONS"
## .. .. .. .. .. .. ..$ vars: chr [1:19] "ROW_ID" "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID" ...
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
## .. .. .. .. ..$ args:List of 4
## .. .. .. .. .. ..$ vars :List of 3
## .. .. .. .. .. .. ..$ alias: chr [1:19] "DRUG" "ROW_ID" "SUBJECT_ID" "HADM_ID" ...
## .. .. .. .. .. .. ..$ x : chr [1:19] "DRUG" NA NA NA ...
## .. .. .. .. .. .. ..$ y : chr [1:19] NA "ROW_ID" "SUBJECT_ID" "HADM_ID" ...
## .. .. .. .. .. ..$ type : chr "left"
## .. .. .. .. .. ..$ by :List of 2
## .. .. .. .. .. .. ..$ x: chr "DRUG"
## .. .. .. .. .. .. ..$ y: chr "DRUG"
## .. .. .. .. .. ..$ suffix: chr [1:2] ".x" ".y"
## .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_join" "op_double" "op"
## .. .. .. ..$ dots:List of 1
## .. .. .. .. ..$ : language ~is.null(ROW_ID)
## .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026a1cd90>
## .. .. .. ..$ args: list()
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. ..$ dots:List of 1
## .. .. .. ..$ DRUG: symbol DRUG
## .. .. ..$ args:List of 1
## .. .. .. ..$ add: logi FALSE
## .. .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
## .. ..$ dots: list()
## .. ..$ args:List of 1
## .. .. ..$ vars:List of 2
## .. .. .. ..$ DRUG : symbol DRUG
## .. .. .. ..$ drug1: language ~str_trim(str_to_lower(DRUG))
## .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026a4f058>
## .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## ..$ dots:List of 1
## .. ..$ : language ~drug1
## .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026d20e20>
## ..$ args:List of 1
## .. ..$ .by_group: logi FALSE
## ..- attr(*, "class")= chr [1:3] "op_arrange" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_rxmiss
prescriptions_clean <- prescriptions %>%
filter(!is.null(str_trim(str_to_lower(DRUG)))) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG)))
prescriptions_clean
antihypdrgs_clean <- antihypdrgs %>%
filter(!is.null(str_trim(str_to_lower(DRUG)))) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG)))
antihypdrgs_clean
hyp_rxmiss <- antihypdrgs_clean %>%
left_join(prescriptions_clean, by = "drug1") %>%
filter(is.null(ROW_ID)) %>%
group_by(drug1) %>%
select(drug1)
str(hyp_rxmiss)
## List of 2
## $ src:List of 2
## ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. ..@ dataset : NULL
## .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. ..@ use_legacy_sql: logi FALSE
## .. .. ..@ page_size : int 10000
## .. .. ..@ quiet : logi NA
## .. .. ..@ bigint : chr "integer"
## ..$ disco: NULL
## ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## $ ops:List of 4
## ..$ name: chr "select"
## ..$ x :List of 4
## .. ..$ name: chr "group_by"
## .. ..$ x :List of 4
## .. .. ..$ name: chr "filter"
## .. .. ..$ x :List of 4
## .. .. .. ..$ name: chr "join"
## .. .. .. ..$ x :List of 2
## .. .. .. .. ..$ src:List of 2
## .. .. .. .. .. ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. .. .. .. .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. ..@ dataset : NULL
## .. .. .. .. .. .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
## .. .. .. .. .. .. .. ..@ page_size : int 10000
## .. .. .. .. .. .. .. ..@ quiet : logi NA
## .. .. .. .. .. .. .. ..@ bigint : chr "integer"
## .. .. .. .. .. ..$ disco: NULL
## .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## .. .. .. .. ..$ ops:List of 4
## .. .. .. .. .. ..$ name: chr "select"
## .. .. .. .. .. ..$ x :List of 4
## .. .. .. .. .. .. ..$ name: chr "filter"
## .. .. .. .. .. .. ..$ x :List of 2
## .. .. .. .. .. .. .. ..$ x : 'ident' chr "course3_data.D_ANTIHYPERTENSIVES"
## .. .. .. .. .. .. .. ..$ vars: chr "DRUG"
## .. .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. .. .. .. ..$ dots:List of 1
## .. .. .. .. .. .. .. ..$ : language ~!is.null(str_trim(str_to_lower(DRUG)))
## .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x000000002693de60>
## .. .. .. .. .. .. ..$ args: list()
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. .. .. .. ..$ dots: list()
## .. .. .. .. .. ..$ args:List of 1
## .. .. .. .. .. .. ..$ vars:List of 2
## .. .. .. .. .. .. .. ..$ DRUG : symbol DRUG
## .. .. .. .. .. .. .. ..$ drug1: language ~str_trim(str_to_lower(DRUG))
## .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000026953bc0>
## .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
## .. .. .. ..$ y :List of 2
## .. .. .. .. ..$ src:List of 2
## .. .. .. .. .. ..$ con :Formal class 'BigQueryConnection' [package "bigrquery"] with 7 slots
## .. .. .. .. .. .. .. ..@ project : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. ..@ dataset : NULL
## .. .. .. .. .. .. .. ..@ billing : chr "learnclinicaldatascience"
## .. .. .. .. .. .. .. ..@ use_legacy_sql: logi FALSE
## .. .. .. .. .. .. .. ..@ page_size : int 10000
## .. .. .. .. .. .. .. ..@ quiet : logi NA
## .. .. .. .. .. .. .. ..@ bigint : chr "integer"
## .. .. .. .. .. ..$ disco: NULL
## .. .. .. .. .. ..- attr(*, "class")= chr [1:4] "src_BigQueryConnection" "src_dbi" "src_sql" "src"
## .. .. .. .. ..$ ops:List of 4
## .. .. .. .. .. ..$ name: chr "select"
## .. .. .. .. .. ..$ x :List of 4
## .. .. .. .. .. .. ..$ name: chr "filter"
## .. .. .. .. .. .. ..$ x :List of 2
## .. .. .. .. .. .. .. ..$ x : 'ident' chr "mimic3_demo.PRESCRIPTIONS"
## .. .. .. .. .. .. .. ..$ vars: chr [1:19] "ROW_ID" "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID" ...
## .. .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
## .. .. .. .. .. .. ..$ dots:List of 1
## .. .. .. .. .. .. .. ..$ : language ~!is.null(str_trim(str_to_lower(DRUG)))
## .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x0000000024191268>
## .. .. .. .. .. .. ..$ args: list()
## .. .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. .. .. .. .. ..$ dots: list()
## .. .. .. .. .. ..$ args:List of 1
## .. .. .. .. .. .. ..$ vars:List of 20
## .. .. .. .. .. .. .. ..$ ROW_ID : symbol ROW_ID
## .. .. .. .. .. .. .. ..$ SUBJECT_ID : symbol SUBJECT_ID
## .. .. .. .. .. .. .. ..$ HADM_ID : symbol HADM_ID
## .. .. .. .. .. .. .. ..$ ICUSTAY_ID : symbol ICUSTAY_ID
## .. .. .. .. .. .. .. ..$ STARTDATE : symbol STARTDATE
## .. .. .. .. .. .. .. ..$ ENDDATE : symbol ENDDATE
## .. .. .. .. .. .. .. ..$ DRUG_TYPE : symbol DRUG_TYPE
## .. .. .. .. .. .. .. ..$ DRUG : symbol DRUG
## .. .. .. .. .. .. .. ..$ DRUG_NAME_POE : symbol DRUG_NAME_POE
## .. .. .. .. .. .. .. ..$ DRUG_NAME_GENERIC: symbol DRUG_NAME_GENERIC
## .. .. .. .. .. .. .. ..$ FORMULARY_DRUG_CD: symbol FORMULARY_DRUG_CD
## .. .. .. .. .. .. .. ..$ GSN : symbol GSN
## .. .. .. .. .. .. .. ..$ NDC : symbol NDC
## .. .. .. .. .. .. .. ..$ PROD_STRENGTH : symbol PROD_STRENGTH
## .. .. .. .. .. .. .. ..$ DOSE_VAL_RX : symbol DOSE_VAL_RX
## .. .. .. .. .. .. .. ..$ DOSE_UNIT_RX : symbol DOSE_UNIT_RX
## .. .. .. .. .. .. .. ..$ FORM_VAL_DISP : symbol FORM_VAL_DISP
## .. .. .. .. .. .. .. ..$ FORM_UNIT_DISP : symbol FORM_UNIT_DISP
## .. .. .. .. .. .. .. ..$ ROUTE : symbol ROUTE
## .. .. .. .. .. .. .. ..$ drug1 : language ~str_trim(str_to_lower(DRUG))
## .. .. .. .. .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000241b16c8>
## .. .. .. .. .. ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## .. .. .. .. ..- attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
## .. .. .. ..$ args:List of 4
## .. .. .. .. ..$ vars :List of 3
## .. .. .. .. .. ..$ alias: chr [1:21] "DRUG.x" "drug1" "ROW_ID" "SUBJECT_ID" ...
## .. .. .. .. .. ..$ x : chr [1:21] "DRUG" "drug1" NA NA ...
## .. .. .. .. .. ..$ y : chr [1:21] NA NA "ROW_ID" "SUBJECT_ID" ...
## .. .. .. .. ..$ type : chr "left"
## .. .. .. .. ..$ by :List of 2
## .. .. .. .. .. ..$ x: chr "drug1"
## .. .. .. .. .. ..$ y: chr "drug1"
## .. .. .. .. ..$ suffix: chr [1:2] ".x" ".y"
## .. .. .. ..- attr(*, "class")= chr [1:3] "op_join" "op_double" "op"
## .. .. ..$ dots:List of 1
## .. .. .. ..$ : language ~is.null(ROW_ID)
## .. .. .. .. ..- attr(*, ".Environment")=<environment: 0x00000000294ead90>
## .. .. ..$ args: list()
## .. .. ..- attr(*, "class")= chr [1:3] "op_filter" "op_single" "op"
## .. ..$ dots:List of 1
## .. .. ..$ drug1: symbol drug1
## .. ..$ args:List of 1
## .. .. ..$ add: logi FALSE
## .. ..- attr(*, "class")= chr [1:3] "op_group_by" "op_single" "op"
## ..$ dots: list()
## ..$ args:List of 1
## .. ..$ vars:List of 1
## .. .. ..$ drug1: symbol drug1
## ..- attr(*, "class")= chr [1:3] "op_select" "op_single" "op"
## - attr(*, "class")= chr [1:5] "tbl_BigQueryConnection" "tbl_dbi" "tbl_sql" "tbl_lazy" ...
hyp_rxmiss
Create a table of patients that have PRESCRIPTIONS matched to the list of hypertension DRUG names (using cleaned-up versions of DRUG field for text match):
prescriptions_clean <- prescriptions %>%
filter(!is.null(str_trim(str_to_lower(DRUG)))) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG))) %>%
collect()
prescriptions_clean
antihypdrgs_clean <- antihypdrgs %>%
filter(!is.null(str_trim(str_to_lower(DRUG)))) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG))) %>%
collect()
antihypdrgs_clean
hyp_rx_any <- prescriptions_clean %>%
inner_join(antihypdrgs_clean, by = "drug1") %>%
distinct(SUBJECT_ID) %>%
mutate(hyp_rx_any = 1L)
str(hyp_rx_any)
## tibble [77 x 2] (S3: tbl_df/tbl/data.frame)
## $ SUBJECT_ID: int [1:77] 42281 10059 10124 42346 41983 42199 44083 10088 10061 40601 ...
## $ hyp_rx_any: int [1:77] 1 1 1 1 1 1 1 1 1 1 ...
hyp_rx_any
hyp_rx_any_cf <- training %>%
left_join(hyp_rx_any) %>%
mutate(hyp_rx_any = coalesce(hyp_rx_any, 0)) %>%
collect() %>%
getStats(hyp_rx_any, HYPERTENSION)
hyp_rx_any_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_rx_any 1 0
## 1 39 22
## 0 9 10
##
## Accuracy : 0.6125
## 95% CI : (0.497, 0.7194)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.45762
##
## Kappa : 0.1341
##
## Mcnemar's Test P-Value : 0.03114
##
## Sensitivity : 0.8125
## Specificity : 0.3125
## Pos Pred Value : 0.6393
## Neg Pred Value : 0.5263
## Prevalence : 0.6000
## Detection Rate : 0.4875
## Detection Prevalence : 0.7625
## Balanced Accuracy : 0.5625
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_rx_any_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_rx_any_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 39 9 22 10
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_rx_any : chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.613 0.134 0.497 0.719 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.812 0.312 0.639 0.526 0.639 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_rx_any_cfa <- as.data.frame(as.matrix(hyp_rx_any_cf, what = "overall"))
hyp_rx_any_cfa$row_names <- row.names(as.matrix(hyp_rx_any_cf, what = "overall"))
hyp_rx_any_cfa[nrow(hyp_rx_any_cfa) + 1,] = c("hyp_rx_any","Data_Type")
row.names(hyp_rx_any_cfa) <- str_replace_all(hyp_rx_any_cfa$row_names," ","_")
hyp_rx_any_cfa <- as.data.frame(t(hyp_rx_any_cfa))
hyp_rx_any_cfa <- hyp_rx_any_cfa[1,c(8,1:7)]
row.names(hyp_rx_any_cfa) <- NULL
str(hyp_rx_any_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_rx_any"
## $ Accuracy : chr "0.6125"
## $ Kappa : chr "0.134078212290503"
## $ AccuracyLower : chr "0.496975542501899"
## $ AccuracyUpper : chr "0.719434866875218"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.457620673916443"
## $ McnemarPValue : chr "0.0311412105957967"
hyp_rx_any_cfa
hyp_rx_any_cfs <- as.data.frame(as.matrix(hyp_rx_any_cf, what = "classes"))
hyp_rx_any_cfs$row_names <- row.names(as.matrix(hyp_rx_any_cf, what = "classes"))
hyp_rx_any_cfs[nrow(hyp_rx_any_cfs) + 1,] = c("hyp_rx_any","Data_Type")
row.names(hyp_rx_any_cfs) <- str_replace_all(hyp_rx_any_cfs$row_names," ","_")
hyp_rx_any_cfs <- as.data.frame(t(hyp_rx_any_cfs))
hyp_rx_any_cfs <- hyp_rx_any_cfs[1,c(12,1:11)]
row.names(hyp_rx_any_cfs) <- NULL
str(hyp_rx_any_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_rx_any"
## $ Sensitivity : chr "0.8125"
## $ Specificity : chr "0.3125"
## $ Pos_Pred_Value : chr "0.639344262295082"
## $ Neg_Pred_Value : chr "0.526315789473684"
## $ Precision : chr "0.639344262295082"
## $ Recall : chr "0.8125"
## $ F1 : chr "0.71559633027523"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.4875"
## $ Detection_Prevalence: chr "0.7625"
## $ Balanced_Accuracy : chr "0.5625"
hyp_rx_any_cfs
hyp_rx_any_cfa <- hyp_rx_any_cfa %>%
inner_join(hyp_rx_any_cfs, by = "Data_Type")
rm(hyp_rx_any_cfs)
str(hyp_rx_any_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_rx_any"
## $ Accuracy : chr "0.6125"
## $ Kappa : chr "0.134078212290503"
## $ AccuracyLower : chr "0.496975542501899"
## $ AccuracyUpper : chr "0.719434866875218"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.457620673916443"
## $ McnemarPValue : chr "0.0311412105957967"
## $ Sensitivity : chr "0.8125"
## $ Specificity : chr "0.3125"
## $ Pos_Pred_Value : chr "0.639344262295082"
## $ Neg_Pred_Value : chr "0.526315789473684"
## $ Precision : chr "0.639344262295082"
## $ Recall : chr "0.8125"
## $ F1 : chr "0.71559633027523"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.4875"
## $ Detection_Prevalence: chr "0.7625"
## $ Balanced_Accuracy : chr "0.5625"
hyp_rx_any_cfa
knitr::kable(t(hyp_rx_any_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_rx_any |
| Accuracy | 0.6125 |
| Sensitivity | 0.8125 |
| Specificity | 0.3125 |
| Pos_Pred_Value | 0.639344262295082 |
| Neg_Pred_Value | 0.526315789473684 |
phenotypes_cf <- union(phenotypes_cf,hyp_rx_any_cfa)
#hyp_rx_any_cft <- as_tibble(as.matrix(hyp_rx_any_cf, what = "overall"))
#as.table(hyp_rx_any_cf)
#as.matrix(hyp_rx_any_cf)
#as.matrix(hyp_rx_any_cf, what = "overall")
#as.matrix(hyp_rx_any_cf, what = "classes")
At least 2 instances of medications listed
prescriptions_clean <- prescriptions %>%
filter(!is.null(str_trim(str_to_lower(DRUG)))) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG))) %>%
collect()
prescriptions_clean
antihypdrgs_clean <- antihypdrgs %>%
filter(!is.null(str_trim(str_to_lower(DRUG)))) %>%
mutate(drug1 = str_trim(str_to_lower(DRUG))) %>%
collect()
antihypdrgs_clean
hyp_rx_min2 <- prescriptions_clean %>%
inner_join(antihypdrgs_clean, by = "drug1") %>%
mutate(hyp_rx_mark = 1) %>%
group_by(SUBJECT_ID) %>%
summarise(hyp_rx_count = sum(hyp_rx_mark, na.rm = TRUE)) %>%
mutate(hyp_rx_min2 = case_when(hyp_rx_count >= 2 ~ 1,
TRUE ~ 0)) %>%
select(SUBJECT_ID, hyp_rx_count, hyp_rx_min2)
hyp_rx_min2
hyp_rx_min2_cf <- training %>%
left_join(hyp_rx_min2) %>%
mutate(hyp_rx_min2 = coalesce(hyp_rx_min2, 0)) %>%
collect() %>%
getStats(hyp_rx_min2, HYPERTENSION)
hyp_rx_min2_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_rx_min2 1 0
## 1 39 21
## 0 9 11
##
## Accuracy : 0.625
## 95% CI : (0.5096, 0.7308)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.36873
##
## Kappa : 0.1667
##
## Mcnemar's Test P-Value : 0.04461
##
## Sensitivity : 0.8125
## Specificity : 0.3438
## Pos Pred Value : 0.6500
## Neg Pred Value : 0.5500
## Prevalence : 0.6000
## Detection Rate : 0.4875
## Detection Prevalence : 0.7500
## Balanced Accuracy : 0.5781
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_rx_min2_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_rx_min2_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 39 9 21 11
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_rx_min2 : chr [1:2] "1" "0"
## .. ..$ HYPERTENSION: chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.625 0.167 0.51 0.731 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.812 0.344 0.65 0.55 0.65 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_rx_min2_cfa <- as.data.frame(as.matrix(hyp_rx_min2_cf, what = "overall"))
hyp_rx_min2_cfa$row_names <- row.names(as.matrix(hyp_rx_min2_cf, what = "overall"))
hyp_rx_min2_cfa[nrow(hyp_rx_min2_cfa) + 1,] = c("hyp_rx_min2","Data_Type")
row.names(hyp_rx_min2_cfa) <- str_replace_all(hyp_rx_min2_cfa$row_names," ","_")
hyp_rx_min2_cfa <- as.data.frame(t(hyp_rx_min2_cfa))
hyp_rx_min2_cfa <- hyp_rx_min2_cfa[1,c(8,1:7)]
row.names(hyp_rx_min2_cfa) <- NULL
str(hyp_rx_min2_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_rx_min2"
## $ Accuracy : chr "0.625"
## $ Kappa : chr "0.166666666666667"
## $ AccuracyLower : chr "0.509645235592888"
## $ AccuracyUpper : chr "0.730806830053195"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.368726748258052"
## $ McnemarPValue : chr "0.0446097180249397"
hyp_rx_min2_cfa
hyp_rx_min2_cfs <- as.data.frame(as.matrix(hyp_rx_min2_cf, what = "classes"))
hyp_rx_min2_cfs$row_names <- row.names(as.matrix(hyp_rx_min2_cf, what = "classes"))
hyp_rx_min2_cfs[nrow(hyp_rx_min2_cfs) + 1,] = c("hyp_rx_min2","Data_Type")
row.names(hyp_rx_min2_cfs) <- str_replace_all(hyp_rx_min2_cfs$row_names," ","_")
hyp_rx_min2_cfs <- as.data.frame(t(hyp_rx_min2_cfs))
hyp_rx_min2_cfs <- hyp_rx_min2_cfs[1,c(12,1:11)]
row.names(hyp_rx_min2_cfs) <- NULL
str(hyp_rx_min2_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_rx_min2"
## $ Sensitivity : chr "0.8125"
## $ Specificity : chr "0.34375"
## $ Pos_Pred_Value : chr "0.65"
## $ Neg_Pred_Value : chr "0.55"
## $ Precision : chr "0.65"
## $ Recall : chr "0.8125"
## $ F1 : chr "0.722222222222222"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.4875"
## $ Detection_Prevalence: chr "0.75"
## $ Balanced_Accuracy : chr "0.578125"
hyp_rx_min2_cfs
hyp_rx_min2_cfa <- hyp_rx_min2_cfa %>%
inner_join(hyp_rx_min2_cfs, by = "Data_Type")
rm(hyp_rx_min2_cfs)
str(hyp_rx_min2_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_rx_min2"
## $ Accuracy : chr "0.625"
## $ Kappa : chr "0.166666666666667"
## $ AccuracyLower : chr "0.509645235592888"
## $ AccuracyUpper : chr "0.730806830053195"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.368726748258052"
## $ McnemarPValue : chr "0.0446097180249397"
## $ Sensitivity : chr "0.8125"
## $ Specificity : chr "0.34375"
## $ Pos_Pred_Value : chr "0.65"
## $ Neg_Pred_Value : chr "0.55"
## $ Precision : chr "0.65"
## $ Recall : chr "0.8125"
## $ F1 : chr "0.722222222222222"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.4875"
## $ Detection_Prevalence: chr "0.75"
## $ Balanced_Accuracy : chr "0.578125"
hyp_rx_min2_cfa
knitr::kable(t(hyp_rx_min2_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_rx_min2 |
| Accuracy | 0.625 |
| Sensitivity | 0.8125 |
| Specificity | 0.34375 |
| Pos_Pred_Value | 0.65 |
| Neg_Pred_Value | 0.55 |
phenotypes_cf <- union(phenotypes_cf,hyp_rx_min2_cfa)
Data Combination: ICD 401.9 or any instances of hypertension medications
hyp_icd4019_or_rx_any_cf <- training %>%
left_join(hyp_icd_4019) %>%
left_join(hyp_rx_any) %>%
mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
hyp_rx_any = coalesce(hyp_rx_any, 0)) %>%
mutate(hyp_icd4019_or_rx_any = case_when(hyp_icd_4019 == 1 |
hyp_rx_any == 1 ~ 1,
TRUE ~ 0)) %>%
collect() %>%
getStats(hyp_icd4019_or_rx_any, HYPERTENSION)
hyp_icd4019_or_rx_any_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd4019_or_rx_any 1 0
## 1 41 22
## 0 7 10
##
## Accuracy : 0.6375
## 95% CI : (0.5224, 0.7421)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.28606
##
## Kappa : 0.1808
##
## Mcnemar's Test P-Value : 0.00933
##
## Sensitivity : 0.8542
## Specificity : 0.3125
## Pos Pred Value : 0.6508
## Neg Pred Value : 0.5882
## Prevalence : 0.6000
## Detection Rate : 0.5125
## Detection Prevalence : 0.7875
## Balanced Accuracy : 0.5833
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd4019_or_rx_any_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_any_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 41 7 22 10
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd4019_or_rx_any: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.637 0.181 0.522 0.742 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.854 0.312 0.651 0.588 0.651 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd4019_or_rx_any_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_cf, what = "overall"))
hyp_icd4019_or_rx_any_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_cf, what = "overall"))
hyp_icd4019_or_rx_any_cfa[nrow(hyp_icd4019_or_rx_any_cfa) + 1,] = c("hyp_icd4019_or_rx_any","Data_Type")
row.names(hyp_icd4019_or_rx_any_cfa) <- str_replace_all(hyp_icd4019_or_rx_any_cfa$row_names," ","_")
hyp_icd4019_or_rx_any_cfa <- as.data.frame(t(hyp_icd4019_or_rx_any_cfa))
hyp_icd4019_or_rx_any_cfa <- hyp_icd4019_or_rx_any_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_any_cfa) <- NULL
str(hyp_icd4019_or_rx_any_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_any"
## $ Accuracy : chr "0.6375"
## $ Kappa : chr "0.180790960451977"
## $ AccuracyLower : chr "0.522387278735191"
## $ AccuracyUpper : chr "0.742105104628382"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.286055397395747"
## $ McnemarPValue : chr "0.00932958471977156"
hyp_icd4019_or_rx_any_cfa
hyp_icd4019_or_rx_any_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_cf, what = "classes"))
hyp_icd4019_or_rx_any_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_cf, what = "classes"))
hyp_icd4019_or_rx_any_cfs[nrow(hyp_icd4019_or_rx_any_cfs) + 1,] = c("hyp_icd4019_or_rx_any","Data_Type")
row.names(hyp_icd4019_or_rx_any_cfs) <- str_replace_all(hyp_icd4019_or_rx_any_cfs$row_names," ","_")
hyp_icd4019_or_rx_any_cfs <- as.data.frame(t(hyp_icd4019_or_rx_any_cfs))
hyp_icd4019_or_rx_any_cfs <- hyp_icd4019_or_rx_any_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_any_cfs) <- NULL
str(hyp_icd4019_or_rx_any_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_any"
## $ Sensitivity : chr "0.854166666666667"
## $ Specificity : chr "0.3125"
## $ Pos_Pred_Value : chr "0.650793650793651"
## $ Neg_Pred_Value : chr "0.588235294117647"
## $ Precision : chr "0.650793650793651"
## $ Recall : chr "0.854166666666667"
## $ F1 : chr "0.738738738738739"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.5125"
## $ Detection_Prevalence: chr "0.7875"
## $ Balanced_Accuracy : chr "0.583333333333333"
hyp_icd4019_or_rx_any_cfs
hyp_icd4019_or_rx_any_cfa <- hyp_icd4019_or_rx_any_cfa %>%
inner_join(hyp_icd4019_or_rx_any_cfs, by = "Data_Type")
rm(hyp_icd4019_or_rx_any_cfs)
str(hyp_icd4019_or_rx_any_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_any"
## $ Accuracy : chr "0.6375"
## $ Kappa : chr "0.180790960451977"
## $ AccuracyLower : chr "0.522387278735191"
## $ AccuracyUpper : chr "0.742105104628382"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.286055397395747"
## $ McnemarPValue : chr "0.00932958471977156"
## $ Sensitivity : chr "0.854166666666667"
## $ Specificity : chr "0.3125"
## $ Pos_Pred_Value : chr "0.650793650793651"
## $ Neg_Pred_Value : chr "0.588235294117647"
## $ Precision : chr "0.650793650793651"
## $ Recall : chr "0.854166666666667"
## $ F1 : chr "0.738738738738739"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.5125"
## $ Detection_Prevalence: chr "0.7875"
## $ Balanced_Accuracy : chr "0.583333333333333"
hyp_icd4019_or_rx_any_cfa
knitr::kable(t(hyp_icd4019_or_rx_any_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd4019_or_rx_any |
| Accuracy | 0.6375 |
| Sensitivity | 0.854166666666667 |
| Specificity | 0.3125 |
| Pos_Pred_Value | 0.650793650793651 |
| Neg_Pred_Value | 0.588235294117647 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_any_cfa)
Data Combination: ICD 401.9 or at least two instances of hypertension medications
hyp_icd4019_or_rx_min2_cf <- training %>%
left_join(hyp_icd_4019) %>%
left_join(hyp_rx_min2) %>%
mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
hyp_rx_min2 = coalesce(hyp_rx_min2, 0)) %>%
mutate(hyp_icd4019_or_rx_min2 = case_when(hyp_icd_4019 == 1 |
hyp_rx_min2 == 1 ~ 1,
TRUE ~ 0)) %>%
collect() %>%
getStats(hyp_icd4019_or_rx_min2, HYPERTENSION)
hyp_icd4019_or_rx_min2_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd4019_or_rx_min2 1 0
## 1 41 21
## 0 7 11
##
## Accuracy : 0.65
## 95% CI : (0.5352, 0.7533)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.21311
##
## Kappa : 0.2135
##
## Mcnemar's Test P-Value : 0.01402
##
## Sensitivity : 0.8542
## Specificity : 0.3438
## Pos Pred Value : 0.6613
## Neg Pred Value : 0.6111
## Prevalence : 0.6000
## Detection Rate : 0.5125
## Detection Prevalence : 0.7750
## Balanced Accuracy : 0.5990
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd4019_or_rx_min2_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_min2_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 41 7 21 11
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd4019_or_rx_min2: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.65 0.213 0.535 0.753 0.6 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.854 0.344 0.661 0.611 0.661 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd4019_or_rx_min2_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "overall"))
hyp_icd4019_or_rx_min2_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "overall"))
hyp_icd4019_or_rx_min2_cfa[nrow(hyp_icd4019_or_rx_min2_cfa) + 1,] = c("hyp_icd4019_or_rx_min2","Data_Type")
row.names(hyp_icd4019_or_rx_min2_cfa) <- str_replace_all(hyp_icd4019_or_rx_min2_cfa$row_names," ","_")
hyp_icd4019_or_rx_min2_cfa <- as.data.frame(t(hyp_icd4019_or_rx_min2_cfa))
hyp_icd4019_or_rx_min2_cfa <- hyp_icd4019_or_rx_min2_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_min2_cfa) <- NULL
str(hyp_icd4019_or_rx_min2_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_min2"
## $ Accuracy : chr "0.65"
## $ Kappa : chr "0.213483146067416"
## $ AccuracyLower : chr "0.535203114520664"
## $ AccuracyUpper : chr "0.753328063527465"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue: chr "0.213110087811361"
## $ McnemarPValue : chr "0.01401927711396"
hyp_icd4019_or_rx_min2_cfa
hyp_icd4019_or_rx_min2_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "classes"))
hyp_icd4019_or_rx_min2_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_cf, what = "classes"))
hyp_icd4019_or_rx_min2_cfs[nrow(hyp_icd4019_or_rx_min2_cfs) + 1,] = c("hyp_icd4019_or_rx_min2","Data_Type")
row.names(hyp_icd4019_or_rx_min2_cfs) <- str_replace_all(hyp_icd4019_or_rx_min2_cfs$row_names," ","_")
hyp_icd4019_or_rx_min2_cfs <- as.data.frame(t(hyp_icd4019_or_rx_min2_cfs))
hyp_icd4019_or_rx_min2_cfs <- hyp_icd4019_or_rx_min2_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_min2_cfs) <- NULL
str(hyp_icd4019_or_rx_min2_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_min2"
## $ Sensitivity : chr "0.854166666666667"
## $ Specificity : chr "0.34375"
## $ Pos_Pred_Value : chr "0.661290322580645"
## $ Neg_Pred_Value : chr "0.611111111111111"
## $ Precision : chr "0.661290322580645"
## $ Recall : chr "0.854166666666667"
## $ F1 : chr "0.745454545454545"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.5125"
## $ Detection_Prevalence: chr "0.775"
## $ Balanced_Accuracy : chr "0.598958333333333"
hyp_icd4019_or_rx_min2_cfs
hyp_icd4019_or_rx_min2_cfa <- hyp_icd4019_or_rx_min2_cfa %>%
inner_join(hyp_icd4019_or_rx_min2_cfs, by = "Data_Type")
rm(hyp_icd4019_or_rx_min2_cfs)
str(hyp_icd4019_or_rx_min2_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_min2"
## $ Accuracy : chr "0.65"
## $ Kappa : chr "0.213483146067416"
## $ AccuracyLower : chr "0.535203114520664"
## $ AccuracyUpper : chr "0.753328063527465"
## $ AccuracyNull : chr "0.6"
## $ AccuracyPValue : chr "0.213110087811361"
## $ McnemarPValue : chr "0.01401927711396"
## $ Sensitivity : chr "0.854166666666667"
## $ Specificity : chr "0.34375"
## $ Pos_Pred_Value : chr "0.661290322580645"
## $ Neg_Pred_Value : chr "0.611111111111111"
## $ Precision : chr "0.661290322580645"
## $ Recall : chr "0.854166666666667"
## $ F1 : chr "0.745454545454545"
## $ Prevalence : chr "0.6"
## $ Detection_Rate : chr "0.5125"
## $ Detection_Prevalence: chr "0.775"
## $ Balanced_Accuracy : chr "0.598958333333333"
hyp_icd4019_or_rx_min2_cfa
knitr::kable(t(hyp_icd4019_or_rx_min2_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd4019_or_rx_min2 |
| Accuracy | 0.65 |
| Sensitivity | 0.854166666666667 |
| Specificity | 0.34375 |
| Pos_Pred_Value | 0.661290322580645 |
| Neg_Pred_Value | 0.611111111111111 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_min2_cfa)
Validation: hyp_icd_any_test using testing data
hyp_icd_any_test_cf <- testing %>%
left_join(hyp_icd_any) %>%
mutate(hyp_icd_any_test = coalesce(hyp_icd_any, 0)) %>%
collect() %>%
getStats(hyp_icd_any_test, HYPERTENSION)
hyp_icd_any_test_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_any_test 1 0
## 1 6 1
## 0 9 3
##
## Accuracy : 0.4737
## 95% CI : (0.2445, 0.7114)
## No Information Rate : 0.7895
## P-Value [Acc > NIR] : 0.99950
##
## Kappa : 0.0865
##
## Mcnemar's Test P-Value : 0.02686
##
## Sensitivity : 0.4000
## Specificity : 0.7500
## Pos Pred Value : 0.8571
## Neg Pred Value : 0.2500
## Prevalence : 0.7895
## Detection Rate : 0.3158
## Detection Prevalence : 0.3684
## Balanced Accuracy : 0.5750
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_any_test_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_any_test_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 6 9 1 3
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_any_test: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.4737 0.0865 0.2445 0.7114 0.7895 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.4 0.75 0.857 0.25 0.857 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_any_test_cfa <- as.data.frame(as.matrix(hyp_icd_any_test_cf, what = "overall"))
hyp_icd_any_test_cfa$row_names <- row.names(as.matrix(hyp_icd_any_test_cf, what = "overall"))
hyp_icd_any_test_cfa[nrow(hyp_icd_any_test_cfa) + 1,] = c("hyp_icd_any_test","Data_Type")
row.names(hyp_icd_any_test_cfa) <- str_replace_all(hyp_icd_any_test_cfa$row_names," ","_")
hyp_icd_any_test_cfa <- as.data.frame(t(hyp_icd_any_test_cfa))
hyp_icd_any_test_cfa <- hyp_icd_any_test_cfa[1,c(8,1:7)]
row.names(hyp_icd_any_test_cfa) <- NULL
str(hyp_icd_any_test_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_any_test"
## $ Accuracy : chr "0.473684210526316"
## $ Kappa : chr "0.0865384615384616"
## $ AccuracyLower : chr "0.244474689466196"
## $ AccuracyUpper : chr "0.711356752083001"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue: chr "0.999504530247808"
## $ McnemarPValue : chr "0.0268566955075243"
hyp_icd_any_test_cfa
hyp_icd_any_test_cfs <- as.data.frame(as.matrix(hyp_icd_any_test_cf, what = "classes"))
hyp_icd_any_test_cfs$row_names <- row.names(as.matrix(hyp_icd_any_test_cf, what = "classes"))
hyp_icd_any_test_cfs[nrow(hyp_icd_any_test_cfs) + 1,] = c("hyp_icd_any_test","Data_Type")
row.names(hyp_icd_any_test_cfs) <- str_replace_all(hyp_icd_any_test_cfs$row_names," ","_")
hyp_icd_any_test_cfs <- as.data.frame(t(hyp_icd_any_test_cfs))
hyp_icd_any_test_cfs <- hyp_icd_any_test_cfs[1,c(12,1:11)]
row.names(hyp_icd_any_test_cfs) <- NULL
str(hyp_icd_any_test_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_any_test"
## $ Sensitivity : chr "0.4"
## $ Specificity : chr "0.75"
## $ Pos_Pred_Value : chr "0.857142857142857"
## $ Neg_Pred_Value : chr "0.25"
## $ Precision : chr "0.857142857142857"
## $ Recall : chr "0.4"
## $ F1 : chr "0.545454545454546"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.315789473684211"
## $ Detection_Prevalence: chr "0.368421052631579"
## $ Balanced_Accuracy : chr "0.575"
hyp_icd_any_test_cfs
hyp_icd_any_test_cfa <- hyp_icd_any_test_cfa %>%
inner_join(hyp_icd_any_test_cfs, by = "Data_Type")
rm(hyp_icd_any_test_cfs)
str(hyp_icd_any_test_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_any_test"
## $ Accuracy : chr "0.473684210526316"
## $ Kappa : chr "0.0865384615384616"
## $ AccuracyLower : chr "0.244474689466196"
## $ AccuracyUpper : chr "0.711356752083001"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue : chr "0.999504530247808"
## $ McnemarPValue : chr "0.0268566955075243"
## $ Sensitivity : chr "0.4"
## $ Specificity : chr "0.75"
## $ Pos_Pred_Value : chr "0.857142857142857"
## $ Neg_Pred_Value : chr "0.25"
## $ Precision : chr "0.857142857142857"
## $ Recall : chr "0.4"
## $ F1 : chr "0.545454545454546"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.315789473684211"
## $ Detection_Prevalence: chr "0.368421052631579"
## $ Balanced_Accuracy : chr "0.575"
hyp_icd_any_test_cfa
knitr::kable(t(hyp_icd_any_test_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_any_test |
| Accuracy | 0.473684210526316 |
| Sensitivity | 0.4 |
| Specificity | 0.75 |
| Pos_Pred_Value | 0.857142857142857 |
| Neg_Pred_Value | 0.25 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_any_test_cfa)
Validation: hyp_icd_4019_test using testing data
hyp_icd_4019_test_cf <- testing %>%
left_join(hyp_icd_4019) %>%
mutate(hyp_icd_4019_test = coalesce(hyp_icd_4019, 0)) %>%
collect() %>%
getStats(hyp_icd_4019_test, HYPERTENSION)
hyp_icd_4019_test_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd_4019_test 1 0
## 1 5 1
## 0 10 3
##
## Accuracy : 0.4211
## 95% CI : (0.2025, 0.665)
## No Information Rate : 0.7895
## P-Value [Acc > NIR] : 0.99992
##
## Kappa : 0.0457
##
## Mcnemar's Test P-Value : 0.01586
##
## Sensitivity : 0.3333
## Specificity : 0.7500
## Pos Pred Value : 0.8333
## Neg Pred Value : 0.2308
## Prevalence : 0.7895
## Detection Rate : 0.2632
## Detection Prevalence : 0.3158
## Balanced Accuracy : 0.5417
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd_4019_test_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd_4019_test_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 5 10 1 3
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd_4019_test: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.4211 0.0457 0.2025 0.665 0.7895 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.333 0.75 0.833 0.231 0.833 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd_4019_test_cfa <- as.data.frame(as.matrix(hyp_icd_4019_test_cf, what = "overall"))
hyp_icd_4019_test_cfa$row_names <- row.names(as.matrix(hyp_icd_4019_test_cf, what = "overall"))
hyp_icd_4019_test_cfa[nrow(hyp_icd_4019_test_cfa) + 1,] = c("hyp_icd_4019_test","Data_Type")
row.names(hyp_icd_4019_test_cfa) <- str_replace_all(hyp_icd_4019_test_cfa$row_names," ","_")
hyp_icd_4019_test_cfa <- as.data.frame(t(hyp_icd_4019_test_cfa))
hyp_icd_4019_test_cfa <- hyp_icd_4019_test_cfa[1,c(8,1:7)]
row.names(hyp_icd_4019_test_cfa) <- NULL
str(hyp_icd_4019_test_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd_4019_test"
## $ Accuracy : chr "0.421052631578947"
## $ Kappa : chr "0.045662100456621"
## $ AccuracyLower : chr "0.202521438977163"
## $ AccuracyUpper : chr "0.665002155988265"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue: chr "0.999915197373638"
## $ McnemarPValue : chr "0.015861332739773"
hyp_icd_4019_test_cfa
hyp_icd_4019_test_cfs <- as.data.frame(as.matrix(hyp_icd_4019_test_cf, what = "classes"))
hyp_icd_4019_test_cfs$row_names <- row.names(as.matrix(hyp_icd_4019_test_cf, what = "classes"))
hyp_icd_4019_test_cfs[nrow(hyp_icd_4019_test_cfs) + 1,] = c("hyp_icd_4019_test","Data_Type")
row.names(hyp_icd_4019_test_cfs) <- str_replace_all(hyp_icd_4019_test_cfs$row_names," ","_")
hyp_icd_4019_test_cfs <- as.data.frame(t(hyp_icd_4019_test_cfs))
hyp_icd_4019_test_cfs <- hyp_icd_4019_test_cfs[1,c(12,1:11)]
row.names(hyp_icd_4019_test_cfs) <- NULL
str(hyp_icd_4019_test_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd_4019_test"
## $ Sensitivity : chr "0.333333333333333"
## $ Specificity : chr "0.75"
## $ Pos_Pred_Value : chr "0.833333333333333"
## $ Neg_Pred_Value : chr "0.230769230769231"
## $ Precision : chr "0.833333333333333"
## $ Recall : chr "0.333333333333333"
## $ F1 : chr "0.476190476190476"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.263157894736842"
## $ Detection_Prevalence: chr "0.315789473684211"
## $ Balanced_Accuracy : chr "0.541666666666667"
hyp_icd_4019_test_cfs
hyp_icd_4019_test_cfa <- hyp_icd_4019_test_cfa %>%
inner_join(hyp_icd_4019_test_cfs, by = "Data_Type")
rm(hyp_icd_4019_test_cfs)
str(hyp_icd_4019_test_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd_4019_test"
## $ Accuracy : chr "0.421052631578947"
## $ Kappa : chr "0.045662100456621"
## $ AccuracyLower : chr "0.202521438977163"
## $ AccuracyUpper : chr "0.665002155988265"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue : chr "0.999915197373638"
## $ McnemarPValue : chr "0.015861332739773"
## $ Sensitivity : chr "0.333333333333333"
## $ Specificity : chr "0.75"
## $ Pos_Pred_Value : chr "0.833333333333333"
## $ Neg_Pred_Value : chr "0.230769230769231"
## $ Precision : chr "0.833333333333333"
## $ Recall : chr "0.333333333333333"
## $ F1 : chr "0.476190476190476"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.263157894736842"
## $ Detection_Prevalence: chr "0.315789473684211"
## $ Balanced_Accuracy : chr "0.541666666666667"
hyp_icd_4019_test_cfa
knitr::kable(t(hyp_icd_4019_test_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd_4019_test |
| Accuracy | 0.421052631578947 |
| Sensitivity | 0.333333333333333 |
| Specificity | 0.75 |
| Pos_Pred_Value | 0.833333333333333 |
| Neg_Pred_Value | 0.230769230769231 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd_4019_test_cfa)
Validation: hyp_rx_min2_test using testing data
hyp_rx_min2_test_cf <- testing %>%
left_join(hyp_rx_min2) %>%
mutate(hyp_rx_min2_test = coalesce(hyp_rx_min2, 0)) %>%
collect() %>%
getStats(hyp_rx_min2_test, HYPERTENSION)
hyp_rx_min2_test_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_rx_min2_test 1 0
## 1 12 2
## 0 3 2
##
## Accuracy : 0.7368
## 95% CI : (0.488, 0.9085)
## No Information Rate : 0.7895
## P-Value [Acc > NIR] : 0.8054
##
## Kappa : 0.2748
##
## Mcnemar's Test P-Value : 1.0000
##
## Sensitivity : 0.8000
## Specificity : 0.5000
## Pos Pred Value : 0.8571
## Neg Pred Value : 0.4000
## Prevalence : 0.7895
## Detection Rate : 0.6316
## Detection Prevalence : 0.7368
## Balanced Accuracy : 0.6500
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_rx_min2_test_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_rx_min2_test_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 12 3 2 2
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_rx_min2_test: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.737 0.275 0.488 0.909 0.789 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.8 0.5 0.857 0.4 0.857 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_rx_min2_test_cfa <- as.data.frame(as.matrix(hyp_rx_min2_test_cf, what = "overall"))
hyp_rx_min2_test_cfa$row_names <- row.names(as.matrix(hyp_rx_min2_test_cf, what = "overall"))
hyp_rx_min2_test_cfa[nrow(hyp_rx_min2_test_cfa) + 1,] = c("hyp_rx_min2_test","Data_Type")
row.names(hyp_rx_min2_test_cfa) <- str_replace_all(hyp_rx_min2_test_cfa$row_names," ","_")
hyp_rx_min2_test_cfa <- as.data.frame(t(hyp_rx_min2_test_cfa))
hyp_rx_min2_test_cfa <- hyp_rx_min2_test_cfa[1,c(8,1:7)]
row.names(hyp_rx_min2_test_cfa) <- NULL
str(hyp_rx_min2_test_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_rx_min2_test"
## $ Accuracy : chr "0.736842105263158"
## $ Kappa : chr "0.274809160305343"
## $ AccuracyLower : chr "0.487970654654127"
## $ AccuracyUpper : chr "0.908534215092333"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue: chr "0.805444466849716"
## $ McnemarPValue : chr "1"
hyp_rx_min2_test_cfa
hyp_rx_min2_test_cfs <- as.data.frame(as.matrix(hyp_rx_min2_test_cf, what = "classes"))
hyp_rx_min2_test_cfs$row_names <- row.names(as.matrix(hyp_rx_min2_test_cf, what = "classes"))
hyp_rx_min2_test_cfs[nrow(hyp_rx_min2_test_cfs) + 1,] = c("hyp_rx_min2_test","Data_Type")
row.names(hyp_rx_min2_test_cfs) <- str_replace_all(hyp_rx_min2_test_cfs$row_names," ","_")
hyp_rx_min2_test_cfs <- as.data.frame(t(hyp_rx_min2_test_cfs))
hyp_rx_min2_test_cfs <- hyp_rx_min2_test_cfs[1,c(12,1:11)]
row.names(hyp_rx_min2_test_cfs) <- NULL
str(hyp_rx_min2_test_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_rx_min2_test"
## $ Sensitivity : chr "0.8"
## $ Specificity : chr "0.5"
## $ Pos_Pred_Value : chr "0.857142857142857"
## $ Neg_Pred_Value : chr "0.4"
## $ Precision : chr "0.857142857142857"
## $ Recall : chr "0.8"
## $ F1 : chr "0.827586206896552"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.631578947368421"
## $ Detection_Prevalence: chr "0.736842105263158"
## $ Balanced_Accuracy : chr "0.65"
hyp_rx_min2_test_cfs
hyp_rx_min2_test_cfa <- hyp_rx_min2_test_cfa %>%
inner_join(hyp_rx_min2_test_cfs, by = "Data_Type")
rm(hyp_rx_min2_test_cfs)
str(hyp_rx_min2_test_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_rx_min2_test"
## $ Accuracy : chr "0.736842105263158"
## $ Kappa : chr "0.274809160305343"
## $ AccuracyLower : chr "0.487970654654127"
## $ AccuracyUpper : chr "0.908534215092333"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue : chr "0.805444466849716"
## $ McnemarPValue : chr "1"
## $ Sensitivity : chr "0.8"
## $ Specificity : chr "0.5"
## $ Pos_Pred_Value : chr "0.857142857142857"
## $ Neg_Pred_Value : chr "0.4"
## $ Precision : chr "0.857142857142857"
## $ Recall : chr "0.8"
## $ F1 : chr "0.827586206896552"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.631578947368421"
## $ Detection_Prevalence: chr "0.736842105263158"
## $ Balanced_Accuracy : chr "0.65"
hyp_rx_min2_test_cfa
knitr::kable(t(hyp_rx_min2_test_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_rx_min2_test |
| Accuracy | 0.736842105263158 |
| Sensitivity | 0.8 |
| Specificity | 0.5 |
| Pos_Pred_Value | 0.857142857142857 |
| Neg_Pred_Value | 0.4 |
phenotypes_cf <- union(phenotypes_cf,hyp_rx_min2_test_cfa)
Validation: hyp_icd4019_or_rx_any_test using testing data
hyp_icd4019_or_rx_any_test_cf <- testing %>%
left_join(hyp_icd_4019) %>%
left_join(hyp_rx_any) %>%
mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
hyp_rx_any = coalesce(hyp_rx_any, 0)) %>%
mutate(hyp_icd4019_or_rx_any = case_when(hyp_icd_4019 == 1 |
hyp_rx_any == 1 ~ 1,
TRUE ~ 0)) %>%
collect() %>%
getStats(hyp_icd4019_or_rx_any, HYPERTENSION)
hyp_icd4019_or_rx_any_test_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd4019_or_rx_any 1 0
## 1 14 2
## 0 1 2
##
## Accuracy : 0.8421
## 95% CI : (0.6042, 0.9662)
## No Information Rate : 0.7895
## P-Value [Acc > NIR] : 0.4101
##
## Kappa : 0.4771
##
## Mcnemar's Test P-Value : 1.0000
##
## Sensitivity : 0.9333
## Specificity : 0.5000
## Pos Pred Value : 0.8750
## Neg Pred Value : 0.6667
## Prevalence : 0.7895
## Detection Rate : 0.7368
## Detection Prevalence : 0.8421
## Balanced Accuracy : 0.7167
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd4019_or_rx_any_test_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_any_test_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 14 1 2 2
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd4019_or_rx_any: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.842 0.477 0.604 0.966 0.789 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.933 0.5 0.875 0.667 0.875 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd4019_or_rx_any_test_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "overall"))
hyp_icd4019_or_rx_any_test_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "overall"))
hyp_icd4019_or_rx_any_test_cfa[nrow(hyp_icd4019_or_rx_any_test_cfa) + 1,] = c("hyp_icd4019_or_rx_any_test","Data_Type")
row.names(hyp_icd4019_or_rx_any_test_cfa) <- str_replace_all(hyp_icd4019_or_rx_any_test_cfa$row_names," ","_")
hyp_icd4019_or_rx_any_test_cfa <- as.data.frame(t(hyp_icd4019_or_rx_any_test_cfa))
hyp_icd4019_or_rx_any_test_cfa <- hyp_icd4019_or_rx_any_test_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_any_test_cfa) <- NULL
str(hyp_icd4019_or_rx_any_test_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_any_test"
## $ Accuracy : chr "0.842105263157895"
## $ Kappa : chr "0.477064220183486"
## $ AccuracyLower : chr "0.604215448733326"
## $ AccuracyUpper : chr "0.966173750998235"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue: chr "0.410126360334181"
## $ McnemarPValue : chr "1"
hyp_icd4019_or_rx_any_test_cfa
hyp_icd4019_or_rx_any_test_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "classes"))
hyp_icd4019_or_rx_any_test_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_any_test_cf, what = "classes"))
hyp_icd4019_or_rx_any_test_cfs[nrow(hyp_icd4019_or_rx_any_test_cfs) + 1,] = c("hyp_icd4019_or_rx_any_test","Data_Type")
row.names(hyp_icd4019_or_rx_any_test_cfs) <- str_replace_all(hyp_icd4019_or_rx_any_test_cfs$row_names," ","_")
hyp_icd4019_or_rx_any_test_cfs <- as.data.frame(t(hyp_icd4019_or_rx_any_test_cfs))
hyp_icd4019_or_rx_any_test_cfs <- hyp_icd4019_or_rx_any_test_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_any_test_cfs) <- NULL
str(hyp_icd4019_or_rx_any_test_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_any_test"
## $ Sensitivity : chr "0.933333333333333"
## $ Specificity : chr "0.5"
## $ Pos_Pred_Value : chr "0.875"
## $ Neg_Pred_Value : chr "0.666666666666667"
## $ Precision : chr "0.875"
## $ Recall : chr "0.933333333333333"
## $ F1 : chr "0.903225806451613"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.736842105263158"
## $ Detection_Prevalence: chr "0.842105263157895"
## $ Balanced_Accuracy : chr "0.716666666666667"
hyp_icd4019_or_rx_any_test_cfs
hyp_icd4019_or_rx_any_test_cfa <- hyp_icd4019_or_rx_any_test_cfa %>%
inner_join(hyp_icd4019_or_rx_any_test_cfs, by = "Data_Type")
rm(hyp_icd4019_or_rx_any_test_cfs)
str(hyp_icd4019_or_rx_any_test_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_any_test"
## $ Accuracy : chr "0.842105263157895"
## $ Kappa : chr "0.477064220183486"
## $ AccuracyLower : chr "0.604215448733326"
## $ AccuracyUpper : chr "0.966173750998235"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue : chr "0.410126360334181"
## $ McnemarPValue : chr "1"
## $ Sensitivity : chr "0.933333333333333"
## $ Specificity : chr "0.5"
## $ Pos_Pred_Value : chr "0.875"
## $ Neg_Pred_Value : chr "0.666666666666667"
## $ Precision : chr "0.875"
## $ Recall : chr "0.933333333333333"
## $ F1 : chr "0.903225806451613"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.736842105263158"
## $ Detection_Prevalence: chr "0.842105263157895"
## $ Balanced_Accuracy : chr "0.716666666666667"
hyp_icd4019_or_rx_any_test_cfa
knitr::kable(t(hyp_icd4019_or_rx_any_test_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd4019_or_rx_any_test |
| Accuracy | 0.842105263157895 |
| Sensitivity | 0.933333333333333 |
| Specificity | 0.5 |
| Pos_Pred_Value | 0.875 |
| Neg_Pred_Value | 0.666666666666667 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_any_test_cfa)
Validation: hyp_icd4019_or_rx_min2_test using testing data
hyp_icd4019_or_rx_min2_test_cf <- testing %>%
left_join(hyp_icd_4019) %>%
left_join(hyp_rx_min2) %>%
mutate(hyp_icd_4019 = coalesce(hyp_icd_4019, 0),
hyp_rx_min2 = coalesce(hyp_rx_min2, 0)) %>%
mutate(hyp_icd4019_or_rx_min2 = case_when(hyp_icd_4019 == 1 |
hyp_rx_min2 == 1 ~ 1,
TRUE ~ 0)) %>%
collect() %>%
getStats(hyp_icd4019_or_rx_min2, HYPERTENSION)
hyp_icd4019_or_rx_min2_test_cf
## Confusion Matrix and Statistics
##
## HYPERTENSION
## hyp_icd4019_or_rx_min2 1 0
## 1 13 2
## 0 2 2
##
## Accuracy : 0.7895
## 95% CI : (0.5443, 0.9395)
## No Information Rate : 0.7895
## P-Value [Acc > NIR] : 0.6297
##
## Kappa : 0.3667
##
## Mcnemar's Test P-Value : 1.0000
##
## Sensitivity : 0.8667
## Specificity : 0.5000
## Pos Pred Value : 0.8667
## Neg Pred Value : 0.5000
## Prevalence : 0.7895
## Detection Rate : 0.6842
## Detection Prevalence : 0.7895
## Balanced Accuracy : 0.6833
##
## 'Positive' Class : 1
##
fourfoldplot(hyp_icd4019_or_rx_min2_test_cf$table)
#export ConfusionMatrix numbers output to tibble row
str(hyp_icd4019_or_rx_min2_test_cf)
## List of 6
## $ positive: chr "1"
## $ table : 'table' int [1:2, 1:2] 13 2 2 2
## ..- attr(*, "dimnames")=List of 2
## .. ..$ hyp_icd4019_or_rx_min2: chr [1:2] "1" "0"
## .. ..$ HYPERTENSION : chr [1:2] "1" "0"
## $ overall : Named num [1:7] 0.789 0.367 0.544 0.939 0.789 ...
## ..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
## $ byClass : Named num [1:11] 0.867 0.5 0.867 0.5 0.867 ...
## ..- attr(*, "names")= chr [1:11] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
## $ mode : chr "sens_spec"
## $ dots : list()
## - attr(*, "class")= chr "confusionMatrix"
hyp_icd4019_or_rx_min2_test_cfa <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "overall"))
hyp_icd4019_or_rx_min2_test_cfa$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "overall"))
hyp_icd4019_or_rx_min2_test_cfa[nrow(hyp_icd4019_or_rx_min2_test_cfa) + 1,] = c("hyp_icd4019_or_rx_min2_test","Data_Type")
row.names(hyp_icd4019_or_rx_min2_test_cfa) <- str_replace_all(hyp_icd4019_or_rx_min2_test_cfa$row_names," ","_")
hyp_icd4019_or_rx_min2_test_cfa <- as.data.frame(t(hyp_icd4019_or_rx_min2_test_cfa))
hyp_icd4019_or_rx_min2_test_cfa <- hyp_icd4019_or_rx_min2_test_cfa[1,c(8,1:7)]
row.names(hyp_icd4019_or_rx_min2_test_cfa) <- NULL
str(hyp_icd4019_or_rx_min2_test_cfa)
## 'data.frame': 1 obs. of 8 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_min2_test"
## $ Accuracy : chr "0.789473684210526"
## $ Kappa : chr "0.366666666666667"
## $ AccuracyLower : chr "0.544346918108494"
## $ AccuracyUpper : chr "0.93947546229071"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue: chr "0.629747530620589"
## $ McnemarPValue : chr "1"
hyp_icd4019_or_rx_min2_test_cfa
hyp_icd4019_or_rx_min2_test_cfs <- as.data.frame(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "classes"))
hyp_icd4019_or_rx_min2_test_cfs$row_names <- row.names(as.matrix(hyp_icd4019_or_rx_min2_test_cf, what = "classes"))
hyp_icd4019_or_rx_min2_test_cfs[nrow(hyp_icd4019_or_rx_min2_test_cfs) + 1,] = c("hyp_icd4019_or_rx_min2_test","Data_Type")
row.names(hyp_icd4019_or_rx_min2_test_cfs) <- str_replace_all(hyp_icd4019_or_rx_min2_test_cfs$row_names," ","_")
hyp_icd4019_or_rx_min2_test_cfs <- as.data.frame(t(hyp_icd4019_or_rx_min2_test_cfs))
hyp_icd4019_or_rx_min2_test_cfs <- hyp_icd4019_or_rx_min2_test_cfs[1,c(12,1:11)]
row.names(hyp_icd4019_or_rx_min2_test_cfs) <- NULL
str(hyp_icd4019_or_rx_min2_test_cfs)
## 'data.frame': 1 obs. of 12 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_min2_test"
## $ Sensitivity : chr "0.866666666666667"
## $ Specificity : chr "0.5"
## $ Pos_Pred_Value : chr "0.866666666666667"
## $ Neg_Pred_Value : chr "0.5"
## $ Precision : chr "0.866666666666667"
## $ Recall : chr "0.866666666666667"
## $ F1 : chr "0.866666666666667"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.684210526315789"
## $ Detection_Prevalence: chr "0.789473684210526"
## $ Balanced_Accuracy : chr "0.683333333333333"
hyp_icd4019_or_rx_min2_test_cfs
hyp_icd4019_or_rx_min2_test_cfa <- hyp_icd4019_or_rx_min2_test_cfa %>%
inner_join(hyp_icd4019_or_rx_min2_test_cfs, by = "Data_Type")
rm(hyp_icd4019_or_rx_min2_test_cfs)
str(hyp_icd4019_or_rx_min2_test_cfa)
## 'data.frame': 1 obs. of 19 variables:
## $ Data_Type : chr "hyp_icd4019_or_rx_min2_test"
## $ Accuracy : chr "0.789473684210526"
## $ Kappa : chr "0.366666666666667"
## $ AccuracyLower : chr "0.544346918108494"
## $ AccuracyUpper : chr "0.93947546229071"
## $ AccuracyNull : chr "0.789473684210526"
## $ AccuracyPValue : chr "0.629747530620589"
## $ McnemarPValue : chr "1"
## $ Sensitivity : chr "0.866666666666667"
## $ Specificity : chr "0.5"
## $ Pos_Pred_Value : chr "0.866666666666667"
## $ Neg_Pred_Value : chr "0.5"
## $ Precision : chr "0.866666666666667"
## $ Recall : chr "0.866666666666667"
## $ F1 : chr "0.866666666666667"
## $ Prevalence : chr "0.789473684210526"
## $ Detection_Rate : chr "0.684210526315789"
## $ Detection_Prevalence: chr "0.789473684210526"
## $ Balanced_Accuracy : chr "0.683333333333333"
hyp_icd4019_or_rx_min2_test_cfa
knitr::kable(t(hyp_icd4019_or_rx_min2_test_cfa[,c(1,2,9,10,11,12)]), "simple")
| Data_Type | hyp_icd4019_or_rx_min2_test |
| Accuracy | 0.789473684210526 |
| Sensitivity | 0.866666666666667 |
| Specificity | 0.5 |
| Pos_Pred_Value | 0.866666666666667 |
| Neg_Pred_Value | 0.5 |
phenotypes_cf <- union(phenotypes_cf,hyp_icd4019_or_rx_min2_test_cfa)
List out metrics:
count(phenotypes_cf)
str(phenotypes_cf)
## 'data.frame': 15 obs. of 19 variables:
## $ Data_Type : chr "hyper_bp" "hyp_icd_any" "hyp_icd_4010" "hyp_icd_4011" ...
## $ Accuracy : chr "0.4" "0.7375" "0.4" "0.4125" ...
## $ Kappa : chr "0" "0.497607655502392" "0" "0.0167364016736401" ...
## $ AccuracyLower : chr "0.29200935625531" "0.627149169787568" "0.29200935625531" "0.303524592937594" ...
## $ AccuracyUpper : chr "0.51562305950451" "0.829590748899592" "0.51562305950451" "0.528151659574796" ...
## $ AccuracyNull : chr "0.6" "0.6" "0.6" "0.6" ...
## $ AccuracyPValue : chr "0.999899723274404" "0.00717992545660734" "0.999899723274404" "0.999761568218371" ...
## $ McnemarPValue : chr "1.17002131170662e-11" "0.000480341199981871" "1.17002131170662e-11" "1.94905225611693e-11" ...
## $ Sensitivity : chr "0" "0.604166666666667" "0" "0.0208333333333333" ...
## $ Specificity : chr "1" "0.9375" "1" "1" ...
## $ Pos_Pred_Value : chr "NaN" "0.935483870967742" "NaN" "1" ...
## $ Neg_Pred_Value : chr "0.4" "0.612244897959184" "0.4" "0.405063291139241" ...
## $ Precision : chr NA "0.935483870967742" NA "1" ...
## $ Recall : chr "0" "0.604166666666667" "0" "0.0208333333333333" ...
## $ F1 : chr NA "0.734177215189873" NA "0.0408163265306122" ...
## $ Prevalence : chr "0.6" "0.6" "0.6" "0.6" ...
## $ Detection_Rate : chr "0" "0.3625" "0" "0.0125" ...
## $ Detection_Prevalence: chr "0" "0.3875" "0" "0.0125" ...
## $ Balanced_Accuracy : chr "0.5" "0.770833333333333" "0.5" "0.510416666666667" ...
phenotypes_cf[,c(1,2,9,10,11,12)]
knitr::kable(phenotypes_cf[,c(1,2,9,10,11,12)], "simple")
| Data_Type | Accuracy | Sensitivity | Specificity | Pos_Pred_Value | Neg_Pred_Value |
|---|---|---|---|---|---|
| hyper_bp | 0.4 | 0 | 1 | NaN | 0.4 |
| hyp_icd_any | 0.7375 | 0.604166666666667 | 0.9375 | 0.935483870967742 | 0.612244897959184 |
| hyp_icd_4010 | 0.4 | 0 | 1 | NaN | 0.4 |
| hyp_icd_4011 | 0.4125 | 0.0208333333333333 | 1 | 1 | 0.405063291139241 |
| hyp_icd_4019 | 0.725 | 0.583333333333333 | 0.9375 | 0.933333333333333 | 0.6 |
| hyp_icd_4019_min2 | 0.4625 | 0.104166666666667 | 1 | 1 | 0.426666666666667 |
| hyp_rx_any | 0.6125 | 0.8125 | 0.3125 | 0.639344262295082 | 0.526315789473684 |
| hyp_rx_min2 | 0.625 | 0.8125 | 0.34375 | 0.65 | 0.55 |
| hyp_icd4019_or_rx_any | 0.6375 | 0.854166666666667 | 0.3125 | 0.650793650793651 | 0.588235294117647 |
| hyp_icd4019_or_rx_min2 | 0.65 | 0.854166666666667 | 0.34375 | 0.661290322580645 | 0.611111111111111 |
| hyp_icd_any_test | 0.473684210526316 | 0.4 | 0.75 | 0.857142857142857 | 0.25 |
| hyp_icd_4019_test | 0.421052631578947 | 0.333333333333333 | 0.75 | 0.833333333333333 | 0.230769230769231 |
| hyp_rx_min2_test | 0.736842105263158 | 0.8 | 0.5 | 0.857142857142857 | 0.4 |
| hyp_icd4019_or_rx_any_test | 0.842105263157895 | 0.933333333333333 | 0.5 | 0.875 | 0.666666666666667 |
| hyp_icd4019_or_rx_min2_test | 0.789473684210526 | 0.866666666666667 | 0.5 | 0.866666666666667 | 0.5 |
write.csv(phenotypes_cf,'phenotypes_cf.csv')
Save pngs of plots:
png('hyper_bp_%d.png')
fourfoldplot(hyper_bp_cf$table)
## Warning in sqrt(odds(tab)$or): NaNs produced
fourfoldplot(hyp_icd_any_cf$table)
fourfoldplot(hyp_icd_4010_cf$table)
## Warning in sqrt(odds(tab)$or): NaNs produced
fourfoldplot(hyp_icd_4011_cf$table)
fourfoldplot(hyp_icd_4019_cf$table)
fourfoldplot(hyp_icd_4019_min2_cf$table)
fourfoldplot(hyp_rx_any_cf$table)
fourfoldplot(hyp_rx_min2_cf$table)
fourfoldplot(hyp_icd4019_or_rx_any_cf$table)
fourfoldplot(hyp_icd4019_or_rx_min2_cf$table)
fourfoldplot(hyp_icd_any_test_cf$table)
fourfoldplot(hyp_icd_4019_test_cf$table)
fourfoldplot(hyp_rx_min2_test_cf$table)
fourfoldplot(hyp_icd4019_or_rx_any_test_cf$table)
fourfoldplot(hyp_icd4019_or_rx_min2_test_cf$table)
dev.off()
## png
## 2