May 5, 2017

Establish ODBC Connection within R

require(RODBC)
## Loading required package: RODBC
## Warning: package 'RODBC' was built under R version 3.3.1
## Open connections to SPARCS databases
my_channel = odbcConnect(dsn = "SPARCS")

SQL query

## develop a sql query as a text string
my_query = "
SELECT 
nycres2017, PCOUNTY, AGE_N, SEX, ETHNICITY, PRINDIAG, ODIAG2, ODIAG3 
FROM IPSPC10_0615 
WHERE HSA=7 
AND PCOUNTY IN (58, 59, 60, 61, 62) 
AND AGE_N >= 18
"
## FYI "IPSPC10_0615" IP = inpatient, SPC = SPARCS, 
## 10 = year of data, _0615 = date of data update

store the sql query as a dataframe

my_dataframe = sqlQuery(channel = my_channel, query = my_query)

## subsetting dataframe to return top 5 rows
my_dataframe = head(my_dataframe, 5)

optional: to protect patient cofidentiality, overwrite data with dummy data

my_dataframe$PCOUNTY = 58:62 # random county
my_dataframe$ETHNICITY = c(rep("Latino",3),rep("Black",2)) # random race
my_dataframe$AGE_N = round(runif(5,20,50)) # random age
my_dataframe$SEX = c(rep("F",2),rep("M",3)) # random sex

how do we query strings?

  • regular expressions (regex) were orinally develop in Perl
  • regex is like "wildcards on steroids"
  • different characters allow us flexibility in text string queries source

examples of regex

  • what if we want to query PrinDiag codes that begin with "29" and the third digit is 1–7
## use grep to return the matches
grep(pattern = "^29[1-7]", 
     x = my_dataframe$PRINDIAG, 
     perl = TRUE, value  = T)
## [1] "29530" "29590" "29532" "29534"
## use grepl to return the logical vector 
my_logical = grepl(pattern = "^29[1-7]", 
                   x = my_dataframe$PRINDIAG, 
                   perl = TRUE)

examples of regex

## use logical vector to index dataframe
my_dataframe[my_logical , ]
##   nycres2017 PCOUNTY AGE_N SEX ETHNICITY PRINDIAG ODIAG2 ODIAG3
## 2          1      59    23   F    Latino    29530   7840   7804
## 3          1      60    22   M    Latino    29590  29653  30431
## 4          1      61    29   M     Black    29532  25000   4019
## 5          1      62    28   M     Black    29534  V1581   <NA>

examples of regex with functions and apply

  • what if we want to repeat this analysis for ODIAG2 and ODIAG 3?
  • specifically we want to return hospitalizations where any diagnosis matches this pattern
  1. develop a function
  2. apply the user created function to the other vectors

develop a function that looks for hospital codes and returns a logical vector

my_hospital_codes = function(my_vec){
  grepl(pattern = "^29[1-7]", x = my_vec, perl = TRUE)
  }

apply the user created function to the other diagnosis vectors

  • sapply() allows us to repeat a function
  • "s" represent simple — returning the simplest data object
## returns logical matrix
my_log_mat = sapply(X = my_dataframe[,c("PRINDIAG","ODIAG2","ODIAG3")]
                    , FUN = my_hospital_codes)
my_log_mat
##      PRINDIAG ODIAG2 ODIAG3
## [1,]    FALSE  FALSE  FALSE
## [2,]     TRUE  FALSE  FALSE
## [3,]     TRUE   TRUE  FALSE
## [4,]     TRUE  FALSE  FALSE
## [5,]     TRUE  FALSE  FALSE

return original dataframe

## determines when a row sum > 0 (when any row is TRUE)
rowSums(my_log_mat) > 0
## [1] FALSE  TRUE  TRUE  TRUE  TRUE
## subset original data frame
my_dataframe[rowSums(my_log_mat) > 0,]
##   nycres2017 PCOUNTY AGE_N SEX ETHNICITY PRINDIAG ODIAG2 ODIAG3
## 2          1      59    23   F    Latino    29530   7840   7804
## 3          1      60    22   M    Latino    29590  29653  30431
## 4          1      61    29   M     Black    29532  25000   4019
## 5          1      62    28   M     Black    29534  V1581   <NA>