Introduction

The purpose of this exploration is to illustrate use of the R ecosystem for healthcare data analysis, using the Medical Expenditure Panel Survey’s Prescribed Medicines data set (2014 file).

R Libraries

This exploration loads two R packages: tidyverse and SASxport.

library(SASxport)
library(tidyverse)

Data Description

69 Variables in Prescribed Medicines SAS file
documentation & codebook

x <- lookup.xport("h168a.ssp")
mat <- cbind(x[1]$H168A$name, x[1]$H168A$label, x[1]$H168A$type )
colnames(mat) <- c("VARIABLE", "LABEL", "TYPE")  # rownames(mat) <- NULL
knitr::kable(mat)
VARIABLE LABEL TYPE
DUID DWELLING UNIT ID numeric
PID PERSON NUMBER numeric
DUPERSID PERSON ID (DUID + PID) character
DRUGIDX DRUG ID (DUPERSID + COUNTER) character
RXRECIDX UNIQUE RX/PRESCRIBED MEDICINE IDENTIFIER character
LINKIDX ID FOR LINKAGE TO COND/OTH EVENT FILES character
PANEL PANEL NUMBER numeric
PURCHRD ROUND RX/PRESCR MED OBTAINED/PURCHASED numeric
RXBEGMM MONTH PERSON STARTED TAKING MEDICINE numeric
RXBEGYRX YEAR PERSON STARTED TAKING MEDICINE numeric
RXNAME MEDICINE NAME (IMPUTED) character
RXDRGNAM MULTUM MEDICINE NAME (IMPUTED) character
RXNDC NDC (IMPUTED) character
RXQUANTY QUANTITY OF RX/PRESCR MED (IMPUTED) numeric
RXFORM DOSAGE FORM (IMPUTED) character
RXFRMUNT QUANTITY UNIT OF MEDICATION (IMPUTED) character
RXSTRENG STRENGTH OF MEDICATION (IMPUTED) character
RXSTRUNT UNIT OF MEDICATION (IMPUTED) character
RXDAYSUP DAYS SUPPLIED OF PRESCRIBED MED(IMPUTED) numeric
PHARTP1 TYPE OF PHARMACY PROV - 1ST numeric
PHARTP2 TYPE OF PHARMACY PROV - 2ND numeric
PHARTP3 TYPE OF PHARMACY PROV - 3RD numeric
PHARTP4 TYPE OF PHARMACY PROV - 4TH numeric
PHARTP5 TYPE OF PHARMACY PROV - 5TH numeric
PHARTP6 TYPE OF PHARMACY PROV - 6TH numeric
PHARTP7 TYPE OF PHARMACY PROV - 7TH numeric
PHARTP8 TYPE OF PHARMACY PROV - 8TH numeric
RXFLG NDC IMPUTATION SOURCE ON PC DONOR REC numeric
IMPFLAG METHOD OF EXPENDITURE DATA CREATION numeric
PCIMPFLG TYPE OF HC TO PC PRESCRIPTION MATCH numeric
OMTYPE OTHER MEDICAL EXPENSE TYPE numeric
INPCFLG PID HAS AT LEAST 1 RECORD IN PC numeric
SAMPLE HOUSEHLD RCVD FREE SAMPLE OF RX IN ROUND numeric
RXCCC1X MODIFIED CLINICAL CLASS CODE character
RXCCC2X MODIFIED CLINICAL CLASS CODE character
RXCCC3X MODIFIED CLINICAL CLASS CODE character
PREGCAT MULTUM PREGNANCY CATEGORY character
TC1 MULTUM THERAPEUTIC CLASS #1 numeric
TC1S1 MULTUM THERAPEUTIC SUB-CLASS #1 FOR TC1 numeric
TC1S1_1 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S1 numeric
TC1S1_2 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S1 numeric
TC1S2 MULTUM THERAPEUTIC SUB-CLASS #2 FOR TC1 numeric
TC1S2_1 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S2 numeric
TC1S3 MULTUM THERAPEUTIC SUB-CLASS #3 FOR TC1 numeric
TC1S3_1 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S3 numeric
TC2 MULTUM THERAPEUTIC CLASS #2 numeric
TC2S1 MULTUM THERAPEUTIC SUB-CLASS #1 FOR TC2 numeric
TC2S1_1 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC2S1 numeric
TC2S1_2 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC2S1 numeric
TC2S2 MULTUM THERAPEUTIC SUB-CLASS #2 FOR TC2 numeric
TC3 MULTUM THERAPEUTIC CLASS #3 numeric
TC3S1 MULTUM THERAPEUTIC SUB-CLASS #1 FOR TC3 numeric
TC3S1_1 MULTUM THERAPEUT SUB-SUB-CLASS FOR TC3S1 numeric
RXSF14X AMOUNT PAID, SELF OR FAMILY (IMPUTED) numeric
RXMR14X AMOUNT PAID, MEDICARE (IMPUTED) numeric
RXMD14X AMOUNT PAID, MEDICAID (IMPUTED) numeric
RXPV14X AMOUNT PAID, PRIVATE INSURANCE (IMPUTED) numeric
RXVA14X AMOUNT PAID, VETERANS/CHAMPVA (IMPUTED) numeric
RXTR14X AMOUNT PAID, TRICARE (IMPUTED) numeric
RXOF14X AMOUNT PAID, OTHER FEDERAL (IMPUTED) numeric
RXSL14X AMOUNT PAID, STATE & LOCAL GOV (IMPUTED) numeric
RXWC14X AMOUNT PAID, WORKERS COMP (IMPUTED) numeric
RXOT14X AMOUNT PAID, OTHER INSURANCE (IMPUTED) numeric
RXOR14X AMOUNT PAID, OTHER PRIVATE (IMPUTED) numeric
RXOU14X AMOUNT PAID, OTHER PUBLIC (IMPUTED) numeric
RXXP14X SUM OF PAYMENTS RXSF14X-RXOU14X(IMPUTED) numeric
PERWT14F EXPENDITURE FILE PERSON WEIGHT, 2014 numeric
VARSTR VARIANCE ESTIMATION STRATUM, 2014 numeric
VARPSU VARIANCE ESTIMATION PSU, 2014 numeric

SAS File

Abbreviated representation of R data frame containing Prescribed Medicines SAS file content:

h168a <- tbl_df(read.xport("h168a.ssp"))
h168a
# A tibble: 319,467 × 69
             DUID            PID DUPERSID     DRUGIDX        RXRECIDX
   <S3: labelled> <S3: labelled>   <fctr>      <fctr>  <S3: labelled>
1           40001            101 40001101 40001101008 400011010161001
2           40001            101 40001101 40001101003 400011010171001
3           40001            101 40001101 40001101005 400011010181001
4           40001            101 40001101 40001101008 400011010191001
5           40001            101 40001101 40001101009 400011010201001
6           40001            102 40001102 40001102001 400011020121001
7           40001            102 40001102 40001102002 400011020131001
8           40001            102 40001102 40001102003 400011020141001
9           40001            102 40001102 40001102001 400011020151001
10          40001            102 40001102 40001102003 400011020161001
# ... with 319,457 more rows, and 64 more variables: LINKIDX <fctr>,
#   PANEL <S3: labelled>, PURCHRD <S3: labelled>, RXBEGMM <S3: labelled>,
#   RXBEGYRX <S3: labelled>, RXNAME <fctr>, RXDRGNAM <fctr>, RXNDC <fctr>,
#   RXQUANTY <S3: labelled>, RXFORM <fctr>, RXFRMUNT <fctr>,
#   RXSTRENG <fctr>, RXSTRUNT <fctr>, RXDAYSUP <S3: labelled>,
#   PHARTP1 <S3: labelled>, PHARTP2 <S3: labelled>, PHARTP3 <S3:
#   labelled>, PHARTP4 <S3: labelled>, PHARTP5 <S3: labelled>,
#   PHARTP6 <S3: labelled>, PHARTP7 <S3: labelled>, PHARTP8 <S3:
#   labelled>, RXFLG <S3: labelled>, IMPFLAG <S3: labelled>, PCIMPFLG <S3:
#   labelled>, OMTYPE <S3: labelled>, INPCFLG <S3: labelled>, SAMPLE <S3:
#   labelled>, RXCCC1X <fctr>, RXCCC2X <fctr>, RXCCC3X <fctr>,
#   PREGCAT <fctr>, TC1 <S3: labelled>, TC1S1 <S3: labelled>, TC1S1.1 <S3:
#   labelled>, TC1S1.2 <S3: labelled>, TC1S2 <S3: labelled>, TC1S2.1 <S3:
#   labelled>, TC1S3 <S3: labelled>, TC1S3.1 <S3: labelled>, TC2 <S3:
#   labelled>, TC2S1 <S3: labelled>, TC2S1.1 <S3: labelled>, TC2S1.2 <S3:
#   labelled>, TC2S2 <S3: labelled>, TC3 <S3: labelled>, TC3S1 <S3:
#   labelled>, TC3S1.1 <S3: labelled>, RXSF14X <S3: labelled>,
#   RXMR14X <S3: labelled>, RXMD14X <S3: labelled>, RXPV14X <S3:
#   labelled>, RXVA14X <S3: labelled>, RXTR14X <S3: labelled>,
#   RXOF14X <S3: labelled>, RXSL14X <S3: labelled>, RXWC14X <S3:
#   labelled>, RXOT14X <S3: labelled>, RXOR14X <S3: labelled>,
#   RXOU14X <S3: labelled>, RXXP14X <S3: labelled>, PERWT14F <S3:
#   labelled>, VARSTR <S3: labelled>, VARPSU <S3: labelled>

Visuals

RXNAME Density

x1 <- h168a %>% 
  group_by(RXNAME) %>% 
  summarise(count = n()) %>% 
  arrange(desc(count)) %>% 
  head(300)

x1 %>% ggplot(aes(x = reorder(RXNAME, count) , y = count,
  fontface = "bold", size = 2, fill = 0.05, alpha = 0.01 )) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  scale_x_discrete(limits = as.character(x1$RXNAME)) +
  theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())

Top RXNAMES

Top Prescribed Medicines

h168a %>% 
  group_by(RXNAME) %>% 
  summarise(count = n()) %>% 
  arrange(desc(count)) %>% 
  head(40) %>% 
  ggplot(aes(x = reorder(RXNAME, count) , y = count,
    fontface = "bold", size = 2, fill = 0.05, alpha = 0.01 )) +
    geom_bar(stat = 'identity') +
    theme(legend.position = "none") +
    labs(x = "RXNAME",  y = "Occurences in Prescribed Medicines 2014") +
    coord_flip()

DUPERSID Density

x2 <- h168a %>% 
  group_by(DUPERSID) %>% 
  summarise(count = n()) %>% 
  arrange(desc(count)) %>%  # max = 18963 
  head(9000)

x2 %>% ggplot(aes(x = reorder(DUPERSID, count) , y = count,
  fontface = "bold", size = 2, fill = 0.05, alpha = 0.01 )) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  scale_x_discrete(limits = as.character(x2$DUPERSID)) +
  theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())

Top DUPERSIDs

Top Patients using Prescribed Medicines

h168a %>% 
  group_by(DUPERSID) %>% 
  summarise(count = n()) %>% 
  arrange(desc(count)) %>% 
  head(50) %>% 
  ggplot(aes(x = reorder(DUPERSID, count) , y = count,
    fontface = "bold", size = 2, fill = 0.05, alpha = 0.01)) +
    geom_bar(stat = 'identity') +
    theme(legend.position = "none") +
    labs(x = "DUPERSID",  y = "Occurences in Prescribed Medicines 2014") +
    coord_flip()

Further Analysis

To request further analysis, please contact john.b.williams@gmail.com

R Session

devtools::session_info()
Session info --------------------------------------------------------------
 setting  value                       
 version  R version 3.3.1 (2016-06-21)
 system   x86_64, mingw32             
 ui       RTerm                       
 language (EN)                        
 collate  English_United States.1252  
 tz       America/New_York            
 date     2016-09-27                  
Packages ------------------------------------------------------------------
 package      * version  date       source                            
 acepack        1.3-3.3  2014-11-24 CRAN (R 3.3.1)                    
 assertthat     0.1      2013-12-06 CRAN (R 3.3.0)                    
 chron          2.3-47   2015-06-24 CRAN (R 3.3.1)                    
 cluster        2.0.4    2016-04-18 CRAN (R 3.3.1)                    
 colorspace     1.2-6    2015-03-11 CRAN (R 3.3.0)                    
 data.table     1.9.6    2015-09-19 CRAN (R 3.3.1)                    
 DBI            0.5-1    2016-09-10 CRAN (R 3.3.1)                    
 devtools       1.12.0   2016-06-24 CRAN (R 3.3.1)                    
 digest         0.6.10   2016-08-02 CRAN (R 3.3.1)                    
 dplyr        * 0.5.0    2016-06-24 CRAN (R 3.3.1)                    
 evaluate       0.9      2016-04-29 CRAN (R 3.3.0)                    
 foreign        0.8-66   2015-08-19 CRAN (R 3.3.1)                    
 formatR        1.4      2016-05-09 CRAN (R 3.3.0)                    
 Formula        1.2-1    2015-04-07 CRAN (R 3.3.1)                    
 ggplot2      * 2.1.0    2016-03-01 CRAN (R 3.3.0)                    
 gridExtra      2.2.1    2016-02-29 CRAN (R 3.3.0)                    
 gtable         0.2.0    2016-02-26 CRAN (R 3.3.0)                    
 Hmisc          3.17-4   2016-05-02 CRAN (R 3.3.1)                    
 htmltools      0.3.5    2016-03-21 CRAN (R 3.3.0)                    
 knitr          1.14     2016-08-13 CRAN (R 3.3.1)                    
 lattice        0.20-34  2016-09-06 CRAN (R 3.3.1)                    
 latticeExtra   0.6-28   2016-02-09 CRAN (R 3.3.1)                    
 magrittr       1.5      2014-11-22 CRAN (R 3.3.0)                    
 Matrix         1.2-6    2016-05-02 CRAN (R 3.3.1)                    
 memoise        1.0.0    2016-01-29 CRAN (R 3.3.0)                    
 munsell        0.4.3    2016-02-13 CRAN (R 3.3.0)                    
 nnet           7.3-12   2016-02-02 CRAN (R 3.3.1)                    
 plyr           1.8.4    2016-06-08 CRAN (R 3.3.0)                    
 purrr        * 0.2.2    2016-06-18 CRAN (R 3.3.1)                    
 R6             2.1.3    2016-08-19 CRAN (R 3.3.1)                    
 RColorBrewer   1.1-2    2014-12-07 CRAN (R 3.3.0)                    
 Rcpp           0.12.7   2016-09-05 CRAN (R 3.3.1)                    
 readr        * 1.0.0    2016-08-03 CRAN (R 3.3.1)                    
 rmarkdown      1.0.9014 2016-09-27 Github (rstudio/rmarkdown@81c2092)
 rpart          4.1-10   2015-06-29 CRAN (R 3.3.1)                    
 SASxport     * 1.5.3    2016-03-11 CRAN (R 3.3.1)                    
 scales         0.4.0    2016-02-26 CRAN (R 3.3.0)                    
 stringi        1.1.1    2016-05-27 CRAN (R 3.3.0)                    
 stringr        1.1.0    2016-08-19 CRAN (R 3.3.1)                    
 survival       2.39-4   2016-05-11 CRAN (R 3.3.1)                    
 tibble       * 1.2      2016-08-26 CRAN (R 3.3.1)                    
 tidyr        * 0.6.0    2016-08-12 CRAN (R 3.3.1)                    
 tidyverse    * 1.0.0    2016-09-09 CRAN (R 3.3.1)                    
 withr          1.0.2    2016-06-20 CRAN (R 3.3.1)                    
 yaml           2.1.13   2014-06-12 CRAN (R 3.3.0)