The purpose of this exploration is to illustrate use of the R ecosystem for healthcare data analysis, using the Medical Expenditure Panel Survey’s Prescribed Medicines data set (2014 file).
This exploration loads two R packages: tidyverse and SASxport.
library(SASxport)
library(tidyverse)69 Variables in Prescribed Medicines SAS file
documentation & codebook
x <- lookup.xport("h168a.ssp")
mat <- cbind(x[1]$H168A$name, x[1]$H168A$label, x[1]$H168A$type )
colnames(mat) <- c("VARIABLE", "LABEL", "TYPE") # rownames(mat) <- NULL
knitr::kable(mat)| VARIABLE | LABEL | TYPE |
|---|---|---|
| DUID | DWELLING UNIT ID | numeric |
| PID | PERSON NUMBER | numeric |
| DUPERSID | PERSON ID (DUID + PID) | character |
| DRUGIDX | DRUG ID (DUPERSID + COUNTER) | character |
| RXRECIDX | UNIQUE RX/PRESCRIBED MEDICINE IDENTIFIER | character |
| LINKIDX | ID FOR LINKAGE TO COND/OTH EVENT FILES | character |
| PANEL | PANEL NUMBER | numeric |
| PURCHRD | ROUND RX/PRESCR MED OBTAINED/PURCHASED | numeric |
| RXBEGMM | MONTH PERSON STARTED TAKING MEDICINE | numeric |
| RXBEGYRX | YEAR PERSON STARTED TAKING MEDICINE | numeric |
| RXNAME | MEDICINE NAME (IMPUTED) | character |
| RXDRGNAM | MULTUM MEDICINE NAME (IMPUTED) | character |
| RXNDC | NDC (IMPUTED) | character |
| RXQUANTY | QUANTITY OF RX/PRESCR MED (IMPUTED) | numeric |
| RXFORM | DOSAGE FORM (IMPUTED) | character |
| RXFRMUNT | QUANTITY UNIT OF MEDICATION (IMPUTED) | character |
| RXSTRENG | STRENGTH OF MEDICATION (IMPUTED) | character |
| RXSTRUNT | UNIT OF MEDICATION (IMPUTED) | character |
| RXDAYSUP | DAYS SUPPLIED OF PRESCRIBED MED(IMPUTED) | numeric |
| PHARTP1 | TYPE OF PHARMACY PROV - 1ST | numeric |
| PHARTP2 | TYPE OF PHARMACY PROV - 2ND | numeric |
| PHARTP3 | TYPE OF PHARMACY PROV - 3RD | numeric |
| PHARTP4 | TYPE OF PHARMACY PROV - 4TH | numeric |
| PHARTP5 | TYPE OF PHARMACY PROV - 5TH | numeric |
| PHARTP6 | TYPE OF PHARMACY PROV - 6TH | numeric |
| PHARTP7 | TYPE OF PHARMACY PROV - 7TH | numeric |
| PHARTP8 | TYPE OF PHARMACY PROV - 8TH | numeric |
| RXFLG | NDC IMPUTATION SOURCE ON PC DONOR REC | numeric |
| IMPFLAG | METHOD OF EXPENDITURE DATA CREATION | numeric |
| PCIMPFLG | TYPE OF HC TO PC PRESCRIPTION MATCH | numeric |
| OMTYPE | OTHER MEDICAL EXPENSE TYPE | numeric |
| INPCFLG | PID HAS AT LEAST 1 RECORD IN PC | numeric |
| SAMPLE | HOUSEHLD RCVD FREE SAMPLE OF RX IN ROUND | numeric |
| RXCCC1X | MODIFIED CLINICAL CLASS CODE | character |
| RXCCC2X | MODIFIED CLINICAL CLASS CODE | character |
| RXCCC3X | MODIFIED CLINICAL CLASS CODE | character |
| PREGCAT | MULTUM PREGNANCY CATEGORY | character |
| TC1 | MULTUM THERAPEUTIC CLASS #1 | numeric |
| TC1S1 | MULTUM THERAPEUTIC SUB-CLASS #1 FOR TC1 | numeric |
| TC1S1_1 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S1 | numeric |
| TC1S1_2 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S1 | numeric |
| TC1S2 | MULTUM THERAPEUTIC SUB-CLASS #2 FOR TC1 | numeric |
| TC1S2_1 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S2 | numeric |
| TC1S3 | MULTUM THERAPEUTIC SUB-CLASS #3 FOR TC1 | numeric |
| TC1S3_1 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC1S3 | numeric |
| TC2 | MULTUM THERAPEUTIC CLASS #2 | numeric |
| TC2S1 | MULTUM THERAPEUTIC SUB-CLASS #1 FOR TC2 | numeric |
| TC2S1_1 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC2S1 | numeric |
| TC2S1_2 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC2S1 | numeric |
| TC2S2 | MULTUM THERAPEUTIC SUB-CLASS #2 FOR TC2 | numeric |
| TC3 | MULTUM THERAPEUTIC CLASS #3 | numeric |
| TC3S1 | MULTUM THERAPEUTIC SUB-CLASS #1 FOR TC3 | numeric |
| TC3S1_1 | MULTUM THERAPEUT SUB-SUB-CLASS FOR TC3S1 | numeric |
| RXSF14X | AMOUNT PAID, SELF OR FAMILY (IMPUTED) | numeric |
| RXMR14X | AMOUNT PAID, MEDICARE (IMPUTED) | numeric |
| RXMD14X | AMOUNT PAID, MEDICAID (IMPUTED) | numeric |
| RXPV14X | AMOUNT PAID, PRIVATE INSURANCE (IMPUTED) | numeric |
| RXVA14X | AMOUNT PAID, VETERANS/CHAMPVA (IMPUTED) | numeric |
| RXTR14X | AMOUNT PAID, TRICARE (IMPUTED) | numeric |
| RXOF14X | AMOUNT PAID, OTHER FEDERAL (IMPUTED) | numeric |
| RXSL14X | AMOUNT PAID, STATE & LOCAL GOV (IMPUTED) | numeric |
| RXWC14X | AMOUNT PAID, WORKERS COMP (IMPUTED) | numeric |
| RXOT14X | AMOUNT PAID, OTHER INSURANCE (IMPUTED) | numeric |
| RXOR14X | AMOUNT PAID, OTHER PRIVATE (IMPUTED) | numeric |
| RXOU14X | AMOUNT PAID, OTHER PUBLIC (IMPUTED) | numeric |
| RXXP14X | SUM OF PAYMENTS RXSF14X-RXOU14X(IMPUTED) | numeric |
| PERWT14F | EXPENDITURE FILE PERSON WEIGHT, 2014 | numeric |
| VARSTR | VARIANCE ESTIMATION STRATUM, 2014 | numeric |
| VARPSU | VARIANCE ESTIMATION PSU, 2014 | numeric |
Abbreviated representation of R data frame containing Prescribed Medicines SAS file content:
h168a <- tbl_df(read.xport("h168a.ssp"))
h168a# A tibble: 319,467 × 69
DUID PID DUPERSID DRUGIDX RXRECIDX
<S3: labelled> <S3: labelled> <fctr> <fctr> <S3: labelled>
1 40001 101 40001101 40001101008 400011010161001
2 40001 101 40001101 40001101003 400011010171001
3 40001 101 40001101 40001101005 400011010181001
4 40001 101 40001101 40001101008 400011010191001
5 40001 101 40001101 40001101009 400011010201001
6 40001 102 40001102 40001102001 400011020121001
7 40001 102 40001102 40001102002 400011020131001
8 40001 102 40001102 40001102003 400011020141001
9 40001 102 40001102 40001102001 400011020151001
10 40001 102 40001102 40001102003 400011020161001
# ... with 319,457 more rows, and 64 more variables: LINKIDX <fctr>,
# PANEL <S3: labelled>, PURCHRD <S3: labelled>, RXBEGMM <S3: labelled>,
# RXBEGYRX <S3: labelled>, RXNAME <fctr>, RXDRGNAM <fctr>, RXNDC <fctr>,
# RXQUANTY <S3: labelled>, RXFORM <fctr>, RXFRMUNT <fctr>,
# RXSTRENG <fctr>, RXSTRUNT <fctr>, RXDAYSUP <S3: labelled>,
# PHARTP1 <S3: labelled>, PHARTP2 <S3: labelled>, PHARTP3 <S3:
# labelled>, PHARTP4 <S3: labelled>, PHARTP5 <S3: labelled>,
# PHARTP6 <S3: labelled>, PHARTP7 <S3: labelled>, PHARTP8 <S3:
# labelled>, RXFLG <S3: labelled>, IMPFLAG <S3: labelled>, PCIMPFLG <S3:
# labelled>, OMTYPE <S3: labelled>, INPCFLG <S3: labelled>, SAMPLE <S3:
# labelled>, RXCCC1X <fctr>, RXCCC2X <fctr>, RXCCC3X <fctr>,
# PREGCAT <fctr>, TC1 <S3: labelled>, TC1S1 <S3: labelled>, TC1S1.1 <S3:
# labelled>, TC1S1.2 <S3: labelled>, TC1S2 <S3: labelled>, TC1S2.1 <S3:
# labelled>, TC1S3 <S3: labelled>, TC1S3.1 <S3: labelled>, TC2 <S3:
# labelled>, TC2S1 <S3: labelled>, TC2S1.1 <S3: labelled>, TC2S1.2 <S3:
# labelled>, TC2S2 <S3: labelled>, TC3 <S3: labelled>, TC3S1 <S3:
# labelled>, TC3S1.1 <S3: labelled>, RXSF14X <S3: labelled>,
# RXMR14X <S3: labelled>, RXMD14X <S3: labelled>, RXPV14X <S3:
# labelled>, RXVA14X <S3: labelled>, RXTR14X <S3: labelled>,
# RXOF14X <S3: labelled>, RXSL14X <S3: labelled>, RXWC14X <S3:
# labelled>, RXOT14X <S3: labelled>, RXOR14X <S3: labelled>,
# RXOU14X <S3: labelled>, RXXP14X <S3: labelled>, PERWT14F <S3:
# labelled>, VARSTR <S3: labelled>, VARPSU <S3: labelled>
x1 <- h168a %>%
group_by(RXNAME) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(300)
x1 %>% ggplot(aes(x = reorder(RXNAME, count) , y = count,
fontface = "bold", size = 2, fill = 0.05, alpha = 0.01 )) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
scale_x_discrete(limits = as.character(x1$RXNAME)) +
theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())Top Prescribed Medicines
h168a %>%
group_by(RXNAME) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(40) %>%
ggplot(aes(x = reorder(RXNAME, count) , y = count,
fontface = "bold", size = 2, fill = 0.05, alpha = 0.01 )) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "RXNAME", y = "Occurences in Prescribed Medicines 2014") +
coord_flip()x2 <- h168a %>%
group_by(DUPERSID) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>% # max = 18963
head(9000)
x2 %>% ggplot(aes(x = reorder(DUPERSID, count) , y = count,
fontface = "bold", size = 2, fill = 0.05, alpha = 0.01 )) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
scale_x_discrete(limits = as.character(x2$DUPERSID)) +
theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
theme(axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())Top Patients using Prescribed Medicines
h168a %>%
group_by(DUPERSID) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(50) %>%
ggplot(aes(x = reorder(DUPERSID, count) , y = count,
fontface = "bold", size = 2, fill = 0.05, alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "DUPERSID", y = "Occurences in Prescribed Medicines 2014") +
coord_flip()To request further analysis, please contact john.b.williams@gmail.com
devtools::session_info()Session info --------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.1252
tz America/New_York
date 2016-09-27
Packages ------------------------------------------------------------------
package * version date source
acepack 1.3-3.3 2014-11-24 CRAN (R 3.3.1)
assertthat 0.1 2013-12-06 CRAN (R 3.3.0)
chron 2.3-47 2015-06-24 CRAN (R 3.3.1)
cluster 2.0.4 2016-04-18 CRAN (R 3.3.1)
colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0)
data.table 1.9.6 2015-09-19 CRAN (R 3.3.1)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.1)
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.3.1)
dplyr * 0.5.0 2016-06-24 CRAN (R 3.3.1)
evaluate 0.9 2016-04-29 CRAN (R 3.3.0)
foreign 0.8-66 2015-08-19 CRAN (R 3.3.1)
formatR 1.4 2016-05-09 CRAN (R 3.3.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.3.1)
ggplot2 * 2.1.0 2016-03-01 CRAN (R 3.3.0)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
Hmisc 3.17-4 2016-05-02 CRAN (R 3.3.1)
htmltools 0.3.5 2016-03-21 CRAN (R 3.3.0)
knitr 1.14 2016-08-13 CRAN (R 3.3.1)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.1)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.1)
magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
Matrix 1.2-6 2016-05-02 CRAN (R 3.3.1)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.3.1)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.0)
purrr * 0.2.2 2016-06-18 CRAN (R 3.3.1)
R6 2.1.3 2016-08-19 CRAN (R 3.3.1)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.0)
Rcpp 0.12.7 2016-09-05 CRAN (R 3.3.1)
readr * 1.0.0 2016-08-03 CRAN (R 3.3.1)
rmarkdown 1.0.9014 2016-09-27 Github (rstudio/rmarkdown@81c2092)
rpart 4.1-10 2015-06-29 CRAN (R 3.3.1)
SASxport * 1.5.3 2016-03-11 CRAN (R 3.3.1)
scales 0.4.0 2016-02-26 CRAN (R 3.3.0)
stringi 1.1.1 2016-05-27 CRAN (R 3.3.0)
stringr 1.1.0 2016-08-19 CRAN (R 3.3.1)
survival 2.39-4 2016-05-11 CRAN (R 3.3.1)
tibble * 1.2 2016-08-26 CRAN (R 3.3.1)
tidyr * 0.6.0 2016-08-12 CRAN (R 3.3.1)
tidyverse * 1.0.0 2016-09-09 CRAN (R 3.3.1)
withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
yaml 2.1.13 2014-06-12 CRAN (R 3.3.0)