December 26, 2017

R Markdown

What is RXnorm?

  • Drug terminology produced by NLM.
  • List of normalized names and unique identifiers for drugs.
  • Terminology derived from other drug terminologies.
  • Represent drugs from prescriber's point of view.

Main Format

  • RXCUI (concept unique identifier): 705610
  • Normalized name (e.g., Ranitidine 15MG/ML Oral Solution)

Download RxNorm mapping index

if (!file.exists('rxnorm_mappings.zip')){
  file_url = 'ftp://public.nlm.nih.gov/nlmdata/.dailymed/rxnorm_mappings.zip'
  download.file(file_url,destfile = 'rxnorm_mappings.zip',mode = "wb")
  unzip('rxnorm_mappings.zip',exdir = 'rxnorm')
}
data <- read.table('rxnorm/rxnorm_mappings.txt',header=T,sep='|',fill=T)
kable(head(data, n=3))
SETID SPL_VERSION RXCUI RXSTRING RXTTY
000155a8-709c-44e5-a75f-cd890f3a7caf 3 198014 Naproxen 500 MG Oral Tablet SCD
000155a8-709c-44e5-a75f-cd890f3a7caf 3 198014 naproxen 500 MG Oral Tablet PSN
0001eaa9-e890-4e94-9d44-47a0f3086a02 2 251577 Salicylic Acid 20 MG/ML Topical Solution SCD

Explaination of RXTTY

table(data[,5])
## 
##        BPCK  GPCK   PSN   SBD   SCD    SY 
##     4   314   681 62529  8190 54545 71909
  • BPCK: Brand Name Pack
  • GPCK: Generic Pack
  • PSN: Prescribable Name
  • SBD: Semantic Branded Drug
  • SCD: Semantic Clinical Drug
  • SY: Synonym

Overlaps with the newest FDALabel drugs

Export date: 12/26/2017
All drugs (included animal drugs are obtained for analysis.)

label_data <- read.table('setid_export.tsv', sep="\t", header=T)
kable(head(label_data[,c(1,2,7)]))
SET_ID DOCUMENT_TYPE MARKET_CATEGORIES
a9a13367-b5f0-4255-ae6a-8f558d90cd07 HUMAN PRESCRIPTION DRUG LABEL NDA
b34cb984-313c-4be3-b34e-90ab2797a29e HUMAN PRESCRIPTION DRUG LABEL NDA
2275ed71-a964-42b0-b9cb-46ab88da8687 HUMAN PRESCRIPTION DRUG LABEL ANDA
155e8576-c28d-4ec5-bc03-baeb3e65878b HUMAN PRESCRIPTION DRUG LABEL NDA
4ab52b14-12af-4706-a14d-b91a983608d1 HUMAN PRESCRIPTION DRUG LABEL ANDA
d33c4293-347c-49a1-9d98-f291c3dab6da HUMAN PRESCRIPTION DRUG LABEL ANDA

Overlaps

common_setid <- intersect(levels(data$SETID), levels(label_data$SET_ID))
RX_norm_data <- label_data[label_data$SET_ID %in% common_setid,]
length(common_setid)
## [1] 38493
#RXnorm in total
length(levels(data$SETID))
## [1] 38501
#FDALabel in total
dim(label_data)[1]
## [1] 98032

# Document type: # Top 5 types of drugs
tmp_table <- sort(table(RX_norm_data$DOCUMENT_TYPE),decreasing = T)
tmp_table[1:10]
## 
##     HUMAN PRESCRIPTION DRUG LABEL              HUMAN OTC DRUG LABEL 
##                             18974                             17523 
##                    MEDICAL DEVICE             OTC ANIMAL DRUG LABEL 
##                               648                               638 
##    PRESCRIPTION ANIMAL DRUG LABEL                 PLASMA DERIVATIVE 
##                               468                                70 
##                DIETARY SUPPLEMENT                     VACCINE LABEL 
##                                48                                47 
## NON-STANDARDIZED ALLERGENIC LABEL           STANDARDIZED ALLERGENIC 
##                                34                                16

# Top 5 market categories
tmp_table <- sort(table(RX_norm_data$MARKET_CATEGORIES),decreasing = T)
tmp_table[1:10]
## 
##                    ANDA     OTC Monograph Final OTC Monograph Not Final 
##                   15887                    7687                    7034 
##                     NDA   Unapproved drug other  Premarket Notification 
##                    3501                    1647                     617 
##  NDA authorized generic  Unapproved medical gas                     BLA 
##                     471                     384                     346 
##                    NADA 
##                     335