First, download the data from DailyMed:
Mapping table (Pharma_Mapping): http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-mapping-files.cfm
Index table (Pharm_Index) http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-indexing-files.cfm
NDFRT Tables:
NCFRT Public FTP ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NDF-RT/
Running Following script in Bash (Linux):
The script is also saved in file run.sh
folder=data_1108
Pharm_index=pharmacologic_class_indexing_spl_files.zip
Pharma_mapping=pharmacologic_class_mappings.txt
TDE_file=NDFRT_Public_2017.11.06_TDE.xml
NUI_file=NDFRT_Public_2017.11.06_NUI.txt
scala-2.12.2/bin/scalac PharmClassesExtractor.scala
scala-2.12.2/bin/scala PharmClassesExtractor \$folder"/"\$Pharm_index unii_classes_.txt
sort unii_classes_.txt | uniq > f1.pharm_index.tsv
export JAVA_OPTS="-Xmx3g"
scala-2.12.2/bin/scalac -feature NDFRT_extractor.scala
scala-2.12.2/bin/scala NDFRT_extractor \$folder"/"\$TDE_file NDFRT_output_.txt NDFRT_uni.txt
sort NDFRT_output_.txt | uniq > f3.NDFRT_concept_property.tsv
rm *.class
perl PharmClass_Processing.pl \$folder"/"$NUI_file \$folder"/"\$Pharma_mapping
rm unii_classes_.txt
rm NDFRT_output_.txt
rm NDFRT_uni.txt
Four output table files could be found in the root folder:
* f1.pharm_index.tsv
* f2.Pharma_mapping.tsv
* f3.NDFRT_concept_property.tsv
* f4.NDFRT_conceptinf.tsv
Where the first two files (f1, f2) are used for Pharmacologic Class updates.
The other two (f3, f4) are used for NDFRT updates.
Manually updates:
- Pharm_Index
- Pharma_Mapping
- NDFRT_CONCEPT_PROPERTIES
- NDFRT_CONCEPTINF
truncate table NDFRT_concept;
insert into NDFRT_CONCEPT
select distinct nc.nui, nc.name, nc.TLEVEL from pharm_index pi
join pharma_mapping pm on pm.pharma_setid = pi.pharma_setid
join NDFRT_CONCEPTINF nc on nc.nui = pi.nui
where nc.nui is not null;
stats <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 1)
stats_new <- stats %>% spread(key=Version, value=Counts) %>% mutate(Diff = After_Updated - Before_Updated)
stats_new
## Table After_Updated Before_Updated Diff
## 1 NDFRT_CONCEPT 1149 1174 -25
## 2 NDFRT_CONCEPT_PROPERTIES 468292 443827 24465
## 3 NDFRT_CONCEPTINF 45435 43313 2122
## 4 PHARM_INDEX 9895 9649 246
## 5 PHARMA_MAPPING 67389 59032 8357
run the SQL query to find newly added and removed Pharmacologic Classes:
select count(NUI) from NDFRT_CONCEPT where NUI not in (select NUI from NDFRT_CONCEPT_1512);
> 59
select count(NUI) from NDFRT_CONCEPT_1512 where NUI not in (select NUI from NDFRT_CONCEPT);
> 84
newly_added <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 2)
head(newly_added[,c(2,4)], n=15)
## NAME Curr_Counts
## 1 Imidazolines [Chemical/Ingredient] 215
## 2 Vitamin B 6 [Chemical/Ingredient] 381
## 3 Vascular Endothelial Growth Factor Inhibitor [EPC] 3
## 4 Interleukin-23 Antagonist [EPC] 2
## 5 Parathyroid Hormone-Related Protein [Chemical/Ingredient] 1
## 6 Vasoconstrictor [EPC] 215
## 7 Parathyroid Hormone-Related Peptide Analog [EPC] 1
## 8 Urate Transporter 1 Inhibitors [MoA] 3
## 9 Interleukin-23 Antagonists [MoA] 2
## 10 Tyrosine Kinase Inhibitors [MoA] 2
## 11 Ion Exchange Activity [MoA] 1
## 12 Plasma Kallikrein Inhibitor [EPC] 1
## 13 Programmed Death Ligand-1 Antagonists [MoA] 2
## 14 Vitamin B6 Analog [EPC] 381
## 15 UDP Glucuronosyltransferases Inducers [MoA] 19
newly_removed <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 3)
head(newly_removed, n=15)
## NUI NAME CAT
## 1 N0000008313 Bone Resorption Inhibition PE
## 2 N0000175567 Calcium Binder EPC
## 3 N0000180856 Calcium Resorption Inhibitor EPC
## 4 N0000182963 Bradykinin Receptor Antagonists MoA
## 5 N0000175640 Catecholamine-depleting Sympatholytic EPC
## 6 N0000175856 CD11a-directed Antibody Interactions MoA
## 7 N0000175858 CD11a-directed Humanized IgG1 Antibody EPC
## 8 N0000175857 CD2-directed Antibody Interactions MoA
## 9 N0000175859 CD2-directed LFA-3/Fc Fusion Protein EPC
## 10 N0000177912 CD25-directed Cytotoxin EPC
## 11 N0000170905 Chickenpox Vaccine CI
## 12 N0000175544 Cysteine Depleting Agent EPC
## 13 N0000170883 Haemophilus Vaccines CI
## 14 N0000183896 Inactivated Bordetella Pertussis Vaccine EPC
## 15 N0000183900 Inactivated Clostridium Tetani Vaccine EPC
Note that, old term in previous NDFRT_INDEX table is not removed so many of them already cannot be searched even in the last version of FDALabel (that is, the current production version).
Therefore, here is the list of “truly removed” items which still can be searched in old version:
Only 1 are removed as:
truly_removed <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 4)
truly_removed
## NUI NAME CAT Counts
## 1 N0000009945 Antidiuretic Hormone Antagonists MoA 1
All raw data for test purpose are also stored in Updating_Stats.xlsx in the root folder.