(1) Updating Pharmacologic Class Tables in FDALabel

First, download the data from DailyMed:

Mapping table (Pharma_Mapping): http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-mapping-files.cfm

Current Version

Index table (Pharm_Index) http://dailymed.nlm.nih.gov/dailymed/spl-resources-all-indexing-files.cfm

Current Version

NDFRT Tables:
NCFRT Public FTP ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NDF-RT/

(1.1) Data preprocessing

  • Download above links of current version files into the root folder;
  • Unzip and generate following files:
    • pharmacologic_class_mapping.txt
    • NDFRT_Public_2017.11.06_NUI.txt
    • NDFRT_Public_2017.11.06_TDE.xml
  • All input data are stored in:
    • data_1108/

(1.2) Data processing

Running Following script in Bash (Linux):
The script is also saved in file run.sh

folder=data_1108  
Pharm_index=pharmacologic_class_indexing_spl_files.zip  
Pharma_mapping=pharmacologic_class_mappings.txt  
TDE_file=NDFRT_Public_2017.11.06_TDE.xml  
NUI_file=NDFRT_Public_2017.11.06_NUI.txt  
scala-2.12.2/bin/scalac PharmClassesExtractor.scala  
scala-2.12.2/bin/scala PharmClassesExtractor \$folder"/"\$Pharm_index unii_classes_.txt  
sort unii_classes_.txt | uniq > f1.pharm_index.tsv  
export JAVA_OPTS="-Xmx3g"  
scala-2.12.2/bin/scalac -feature NDFRT_extractor.scala  
scala-2.12.2/bin/scala NDFRT_extractor \$folder"/"\$TDE_file NDFRT_output_.txt NDFRT_uni.txt   
sort NDFRT_output_.txt | uniq > f3.NDFRT_concept_property.tsv  
rm *.class  
perl PharmClass_Processing.pl \$folder"/"$NUI_file \$folder"/"\$Pharma_mapping  
rm unii_classes_.txt  
rm NDFRT_output_.txt  
rm NDFRT_uni.txt   

(1.3) Data outputs

Four output table files could be found in the root folder:
* f1.pharm_index.tsv
* f2.Pharma_mapping.tsv
* f3.NDFRT_concept_property.tsv
* f4.NDFRT_conceptinf.tsv

Where the first two files (f1, f2) are used for Pharmacologic Class updates.
The other two (f3, f4) are used for NDFRT updates.

(2) Updating Tables in SQL Developer

Manually updates:  
- Pharm_Index
- Pharma_Mapping
- NDFRT_CONCEPT_PROPERTIES
- NDFRT_CONCEPTINF

(2.1) Regenerate NDFRT_CONCEPT table with following SQL script:

truncate table NDFRT_concept;
  insert into NDFRT_CONCEPT
  select distinct nc.nui, nc.name, nc.TLEVEL from pharm_index pi
join pharma_mapping pm on pm.pharma_setid = pi.pharma_setid
join NDFRT_CONCEPTINF nc on nc.nui = pi.nui
  where nc.nui is not null; 

(3) Updating Statistics

stats <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 1)
stats_new <- stats %>% spread(key=Version, value=Counts) %>% mutate(Diff = After_Updated - Before_Updated)
stats_new
##                      Table After_Updated Before_Updated  Diff
## 1            NDFRT_CONCEPT          1149           1174   -25
## 2 NDFRT_CONCEPT_PROPERTIES        468292         443827 24465
## 3         NDFRT_CONCEPTINF         45435          43313  2122
## 4              PHARM_INDEX          9895           9649   246
## 5           PHARMA_MAPPING         67389          59032  8357

(3.1) Statistics of changed Terms

run the SQL query to find newly added and removed Pharmacologic Classes:

select count(NUI) from NDFRT_CONCEPT where NUI not in (select NUI from NDFRT_CONCEPT_1512);  
> 59
select count(NUI) from NDFRT_CONCEPT_1512 where NUI not in (select NUI from NDFRT_CONCEPT);  
> 84

(3.2) Example list of Newly Added terms

newly_added <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 2)
head(newly_added[,c(2,4)], n=15)
##                                                         NAME Curr_Counts
## 1                         Imidazolines [Chemical/Ingredient]         215
## 2                          Vitamin B 6 [Chemical/Ingredient]         381
## 3         Vascular Endothelial Growth Factor Inhibitor [EPC]           3
## 4                            Interleukin-23 Antagonist [EPC]           2
## 5  Parathyroid Hormone-Related Protein [Chemical/Ingredient]           1
## 6                                      Vasoconstrictor [EPC]         215
## 7           Parathyroid Hormone-Related Peptide Analog [EPC]           1
## 8                       Urate Transporter 1 Inhibitors [MoA]           3
## 9                           Interleukin-23 Antagonists [MoA]           2
## 10                          Tyrosine Kinase Inhibitors [MoA]           2
## 11                               Ion Exchange Activity [MoA]           1
## 12                         Plasma Kallikrein Inhibitor [EPC]           1
## 13               Programmed Death Ligand-1 Antagonists [MoA]           2
## 14                                   Vitamin B6 Analog [EPC]         381
## 15               UDP Glucuronosyltransferases Inducers [MoA]          19

(3.2) Example list of Removed terms

newly_removed <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 3)
head(newly_removed, n=15)
##            NUI                                     NAME CAT
## 1  N0000008313               Bone Resorption Inhibition  PE
## 2  N0000175567                           Calcium Binder EPC
## 3  N0000180856             Calcium Resorption Inhibitor EPC
## 4  N0000182963          Bradykinin Receptor Antagonists MoA
## 5  N0000175640    Catecholamine-depleting Sympatholytic EPC
## 6  N0000175856     CD11a-directed Antibody Interactions MoA
## 7  N0000175858   CD11a-directed Humanized IgG1 Antibody EPC
## 8  N0000175857       CD2-directed Antibody Interactions MoA
## 9  N0000175859     CD2-directed LFA-3/Fc Fusion Protein EPC
## 10 N0000177912                  CD25-directed Cytotoxin EPC
## 11 N0000170905                       Chickenpox Vaccine  CI
## 12 N0000175544                 Cysteine Depleting Agent EPC
## 13 N0000170883                     Haemophilus Vaccines  CI
## 14 N0000183896 Inactivated Bordetella Pertussis Vaccine EPC
## 15 N0000183900   Inactivated Clostridium Tetani Vaccine EPC

(3.3) real removed terms

Note that, old term in previous NDFRT_INDEX table is not removed so many of them already cannot be searched even in the last version of FDALabel (that is, the current production version).

Therefore, here is the list of “truly removed” items which still can be searched in old version:
Only 1 are removed as:

truly_removed <- read.xlsx("Updating_Stats.xlsx",sheetIndex = 4)
truly_removed
##           NUI                             NAME CAT Counts
## 1 N0000009945 Antidiuretic Hormone Antagonists MoA      1

(3.4) Summary

All raw data for test purpose are also stored in Updating_Stats.xlsx in the root folder.