Dans ce Blog, nous verrons comment faire des analyses avec des données JSON (JavaScript Object Notation) qui est le format de données le plus couramment utilisé pour la communication asynchrone navigateur/serveur. Nous travaillerons avec les données d’étiquetage des médicaments de la Food and Drug Administration de OpenFDA avec lequelle travaille souvent les Data scientist de Astrazeneca: https://open.fda.gov/

Chargeons les librairies

# Téléchargement de fichier http
library(downloader)

## Warning: package 'downloader' was built under R version 3.6.1

# Travailler avec Json
library(jsonlite)

## Warning: package 'jsonlite' was built under R version 3.6.3

# Tidyverse
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.6.3

## -- Attaching packages ----------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.0
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## Warning: package 'ggplot2' was built under R version 3.6.3

## Warning: package 'tibble' was built under R version 3.6.3

## Warning: package 'tidyr' was built under R version 3.6.3

## Warning: package 'purrr' was built under R version 3.6.3

## Warning: package 'dplyr' was built under R version 3.6.3

## Warning: package 'forcats' was built under R version 3.6.3

## -- Conflicts -------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks jsonlite::flatten()
## x dplyr::lag()     masks stats::lag()

# Stringi
library(stringi)

## Warning: package 'stringi' was built under R version 3.6.2

Téléchargeons les fichiers JSON sur OPEN FDA

#for(i in 1:5){
  #Récupération des URLs
  #url = paste0("http://download.open.fda.gov/drug/label/drug-label-000",i,"-of-0009.json.zip")
  
  #Téléchargeons les fichiers
  #download(url,dest = "fichier_json.zip")
  
  #Dézippons les fichiers
  #unzip("fichier_json.zip")
#}

Vérifions si nous avons télécharger les 5 fichiers .json

dir()

##  [1] "drug-label-0001-of-0009.json"  "drug-label-0002-of-0009.json" 
##  [3] "drug-label-0003-of-0009.json"  "drug-label-0004-of-0009.json" 
##  [5] "drug-label-0005-of-0009.json"  "fichier_json.zip"             
##  [7] "images.png"                    "Json_R_image.jpg"             
##  [9] "Json_R_image.psd"              "Json_R_notebook_image.jpg"    
## [11] "manip_Json_avec_R.psd"         "Manipulation_JSON_avec_R.html"
## [13] "Manipulation_JSON_avec_R.Rmd"

Apparement oui nous avons télécharger et dézipper les fichiers .Json

Créons une variable qui contiendra les noms des fichiers Json (list.files())

liste_fichier_json <- list.files(pattern = "*.json", full.names = T)
liste_fichier_json

## [1] "./drug-label-0001-of-0009.json" "./drug-label-0002-of-0009.json"
## [3] "./drug-label-0003-of-0009.json" "./drug-label-0004-of-0009.json"
## [5] "./drug-label-0005-of-0009.json" "./fichier_json.zip"

Comprendre les données en analysant un des fichiers Json

# Lecture du fichier JSON avec fromJSON
fichier_drug_1 <- fromJSON(liste_fichier_json[1])

Composition de notre Json

names(fichier_drug_1)

## [1] "meta"    "results"

Voyons ce que contient la variable meta

Nous ne voyons pas de dataframe ou tableau à l’intérieur de cette variable meta

fichier_drug_1$meta

## $last_updated
## [1] "2020-08-12"
## 
## $terms
## [1] "https://open.fda.gov/terms/"
## 
## $results
## $results$skip
## [1] 0
## 
## $results$total
## [1] 169793
## 
## $results$limit
## [1] 20000
## 
## 
## $license
## [1] "https://open.fda.gov/license/"
## 
## $disclaimer
## [1] "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service."

Nous ne voyons pas de dataframe ou tableau à l’intérieur de cette variable meta. Nous voyons que il ne contient que des informations sur les données comme sa date de mise à jour (la dernière), le nombre total de ligne, la limite, la licence et les avertissements

Voyons ce que contient la variable Results

fichier_drug1_results <- fichier_drug_1$results
# head(fichier_drug1_results)

Nous voyons que cela ressemble à un dataframe, mais les données sont pas comme on a l’habitude de les voir. On peut aussi confirmer que le nombre de lignes est identique à celui indiqué dans le méta (Si on utilise Rstudio bien sûre)

Confirmons le type de la variable results

class(fichier_drug1_results)

## [1] "data.frame"

Aperçu de la variable Results

glimpse(fichier_drug1_results)

## Rows: 20,000
## Columns: 141
## $ active_ingredient                                                <list> [...
## $ storage_and_handling                                             <list> [...
## $ warnings                                                         <list> [...
## $ inactive_ingredient                                              <list> [...
## $ do_not_use                                                       <list> [...
## $ openfda                                                          <df[,21]> ...
## $ keep_out_of_reach_of_children                                    <list> [...
## $ spl_product_data_elements                                        <list> [...
## $ package_label_principal_display_panel                            <list> [...
## $ set_id                                                           <chr> "a...
## $ dosage_and_administration                                        <list> [...
## $ version                                                          <chr> "1...
## $ stop_use                                                         <list> [...
## $ purpose                                                          <list> [...
## $ when_using                                                       <list> [...
## $ effective_time                                                   <chr> "2...
## $ id                                                               <chr> "a...
## $ indications_and_usage                                            <list> [...
## $ ask_doctor                                                       <list> [...
## $ other_safety_information                                         <list> [...
## $ questions                                                        <list> [...
## $ warnings_and_cautions                                            <list> [...
## $ instructions_for_use                                             <list> [...
## $ description                                                      <list> [...
## $ clinical_pharmacology                                            <list> [...
## $ adverse_reactions                                                <list> [...
## $ how_supplied                                                     <list> [...
## $ precautions                                                      <list> [...
## $ general_precautions                                              <list> [...
## $ pregnancy_or_breast_feeding                                      <list> [...
## $ spl_unclassified_section                                         <list> [...
## $ ask_doctor_or_pharmacist                                         <list> [...
## $ boxed_warning                                                    <list> [...
## $ boxed_warning_table                                              <list> [...
## $ description_table                                                <list> [...
## $ drug_abuse_and_dependence                                        <list> [...
## $ contraindications                                                <list> [...
## $ spl_medguide                                                     <list> [...
## $ overdosage                                                       <list> [...
## $ references                                                       <list> [...
## $ dosage_and_administration_table                                  <list> [...
## $ keep_out_of_reach_of_children_table                              <list> [...
## $ warnings_table                                                   <list> [...
## $ active_ingredient_table                                          <list> [...
## $ nonclinical_toxicology                                           <list> [...
## $ patient_medication_information                                   <list> [...
## $ clinical_studies                                                 <list> [...
## $ use_in_specific_populations                                      <list> [...
## $ drug_interactions                                                <list> [...
## $ carcinogenesis_and_mutagenesis_and_impairment_of_fertility       <list> [...
## $ pregnancy                                                        <list> [...
## $ pediatric_use                                                    <list> [...
## $ information_for_patients                                         <list> [...
## $ clinical_pharmacology_table                                      <list> [...
## $ geriatric_use                                                    <list> [...
## $ nursing_mothers                                                  <list> [...
## $ teratogenic_effects                                              <list> [...
## $ purpose_table                                                    <list> [...
## $ laboratory_tests                                                 <list> [...
## $ pharmacokinetics                                                 <list> [...
## $ nonteratogenic_effects                                           <list> [...
## $ how_supplied_table                                               <list> [...
## $ adverse_reactions_table                                          <list> [...
## $ information_for_owners_or_caregivers                             <list> [...
## $ indications_and_usage_table                                      <list> [...
## $ safe_handling_warning                                            <list> [...
## $ spl_unclassified_section_table                                   <list> [...
## $ labor_and_delivery                                               <list> [...
## $ controlled_substance                                             <list> [...
## $ abuse                                                            <list> [...
## $ drug_and_or_laboratory_test_interactions                         <list> [...
## $ dependence                                                       <list> [...
## $ pharmacodynamics                                                 <list> [...
## $ pharmacokinetics_table                                           <list> [...
## $ microbiology                                                     <list> [...
## $ dosage_forms_and_strengths                                       <list> [...
## $ clinical_studies_table                                           <list> [...
## $ drug_interactions_table                                          <list> [...
## $ mechanism_of_action                                              <list> [...
## $ animal_pharmacology_and_or_toxicology                            <list> [...
## $ spl_patient_package_insert                                       <list> [...
## $ recent_major_changes                                             <list> [...
## $ recent_major_changes_table                                       <list> [...
## $ spl_patient_package_insert_table                                 <list> [...
## $ pharmacodynamics_table                                           <list> [...
## $ warnings_and_cautions_table                                      <list> [...
## $ precautions_table                                                <list> [...
## $ user_safety_warnings                                             <list> [...
## $ microbiology_table                                               <list> [...
## $ dosage_forms_and_strengths_table                                 <list> [...
## $ information_for_patients_table                                   <list> [...
## $ package_label_principal_display_panel_table                      <list> [...
## $ intended_use_of_the_device                                       <list> [...
## $ troubleshooting                                                  <list> [...
## $ statement_of_identity                                            <list> [...
## $ route                                                            <list> [...
## $ summary_of_safety_and_effectiveness                              <list> [...
## $ cleaning                                                         <list> [...
## $ components                                                       <list> [...
## $ spl_medguide_table                                               <list> [...
## $ accessories                                                      <list> [...
## $ stop_use_table                                                   <list> [...
## $ inactive_ingredient_table                                        <list> [...
## $ pediatric_use_table                                              <list> [...
## $ disposal_and_waste_handling                                      <list> [...
## $ contraindications_table                                          <list> [...
## $ spl_indexing_data_elements                                       <list> [...
## $ storage_and_handling_table                                       <list> [...
## $ guaranteed_analysis_of_feed                                      <list> [...
## $ diagram_of_device                                                <list> [...
## $ animal_pharmacology_and_or_toxicology_table                      <list> [...
## $ pregnancy_table                                                  <list> [...
## $ use_in_specific_populations_table                                <list> [...
## $ instructions_for_use_table                                       <list> [...
## $ geriatric_use_table                                              <list> [...
## $ drug_and_or_laboratory_test_interactions_table                   <list> [...
## $ laboratory_tests_table                                           <list> [...
## $ references_table                                                 <list> [...
## $ overdosage_table                                                 <list> [...
## $ pharmacogenomics                                                 <list> [...
## $ risks                                                            <list> [...
## $ general_precautions_table                                        <list> [...
## $ drug_abuse_and_dependence_table                                  <list> [...
## $ abuse_table                                                      <list> [...
## $ teratogenic_effects_table                                        <list> [...
## $ patient_medication_information_table                             <list> [...
## $ assembly_or_installation_instructions                            <list> [...
## $ questions_table                                                  <list> [...
## $ nonclinical_toxicology_table                                     <list> [...
## $ carcinogenesis_and_mutagenesis_and_impairment_of_fertility_table <list> [...
## $ nonteratogenic_effects_table                                     <list> [...
## $ when_using_table                                                 <list> [...
## $ do_not_use_table                                                 <list> [...
## $ health_care_provider_letter                                      <list> [...
## $ health_care_provider_letter_table                                <list> [...
## $ veterinary_indications                                           <list> [...
## $ pregnancy_or_breast_feeding_table                                <list> [...
## $ ask_doctor_or_pharmacist_table                                   <list> [...
## $ ask_doctor_table                                                 <list> [...
## $ mechanism_of_action_table                                        <list> [...
## $ dependence_table                                                 <list> [...

On peut se rendre compte que notre variable possède, 20.000 observations et 141 colonnes. On peut également se rendre compte que ce que nous cherchons c’est un dataframe et nous le voyons dans la variable Openfda inclu dans results

Contenu de la variable Openfda contenu dans results

glimpse(fichier_drug1_results$openfda)

## Rows: 20,000
## Columns: 21
## $ manufacturer_name             <list> ["Tonic Bath & Products Shenzhen Ltd...
## $ unii                          <list> ["3K9958V90M", <"G63QQF2NOX", "95OOS...
## $ product_type                  <list> ["HUMAN OTC DRUG", "HUMAN OTC DRUG",...
## $ rxcui                         <list> ["2050977", NULL, NULL, NULL, "90528...
## $ spl_set_id                    <list> ["a517190f-2a21-3016-e053-2a95a90abd...
## $ route                         <list> ["TOPICAL", "TOPICAL", "TOPICAL", NU...
## $ generic_name                  <list> ["ETHYL ALCOHOL", "AVOBENZONE, HOMOS...
## $ brand_name                    <list> ["Hand Sanitizer", "KROGER SPORT SPF...
## $ product_ndc                   <list> ["43333-484", "30142-808", "76841-48...
## $ substance_name                <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZO...
## $ spl_id                        <list> ["a51710ae-6bcc-a672-e053-2995a90aae...
## $ application_number            <list> ["part333A", "part352", "part333A", ...
## $ is_original_packager          <list> [TRUE, TRUE, TRUE, NULL, NULL, TRUE,...
## $ package_ndc                   <list> ["43333-484-08", "30142-808-37", "76...
## $ upc                           <list> [NULL, "0041260383328", NULL, NULL, ...
## $ original_packager_product_ndc <list> [NULL, NULL, NULL, NULL, "70954-211"...
## $ nui                           <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_epc               <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_pe                <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_cs                <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_moa               <list> [NULL, NULL, NULL, NULL, NULL, NULL,...

Nous pouvons observer 21 colonnes et 20.000 observations, ce qui est déjà mieux.

Fusionnons tous les fichiers pour nos analyses.

Nous allons inclure 2 autres variables, “effective_time” et “boxed_warning”

# Affectation des 2 colonnes dans la variable "effective_time" et "boxed_warning" 
result_1 <- fromJSON(liste_fichier_json[1])$results %>% select(effective_time,boxed_warning)
result_2 <- fromJSON(liste_fichier_json[2])$results %>% select(effective_time,boxed_warning)
result_3 <- fromJSON(liste_fichier_json[3])$results %>% select(effective_time,boxed_warning)
result_4 <- fromJSON(liste_fichier_json[4])$results %>% select(effective_time,boxed_warning)
result_5 <- fromJSON(liste_fichier_json[5])$results %>% select(effective_time,boxed_warning)

# Concatenons tous ces résultats
spl <- rbind(result_1,result_2,result_3,result_4,result_5)

Dimension de la variable spl?

cat("Notre variable spl contient",nrow(spl),"lignes et",ncol(spl),"colonnes")

## Notre variable spl contient 100000 lignes et 2 colonnes

Structure de ces 2 colonnes que nous venons de concatener

glimpse(spl)

## Rows: 100,000
## Columns: 2
## $ effective_time <chr> "20200507", "20200508", "20200510", "20181108", "201...
## $ boxed_warning  <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...

Un bémole ici, boxed_warning est une liste et effective_time un string

Comprendre la donnée “effective_time”

Nombre max de caractère

cat('Nombre maximal de caractère dans la variable effective_time :',max(nchar(spl$effective_time)))

## Nombre maximal de caractère dans la variable effective_time : 10

Nombre min de caractère

cat('Nombre minimal de caractère dans la variable effective_time :',min(nchar(spl$effective_time)))

## Nombre minimal de caractère dans la variable effective_time : 8

Nombre de ligne par caractéristique de la variable “effective_time”

spl %>% group_by(nchar(effective_time)) %>% summarise(nombre = n())

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 3 x 2
##   `nchar(effective_time)` nombre
##                     <int>  <int>
## 1                       8  99998
## 2                       9      1
## 3                      10      1

Le format de l’heure peut poser un problème compte tenue du fait que nous avons des nombres de caractères différents. une valeur de 9 et 10 caractères pour 99998 pour 8 caractères, il se peut qu’il y’ait une erreur dans ces 2 lignes ayant 8 et 9 caractères.

Regardons la valeur de “effective_time” qui possède 10 caractères.

spl %>% filter(nchar(effective_time) == 10) %>% select(effective_time)

##   effective_time
## 1     2009103009

Regardons la valeur de “effective_time” qui possède 9 caractères.

spl %>% filter(nchar(effective_time) == 9) %>% select(effective_time)

##   effective_time
## 1      201108178

Regardons la valeur de “effective_time” qui possède 9 caractères.

# Affichons juste les 6 premières lignes
head(spl %>% filter(nchar(effective_time) == 8)) %>%  select(effective_time)

##   effective_time
## 1       20200507
## 2       20200508
## 3       20200510
## 4       20181108
## 5       20191110
## 6       20150925

Extraction des 8 premiers caractères de “effective_time” ayant 9 et 10 caractères

spl <- spl %>% mutate(effective_time = stri_sub(effective_time,1,8))

Vérifions si toutes nos lignes ont désormais 8 caractères (nous devons avoir maintenant 100.000)

spl %>% group_by(nchar(effective_time)) %>% summarise(nombre = n())

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 1 x 2
##   `nchar(effective_time)` nombre
##                     <int>  <int>
## 1                       8 100000

C’est parfait, toutes nos lignes de “effective_time” ont désormais le nombre de caractères requis

Convertissons maintenant “effective_time” en format dmy

# Convertissons en format date
spl <- spl %>% mutate(effective_time = lubridate::ymd(effective_time))

# Vérifions les si les changements ont eu lieu
glimpse(spl)

## Rows: 100,000
## Columns: 2
## $ effective_time <date> 2020-05-07, 2020-05-08, 2020-05-10, 2018-11-08, 201...
## $ boxed_warning  <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...

Nombre de soumissions par jour

# Requête
soumission_par_jour <- spl %>% group_by(effective_time) %>% summarise(nombre_de_valeur = n())

## `summarise()` ungrouping output (override with `.groups` argument)

# Graphe
graphe_1 <- soumission_par_jour %>% ggplot(aes(effective_time, nombre_de_valeur)) +
                                    geom_line() +
                                    ggtitle("Présentation journalière d'étiquetage des médicaments") +
                                    xlab("Effective time") +
                                    ylab("Total") +
                                    theme_minimal() +
                                    theme(axis.title.x = element_text(color = "red"),
                                          axis.title.y = element_text(color = "red"),
                                          axis.text = element_text(color = "blue"),
                                          plot.title = element_text(size = 15, hjust = 0.5, color = "Orange"))
graphe_1

On peut se rendre compte qu’il y’a peu de données avant 2009

Enlevons les données avant 2009

# Requête
req_1 <- spl %>% group_by(effective_time) %>% filter(effective_time > "2008-12-31") %>% summarise(nombre_de_valeur = n())

## `summarise()` ungrouping output (override with `.groups` argument)

# Graphe
graphe_2 <- req_1 %>% ggplot(aes(effective_time, nombre_de_valeur)) +
                      geom_line() +
                      ggtitle("Présentation journalière d'étiquetage des médicaments") +
                      xlab("Effective time") +
                      ylab("Total") +
                      theme_minimal() +
                      theme(axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.text = element_text(color = "blue"),
                            plot.title = element_text(size = 15, hjust = 0.5, color = "Orange"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red"),
                            legend.text = element_text(colour = "blue", face = "bold"),
                            legend.title = element_text(colour = "red", size = 10,face = "bold", hjust = 0.5),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))
graphe_2

ce graphe peut nous informer sur la tendance mais il est toujours illisible et ne nous aide pas énormément. Essayons de trouver le nombre de rapport par année

présentation annuelle d’étiquetage des médicaments

# Requête
req_2 <- spl %>% mutate(annee = lubridate::year(effective_time)) %>% group_by(annee) %>% filter(annee > "2008") %>% summarise(nombre_de_valeur = n())

## `summarise()` ungrouping output (override with `.groups` argument)

# Graphe
graphe_3 <- req_2 %>% ggplot(aes(annee, nombre_de_valeur, fill = nombre_de_valeur)) +
                      geom_bar(stat = "identity", color = "grey") +
                      ggtitle("Présentation annuelle d'étiquetage des médicaments") +
                      xlab("Année") +
                      ylab("Total") +
                      theme_minimal() +
                      labs(fill = "Total") +
                      scale_fill_gradient(low = "yellow", high = "red") +
                      scale_x_continuous(breaks=seq(2008, 2020, by = 1))+
                      theme(axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.text = element_text(color = "blue"),
                            plot.title = element_text(size = 20, hjust = 0.5, color = "Orange"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red"),
                            legend.text = element_text(colour = "blue", face = "bold"),
                            legend.title = element_text(colour = "red", size = 10,face = "bold", hjust = 0.5),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))
graphe_3

L’avertissement le plus fort exigé par la FDA:

Qui indique que le médicament comporte un risque important d’effets indésirables graves, voire mortels.

# Ici j'ai dû tester plein d'intervalle par contre séquencer par pas de 100
for(i in 16:20){
  print(select(spl,boxed_warning)[i,])
}

## [[1]]
## NULL
## 
## [[1]]
## NULL
## 
## [[1]]
## [1] "WARNING: ADDICTION, ABUSE, AND MISUSE: RISK EVALUATION AND MITIGATION STRATEGY (REMS); LIFE-THREATENING RESPIRATORY DEPRESSION; ACCIDENTAL INGESTION; NEONATAL OPIOID; WITHDRAWAL SYNDROME; CYTOCHROME P450 3A4 INTERACTION; HEPATOTOXICITY AND RISKS FROM CONCOMITANT USE WITH BENZODIAZEPINES OR OTHER CNS DEPRESSANTS Addiction, Abuse, and Misuse Hydrocodone bitartrate and acetaminophen tablets exposes patients and other users to the risks of opioid addiction, abuse, and misuse, which can lead to overdose and death. Assess each patient’s risk prior to prescribing hydrocodone bitartrate and acetaminophen tablets, and monitor all patients regularly for the development of these behaviors or conditions [see WARNINGS]. Opioid Analgesic Risk Evaluation and Mitigation Strategy (REMS): To ensure that the benefits of opioid analgesics outweigh the risks of addiction, abuse, and misuse, the Food and Drug Administration (FDA) has required a REMS for these products [see WARNINGS]. Under the requirements of the REMS, drug companies with approved opioid analgesic products must make REMS-compliant education programs available to healthcare providers. Healthcare providers are strongly encouraged to complete a REMS-compliant education program, counsel patients and/or their caregivers, with every prescription, on safe use, serious risks, storage, and disposal of these products, emphasize to patients and their caregivers the importance of reading the Medication Guide every time it is provided by their pharmacist, and Consider other tools to improve patient, household, and community safety. Life-Threatening Respiratory Depression Serious, life-threatening, or fatal respiratory depression may occur with use of hydrocodone bitartrate and acetaminophen tablets. Monitor for respiratory depression, especially during initiation of hydrocodone bitartrate and acetaminophen tablets or following a dose increase [seeWARNINGS]. Accidental Ingestion Accidental ingestion of hydrocodone bitartrate and acetaminophen tablets, especially by children, can result in a fatal overdose of Hydrocodone bitartrate and acetaminophen tablets [seeWARNINGS]. Neonatal Opioid Withdrawal Syndrome Prolonged use of hydrocodone bitartrate and acetaminophen tablets during pregnancy can result in neonatal opioid withdrawal syndrome, which may be life-threatening if not recognized and treated, and requires management according to protocols developed by neonatology experts. If opioid use is required for a prolonged period in a pregnant woman, advise the patient of the risk of neonatal opioid withdrawal syndrome and ensure that appropriate treatment will be available [seeWARNINGS]. Cytochrome P450 3A4 Interaction The concomitant use of Hydrocodone Bitartrate and Acetaminophen Tablets with all Cytochrome P450 3A4 inhibitors may result in an increase in hydrocodone plasma concentrations, which could increase or prolong adverse reactions and may cause potentially fatal respiratory depression. In addition, discontinuation of a concomitantly used Cytochrome P450 3A4 inducer may result in an increase in hydrocodone plasma concentrations. Monitor patients receiving Hydrocodone Bitartrate and Acetaminophen Tablets and any Cytochrome P450 3A4 inhibitor or inducer for signs of respiratory depression or sedation [see CLINICAL PHARMACOLOGY, WARNINGS, PRECAUTIONS;DRUG INTERACTIONS]. Hepatotoxicity Acetaminophen has been associated with cases of acute liver failure, at times resulting in liver transplant and death. Most of the cases of liver injury are associated with the use of acetaminophen at doses that exceed 4,000 milligrams per day, and often involve more than one acetaminophen-containing product (see WARNINGS). Risks from concomitant use with Benzodiazepines or other CNS Depressants Concomitant use of opiods with benzodiazepines or other central nervous system (CNS) depressants including alcohol, may result in profound sedation, respiratory depression, coma, and death. [seeWARNINGS,PRECAUTIONS; DRUG INTERACTIONS] • Reserve concomitant prescribing of Hydrocodone bitartrate and acetaminophen tablets and benzodiazepines or other CNS depressants for use in patients for whom alternative treatment options are inadequate. • Limit dosages and durations to the minimum required. • Follow patients for signs and symptoms of respiratory depression and sedation."
## 
## [[1]]
## NULL
## 
## [[1]]
## NULL

Nous pouvons voir quele warning correspond à notre warning que nous recherchons

Changeons les “NULL” en “NA”

# changeons les "NULL" en "NA"
spl$boxed_warning[spl$boxed_warning == "NULL"] = NA

# Vérifions si les changements ont été opéré
head(spl$boxed_warning)

## [[1]]
## [1] NA
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] NA
## 
## [[4]]
## [1] NA
## 
## [[5]]
## [1] NA
## 
## [[6]]
## [1] NA

Présentation annuelle d’étiquetage des médicaments avec avertissement encadré

# Requête
req_3 <- spl %>% mutate(annee = lubridate::year(effective_time)) %>% group_by(annee) %>% filter(annee > "2008") %>% summarise(nombre_precaution = sum(!is.na(boxed_warning)))

## `summarise()` ungrouping output (override with `.groups` argument)

# Graphe
graphe_4 <- req_3 %>% ggplot(aes(annee, nombre_precaution, fill = nombre_precaution)) +
                      geom_bar(stat = "identity", color = "grey") +
                      ggtitle("Présentation annuelle d'étiquetage des médicaments avec avertissement encadré") +
                      xlab("Année") +
                      ylab("Total") +
                      theme_minimal() +
                      labs(fill = "Total") +
                      scale_fill_gradient(low = "green", high = "red") +
                      scale_x_continuous(breaks=seq(2008, 2020, by = 1))+
                      theme(axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.text = element_text(color = "blue"),
                            plot.title = element_text(size = 13, hjust = 0.5, color = "Orange"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red"),
                            legend.text = element_text(colour = "blue", face = "bold"),
                            legend.title = element_text(colour = "red", size = 10,face = "bold", hjust = 0.5),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))
graphe_4

Aperçu des uniques précautions

head(unique(spl$boxed_warning))

## [[1]]
## [1] NA
## 
## [[2]]
## [1] "WARNING: ADDICTION, ABUSE, AND MISUSE: RISK EVALUATION AND MITIGATION STRATEGY (REMS); LIFE-THREATENING RESPIRATORY DEPRESSION; ACCIDENTAL INGESTION; NEONATAL OPIOID; WITHDRAWAL SYNDROME; CYTOCHROME P450 3A4 INTERACTION; HEPATOTOXICITY AND RISKS FROM CONCOMITANT USE WITH BENZODIAZEPINES OR OTHER CNS DEPRESSANTS Addiction, Abuse, and Misuse Hydrocodone bitartrate and acetaminophen tablets exposes patients and other users to the risks of opioid addiction, abuse, and misuse, which can lead to overdose and death. Assess each patient’s risk prior to prescribing hydrocodone bitartrate and acetaminophen tablets, and monitor all patients regularly for the development of these behaviors or conditions [see WARNINGS]. Opioid Analgesic Risk Evaluation and Mitigation Strategy (REMS): To ensure that the benefits of opioid analgesics outweigh the risks of addiction, abuse, and misuse, the Food and Drug Administration (FDA) has required a REMS for these products [see WARNINGS]. Under the requirements of the REMS, drug companies with approved opioid analgesic products must make REMS-compliant education programs available to healthcare providers. Healthcare providers are strongly encouraged to complete a REMS-compliant education program, counsel patients and/or their caregivers, with every prescription, on safe use, serious risks, storage, and disposal of these products, emphasize to patients and their caregivers the importance of reading the Medication Guide every time it is provided by their pharmacist, and Consider other tools to improve patient, household, and community safety. Life-Threatening Respiratory Depression Serious, life-threatening, or fatal respiratory depression may occur with use of hydrocodone bitartrate and acetaminophen tablets. Monitor for respiratory depression, especially during initiation of hydrocodone bitartrate and acetaminophen tablets or following a dose increase [seeWARNINGS]. Accidental Ingestion Accidental ingestion of hydrocodone bitartrate and acetaminophen tablets, especially by children, can result in a fatal overdose of Hydrocodone bitartrate and acetaminophen tablets [seeWARNINGS]. Neonatal Opioid Withdrawal Syndrome Prolonged use of hydrocodone bitartrate and acetaminophen tablets during pregnancy can result in neonatal opioid withdrawal syndrome, which may be life-threatening if not recognized and treated, and requires management according to protocols developed by neonatology experts. If opioid use is required for a prolonged period in a pregnant woman, advise the patient of the risk of neonatal opioid withdrawal syndrome and ensure that appropriate treatment will be available [seeWARNINGS]. Cytochrome P450 3A4 Interaction The concomitant use of Hydrocodone Bitartrate and Acetaminophen Tablets with all Cytochrome P450 3A4 inhibitors may result in an increase in hydrocodone plasma concentrations, which could increase or prolong adverse reactions and may cause potentially fatal respiratory depression. In addition, discontinuation of a concomitantly used Cytochrome P450 3A4 inducer may result in an increase in hydrocodone plasma concentrations. Monitor patients receiving Hydrocodone Bitartrate and Acetaminophen Tablets and any Cytochrome P450 3A4 inhibitor or inducer for signs of respiratory depression or sedation [see CLINICAL PHARMACOLOGY, WARNINGS, PRECAUTIONS;DRUG INTERACTIONS]. Hepatotoxicity Acetaminophen has been associated with cases of acute liver failure, at times resulting in liver transplant and death. Most of the cases of liver injury are associated with the use of acetaminophen at doses that exceed 4,000 milligrams per day, and often involve more than one acetaminophen-containing product (see WARNINGS). Risks from concomitant use with Benzodiazepines or other CNS Depressants Concomitant use of opiods with benzodiazepines or other central nervous system (CNS) depressants including alcohol, may result in profound sedation, respiratory depression, coma, and death. [seeWARNINGS,PRECAUTIONS; DRUG INTERACTIONS] • Reserve concomitant prescribing of Hydrocodone bitartrate and acetaminophen tablets and benzodiazepines or other CNS depressants for use in patients for whom alternative treatment options are inadequate. • Limit dosages and durations to the minimum required. • Follow patients for signs and symptoms of respiratory depression and sedation."
## 
## [[3]]
## [1] "BOXED WARNING Cessation of Therapy with TENORMINPatients with coronary artery disease, who are being treated with TENORMIN, should be advised against abrupt discontinuation of therapy. Severe exacerbation of angina and the occurrence of myocardial infarction and ventricular arrhythmias have been reported in angina patients following the abrupt discontinuation of therapy with beta blockers. The last two complications may occur with or without preceding exacerbation of the angina pectoris. As with other beta blockers, when discontinuation of TENORMIN is planned, the patients should be carefully observed and advised to limit physical activity to a minimum. If the angina worsens or acute coronary insufficiency develops, it is recommended that TENORMIN be promptly reinstituted, at least temporarily. Because coronary artery disease is common and may be unrecognized, it may be prudent not to discontinue TENORMIN therapy abruptly even in patients treated only for hypertension. (SeeDOSAGE AND ADMINISTRATION.)"
## 
## [[4]]
## [1] "*Based on a review of this drug (isometheptene mucate) by the National Academy of Sciences - National Research Council and/or other information, FDA has classified the other indication as \"possibly\" effective in the treatment of migraine headache. Final classification of the less-than-effective indication requires further investigation."
## 
## [[5]]
## [1] "BOXED WARNING Hepatotoxicity: Acetaminophen has been associated with cases of acute liver failure, at times resulting in liver transplant and death. Most of the cases of liver injury are associated with the use of acetaminophen at doses that exceed 4000 milligrams per day, and often involve more than one acetaminophen-containing product."
## 
## [[6]]
## [1] "SUICIDALITY AND ANTIDEPRESSANT DRUGS Antidepressants increased the risk compared to placebo of suicidal thinking and behavior (suicidality) in children, adolescents, and young adults in short-term studies of major depressive disorder (MDD) and other psychiatric disorders. Anyone considering the use of nortriptyline hydrochloride or any other antidepressant in a child, adolescent, or young adult must balance this risk with the clinical need. Short-term studies did not show an increase in the risk of suicidality with antidepressants compared to placebo in adults beyond age 24; there was a reduction in risk with antidepressants compared to placebo in adults aged 65 and older. Depression and certain other psychiatric disorders are themselves associated with increases in the risk of suicide. Patients of all ages who are started on antidepressant therapy should be monitored appropriately and observed closely for clinical worsening, suicidality, or unusual changes in behavior. Families and caregivers should be advised of the need for close observation and communication with the prescriber. Nortriptyline hydrochloride is not approved for use in pediatric patients. (See WARNINGS: Clinical Worsening and Suicide Risk , PRECAUTIONS: Information for Patients ; and PRECAUTIONS, Pediatric Use )."

Changeons la boxed_warning qui est une liste en caractère (“unlist”)

# Transformons les [NA,NA] en ["NA","NA"]
for(i in 1:nrow(spl)){
  spl$boxed_warning[i] = paste0(unlist(spl$boxed_warning[i]),collapse = ",")
}

# Délistons la colonne entière, nous partirons de ["NA","NA"] à "NA","NA"
spl$boxed_warning = unlist(spl$boxed_warning)

Voyons si les changements ont été opéré

glimpse(spl)

## Rows: 100,000
## Columns: 2
## $ effective_time <date> 2020-05-07, 2020-05-08, 2020-05-10, 2018-11-08, 201...
## $ boxed_warning  <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"...

Ajoutons des variables additionnelles

# Récupérons les dataframes nommés OpenFDA du Json et Sélectionnons les colonnes qui nous interessent
openfda_1 <- fromJSON(liste_fichier_json[1])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)

openfda_2 <- fromJSON(liste_fichier_json[2])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)
  
openfda_3 <- fromJSON(liste_fichier_json[3])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)

openfda_4 <- fromJSON(liste_fichier_json[4])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)

openfda_5 <- fromJSON(liste_fichier_json[5])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)

# Concatenons les openfda par ligne
spl_openfda <- rbind(openfda_1,openfda_2,openfda_3,openfda_4,openfda_5)

Aperçu de notre variable “openfda_spl”

glimpse(spl_openfda)

## Rows: 100,000
## Columns: 6
## $ manufacturer_name <list> ["Tonic Bath & Products Shenzhen Ltd", "THE KROG...
## $ product_type      <list> ["HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC ...
## $ route             <list> ["TOPICAL", "TOPICAL", "TOPICAL", NULL, "ORAL", ...
## $ generic_name      <list> ["ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTIS...
## $ brand_name        <list> ["Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREE...
## $ substance_name    <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZONE", "HOMOSA...

On peut observer que nous avons dans notre dataframe spl_openfda a 100.000 lignes et 6 colonnes

Colonne “product_type”

Vérifions si nous avons des “NULL” dans la colonne “product_type”

"NULL" %in% (spl_openfda$product_type)

## [1] TRUE

# On peut aussi passer par l'unicité de la colonne pour trouver les 3 valeurs uniques: "HUMAN OTC DRUG", "HUMAN PRESCRIPTION DRUG" et "NULL"
# unique(spl_openfda$product_type)

Changeons les “NULL” en “NA”

spl_openfda$product_type[spl_openfda$product_type == "NULL"] = NA

# Vérifions l'unicité pour voir si les changemants ont bien eu lieu
unique(spl_openfda$product_type)

## [[1]]
## [1] "HUMAN OTC DRUG"
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] "HUMAN PRESCRIPTION DRUG"

Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “product_type”

Délistons la colonne “product_type”

for(i in 1:nrow(spl_openfda)){
      spl_openfda$product_type[i]=paste0(unlist(spl_openfda$product_type[i]),collapse=",")
}

spl_openfda <- spl_openfda %>% mutate(product_type = unlist(product_type))

Colonne “route”

Vérifions si nous avons des “NULL” dans la colonne “route”

"NULL" %in% (spl_openfda$route)

## [1] TRUE

Changeons les “NULL” en “NA”

spl_openfda$route[spl_openfda$route == "NULL"] = NA

# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)
head(unique(spl_openfda$route), 20)

## [[1]]
## [1] "TOPICAL"
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] "ORAL"
## 
## [[4]]
## [1] "BUCCAL"
## 
## [[5]]
## [1] "EXTRACORPOREAL"
## 
## [[6]]
## [1] "DENTAL"
## 
## [[7]]
## [1] "OPHTHALMIC"
## 
## [[8]]
## [1] "RESPIRATORY (INHALATION)"
## 
## [[9]]
## [1] "AURICULAR (OTIC)"
## 
## [[10]]
## [1] "CUTANEOUS"
## 
## [[11]]
## [1] "VAGINAL"
## 
## [[12]]
## [1] "TOPICAL" "DENTAL" 
## 
## [[13]]
## [1] "INTRAMUSCULAR" "PARENTERAL"    "INTRAVENOUS"   "SUBCUTANEOUS" 
## 
## [[14]]
## [1] "INTRAVENOUS"
## 
## [[15]]
## [1] "INTRAMUSCULAR" "INTRAVENOUS"  
## 
## [[16]]
## [1] "PERIODONTAL" "DENTAL"      "ORAL"       
## 
## [[17]]
## [1] "PARENTERAL"
## 
## [[18]]
## [1] "DENTAL" "ORAL"  
## 
## [[19]]
## [1] "SUBLINGUAL"
## 
## [[20]]
## [1] "TRANSDERMAL"

# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$route)

## [1] FALSE

Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “route”

Délistons la colonne “product_type”

for(i in 1:nrow(spl_openfda)){
      spl_openfda$route[i]=paste0(unlist(spl_openfda$route[i]),collapse=",")
}

spl_openfda <- spl_openfda %>% mutate(route=unlist(route))

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

head(unique(spl_openfda$route),20)

##  [1] "TOPICAL"                                          
##  [2] "NA"                                               
##  [3] "ORAL"                                             
##  [4] "BUCCAL"                                           
##  [5] "EXTRACORPOREAL"                                   
##  [6] "DENTAL"                                           
##  [7] "OPHTHALMIC"                                       
##  [8] "RESPIRATORY (INHALATION)"                         
##  [9] "AURICULAR (OTIC)"                                 
## [10] "CUTANEOUS"                                        
## [11] "VAGINAL"                                          
## [12] "TOPICAL,DENTAL"                                   
## [13] "INTRAMUSCULAR,PARENTERAL,INTRAVENOUS,SUBCUTANEOUS"
## [14] "INTRAVENOUS"                                      
## [15] "INTRAMUSCULAR,INTRAVENOUS"                        
## [16] "PERIODONTAL,DENTAL,ORAL"                          
## [17] "PARENTERAL"                                       
## [18] "DENTAL,ORAL"                                      
## [19] "SUBLINGUAL"                                       
## [20] "TRANSDERMAL"

Colonne “manufacturer_name”

Vérifions si nous avons des “NULL” dans la colonne “manufacturer_name”

"NULL" %in% (spl_openfda$manufacturer_name)

## [1] TRUE

Changeons les “NULL” en “NA”

spl_openfda$manufacturer_name[spl_openfda$manufacturer_name == "NULL"] = NA

# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$manufacturer_name), 5)

## [[1]]
## [1] "Tonic Bath & Products Shenzhen Ltd"
## 
## [[2]]
## [1] "THE KROGER COMPANY"
## 
## [[3]]
## [1] "Earth Edition Inc."
## 
## [[4]]
## [1] NA
## 
## [[5]]
## [1] "REMEDYREPACK INC."

# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$manufacturer_name)

## [1] FALSE

Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “manufacturer_name”

Délistons la colonne “manufacturer_name”

for(i in 1:nrow(spl_openfda)){
      spl_openfda$manufacturer_name[i]=paste0(unlist(spl_openfda$manufacturer_name[i]),collapse=",")
}

spl_openfda <- spl_openfda %>% mutate(manufacturer_name=unlist(manufacturer_name))

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)

head(unique(spl_openfda$manufacturer_name),5)

## [1] "Tonic Bath & Products Shenzhen Ltd" "THE KROGER COMPANY"                
## [3] "Earth Edition Inc."                 "NA"                                
## [5] "REMEDYREPACK INC."

Colonne “generic_name”

Vérifions si nous avons des “NULL” dans la colonne “generic_name”

"NULL" %in% (spl_openfda$generic_name)

## [1] TRUE

Changeons les “NULL” en “NA”

spl_openfda$generic_name[spl_openfda$generic_name == "NULL"] = NA

# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$generic_name), 5)

## [[1]]
## [1] "ETHYL ALCOHOL"
## 
## [[2]]
## [1] "AVOBENZONE, HOMOSALATE, OCTISALATE, OXYBENZONE"
## 
## [[3]]
## [1] "ETHANOL"
## 
## [[4]]
## [1] NA
## 
## [[5]]
## [1] "TRIHEXYPHENIDYL HYDROCHLORIDE"

# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$generic_name)

## [1] FALSE

Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “route”

Délistons la colonne “generic_name”

for(i in 1:nrow(spl_openfda)){
      spl_openfda$generic_name[i]=paste0(unlist(spl_openfda$generic_name[i]),collapse=",")
}

spl_openfda <- spl_openfda %>% mutate(generic_name=unlist(generic_name))

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

head(unique(spl_openfda$generic_name),5)

## [1] "ETHYL ALCOHOL"                                 
## [2] "AVOBENZONE, HOMOSALATE, OCTISALATE, OXYBENZONE"
## [3] "ETHANOL"                                       
## [4] "NA"                                            
## [5] "TRIHEXYPHENIDYL HYDROCHLORIDE"

glimpse(spl_openfda)

## Rows: 100,000
## Columns: 6
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type      <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route             <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name      <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name        <list> ["Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREE...
## $ substance_name    <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZONE", "HOMOSA...

Colonne “brand_name”

Vérifions si nous avons des “NULL” dans la colonne “brand_name”

"NULL" %in% (spl_openfda$brand_name)

## [1] TRUE

Changeons les “NULL” en “NA”

spl_openfda$brand_name[spl_openfda$brand_name == "NULL"] = NA

# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$brand_name), 5)

## [[1]]
## [1] "Hand Sanitizer"
## 
## [[2]]
## [1] "KROGER SPORT SPF 15 SUNSCREEN"
## 
## [[3]]
## [1] "EARTH EDITION HAND SANITIZER Gel"
## 
## [[4]]
## [1] NA
## 
## [[5]]
## [1] "TRIHEXYPHENIDYL HYDROCHLORIDE"

# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$brand_name)

## [1] FALSE

Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “brand_name”

Délistons la colonne “brand_name”

for(i in 1:nrow(spl_openfda)){
      spl_openfda$brand_name[i]=paste0(unlist(spl_openfda$brand_name[i]),collapse=",")
}

spl_openfda <- spl_openfda %>% mutate(brand_name=unlist(brand_name))

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

head(unique(spl_openfda$brand_name),5)

## [1] "Hand Sanitizer"                   "KROGER SPORT SPF 15 SUNSCREEN"   
## [3] "EARTH EDITION HAND SANITIZER Gel" "NA"                              
## [5] "TRIHEXYPHENIDYL HYDROCHLORIDE"

glimpse(spl_openfda)

## Rows: 100,000
## Columns: 6
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type      <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route             <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name      <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name        <chr> "Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREEN"...
## $ substance_name    <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZONE", "HOMOSA...

Colonne “substance_name”

Vérifions si nous avons des “NULL” dans la colonne “substance_name”

"NULL" %in% (spl_openfda$substance_name)

## [1] TRUE

Changeons les “NULL” en “NA”

spl_openfda$substance_name[spl_openfda$substance_name == "NULL"] = NA

# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$substance_name), 5)

## [[1]]
## [1] "ALCOHOL"
## 
## [[2]]
## [1] "OCTISALATE" "OXYBENZONE" "HOMOSALATE" "AVOBENZONE"
## 
## [[3]]
## [1] NA
## 
## [[4]]
## [1] "TRIHEXYPHENIDYL HYDROCHLORIDE"
## 
## [[5]]
## [1] "ASPIRIN"

# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$substance_name)

## [1] FALSE

Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “substance_name”

Délistons la colonne “substance_name”

for(i in 1:nrow(spl_openfda)){
      spl_openfda$substance_name[i]=paste0(unlist(spl_openfda$substance_name[i]),collapse=",")
}

spl_openfda <- spl_openfda %>% mutate(substance_name=unlist(substance_name))

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

head(unique(spl_openfda$substance_name),5)

## [1] "ALCOHOL"                                    
## [2] "OCTISALATE,OXYBENZONE,HOMOSALATE,AVOBENZONE"
## [3] "NA"                                         
## [4] "TRIHEXYPHENIDYL HYDROCHLORIDE"              
## [5] "ASPIRIN"

glimpse(spl_openfda)

## Rows: 100,000
## Columns: 6
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type      <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route             <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name      <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name        <chr> "Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREEN"...
## $ substance_name    <chr> "ALCOHOL", "OCTISALATE,OXYBENZONE,HOMOSALATE,AVOB...

Requêtes:

Le nombre de soumissions des plus fréquemment signalées par type de produit et par route.

# Les 20 premières lignes par ordre décroissante
nb_submission <- spl_openfda %>% group_by(product_type, route) %>% summarise(total = n()) %>% arrange(desc(total)) %>% head(20)

## `summarise()` regrouping output by 'product_type' (override with `.groups` argument)

nb_submission

## # A tibble: 20 x 3
## # Groups:   product_type [3]
##    product_type            route                     total
##    <chr>                   <chr>                     <int>
##  1 NA                      NA                        47107
##  2 HUMAN OTC DRUG          TOPICAL                   19209
##  3 HUMAN OTC DRUG          ORAL                      13370
##  4 HUMAN PRESCRIPTION DRUG ORAL                      11882
##  5 HUMAN PRESCRIPTION DRUG INTRAVENOUS                1385
##  6 HUMAN PRESCRIPTION DRUG TOPICAL                    1123
##  7 HUMAN PRESCRIPTION DRUG RESPIRATORY (INHALATION)    683
##  8 HUMAN OTC DRUG          NA                          616
##  9 HUMAN OTC DRUG          DENTAL                      516
## 10 HUMAN PRESCRIPTION DRUG INTRAMUSCULAR,INTRAVENOUS   423
## 11 HUMAN OTC DRUG          EXTRACORPOREAL              382
## 12 HUMAN PRESCRIPTION DRUG NA                          342
## 13 HUMAN OTC DRUG          OPHTHALMIC                  335
## 14 HUMAN PRESCRIPTION DRUG OPHTHALMIC                  312
## 15 HUMAN OTC DRUG          NASAL                       258
## 16 HUMAN OTC DRUG          SUBLINGUAL                  183
## 17 HUMAN PRESCRIPTION DRUG SUBCUTANEOUS                161
## 18 HUMAN PRESCRIPTION DRUG DENTAL                      138
## 19 HUMAN PRESCRIPTION DRUG INTRAMUSCULAR               138
## 20 HUMAN OTC DRUG          RECTAL                      124

Nombre de rapport par type de produit

# Requête
req_4 <- spl_openfda %>% group_by(product_type) %>% summarise(total = n())

## `summarise()` ungrouping output (override with `.groups` argument)

# Graphe
graphe_5 <- req_4 %>% ggplot(aes(product_type, total, fill = total)) +
                      geom_bar(stat = "identity", color = "orange") +
                      theme_light() + 
                      xlab("Type de produit") +
                      scale_fill_gradient(low = "yellow", high = "green") +
                      ggtitle("Nombre de rapport par type de produit") +
                      theme(plot.title = element_text(color = "orange", hjust = 0.5),
                            axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
                            axis.text = element_text(color = "blue"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red"))
graphe_5

Nous pouvons voir que nous avons énormément de données sur “Human OTC Drug” et une très grande quantité de soumissions sont inconnus

Les voies les plus fréquentes pour prendre les médicaments

# Requête
req_5 <- spl_openfda %>% group_by(route) %>% 
                                     summarise(total = n()) %>% 
                                     arrange(desc(total)) %>% 
                                     mutate(route = factor(route,levels = route[order(total,decreasing = FALSE)])) %>% 
                                     head(21)

## `summarise()` ungrouping output (override with `.groups` argument)

# Graphe
graphe_6 <- req_5 %>% ggplot(aes(route, total, fill = total)) +
                      geom_bar(stat = "identity", color = "orange") +
                      theme_light() + 
                      xlab("Total") +
                      ylab("Route") + 
                      scale_fill_gradient(low = "yellow", high = "orange") +
                      ggtitle("Routes les plus fréquentes") +
                      theme(plot.title = element_text(color = "orange", hjust = 0.5),
                            axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
                            axis.text = element_text(color = "blue"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red")) +
                      coord_flip()
graphe_6

La voie la plus utilisé pour prendre les médicaments est la voie Oral puis la voie Topical

Concatenons les deux variables “spl” et “spl_openfda”

spl_complet <- cbind(spl,spl_openfda)
head(spl_complet,10)

##    effective_time boxed_warning                  manufacturer_name
## 1      2020-05-07            NA Tonic Bath & Products Shenzhen Ltd
## 2      2020-05-08            NA                 THE KROGER COMPANY
## 3      2020-05-10            NA                 Earth Edition Inc.
## 4      2018-11-08            NA                                 NA
## 5      2019-11-10            NA                  REMEDYREPACK INC.
## 6      2015-09-25            NA            UniGroup Wholesale Inc.
## 7      2015-08-31            NA                                 NA
## 8      2013-02-19            NA                                 NA
## 9      2018-11-02            NA                                 NA
## 10     2019-12-17            NA                 Crane Safety, LLC.
##               product_type   route
## 1           HUMAN OTC DRUG TOPICAL
## 2           HUMAN OTC DRUG TOPICAL
## 3           HUMAN OTC DRUG TOPICAL
## 4                       NA      NA
## 5  HUMAN PRESCRIPTION DRUG    ORAL
## 6           HUMAN OTC DRUG TOPICAL
## 7                       NA      NA
## 8                       NA      NA
## 9                       NA      NA
## 10          HUMAN OTC DRUG    ORAL
##                                      generic_name
## 1                                   ETHYL ALCOHOL
## 2  AVOBENZONE, HOMOSALATE, OCTISALATE, OXYBENZONE
## 3                                         ETHANOL
## 4                                              NA
## 5                   TRIHEXYPHENIDYL HYDROCHLORIDE
## 6                                   ETHYL ALCOHOL
## 7                                              NA
## 8                                              NA
## 9                                              NA
## 10                                        ASPIRIN
##                          brand_name                              substance_name
## 1                    Hand Sanitizer                                     ALCOHOL
## 2     KROGER SPORT SPF 15 SUNSCREEN OCTISALATE,OXYBENZONE,HOMOSALATE,AVOBENZONE
## 3  EARTH EDITION HAND SANITIZER Gel                                     ALCOHOL
## 4                                NA                                          NA
## 5     TRIHEXYPHENIDYL HYDROCHLORIDE               TRIHEXYPHENIDYL HYDROCHLORIDE
## 6           anti-bacterial hand gel                                     ALCOHOL
## 7                                NA                                          NA
## 8                                NA                                          NA
## 9                                NA                                          NA
## 10             Crane Safety Aspirin                                     ASPIRIN

glimpse(spl_complet)

## Rows: 100,000
## Columns: 8
## $ effective_time    <date> 2020-05-07, 2020-05-08, 2020-05-10, 2018-11-08, ...
## $ boxed_warning     <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "...
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type      <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route             <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name      <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name        <chr> "Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREEN"...
## $ substance_name    <chr> "ALCOHOL", "OCTISALATE,OXYBENZONE,HOMOSALATE,AVOB...

Le nombre de rapports par an par type de produit.

req_6 <- spl_complet %>% mutate(annee = lubridate::year(effective_time)) %>% 
                                                  group_by(annee,product_type) %>% 
                                                  filter(annee > "2008") %>% 
                                                  summarise(total = n())

## `summarise()` regrouping output by 'annee' (override with `.groups` argument)

# Graphe
graphe_7 <- req_6 %>% ggplot(aes(annee,total, fill = product_type)) +
                      geom_bar(stat = "identity", color = "white") + 
                      ggtitle("Le nombre de rapports par an par type de produit") + 
                      xlab("Année") +
                      ylab("Nombre de rapport") +
                      scale_x_continuous(breaks=seq(2008, 2020, by = 1)) +
                      theme(plot.title = element_text(color = "orange", hjust = 0.5),
                            axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red"),
                            legend.title = element_text(colour = "red", size = 10,face = "bold"),
                            axis.text = element_text(color = "blue"),
                            axis.text.x = element_text(angle = 65, vjust = 0.5)) 

  
graphe_7

Quels sont les fabricants ayant le plus grand nombre de rapport chaque année?.

req_7 <- spl_complet %>% mutate(annee = lubridate::year(effective_time)) %>% 
                                                  group_by(annee,manufacturer_name) %>% 
                                                  filter(annee > "2008", nchar(manufacturer_name) > 2) %>% 
                                                  summarise(total = n()) %>% 
                                                  top_n(1,wt = total)

## `summarise()` regrouping output by 'annee' (override with `.groups` argument)

# Graphe
graphe_8 <- req_7 %>% ggplot(aes(annee,total, fill = manufacturer_name)) +
                      geom_bar(stat = "identity", color = "white") + 
                      ggtitle("Les fabricants ayant le plus grand nombre de rapport chaque année") + 
                      xlab("Année") +
                      ylab("Nombre de rapport") +
                      scale_x_continuous(breaks = seq(2008, 2020, by = 1)) +
                      theme(plot.title = element_text(color = "orange", hjust = 0.5, size = 12),
                            axis.title.x = element_text(color = "red", face = "bold"),
                            axis.title.y = element_text(color = "red", face = "bold"),
                            axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
                            legend.background = element_rect(fill = "lightblue", 
                            size=0.5, linetype="solid", colour = "red"),
                            legend.title = element_text(colour = "red", size = 10,face = "bold"),
                            axis.text = element_text(color = "blue"),
                            axis.text.x = element_text(angle = 65, vjust = 0.5))
                                    
graphe_8

Quels sont les noms des marques les plus courants chaque année?.

marques_les_plus_courants <- spl_complet %>% mutate(annee = lubridate::year(effective_time)) %>% 
                                                  group_by(annee, brand_name) %>% 
                                                  filter(annee > "2008",nchar(brand_name) > 2) %>% 
                                                  summarise(total = n()) %>% 
                                                  arrange(desc(total)) %>% 
                                                  top_n(1,wt = total) %>% head(12)

## `summarise()` regrouping output by 'annee' (override with `.groups` argument)

marques_les_plus_courants

## # A tibble: 12 x 3
## # Groups:   annee [10]
##    annee brand_name                                        total
##    <dbl> <chr>                                             <int>
##  1  2020 Hand Sanitizer                                     2020
##  2  2019 Oxygen                                              248
##  3  2018 Ibuprofen                                            31
##  4  2017 Childrens Loratadine Sugar Free                      10
##  5  2016 Hand Sanitizer                                        7
##  6  2015 anti-bacterial hand gel                               5
##  7  2013 Instant Hand Sanitizer                                3
##  8  2014 Sodium Citrate Blood-Pack Units, (PL 146 Plastic)     3
##  9  2012 Fluocinolone Acetonide                                2
## 10  2012 Hydrogen Peroxide                                     2
## 11  2012 Icy Hot Advanced Relief                               2
## 12  2009 Arnica Nettle                                         1

Conclusion:

Dans ce post, nous avons télécharger depuis le web, des données au format JSON à partir de l’API OpenFDA sur les produits pharmaceutique, qui procure des données pour l’étiquetage des médicaments sur ordonnace et en vente libre(OTC). Nous avons ensuite effectuer quelques analyses sur le contenue de nos données à l’aide de différents packages R pour comprendre les données. Lors de mon entretien d’embauche pour intégrer l’entreprise Astrazeneca sis à Cambridge, j’ai eu comme test un “coding style” sur cet API OpenFDA, que j’ai passé avec succès et qui m’a permis de passeraux étapes suivantes et que j’ai effectuer de 2 manières différentes. Ce coding style, je l’ai effectué en python, vous pouvez le retrouver sur la rubrique “python”, avec quelques applications du machine learning qui m’avait été demandé en option dans ce test, pour prédire le nombre d’ingrédients qu’il faudrait dans les années futures, etc…

Manipulation de Fichier JSON avec R

Hurgland-Nick KELIET

Chargeons les librairies

Téléchargeons les fichiers JSON sur OPEN FDA

Vérifions si nous avons télécharger les 5 fichiers .json

Créons une variable qui contiendra les noms des fichiers Json (list.files())

Comprendre les données en analysant un des fichiers Json

Composition de notre Json

Voyons ce que contient la variable meta

Voyons ce que contient la variable Results

Confirmons le type de la variable results

Aperçu de la variable Results

Contenu de la variable Openfda contenu dans results

Fusionnons tous les fichiers pour nos analyses.

Dimension de la variable spl?

Structure de ces 2 colonnes que nous venons de concatener

Comprendre la donnée “effective_time”

Nombre max de caractère

Nombre min de caractère

Nombre de ligne par caractéristique de la variable “effective_time”

Regardons la valeur de “effective_time” qui possède 10 caractères.

Regardons la valeur de “effective_time” qui possède 9 caractères.

Regardons la valeur de “effective_time” qui possède 9 caractères.

Extraction des 8 premiers caractères de “effective_time” ayant 9 et 10 caractères

Convertissons maintenant “effective_time” en format dmy

Nombre de soumissions par jour

Enlevons les données avant 2009

présentation annuelle d’étiquetage des médicaments

L’avertissement le plus fort exigé par la FDA:

Changeons les “NULL” en “NA”

Présentation annuelle d’étiquetage des médicaments avec avertissement encadré

Aperçu des uniques précautions

Changeons la boxed_warning qui est une liste en caractère (“unlist”)

Voyons si les changements ont été opéré

Ajoutons des variables additionnelles

Aperçu de notre variable “openfda_spl”

Colonne “product_type”

Vérifions si nous avons des “NULL” dans la colonne “product_type”

Changeons les “NULL” en “NA”

Délistons la colonne “product_type”

Colonne “route”

Vérifions si nous avons des “NULL” dans la colonne “route”

Changeons les “NULL” en “NA”

Délistons la colonne “product_type”

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

Colonne “manufacturer_name”

Vérifions si nous avons des “NULL” dans la colonne “manufacturer_name”

Changeons les “NULL” en “NA”

Délistons la colonne “manufacturer_name”

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)

Colonne “generic_name”

Vérifions si nous avons des “NULL” dans la colonne “generic_name”

Changeons les “NULL” en “NA”

Délistons la colonne “generic_name”

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

Colonne “brand_name”

Vérifions si nous avons des “NULL” dans la colonne “brand_name”

Changeons les “NULL” en “NA”

Délistons la colonne “brand_name”

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

Colonne “substance_name”

Vérifions si nous avons des “NULL” dans la colonne “substance_name”

Changeons les “NULL” en “NA”

Délistons la colonne “substance_name”

Vérifions l’unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)

Requêtes:

Le nombre de soumissions des plus fréquemment signalées par type de produit et par route.

Nombre de rapport par type de produit

Les voies les plus fréquentes pour prendre les médicaments

Concatenons les deux variables “spl” et “spl_openfda”

Le nombre de rapports par an par type de produit.

Quels sont les fabricants ayant le plus grand nombre de rapport chaque année?.

Quels sont les noms des marques les plus courants chaque année?.

Conclusion: