Dans ce Blog, nous verrons comment faire des analyses avec des données JSON (JavaScript Object Notation) qui est le format de données le plus couramment utilisé pour la communication asynchrone navigateur/serveur. Nous travaillerons avec les données d’étiquetage des médicaments de la Food and Drug Administration de OpenFDA avec lequelle travaille souvent les Data scientist de Astrazeneca: https://open.fda.gov/
# Téléchargement de fichier http
library(downloader)
## Warning: package 'downloader' was built under R version 3.6.1
# Travailler avec Json
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.6.3
# Tidyverse
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts -------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x purrr::flatten() masks jsonlite::flatten()
## x dplyr::lag() masks stats::lag()
# Stringi
library(stringi)
## Warning: package 'stringi' was built under R version 3.6.2
#for(i in 1:5){
#Récupération des URLs
#url = paste0("http://download.open.fda.gov/drug/label/drug-label-000",i,"-of-0009.json.zip")
#Téléchargeons les fichiers
#download(url,dest = "fichier_json.zip")
#Dézippons les fichiers
#unzip("fichier_json.zip")
#}
dir()
## [1] "drug-label-0001-of-0009.json" "drug-label-0002-of-0009.json"
## [3] "drug-label-0003-of-0009.json" "drug-label-0004-of-0009.json"
## [5] "drug-label-0005-of-0009.json" "fichier_json.zip"
## [7] "images.png" "Json_R_image.jpg"
## [9] "Json_R_image.psd" "Json_R_notebook_image.jpg"
## [11] "manip_Json_avec_R.psd" "Manipulation_JSON_avec_R.html"
## [13] "Manipulation_JSON_avec_R.Rmd"
Apparement oui nous avons télécharger et dézipper les fichiers .Json
liste_fichier_json <- list.files(pattern = "*.json", full.names = T)
liste_fichier_json
## [1] "./drug-label-0001-of-0009.json" "./drug-label-0002-of-0009.json"
## [3] "./drug-label-0003-of-0009.json" "./drug-label-0004-of-0009.json"
## [5] "./drug-label-0005-of-0009.json" "./fichier_json.zip"
# Lecture du fichier JSON avec fromJSON
fichier_drug_1 <- fromJSON(liste_fichier_json[1])
names(fichier_drug_1)
## [1] "meta" "results"
Nous ne voyons pas de dataframe ou tableau à l’intérieur de cette variable meta
fichier_drug_1$meta
## $last_updated
## [1] "2020-08-12"
##
## $terms
## [1] "https://open.fda.gov/terms/"
##
## $results
## $results$skip
## [1] 0
##
## $results$total
## [1] 169793
##
## $results$limit
## [1] 20000
##
##
## $license
## [1] "https://open.fda.gov/license/"
##
## $disclaimer
## [1] "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service."
Nous ne voyons pas de dataframe ou tableau à l’intérieur de cette variable meta. Nous voyons que il ne contient que des informations sur les données comme sa date de mise à jour (la dernière), le nombre total de ligne, la limite, la licence et les avertissements
fichier_drug1_results <- fichier_drug_1$results
# head(fichier_drug1_results)
Nous voyons que cela ressemble à un dataframe, mais les données sont pas comme on a l’habitude de les voir. On peut aussi confirmer que le nombre de lignes est identique à celui indiqué dans le méta (Si on utilise Rstudio bien sûre)
class(fichier_drug1_results)
## [1] "data.frame"
glimpse(fichier_drug1_results)
## Rows: 20,000
## Columns: 141
## $ active_ingredient <list> [...
## $ storage_and_handling <list> [...
## $ warnings <list> [...
## $ inactive_ingredient <list> [...
## $ do_not_use <list> [...
## $ openfda <df[,21]> ...
## $ keep_out_of_reach_of_children <list> [...
## $ spl_product_data_elements <list> [...
## $ package_label_principal_display_panel <list> [...
## $ set_id <chr> "a...
## $ dosage_and_administration <list> [...
## $ version <chr> "1...
## $ stop_use <list> [...
## $ purpose <list> [...
## $ when_using <list> [...
## $ effective_time <chr> "2...
## $ id <chr> "a...
## $ indications_and_usage <list> [...
## $ ask_doctor <list> [...
## $ other_safety_information <list> [...
## $ questions <list> [...
## $ warnings_and_cautions <list> [...
## $ instructions_for_use <list> [...
## $ description <list> [...
## $ clinical_pharmacology <list> [...
## $ adverse_reactions <list> [...
## $ how_supplied <list> [...
## $ precautions <list> [...
## $ general_precautions <list> [...
## $ pregnancy_or_breast_feeding <list> [...
## $ spl_unclassified_section <list> [...
## $ ask_doctor_or_pharmacist <list> [...
## $ boxed_warning <list> [...
## $ boxed_warning_table <list> [...
## $ description_table <list> [...
## $ drug_abuse_and_dependence <list> [...
## $ contraindications <list> [...
## $ spl_medguide <list> [...
## $ overdosage <list> [...
## $ references <list> [...
## $ dosage_and_administration_table <list> [...
## $ keep_out_of_reach_of_children_table <list> [...
## $ warnings_table <list> [...
## $ active_ingredient_table <list> [...
## $ nonclinical_toxicology <list> [...
## $ patient_medication_information <list> [...
## $ clinical_studies <list> [...
## $ use_in_specific_populations <list> [...
## $ drug_interactions <list> [...
## $ carcinogenesis_and_mutagenesis_and_impairment_of_fertility <list> [...
## $ pregnancy <list> [...
## $ pediatric_use <list> [...
## $ information_for_patients <list> [...
## $ clinical_pharmacology_table <list> [...
## $ geriatric_use <list> [...
## $ nursing_mothers <list> [...
## $ teratogenic_effects <list> [...
## $ purpose_table <list> [...
## $ laboratory_tests <list> [...
## $ pharmacokinetics <list> [...
## $ nonteratogenic_effects <list> [...
## $ how_supplied_table <list> [...
## $ adverse_reactions_table <list> [...
## $ information_for_owners_or_caregivers <list> [...
## $ indications_and_usage_table <list> [...
## $ safe_handling_warning <list> [...
## $ spl_unclassified_section_table <list> [...
## $ labor_and_delivery <list> [...
## $ controlled_substance <list> [...
## $ abuse <list> [...
## $ drug_and_or_laboratory_test_interactions <list> [...
## $ dependence <list> [...
## $ pharmacodynamics <list> [...
## $ pharmacokinetics_table <list> [...
## $ microbiology <list> [...
## $ dosage_forms_and_strengths <list> [...
## $ clinical_studies_table <list> [...
## $ drug_interactions_table <list> [...
## $ mechanism_of_action <list> [...
## $ animal_pharmacology_and_or_toxicology <list> [...
## $ spl_patient_package_insert <list> [...
## $ recent_major_changes <list> [...
## $ recent_major_changes_table <list> [...
## $ spl_patient_package_insert_table <list> [...
## $ pharmacodynamics_table <list> [...
## $ warnings_and_cautions_table <list> [...
## $ precautions_table <list> [...
## $ user_safety_warnings <list> [...
## $ microbiology_table <list> [...
## $ dosage_forms_and_strengths_table <list> [...
## $ information_for_patients_table <list> [...
## $ package_label_principal_display_panel_table <list> [...
## $ intended_use_of_the_device <list> [...
## $ troubleshooting <list> [...
## $ statement_of_identity <list> [...
## $ route <list> [...
## $ summary_of_safety_and_effectiveness <list> [...
## $ cleaning <list> [...
## $ components <list> [...
## $ spl_medguide_table <list> [...
## $ accessories <list> [...
## $ stop_use_table <list> [...
## $ inactive_ingredient_table <list> [...
## $ pediatric_use_table <list> [...
## $ disposal_and_waste_handling <list> [...
## $ contraindications_table <list> [...
## $ spl_indexing_data_elements <list> [...
## $ storage_and_handling_table <list> [...
## $ guaranteed_analysis_of_feed <list> [...
## $ diagram_of_device <list> [...
## $ animal_pharmacology_and_or_toxicology_table <list> [...
## $ pregnancy_table <list> [...
## $ use_in_specific_populations_table <list> [...
## $ instructions_for_use_table <list> [...
## $ geriatric_use_table <list> [...
## $ drug_and_or_laboratory_test_interactions_table <list> [...
## $ laboratory_tests_table <list> [...
## $ references_table <list> [...
## $ overdosage_table <list> [...
## $ pharmacogenomics <list> [...
## $ risks <list> [...
## $ general_precautions_table <list> [...
## $ drug_abuse_and_dependence_table <list> [...
## $ abuse_table <list> [...
## $ teratogenic_effects_table <list> [...
## $ patient_medication_information_table <list> [...
## $ assembly_or_installation_instructions <list> [...
## $ questions_table <list> [...
## $ nonclinical_toxicology_table <list> [...
## $ carcinogenesis_and_mutagenesis_and_impairment_of_fertility_table <list> [...
## $ nonteratogenic_effects_table <list> [...
## $ when_using_table <list> [...
## $ do_not_use_table <list> [...
## $ health_care_provider_letter <list> [...
## $ health_care_provider_letter_table <list> [...
## $ veterinary_indications <list> [...
## $ pregnancy_or_breast_feeding_table <list> [...
## $ ask_doctor_or_pharmacist_table <list> [...
## $ ask_doctor_table <list> [...
## $ mechanism_of_action_table <list> [...
## $ dependence_table <list> [...
On peut se rendre compte que notre variable possède, 20.000 observations et 141 colonnes. On peut également se rendre compte que ce que nous cherchons c’est un dataframe et nous le voyons dans la variable Openfda inclu dans results
glimpse(fichier_drug1_results$openfda)
## Rows: 20,000
## Columns: 21
## $ manufacturer_name <list> ["Tonic Bath & Products Shenzhen Ltd...
## $ unii <list> ["3K9958V90M", <"G63QQF2NOX", "95OOS...
## $ product_type <list> ["HUMAN OTC DRUG", "HUMAN OTC DRUG",...
## $ rxcui <list> ["2050977", NULL, NULL, NULL, "90528...
## $ spl_set_id <list> ["a517190f-2a21-3016-e053-2a95a90abd...
## $ route <list> ["TOPICAL", "TOPICAL", "TOPICAL", NU...
## $ generic_name <list> ["ETHYL ALCOHOL", "AVOBENZONE, HOMOS...
## $ brand_name <list> ["Hand Sanitizer", "KROGER SPORT SPF...
## $ product_ndc <list> ["43333-484", "30142-808", "76841-48...
## $ substance_name <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZO...
## $ spl_id <list> ["a51710ae-6bcc-a672-e053-2995a90aae...
## $ application_number <list> ["part333A", "part352", "part333A", ...
## $ is_original_packager <list> [TRUE, TRUE, TRUE, NULL, NULL, TRUE,...
## $ package_ndc <list> ["43333-484-08", "30142-808-37", "76...
## $ upc <list> [NULL, "0041260383328", NULL, NULL, ...
## $ original_packager_product_ndc <list> [NULL, NULL, NULL, NULL, "70954-211"...
## $ nui <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_epc <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_pe <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_cs <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
## $ pharm_class_moa <list> [NULL, NULL, NULL, NULL, NULL, NULL,...
Nous pouvons observer 21 colonnes et 20.000 observations, ce qui est déjà mieux.
Nous allons inclure 2 autres variables, “effective_time” et “boxed_warning”
# Affectation des 2 colonnes dans la variable "effective_time" et "boxed_warning"
result_1 <- fromJSON(liste_fichier_json[1])$results %>% select(effective_time,boxed_warning)
result_2 <- fromJSON(liste_fichier_json[2])$results %>% select(effective_time,boxed_warning)
result_3 <- fromJSON(liste_fichier_json[3])$results %>% select(effective_time,boxed_warning)
result_4 <- fromJSON(liste_fichier_json[4])$results %>% select(effective_time,boxed_warning)
result_5 <- fromJSON(liste_fichier_json[5])$results %>% select(effective_time,boxed_warning)
# Concatenons tous ces résultats
spl <- rbind(result_1,result_2,result_3,result_4,result_5)
cat("Notre variable spl contient",nrow(spl),"lignes et",ncol(spl),"colonnes")
## Notre variable spl contient 100000 lignes et 2 colonnes
glimpse(spl)
## Rows: 100,000
## Columns: 2
## $ effective_time <chr> "20200507", "20200508", "20200510", "20181108", "201...
## $ boxed_warning <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
Un bémole ici, boxed_warning est une liste et effective_time un string
cat('Nombre maximal de caractère dans la variable effective_time :',max(nchar(spl$effective_time)))
## Nombre maximal de caractère dans la variable effective_time : 10
cat('Nombre minimal de caractère dans la variable effective_time :',min(nchar(spl$effective_time)))
## Nombre minimal de caractère dans la variable effective_time : 8
spl %>% group_by(nchar(effective_time)) %>% summarise(nombre = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
## `nchar(effective_time)` nombre
## <int> <int>
## 1 8 99998
## 2 9 1
## 3 10 1
Le format de l’heure peut poser un problème compte tenue du fait que nous avons des nombres de caractères différents. une valeur de 9 et 10 caractères pour 99998 pour 8 caractères, il se peut qu’il y’ait une erreur dans ces 2 lignes ayant 8 et 9 caractères.
spl %>% filter(nchar(effective_time) == 10) %>% select(effective_time)
## effective_time
## 1 2009103009
spl %>% filter(nchar(effective_time) == 9) %>% select(effective_time)
## effective_time
## 1 201108178
# Affichons juste les 6 premières lignes
head(spl %>% filter(nchar(effective_time) == 8)) %>% select(effective_time)
## effective_time
## 1 20200507
## 2 20200508
## 3 20200510
## 4 20181108
## 5 20191110
## 6 20150925
spl <- spl %>% mutate(effective_time = stri_sub(effective_time,1,8))
Vérifions si toutes nos lignes ont désormais 8 caractères (nous devons avoir maintenant 100.000)
spl %>% group_by(nchar(effective_time)) %>% summarise(nombre = n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 1 x 2
## `nchar(effective_time)` nombre
## <int> <int>
## 1 8 100000
C’est parfait, toutes nos lignes de “effective_time” ont désormais le nombre de caractères requis
# Convertissons en format date
spl <- spl %>% mutate(effective_time = lubridate::ymd(effective_time))
# Vérifions les si les changements ont eu lieu
glimpse(spl)
## Rows: 100,000
## Columns: 2
## $ effective_time <date> 2020-05-07, 2020-05-08, 2020-05-10, 2018-11-08, 201...
## $ boxed_warning <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
# Requête
soumission_par_jour <- spl %>% group_by(effective_time) %>% summarise(nombre_de_valeur = n())
## `summarise()` ungrouping output (override with `.groups` argument)
# Graphe
graphe_1 <- soumission_par_jour %>% ggplot(aes(effective_time, nombre_de_valeur)) +
geom_line() +
ggtitle("Présentation journalière d'étiquetage des médicaments") +
xlab("Effective time") +
ylab("Total") +
theme_minimal() +
theme(axis.title.x = element_text(color = "red"),
axis.title.y = element_text(color = "red"),
axis.text = element_text(color = "blue"),
plot.title = element_text(size = 15, hjust = 0.5, color = "Orange"))
graphe_1
On peut se rendre compte qu’il y’a peu de données avant 2009
# Requête
req_1 <- spl %>% group_by(effective_time) %>% filter(effective_time > "2008-12-31") %>% summarise(nombre_de_valeur = n())
## `summarise()` ungrouping output (override with `.groups` argument)
# Graphe
graphe_2 <- req_1 %>% ggplot(aes(effective_time, nombre_de_valeur)) +
geom_line() +
ggtitle("Présentation journalière d'étiquetage des médicaments") +
xlab("Effective time") +
ylab("Total") +
theme_minimal() +
theme(axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.text = element_text(color = "blue"),
plot.title = element_text(size = 15, hjust = 0.5, color = "Orange"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red"),
legend.text = element_text(colour = "blue", face = "bold"),
legend.title = element_text(colour = "red", size = 10,face = "bold", hjust = 0.5),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))
graphe_2
ce graphe peut nous informer sur la tendance mais il est toujours illisible et ne nous aide pas énormément. Essayons de trouver le nombre de rapport par année
# Requête
req_2 <- spl %>% mutate(annee = lubridate::year(effective_time)) %>% group_by(annee) %>% filter(annee > "2008") %>% summarise(nombre_de_valeur = n())
## `summarise()` ungrouping output (override with `.groups` argument)
# Graphe
graphe_3 <- req_2 %>% ggplot(aes(annee, nombre_de_valeur, fill = nombre_de_valeur)) +
geom_bar(stat = "identity", color = "grey") +
ggtitle("Présentation annuelle d'étiquetage des médicaments") +
xlab("Année") +
ylab("Total") +
theme_minimal() +
labs(fill = "Total") +
scale_fill_gradient(low = "yellow", high = "red") +
scale_x_continuous(breaks=seq(2008, 2020, by = 1))+
theme(axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.text = element_text(color = "blue"),
plot.title = element_text(size = 20, hjust = 0.5, color = "Orange"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red"),
legend.text = element_text(colour = "blue", face = "bold"),
legend.title = element_text(colour = "red", size = 10,face = "bold", hjust = 0.5),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))
graphe_3
Qui indique que le médicament comporte un risque important d’effets indésirables graves, voire mortels.
# Ici j'ai dû tester plein d'intervalle par contre séquencer par pas de 100
for(i in 16:20){
print(select(spl,boxed_warning)[i,])
}
## [[1]]
## NULL
##
## [[1]]
## NULL
##
## [[1]]
## [1] "WARNING: ADDICTION, ABUSE, AND MISUSE: RISK EVALUATION AND MITIGATION STRATEGY (REMS); LIFE-THREATENING RESPIRATORY DEPRESSION; ACCIDENTAL INGESTION; NEONATAL OPIOID; WITHDRAWAL SYNDROME; CYTOCHROME P450 3A4 INTERACTION; HEPATOTOXICITY AND RISKS FROM CONCOMITANT USE WITH BENZODIAZEPINES OR OTHER CNS DEPRESSANTS Addiction, Abuse, and Misuse Hydrocodone bitartrate and acetaminophen tablets exposes patients and other users to the risks of opioid addiction, abuse, and misuse, which can lead to overdose and death. Assess each patient’s risk prior to prescribing hydrocodone bitartrate and acetaminophen tablets, and monitor all patients regularly for the development of these behaviors or conditions [see WARNINGS]. Opioid Analgesic Risk Evaluation and Mitigation Strategy (REMS): To ensure that the benefits of opioid analgesics outweigh the risks of addiction, abuse, and misuse, the Food and Drug Administration (FDA) has required a REMS for these products [see WARNINGS]. Under the requirements of the REMS, drug companies with approved opioid analgesic products must make REMS-compliant education programs available to healthcare providers. Healthcare providers are strongly encouraged to complete a REMS-compliant education program, counsel patients and/or their caregivers, with every prescription, on safe use, serious risks, storage, and disposal of these products, emphasize to patients and their caregivers the importance of reading the Medication Guide every time it is provided by their pharmacist, and Consider other tools to improve patient, household, and community safety. Life-Threatening Respiratory Depression Serious, life-threatening, or fatal respiratory depression may occur with use of hydrocodone bitartrate and acetaminophen tablets. Monitor for respiratory depression, especially during initiation of hydrocodone bitartrate and acetaminophen tablets or following a dose increase [seeWARNINGS]. Accidental Ingestion Accidental ingestion of hydrocodone bitartrate and acetaminophen tablets, especially by children, can result in a fatal overdose of Hydrocodone bitartrate and acetaminophen tablets [seeWARNINGS]. Neonatal Opioid Withdrawal Syndrome Prolonged use of hydrocodone bitartrate and acetaminophen tablets during pregnancy can result in neonatal opioid withdrawal syndrome, which may be life-threatening if not recognized and treated, and requires management according to protocols developed by neonatology experts. If opioid use is required for a prolonged period in a pregnant woman, advise the patient of the risk of neonatal opioid withdrawal syndrome and ensure that appropriate treatment will be available [seeWARNINGS]. Cytochrome P450 3A4 Interaction The concomitant use of Hydrocodone Bitartrate and Acetaminophen Tablets with all Cytochrome P450 3A4 inhibitors may result in an increase in hydrocodone plasma concentrations, which could increase or prolong adverse reactions and may cause potentially fatal respiratory depression. In addition, discontinuation of a concomitantly used Cytochrome P450 3A4 inducer may result in an increase in hydrocodone plasma concentrations. Monitor patients receiving Hydrocodone Bitartrate and Acetaminophen Tablets and any Cytochrome P450 3A4 inhibitor or inducer for signs of respiratory depression or sedation [see CLINICAL PHARMACOLOGY, WARNINGS, PRECAUTIONS;DRUG INTERACTIONS]. Hepatotoxicity Acetaminophen has been associated with cases of acute liver failure, at times resulting in liver transplant and death. Most of the cases of liver injury are associated with the use of acetaminophen at doses that exceed 4,000 milligrams per day, and often involve more than one acetaminophen-containing product (see WARNINGS). Risks from concomitant use with Benzodiazepines or other CNS Depressants Concomitant use of opiods with benzodiazepines or other central nervous system (CNS) depressants including alcohol, may result in profound sedation, respiratory depression, coma, and death. [seeWARNINGS,PRECAUTIONS; DRUG INTERACTIONS] • Reserve concomitant prescribing of Hydrocodone bitartrate and acetaminophen tablets and benzodiazepines or other CNS depressants for use in patients for whom alternative treatment options are inadequate. • Limit dosages and durations to the minimum required. • Follow patients for signs and symptoms of respiratory depression and sedation."
##
## [[1]]
## NULL
##
## [[1]]
## NULL
Nous pouvons voir quele warning correspond à notre warning que nous recherchons
# changeons les "NULL" en "NA"
spl$boxed_warning[spl$boxed_warning == "NULL"] = NA
# Vérifions si les changements ont été opéré
head(spl$boxed_warning)
## [[1]]
## [1] NA
##
## [[2]]
## [1] NA
##
## [[3]]
## [1] NA
##
## [[4]]
## [1] NA
##
## [[5]]
## [1] NA
##
## [[6]]
## [1] NA
# Requête
req_3 <- spl %>% mutate(annee = lubridate::year(effective_time)) %>% group_by(annee) %>% filter(annee > "2008") %>% summarise(nombre_precaution = sum(!is.na(boxed_warning)))
## `summarise()` ungrouping output (override with `.groups` argument)
# Graphe
graphe_4 <- req_3 %>% ggplot(aes(annee, nombre_precaution, fill = nombre_precaution)) +
geom_bar(stat = "identity", color = "grey") +
ggtitle("Présentation annuelle d'étiquetage des médicaments avec avertissement encadré") +
xlab("Année") +
ylab("Total") +
theme_minimal() +
labs(fill = "Total") +
scale_fill_gradient(low = "green", high = "red") +
scale_x_continuous(breaks=seq(2008, 2020, by = 1))+
theme(axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.text = element_text(color = "blue"),
plot.title = element_text(size = 13, hjust = 0.5, color = "Orange"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red"),
legend.text = element_text(colour = "blue", face = "bold"),
legend.title = element_text(colour = "red", size = 10,face = "bold", hjust = 0.5),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"))
graphe_4
head(unique(spl$boxed_warning))
## [[1]]
## [1] NA
##
## [[2]]
## [1] "WARNING: ADDICTION, ABUSE, AND MISUSE: RISK EVALUATION AND MITIGATION STRATEGY (REMS); LIFE-THREATENING RESPIRATORY DEPRESSION; ACCIDENTAL INGESTION; NEONATAL OPIOID; WITHDRAWAL SYNDROME; CYTOCHROME P450 3A4 INTERACTION; HEPATOTOXICITY AND RISKS FROM CONCOMITANT USE WITH BENZODIAZEPINES OR OTHER CNS DEPRESSANTS Addiction, Abuse, and Misuse Hydrocodone bitartrate and acetaminophen tablets exposes patients and other users to the risks of opioid addiction, abuse, and misuse, which can lead to overdose and death. Assess each patient’s risk prior to prescribing hydrocodone bitartrate and acetaminophen tablets, and monitor all patients regularly for the development of these behaviors or conditions [see WARNINGS]. Opioid Analgesic Risk Evaluation and Mitigation Strategy (REMS): To ensure that the benefits of opioid analgesics outweigh the risks of addiction, abuse, and misuse, the Food and Drug Administration (FDA) has required a REMS for these products [see WARNINGS]. Under the requirements of the REMS, drug companies with approved opioid analgesic products must make REMS-compliant education programs available to healthcare providers. Healthcare providers are strongly encouraged to complete a REMS-compliant education program, counsel patients and/or their caregivers, with every prescription, on safe use, serious risks, storage, and disposal of these products, emphasize to patients and their caregivers the importance of reading the Medication Guide every time it is provided by their pharmacist, and Consider other tools to improve patient, household, and community safety. Life-Threatening Respiratory Depression Serious, life-threatening, or fatal respiratory depression may occur with use of hydrocodone bitartrate and acetaminophen tablets. Monitor for respiratory depression, especially during initiation of hydrocodone bitartrate and acetaminophen tablets or following a dose increase [seeWARNINGS]. Accidental Ingestion Accidental ingestion of hydrocodone bitartrate and acetaminophen tablets, especially by children, can result in a fatal overdose of Hydrocodone bitartrate and acetaminophen tablets [seeWARNINGS]. Neonatal Opioid Withdrawal Syndrome Prolonged use of hydrocodone bitartrate and acetaminophen tablets during pregnancy can result in neonatal opioid withdrawal syndrome, which may be life-threatening if not recognized and treated, and requires management according to protocols developed by neonatology experts. If opioid use is required for a prolonged period in a pregnant woman, advise the patient of the risk of neonatal opioid withdrawal syndrome and ensure that appropriate treatment will be available [seeWARNINGS]. Cytochrome P450 3A4 Interaction The concomitant use of Hydrocodone Bitartrate and Acetaminophen Tablets with all Cytochrome P450 3A4 inhibitors may result in an increase in hydrocodone plasma concentrations, which could increase or prolong adverse reactions and may cause potentially fatal respiratory depression. In addition, discontinuation of a concomitantly used Cytochrome P450 3A4 inducer may result in an increase in hydrocodone plasma concentrations. Monitor patients receiving Hydrocodone Bitartrate and Acetaminophen Tablets and any Cytochrome P450 3A4 inhibitor or inducer for signs of respiratory depression or sedation [see CLINICAL PHARMACOLOGY, WARNINGS, PRECAUTIONS;DRUG INTERACTIONS]. Hepatotoxicity Acetaminophen has been associated with cases of acute liver failure, at times resulting in liver transplant and death. Most of the cases of liver injury are associated with the use of acetaminophen at doses that exceed 4,000 milligrams per day, and often involve more than one acetaminophen-containing product (see WARNINGS). Risks from concomitant use with Benzodiazepines or other CNS Depressants Concomitant use of opiods with benzodiazepines or other central nervous system (CNS) depressants including alcohol, may result in profound sedation, respiratory depression, coma, and death. [seeWARNINGS,PRECAUTIONS; DRUG INTERACTIONS] • Reserve concomitant prescribing of Hydrocodone bitartrate and acetaminophen tablets and benzodiazepines or other CNS depressants for use in patients for whom alternative treatment options are inadequate. • Limit dosages and durations to the minimum required. • Follow patients for signs and symptoms of respiratory depression and sedation."
##
## [[3]]
## [1] "BOXED WARNING Cessation of Therapy with TENORMINPatients with coronary artery disease, who are being treated with TENORMIN, should be advised against abrupt discontinuation of therapy. Severe exacerbation of angina and the occurrence of myocardial infarction and ventricular arrhythmias have been reported in angina patients following the abrupt discontinuation of therapy with beta blockers. The last two complications may occur with or without preceding exacerbation of the angina pectoris. As with other beta blockers, when discontinuation of TENORMIN is planned, the patients should be carefully observed and advised to limit physical activity to a minimum. If the angina worsens or acute coronary insufficiency develops, it is recommended that TENORMIN be promptly reinstituted, at least temporarily. Because coronary artery disease is common and may be unrecognized, it may be prudent not to discontinue TENORMIN therapy abruptly even in patients treated only for hypertension. (SeeDOSAGE AND ADMINISTRATION.)"
##
## [[4]]
## [1] "*Based on a review of this drug (isometheptene mucate) by the National Academy of Sciences - National Research Council and/or other information, FDA has classified the other indication as \"possibly\" effective in the treatment of migraine headache. Final classification of the less-than-effective indication requires further investigation."
##
## [[5]]
## [1] "BOXED WARNING Hepatotoxicity: Acetaminophen has been associated with cases of acute liver failure, at times resulting in liver transplant and death. Most of the cases of liver injury are associated with the use of acetaminophen at doses that exceed 4000 milligrams per day, and often involve more than one acetaminophen-containing product."
##
## [[6]]
## [1] "SUICIDALITY AND ANTIDEPRESSANT DRUGS Antidepressants increased the risk compared to placebo of suicidal thinking and behavior (suicidality) in children, adolescents, and young adults in short-term studies of major depressive disorder (MDD) and other psychiatric disorders. Anyone considering the use of nortriptyline hydrochloride or any other antidepressant in a child, adolescent, or young adult must balance this risk with the clinical need. Short-term studies did not show an increase in the risk of suicidality with antidepressants compared to placebo in adults beyond age 24; there was a reduction in risk with antidepressants compared to placebo in adults aged 65 and older. Depression and certain other psychiatric disorders are themselves associated with increases in the risk of suicide. Patients of all ages who are started on antidepressant therapy should be monitored appropriately and observed closely for clinical worsening, suicidality, or unusual changes in behavior. Families and caregivers should be advised of the need for close observation and communication with the prescriber. Nortriptyline hydrochloride is not approved for use in pediatric patients. (See WARNINGS: Clinical Worsening and Suicide Risk , PRECAUTIONS: Information for Patients ; and PRECAUTIONS, Pediatric Use )."
# Transformons les [NA,NA] en ["NA","NA"]
for(i in 1:nrow(spl)){
spl$boxed_warning[i] = paste0(unlist(spl$boxed_warning[i]),collapse = ",")
}
# Délistons la colonne entière, nous partirons de ["NA","NA"] à "NA","NA"
spl$boxed_warning = unlist(spl$boxed_warning)
glimpse(spl)
## Rows: 100,000
## Columns: 2
## $ effective_time <date> 2020-05-07, 2020-05-08, 2020-05-10, 2018-11-08, 201...
## $ boxed_warning <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"...
# Récupérons les dataframes nommés OpenFDA du Json et Sélectionnons les colonnes qui nous interessent
openfda_1 <- fromJSON(liste_fichier_json[1])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)
openfda_2 <- fromJSON(liste_fichier_json[2])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)
openfda_3 <- fromJSON(liste_fichier_json[3])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)
openfda_4 <- fromJSON(liste_fichier_json[4])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)
openfda_5 <- fromJSON(liste_fichier_json[5])$results$openfda %>% select(manufacturer_name, product_type, route, generic_name, brand_name, substance_name)
# Concatenons les openfda par ligne
spl_openfda <- rbind(openfda_1,openfda_2,openfda_3,openfda_4,openfda_5)
glimpse(spl_openfda)
## Rows: 100,000
## Columns: 6
## $ manufacturer_name <list> ["Tonic Bath & Products Shenzhen Ltd", "THE KROG...
## $ product_type <list> ["HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC ...
## $ route <list> ["TOPICAL", "TOPICAL", "TOPICAL", NULL, "ORAL", ...
## $ generic_name <list> ["ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTIS...
## $ brand_name <list> ["Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREE...
## $ substance_name <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZONE", "HOMOSA...
On peut observer que nous avons dans notre dataframe spl_openfda a 100.000 lignes et 6 colonnes
"NULL" %in% (spl_openfda$product_type)
## [1] TRUE
# On peut aussi passer par l'unicité de la colonne pour trouver les 3 valeurs uniques: "HUMAN OTC DRUG", "HUMAN PRESCRIPTION DRUG" et "NULL"
# unique(spl_openfda$product_type)
spl_openfda$product_type[spl_openfda$product_type == "NULL"] = NA
# Vérifions l'unicité pour voir si les changemants ont bien eu lieu
unique(spl_openfda$product_type)
## [[1]]
## [1] "HUMAN OTC DRUG"
##
## [[2]]
## [1] NA
##
## [[3]]
## [1] "HUMAN PRESCRIPTION DRUG"
Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “product_type”
for(i in 1:nrow(spl_openfda)){
spl_openfda$product_type[i]=paste0(unlist(spl_openfda$product_type[i]),collapse=",")
}
spl_openfda <- spl_openfda %>% mutate(product_type = unlist(product_type))
"NULL" %in% (spl_openfda$route)
## [1] TRUE
spl_openfda$route[spl_openfda$route == "NULL"] = NA
# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 20 premières lignes)
head(unique(spl_openfda$route), 20)
## [[1]]
## [1] "TOPICAL"
##
## [[2]]
## [1] NA
##
## [[3]]
## [1] "ORAL"
##
## [[4]]
## [1] "BUCCAL"
##
## [[5]]
## [1] "EXTRACORPOREAL"
##
## [[6]]
## [1] "DENTAL"
##
## [[7]]
## [1] "OPHTHALMIC"
##
## [[8]]
## [1] "RESPIRATORY (INHALATION)"
##
## [[9]]
## [1] "AURICULAR (OTIC)"
##
## [[10]]
## [1] "CUTANEOUS"
##
## [[11]]
## [1] "VAGINAL"
##
## [[12]]
## [1] "TOPICAL" "DENTAL"
##
## [[13]]
## [1] "INTRAMUSCULAR" "PARENTERAL" "INTRAVENOUS" "SUBCUTANEOUS"
##
## [[14]]
## [1] "INTRAVENOUS"
##
## [[15]]
## [1] "INTRAMUSCULAR" "INTRAVENOUS"
##
## [[16]]
## [1] "PERIODONTAL" "DENTAL" "ORAL"
##
## [[17]]
## [1] "PARENTERAL"
##
## [[18]]
## [1] "DENTAL" "ORAL"
##
## [[19]]
## [1] "SUBLINGUAL"
##
## [[20]]
## [1] "TRANSDERMAL"
# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$route)
## [1] FALSE
Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “route”
for(i in 1:nrow(spl_openfda)){
spl_openfda$route[i]=paste0(unlist(spl_openfda$route[i]),collapse=",")
}
spl_openfda <- spl_openfda %>% mutate(route=unlist(route))
head(unique(spl_openfda$route),20)
## [1] "TOPICAL"
## [2] "NA"
## [3] "ORAL"
## [4] "BUCCAL"
## [5] "EXTRACORPOREAL"
## [6] "DENTAL"
## [7] "OPHTHALMIC"
## [8] "RESPIRATORY (INHALATION)"
## [9] "AURICULAR (OTIC)"
## [10] "CUTANEOUS"
## [11] "VAGINAL"
## [12] "TOPICAL,DENTAL"
## [13] "INTRAMUSCULAR,PARENTERAL,INTRAVENOUS,SUBCUTANEOUS"
## [14] "INTRAVENOUS"
## [15] "INTRAMUSCULAR,INTRAVENOUS"
## [16] "PERIODONTAL,DENTAL,ORAL"
## [17] "PARENTERAL"
## [18] "DENTAL,ORAL"
## [19] "SUBLINGUAL"
## [20] "TRANSDERMAL"
"NULL" %in% (spl_openfda$manufacturer_name)
## [1] TRUE
spl_openfda$manufacturer_name[spl_openfda$manufacturer_name == "NULL"] = NA
# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$manufacturer_name), 5)
## [[1]]
## [1] "Tonic Bath & Products Shenzhen Ltd"
##
## [[2]]
## [1] "THE KROGER COMPANY"
##
## [[3]]
## [1] "Earth Edition Inc."
##
## [[4]]
## [1] NA
##
## [[5]]
## [1] "REMEDYREPACK INC."
# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$manufacturer_name)
## [1] FALSE
Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “manufacturer_name”
for(i in 1:nrow(spl_openfda)){
spl_openfda$manufacturer_name[i]=paste0(unlist(spl_openfda$manufacturer_name[i]),collapse=",")
}
spl_openfda <- spl_openfda %>% mutate(manufacturer_name=unlist(manufacturer_name))
head(unique(spl_openfda$manufacturer_name),5)
## [1] "Tonic Bath & Products Shenzhen Ltd" "THE KROGER COMPANY"
## [3] "Earth Edition Inc." "NA"
## [5] "REMEDYREPACK INC."
"NULL" %in% (spl_openfda$generic_name)
## [1] TRUE
spl_openfda$generic_name[spl_openfda$generic_name == "NULL"] = NA
# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$generic_name), 5)
## [[1]]
## [1] "ETHYL ALCOHOL"
##
## [[2]]
## [1] "AVOBENZONE, HOMOSALATE, OCTISALATE, OXYBENZONE"
##
## [[3]]
## [1] "ETHANOL"
##
## [[4]]
## [1] NA
##
## [[5]]
## [1] "TRIHEXYPHENIDYL HYDROCHLORIDE"
# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$generic_name)
## [1] FALSE
Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “route”
for(i in 1:nrow(spl_openfda)){
spl_openfda$generic_name[i]=paste0(unlist(spl_openfda$generic_name[i]),collapse=",")
}
spl_openfda <- spl_openfda %>% mutate(generic_name=unlist(generic_name))
head(unique(spl_openfda$generic_name),5)
## [1] "ETHYL ALCOHOL"
## [2] "AVOBENZONE, HOMOSALATE, OCTISALATE, OXYBENZONE"
## [3] "ETHANOL"
## [4] "NA"
## [5] "TRIHEXYPHENIDYL HYDROCHLORIDE"
glimpse(spl_openfda)
## Rows: 100,000
## Columns: 6
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name <list> ["Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREE...
## $ substance_name <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZONE", "HOMOSA...
"NULL" %in% (spl_openfda$brand_name)
## [1] TRUE
spl_openfda$brand_name[spl_openfda$brand_name == "NULL"] = NA
# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$brand_name), 5)
## [[1]]
## [1] "Hand Sanitizer"
##
## [[2]]
## [1] "KROGER SPORT SPF 15 SUNSCREEN"
##
## [[3]]
## [1] "EARTH EDITION HAND SANITIZER Gel"
##
## [[4]]
## [1] NA
##
## [[5]]
## [1] "TRIHEXYPHENIDYL HYDROCHLORIDE"
# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$brand_name)
## [1] FALSE
Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “brand_name”
for(i in 1:nrow(spl_openfda)){
spl_openfda$brand_name[i]=paste0(unlist(spl_openfda$brand_name[i]),collapse=",")
}
spl_openfda <- spl_openfda %>% mutate(brand_name=unlist(brand_name))
head(unique(spl_openfda$brand_name),5)
## [1] "Hand Sanitizer" "KROGER SPORT SPF 15 SUNSCREEN"
## [3] "EARTH EDITION HAND SANITIZER Gel" "NA"
## [5] "TRIHEXYPHENIDYL HYDROCHLORIDE"
glimpse(spl_openfda)
## Rows: 100,000
## Columns: 6
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name <chr> "Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREEN"...
## $ substance_name <list> ["ALCOHOL", <"OCTISALATE", "OXYBENZONE", "HOMOSA...
"NULL" %in% (spl_openfda$substance_name)
## [1] TRUE
spl_openfda$substance_name[spl_openfda$substance_name == "NULL"] = NA
# Vérifions l'unicité pour voir si les changemants ont bien eu lieu (aperçu des 5 premières lignes)
head(unique(spl_openfda$substance_name), 5)
## [[1]]
## [1] "ALCOHOL"
##
## [[2]]
## [1] "OCTISALATE" "OXYBENZONE" "HOMOSALATE" "AVOBENZONE"
##
## [[3]]
## [1] NA
##
## [[4]]
## [1] "TRIHEXYPHENIDYL HYDROCHLORIDE"
##
## [[5]]
## [1] "ASPIRIN"
# Y'a t-il encore des "NULL"?
"NULL" %in% (spl_openfda$substance_name)
## [1] FALSE
Nous pouvons voir qu’il n’existe plus de “NULL” dans la colonne “substance_name”
for(i in 1:nrow(spl_openfda)){
spl_openfda$substance_name[i]=paste0(unlist(spl_openfda$substance_name[i]),collapse=",")
}
spl_openfda <- spl_openfda %>% mutate(substance_name=unlist(substance_name))
head(unique(spl_openfda$substance_name),5)
## [1] "ALCOHOL"
## [2] "OCTISALATE,OXYBENZONE,HOMOSALATE,AVOBENZONE"
## [3] "NA"
## [4] "TRIHEXYPHENIDYL HYDROCHLORIDE"
## [5] "ASPIRIN"
glimpse(spl_openfda)
## Rows: 100,000
## Columns: 6
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name <chr> "Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREEN"...
## $ substance_name <chr> "ALCOHOL", "OCTISALATE,OXYBENZONE,HOMOSALATE,AVOB...
# Les 20 premières lignes par ordre décroissante
nb_submission <- spl_openfda %>% group_by(product_type, route) %>% summarise(total = n()) %>% arrange(desc(total)) %>% head(20)
## `summarise()` regrouping output by 'product_type' (override with `.groups` argument)
nb_submission
## # A tibble: 20 x 3
## # Groups: product_type [3]
## product_type route total
## <chr> <chr> <int>
## 1 NA NA 47107
## 2 HUMAN OTC DRUG TOPICAL 19209
## 3 HUMAN OTC DRUG ORAL 13370
## 4 HUMAN PRESCRIPTION DRUG ORAL 11882
## 5 HUMAN PRESCRIPTION DRUG INTRAVENOUS 1385
## 6 HUMAN PRESCRIPTION DRUG TOPICAL 1123
## 7 HUMAN PRESCRIPTION DRUG RESPIRATORY (INHALATION) 683
## 8 HUMAN OTC DRUG NA 616
## 9 HUMAN OTC DRUG DENTAL 516
## 10 HUMAN PRESCRIPTION DRUG INTRAMUSCULAR,INTRAVENOUS 423
## 11 HUMAN OTC DRUG EXTRACORPOREAL 382
## 12 HUMAN PRESCRIPTION DRUG NA 342
## 13 HUMAN OTC DRUG OPHTHALMIC 335
## 14 HUMAN PRESCRIPTION DRUG OPHTHALMIC 312
## 15 HUMAN OTC DRUG NASAL 258
## 16 HUMAN OTC DRUG SUBLINGUAL 183
## 17 HUMAN PRESCRIPTION DRUG SUBCUTANEOUS 161
## 18 HUMAN PRESCRIPTION DRUG DENTAL 138
## 19 HUMAN PRESCRIPTION DRUG INTRAMUSCULAR 138
## 20 HUMAN OTC DRUG RECTAL 124
# Requête
req_4 <- spl_openfda %>% group_by(product_type) %>% summarise(total = n())
## `summarise()` ungrouping output (override with `.groups` argument)
# Graphe
graphe_5 <- req_4 %>% ggplot(aes(product_type, total, fill = total)) +
geom_bar(stat = "identity", color = "orange") +
theme_light() +
xlab("Type de produit") +
scale_fill_gradient(low = "yellow", high = "green") +
ggtitle("Nombre de rapport par type de produit") +
theme(plot.title = element_text(color = "orange", hjust = 0.5),
axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
axis.text = element_text(color = "blue"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red"))
graphe_5
Nous pouvons voir que nous avons énormément de données sur “Human OTC Drug” et une très grande quantité de soumissions sont inconnus
# Requête
req_5 <- spl_openfda %>% group_by(route) %>%
summarise(total = n()) %>%
arrange(desc(total)) %>%
mutate(route = factor(route,levels = route[order(total,decreasing = FALSE)])) %>%
head(21)
## `summarise()` ungrouping output (override with `.groups` argument)
# Graphe
graphe_6 <- req_5 %>% ggplot(aes(route, total, fill = total)) +
geom_bar(stat = "identity", color = "orange") +
theme_light() +
xlab("Total") +
ylab("Route") +
scale_fill_gradient(low = "yellow", high = "orange") +
ggtitle("Routes les plus fréquentes") +
theme(plot.title = element_text(color = "orange", hjust = 0.5),
axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
axis.text = element_text(color = "blue"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red")) +
coord_flip()
graphe_6
La voie la plus utilisé pour prendre les médicaments est la voie Oral puis la voie Topical
spl_complet <- cbind(spl,spl_openfda)
head(spl_complet,10)
## effective_time boxed_warning manufacturer_name
## 1 2020-05-07 NA Tonic Bath & Products Shenzhen Ltd
## 2 2020-05-08 NA THE KROGER COMPANY
## 3 2020-05-10 NA Earth Edition Inc.
## 4 2018-11-08 NA NA
## 5 2019-11-10 NA REMEDYREPACK INC.
## 6 2015-09-25 NA UniGroup Wholesale Inc.
## 7 2015-08-31 NA NA
## 8 2013-02-19 NA NA
## 9 2018-11-02 NA NA
## 10 2019-12-17 NA Crane Safety, LLC.
## product_type route
## 1 HUMAN OTC DRUG TOPICAL
## 2 HUMAN OTC DRUG TOPICAL
## 3 HUMAN OTC DRUG TOPICAL
## 4 NA NA
## 5 HUMAN PRESCRIPTION DRUG ORAL
## 6 HUMAN OTC DRUG TOPICAL
## 7 NA NA
## 8 NA NA
## 9 NA NA
## 10 HUMAN OTC DRUG ORAL
## generic_name
## 1 ETHYL ALCOHOL
## 2 AVOBENZONE, HOMOSALATE, OCTISALATE, OXYBENZONE
## 3 ETHANOL
## 4 NA
## 5 TRIHEXYPHENIDYL HYDROCHLORIDE
## 6 ETHYL ALCOHOL
## 7 NA
## 8 NA
## 9 NA
## 10 ASPIRIN
## brand_name substance_name
## 1 Hand Sanitizer ALCOHOL
## 2 KROGER SPORT SPF 15 SUNSCREEN OCTISALATE,OXYBENZONE,HOMOSALATE,AVOBENZONE
## 3 EARTH EDITION HAND SANITIZER Gel ALCOHOL
## 4 NA NA
## 5 TRIHEXYPHENIDYL HYDROCHLORIDE TRIHEXYPHENIDYL HYDROCHLORIDE
## 6 anti-bacterial hand gel ALCOHOL
## 7 NA NA
## 8 NA NA
## 9 NA NA
## 10 Crane Safety Aspirin ASPIRIN
glimpse(spl_complet)
## Rows: 100,000
## Columns: 8
## $ effective_time <date> 2020-05-07, 2020-05-08, 2020-05-10, 2018-11-08, ...
## $ boxed_warning <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "...
## $ manufacturer_name <chr> "Tonic Bath & Products Shenzhen Ltd", "THE KROGER...
## $ product_type <chr> "HUMAN OTC DRUG", "HUMAN OTC DRUG", "HUMAN OTC DR...
## $ route <chr> "TOPICAL", "TOPICAL", "TOPICAL", "NA", "ORAL", "T...
## $ generic_name <chr> "ETHYL ALCOHOL", "AVOBENZONE, HOMOSALATE, OCTISAL...
## $ brand_name <chr> "Hand Sanitizer", "KROGER SPORT SPF 15 SUNSCREEN"...
## $ substance_name <chr> "ALCOHOL", "OCTISALATE,OXYBENZONE,HOMOSALATE,AVOB...
req_6 <- spl_complet %>% mutate(annee = lubridate::year(effective_time)) %>%
group_by(annee,product_type) %>%
filter(annee > "2008") %>%
summarise(total = n())
## `summarise()` regrouping output by 'annee' (override with `.groups` argument)
# Graphe
graphe_7 <- req_6 %>% ggplot(aes(annee,total, fill = product_type)) +
geom_bar(stat = "identity", color = "white") +
ggtitle("Le nombre de rapports par an par type de produit") +
xlab("Année") +
ylab("Nombre de rapport") +
scale_x_continuous(breaks=seq(2008, 2020, by = 1)) +
theme(plot.title = element_text(color = "orange", hjust = 0.5),
axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red"),
legend.title = element_text(colour = "red", size = 10,face = "bold"),
axis.text = element_text(color = "blue"),
axis.text.x = element_text(angle = 65, vjust = 0.5))
graphe_7
req_7 <- spl_complet %>% mutate(annee = lubridate::year(effective_time)) %>%
group_by(annee,manufacturer_name) %>%
filter(annee > "2008", nchar(manufacturer_name) > 2) %>%
summarise(total = n()) %>%
top_n(1,wt = total)
## `summarise()` regrouping output by 'annee' (override with `.groups` argument)
# Graphe
graphe_8 <- req_7 %>% ggplot(aes(annee,total, fill = manufacturer_name)) +
geom_bar(stat = "identity", color = "white") +
ggtitle("Les fabricants ayant le plus grand nombre de rapport chaque année") +
xlab("Année") +
ylab("Nombre de rapport") +
scale_x_continuous(breaks = seq(2008, 2020, by = 1)) +
theme(plot.title = element_text(color = "orange", hjust = 0.5, size = 12),
axis.title.x = element_text(color = "red", face = "bold"),
axis.title.y = element_text(color = "red", face = "bold"),
axis.line = element_line(colour = "darkblue", size = 1, linetype = "solid"),
legend.background = element_rect(fill = "lightblue",
size=0.5, linetype="solid", colour = "red"),
legend.title = element_text(colour = "red", size = 10,face = "bold"),
axis.text = element_text(color = "blue"),
axis.text.x = element_text(angle = 65, vjust = 0.5))
graphe_8
marques_les_plus_courants <- spl_complet %>% mutate(annee = lubridate::year(effective_time)) %>%
group_by(annee, brand_name) %>%
filter(annee > "2008",nchar(brand_name) > 2) %>%
summarise(total = n()) %>%
arrange(desc(total)) %>%
top_n(1,wt = total) %>% head(12)
## `summarise()` regrouping output by 'annee' (override with `.groups` argument)
marques_les_plus_courants
## # A tibble: 12 x 3
## # Groups: annee [10]
## annee brand_name total
## <dbl> <chr> <int>
## 1 2020 Hand Sanitizer 2020
## 2 2019 Oxygen 248
## 3 2018 Ibuprofen 31
## 4 2017 Childrens Loratadine Sugar Free 10
## 5 2016 Hand Sanitizer 7
## 6 2015 anti-bacterial hand gel 5
## 7 2013 Instant Hand Sanitizer 3
## 8 2014 Sodium Citrate Blood-Pack Units, (PL 146 Plastic) 3
## 9 2012 Fluocinolone Acetonide 2
## 10 2012 Hydrogen Peroxide 2
## 11 2012 Icy Hot Advanced Relief 2
## 12 2009 Arnica Nettle 1
Dans ce post, nous avons télécharger depuis le web, des données au format JSON à partir de l’API OpenFDA sur les produits pharmaceutique, qui procure des données pour l’étiquetage des médicaments sur ordonnace et en vente libre(OTC). Nous avons ensuite effectuer quelques analyses sur le contenue de nos données à l’aide de différents packages R pour comprendre les données. Lors de mon entretien d’embauche pour intégrer l’entreprise Astrazeneca sis à Cambridge, j’ai eu comme test un “coding style” sur cet API OpenFDA, que j’ai passé avec succès et qui m’a permis de passeraux étapes suivantes et que j’ai effectuer de 2 manières différentes. Ce coding style, je l’ai effectué en python, vous pouvez le retrouver sur la rubrique “python”, avec quelques applications du machine learning qui m’avait été demandé en option dans ce test, pour prédire le nombre d’ingrédients qu’il faudrait dans les années futures, etc…