Final-Project

Introduction

The purprose of this project is to make API calls to obtain data from the FDA for adverse drug events. From there, we will preform some basic descriptive analytics, as well as some statistical to try and find meaningful differences. For these calls, we will use the json lite package/httr packages.

Installing jsonlite [1.8.4] ...
    OK [linked cache in 1.2 milliseconds]
Installing httr [1.4.6] ...
    OK [linked cache in 0.96 milliseconds]
Installing tidyverse [2.0.0] ...
    OK [linked cache in 0.99 milliseconds]
* Installed 3 packages in 1.8 seconds.
Warning: package 'ggplot2' was built under R version 4.2.3
Installing MASS [7.3-60] ...
    OK [linked cache in 1.1 milliseconds]
* Installed 1 package in 23 milliseconds.
Installing reshape2 [1.4.4] ...
    OK [linked cache in 0.97 milliseconds]
* Installed 1 package in 56 milliseconds.
Installing reshape [0.8.9] ...
    OK [linked cache in 0.92 milliseconds]
* Installed 1 package in 29 milliseconds.
Warning: package 'MASS' was built under R version 4.2.3

We essentially have 2 data frames, 1 with ‘Less Serious’ events for different drug classes and 1 with ‘Serious’ Events provided by drug class. We take the counts of the top 20 drug classes for each group.

Serious <- GET('https://api.fda.gov/drug/event.json?search=(receivedate:[20040101+TO+20230513])+AND+serious:1&count=patient.drug.openfda.pharm_class_epc.exact')






Serious <- fromJSON(rawToChar(Serious$content))

Serious <- Serious$results

Serious <- Serious[ 1:20, ]
Less.Serious <- GET('https://api.fda.gov/drug/event.json?search=(receivedate:[20040101+TO+20230513])+AND+serious:2&count=patient.drug.openfda.pharm_class_epc.exact')

Less.Serious <- fromJSON(rawToChar(Less.Serious$content))

Less.Serious <- Less.Serious$results
#Take top 20 rows from the data frame
Less.Serious <- Less.Serious[1:20, ]




Event <- rep("Less Serious", 20)

Less.Serious <- cbind.data.frame(Less.Serious, Event)


Event <- rep("Serious", 20)

Serious <- cbind(Serious, Event)
df <- rbind(Less.Serious, Serious)


df <- df[, c(1, 3, 2)]

names(df)[1] <- c("Drug")

df$Drug <- gsub("[EPC]", '', df$Drug)
df$Drug <- gsub("\\[]",'', df$Drug)

df$Event <- as.factor(df$Event)

Descriptive Analysis

We can start by visualizing our two main Events: Less serious and Serious

p<-ggplot(df, aes(x=Event, y=count)) +
  geom_bar(stat="identity", fill="grey") + theme_classic()



p

This visual tells us that our two groups are evenly distributed, meaning if we want to compare groups by percentages.

df <- transform(df,                             # Calculate percentage by group
                       perc = ave(count,
                                  Event,
                                  FUN = prop.table))
ggplot(df, aes(fill=Event, y=perc, x=Drug)) + 
    geom_bar(position="stack", stat="identity", ) + coord_flip( ) + theme_dark()

This graph helps us visualize the contribution of each drug to each group (as a percentage of each group). Interestingly enough, we can see that some drugs appear in one group elusively, and other drugs are distributed evenly amongst the two groups. The Non-steroidal-Anti-inflammatory group is a good example. Drugs that appear in 1 group but not the other can help us gain insight into the safety profile of that given drug.

For example, we can say that progesterone (Used for contaception) are safer then Opined Agonists. This trend lines up with domain knowledge/convention. But for the some groups are not so clear cut, like corticosteroids. We can use a chi square test to measure the effect size for each drug across goups. We will do this for the following drugs: Tumor Necrosis Factor Blocker (TNFB) and
, Non-Steroidal-Anti-Inflammatory-Drug (NSAID)

header <- c("Less Serious ", "Serious")
Drugs <- c("Nonsteroidal Anti-inflammatory Drug 
", "Tumor Necrosis Factor Blocker")

Counts = c(299383, 662824, 429787, 401571)

df |> group_by(Event) |> summarise(total = sum(count))
# A tibble: 2 × 2
  Event          total
  <fct>          <int>
1 Less Serious 3188289
2 Serious      6405174
res <- prop.test(x = c(299383, 662824), n = c(3188033, 6405123))
# Printing the results
res 

    2-sample test for equality of proportions with continuity correction

data:  c(299383, 662824) out of c(3188033, 6405123)
X-squared = 2162.4, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.009972981 -0.009177097
sample estimates:
    prop 1     prop 2 
0.09390838 0.10348341 

This proportion test allows us to see that the difference between the two proportions in not =, and therefore it is likely signigficant

Coclusion

We were able to get a clearer sense of the the drug reported events in recent history, and the degree to which the incidents were classified. We were able to look further to see which drug classess are more assoicated with serious adverse drug events and which ones are not associated with adverse drug events.