Project 2

Does consuming peanuts early cause fewer instances of allergic reactions later on as opposed to avoiding eating them, for those children at risk of developing a peanut allergy?

I am testing for a difference in proportion at the 5% significance level between those who ate peanuts and had an allergic reaction after the study compared to those who avoided them and had an allergic reaction after the study.

My hypothesis is that those who eat peanuts will have fewer allergic reactions than those who avoid them.

\(H_0\): \(p_1\) = \(p_2\)

\(H_a\): \(p_1\) < \(p_2\)

Where \(p_1\) is the proportion of patients who have an allergic reaction after 5 years of consuming peanuts \(p_2\) is the proportion of patients who have an allergic reaction after 5 years of avoiding peanuts.

Introduction

I will be using the Learning Early about Peanut Allergy data set from the openintro.org website at https://www.openintro.org/data/index.php?data=LEAP

Here is some more information about the study directly from the Openintro.org website.

“The study team enrolled children in the United Kingdom between 2006 and 2009, selecting 640 infants with eczema, egg allergy, or both. Each child was randomly assigned to a treatment group (peanut consumption) or the control group (peanut avoidance); children in the treatment group were fed at least 6 grams of peanut protein daily until 5 years of age, while children in the control group were to avoid consuming peanut protein until 5 years of age.

At 5 years of age, each child was tested for peanut allergy using an oral food challenge (OFC): 5 grams of peanut protein in a single dose.

This dataset only contains the patients in the primary ITT analysis in the New England Journal of Medicine paper. This means it only includes the children eligible for the study because they are positive for an egg allergy and/or eczema and negative for skin test of peanut allergy.”

There are 530 observations and 13 variables in the dataset, including the child’s age, sex, primary ethnicity among others. I will be using 2 variables; treatment.group (did the patient consume peanuts or not) and overall.V60.outcome (did the patient have an allergic reaction) to answer my question.

I chose it because I was interested to know if the proportion of patients who have an allergic reaction after 5 years of consuming peanuts would be less than those who did not have exposure to peanuts.

Data Anaylsis

I am going to look at the dimensions and head of the data as well as check for any missing information. I will use group by and summarise to look at the proportion of allergic reactions after 5 years for those who ate peanuts and those who did not.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(corrplot)

## corrplot 0.95 loaded

library(lubridate)
library(dplyr)


setwd("~/Downloads/Data 101 Course materials/Data Sets")
LEAP <- read.csv("LEAP.csv")

str(LEAP)

## 'data.frame':    530 obs. of  6 variables:
##  $ participant.ID     : chr  "LEAP_100522" "LEAP_103358" "LEAP_105069" "LEAP_105328" ...
##  $ treatment.group    : chr  "Peanut Consumption" "Peanut Consumption" "Peanut Avoidance" "Peanut Consumption" ...
##  $ age.months         : num  6.08 7.59 5.98 7.03 6.41 ...
##  $ sex                : chr  "Female" "Female" "Male" "Female" ...
##  $ primary.ethnicity  : chr  "Black" "White" "White" "White" ...
##  $ overall.V60.outcome: chr  "PASS OFC" "PASS OFC" "PASS OFC" "PASS OFC" ...

head(LEAP)

##   participant.ID    treatment.group age.months    sex primary.ethnicity
## 1    LEAP_100522 Peanut Consumption     6.0780 Female             Black
## 2    LEAP_103358 Peanut Consumption     7.5893 Female             White
## 3    LEAP_105069   Peanut Avoidance     5.9795   Male             White
## 4    LEAP_105328 Peanut Consumption     7.0308 Female             White
## 5    LEAP_106377   Peanut Avoidance     6.4066   Male             White
## 6    LEAP_107031 Peanut Consumption     6.0452 Female             White
##   overall.V60.outcome
## 1            PASS OFC
## 2            PASS OFC
## 3            PASS OFC
## 4            PASS OFC
## 5            PASS OFC
## 6            PASS OFC

colSums(is.na(LEAP))

##      participant.ID     treatment.group          age.months                 sex 
##                   0                   0                   0                   0 
##   primary.ethnicity overall.V60.outcome 
##                   0                   0

There are no missing data.

df <- LEAP |>
group_by(overall.V60.outcome, treatment.group)  |>
  summarise(outcomes= n()
)

## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by overall.V60.outcome and treatment.group.
## ℹ Output is grouped by overall.V60.outcome.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(overall.V60.outcome, treatment.group))` for
##   per-operation grouping (`?dplyr::dplyr_by`) instead.

df

## # A tibble: 4 × 3
## # Groups:   overall.V60.outcome [2]
##   overall.V60.outcome treatment.group    outcomes
##   <chr>               <chr>                 <int>
## 1 FAIL OFC            Peanut Avoidance         36
## 2 FAIL OFC            Peanut Consumption        5
## 3 PASS OFC            Peanut Avoidance        227
## 4 PASS OFC            Peanut Consumption      262

The number 5 for who ate peanuts and had an allergic reaction is right on the border for being able to use the difference in proportions test. If it was less than 5 an alternative test would need to be used.

names(df) <- gsub("\\.", "_", names(df))
names(df)

## [1] "overall_V60_outcome" "treatment_group"     "outcomes"

data <- matrix(c(5, 36,262, 227), nrow = 2, byrow = TRUE)
colnames(data) <- c("Consumed Peanuts", "Avoided Peanuts")
rownames(data) <- c("Had Allergic Reaction", "No Allergic Reaction")
data

##                       Consumed Peanuts Avoided Peanuts
## Had Allergic Reaction                5              36
## No Allergic Reaction               262             227

barplot(data,
        beside = FALSE,   
        col = c("skyblue", "orange"),
        main = "Visualization",
        xlab = "Peanut Consumption",
        ylab = "Number of Allergic Reactions")

Statistical Analysis

My hypothesis is that those who eat peanuts will have fewer allergic reactions than those who avoid them.

\(H_0\): \(p_1\) = \(p_2\)

\(H_a\): \(p_1\) < \(p_2\)

prop.test(c(5, 36), c(267, 263), alternative = "less")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(5, 36) out of c(267, 263)
## X-squared = 24.286, df = 1, p-value = 4.151e-07
## alternative hypothesis: less
## 95 percent confidence interval:
##  -1.00000000 -0.07694385
## sample estimates:
##     prop 1     prop 2 
## 0.01872659 0.13688213

Conclusion

The p-value is 4.151e-07 which is less than alpha at 0.05.

Therefore, the results are statistically significant. We reject the null hypothesis that there is no difference between the two groups. We have evidence that consuming peanuts early causes fewer instances of allergic reactions than avoiding peanuts in those at risk of developing a peanut allergy.

This is helpful information for parents to know, that introducing peanuts to children at an early age may help prevent them from developing an allergic reaction to peanuts later on.

Future Steps

It may be interesting to look at whether there was a difference in allergic reactions between males and females who ate peanuts and males and females who both avoided peanuts. It would also be an idea to test the children for an allergic reaction at a younger age to see if the amount of time consuming peanuts plays a significant role or not.

Sources

https://www.openintro.org/data/index.php?data=LEAP

Du Toit, George, et al. Randomized trial of peanut consumption in infants at risk for peanut allergy. New England Journal of Medicine 372.9 (2015): 803-813.