Use the oswego.csv dataset to find the bad food served at the community lunch.

The first step is to load the information into a dataframe, change the appropriate columns to categorical data, then do fourfold plots for all the different foods.

library(stats)
library(dplyr)

oswego <- read.csv("homework_1_oswego.csv")

# only use data about patrons and food (ignore time data)
lunchData <- select(oswego, -X, -age, -meal.time, -onset.date, -onset.time)

# set chr data to factor 
lunchData <- lunchData %>% mutate_if(is.character, as.factor)

# make plots for each food item
fourfoldplot(with(lunchData, table(ill, baked.ham)))

fourfoldplot(with(lunchData, table(ill, spinach)))

fourfoldplot(with(lunchData, table(ill, mashed.potato)))

fourfoldplot(with(lunchData, table(ill, cabbage.salad)))

fourfoldplot(with(lunchData, table(ill, jello)))

fourfoldplot(with(lunchData, table(ill, rolls)))

fourfoldplot(with(lunchData, table(ill, brown.bread)))

fourfoldplot(with(lunchData, table(ill, milk)))

fourfoldplot(with(lunchData, table(ill, coffee)))

fourfoldplot(with(lunchData, table(ill, water)))

fourfoldplot(with(lunchData, table(ill, cakes)))

fourfoldplot(with(lunchData, table(ill, vanilla.ice.cream)))

fourfoldplot(with(lunchData, table(ill, chocolate.ice.cream)))

fourfoldplot(with(lunchData, table(ill, fruit.salad)))

Fourfoldplots with a dark color and a shift in the circle radius in the lower-right quadrant are suspicious. This indicates a higher proportion between the ill than the non-ill groups for this food.

Suspicious foods are: jello, brown bread, cakes, and fruit salad. The most highly suspicious food is vanilla ice cream.

But are these suspicious foods statistically meaningful? Chi-squared test will tell us.

Starting with the merely suss foods:

chisq.test(lunchData$ill, lunchData$fruit.salad)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  lunchData$ill and lunchData$fruit.salad
## X-squared = 1.2052e-32, df = 1, p-value = 1
chisq.test(lunchData$ill, lunchData$cakes)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  lunchData$ill and lunchData$cakes
## X-squared = 0.8737, df = 1, p-value = 0.3499
chisq.test(lunchData$ill, lunchData$brown.bread)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  lunchData$ill and lunchData$brown.bread
## X-squared = 0.21561, df = 1, p-value = 0.6424
chisq.test(lunchData$ill, lunchData$jello)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  lunchData$ill and lunchData$jello
## X-squared = 0.51334, df = 1, p-value = 0.4737

So far, no food has a p-value that is significant (p-value < 0.05), these results indicates that there is no significant differences between the ill and not ill groups when it came to eating these food items.

Let’s look at the highly suspicious vanilla ice cream:

chisq.test(lunchData$ill, lunchData$vanilla.ice.cream)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  lunchData$ill and lunchData$vanilla.ice.cream
## X-squared = 24.537, df = 1, p-value = 7.29e-07

The p-value for vanilla ice cream is very small (p-value of 0.000007), which is much, much less than 0.05. This indicates that there is a significant difference between the ill and non-ill groups for this food.

Looks like the vanilla ice cream is what made people sick!