According to the Centers for Disease Control and Prevention, every year 48 million people in the United States develop a foodborne illness, 128,000 become hospitalized, and 3,000 will die. Certain factors are known for increasing the likelihood a person becomes severely ill, such as the age and prior health status of the afflicted person. Researchers have identified more than 250 pathogens, toxins, and chemicals that can lead to a foodborne illness. Public health agencies are urging both citizens and state governments to adopt a proactive approach to reduce the number of food-borne illnesses. One of the first steps to improve and change current health practices is to conduct an analysis that helps determine where and when illnesses are more likely to occur and if there are any trends that can predict outbreaks.
I am using the “Foodborne Disease Outbreaks, 1998 - 2015” dataset from Kaggle to determine the correlation between total number of illnesses and various, independent variables. I will be analyzing how the U.S. state, year, location of food preparation, etiology, and status (whether the etiology is confirmed or suspected) impacts total number of illnesses. I assume that the total number of foodborne illnesses are increasing by the year and that foodborne illnesses are more prominent in coastal areas.
I will use the Poisson Regression Analysis because the dependent variable (foodborne illnesses) is a count variable as it measures the amount of times illnesses occur. The Poisson distribution is both discrete and positive.
The dataset “Foodborne Disease Outbreaks, 1998 - 2015” is licensed by the Centers for Disease Control and Prevention. It contains 12 variables (Year. Month, State, Location, Food, Ingredient, Species, Serotype/Genotype, Status, Illnesses, Hospitalizations, Fatalities).
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(texreg)
## Version: 1.36.23
## Date: 2017-03-03
## Author: Philip Leifeld (University of Glasgow)
##
## Please cite the JSS article in your publications -- see citation("texreg").
library(Zelig)
## Loading required package: survival
##
## Attaching package: 'Zelig'
## The following object is masked from 'package:ggplot2':
##
## stat
library(ZeligChoice)
library(readr)
foodborne_disease<-read_csv("C:/Users/Skippz/Downloads/foodborne-diseases.zip")
## Parsed with column specification:
## cols(
## Year = col_double(),
## Month = col_character(),
## State = col_character(),
## Location = col_character(),
## Food = col_character(),
## Ingredient = col_character(),
## Species = col_character(),
## `Serotype/Genotype` = col_character(),
## Status = col_character(),
## Illnesses = col_double(),
## Hospitalizations = col_double(),
## Fatalities = col_double()
## )
fb_diseases<-select(foodborne_disease, "Year", "Month", "State", "Location", "Species", "Status", "Illnesses", "Hospitalizations", "Fatalities")
head(fb_diseases)
## # A tibble: 6 x 9
## Year Month State Location Species Status Illnesses Hospitalizations
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1998 Janu~ Cali~ Restaur~ <NA> <NA> 20 0
## 2 1998 Janu~ Cali~ <NA> <NA> <NA> 112 0
## 3 1998 Janu~ Cali~ Restaur~ <NA> <NA> 35 0
## 4 1998 Janu~ Cali~ Restaur~ Scombr~ Confi~ 4 0
## 5 1998 Janu~ Cali~ Private~ Salmon~ Confi~ 26 3
## 6 1998 Janu~ Cali~ Restaur~ Shigel~ Confi~ 25 3
## # ... with 1 more variable: Fatalities <dbl>
fb_diseases$Month<-as.factor(fb_diseases$Month)
fb_diseases$State<-as.factor(fb_diseases$State)
fb_diseases$Location<-as.factor(fb_diseases$Location)
fb_diseases$Species<-as.factor(fb_diseases$Species)
fb_diseases$Status<-as.factor(fb_diseases$Status)
fb_diseases$Year<-as.integer(fb_diseases$Year)
fb_diseases$Illnesses<-as.integer(fb_diseases$Illnesses)
fb_diseases$Hospitalizations<-as.integer(fb_diseases$Hospitalizations)
fb_diseases$Fatalities<-as.integer(fb_diseases$Fatalities)
fb_diseases$Location[fb_diseases$Location=='All']<- NA
fb_diseases$Species[fb_diseases$Species=='All']<- NA
fb_diseases$Status[fb_diseases$Status=='All']<- NA
fb_diseases$Illnesses[fb_diseases$Illnesses=='All']<- NA
attr(fb_diseases,'spec')<- NULL
str(fb_diseases)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 19119 obs. of 9 variables:
## $ Year : int 1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
## $ Month : Factor w/ 12 levels "April","August",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ State : Factor w/ 55 levels "Alabama","Alaska",..: 5 5 5 5 5 5 5 5 6 6 ...
## $ Location : Factor w/ 161 levels "Banquet Facility",..: 99 NA 99 99 66 99 99 99 99 99 ...
## $ Species : Factor w/ 201 levels "Amnesic shellfish poison",..: NA NA NA 163 134 165 NA 134 199 199 ...
## $ Status : Factor w/ 22 levels "Confirmed","Confirmed; Confirmed",..: NA NA NA 1 1 1 NA 1 14 14 ...
## $ Illnesses : int 20 112 35 4 26 25 8 4 21 3 ...
## $ Hospitalizations: int 0 0 0 0 3 3 0 3 NA NA ...
## $ Fatalities : int 0 0 0 0 0 0 0 0 NA 0 ...
dim(fb_diseases)
## [1] 19119 9
P1<- zelig(Illnesses ~ Year, model = "poisson", data = fb_diseases, cite = F)
P2<- zelig(Illnesses ~ Year + Hospitalizations, model = "poisson", data = fb_diseases, cite = F)
P3<- zelig(Illnesses ~ Year + Hospitalizations + Fatalities, model = "poisson", data = fb_diseases, cite = F)
P4<- zelig(Illnesses ~ Year + Hospitalizations + Fatalities + State, model = "poisson", data = fb_diseases, cite = F)
htmlreg(list(P1,P2,P3, P4))
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | 20.84*** | 37.97*** | 38.26*** | 61.60*** | |
| (0.64) | (0.71) | (0.72) | (0.76) | ||
| Year | -0.01*** | -0.02*** | -0.02*** | -0.03*** | |
| (0.00) | (0.00) | (0.00) | (0.00) | ||
| Hospitalizations | 0.02*** | 0.02*** | 0.02*** | ||
| (0.00) | (0.00) | (0.00) | |||
| Fatalities | 0.01*** | -0.01*** | |||
| (0.00) | (0.00) | ||||
| StateAlaska | 0.46*** | ||||
| (0.04) | |||||
| StateArizona | 0.77*** | ||||
| (0.03) | |||||
| StateArkansas | 1.73*** | ||||
| (0.03) | |||||
| StateCalifornia | 0.73*** | ||||
| (0.03) | |||||
| StateColorado | 0.97*** | ||||
| (0.03) | |||||
| StateConnecticut | 0.81*** | ||||
| (0.03) | |||||
| StateDelaware | 0.75*** | ||||
| (0.07) | |||||
| StateFlorida | 0.23*** | ||||
| (0.03) | |||||
| StateGeorgia | 1.12*** | ||||
| (0.03) | |||||
| StateGuam | 0.51*** | ||||
| (0.11) | |||||
| StateHawaii | 0.08* | ||||
| (0.03) | |||||
| StateIdaho | 1.13*** | ||||
| (0.04) | |||||
| StateIllinois | 0.98*** | ||||
| (0.03) | |||||
| StateIndiana | 1.09*** | ||||
| (0.03) | |||||
| StateIowa | 1.30*** | ||||
| (0.03) | |||||
| StateKansas | 0.67*** | ||||
| (0.03) | |||||
| StateKentucky | 0.87*** | ||||
| (0.05) | |||||
| StateLouisiana | 1.95*** | ||||
| (0.03) | |||||
| StateMaine | 1.13*** | ||||
| (0.03) | |||||
| StateMaryland | 0.41*** | ||||
| (0.03) | |||||
| StateMassachusetts | 1.29*** | ||||
| (0.03) | |||||
| StateMichigan | 0.76*** | ||||
| (0.03) | |||||
| StateMinnesota | 0.70*** | ||||
| (0.03) | |||||
| StateMississippi | 1.52*** | ||||
| (0.04) | |||||
| StateMissouri | 1.23*** | ||||
| (0.03) | |||||
| StateMontana | 1.04*** | ||||
| (0.06) | |||||
| StateMultistate | 1.55*** | ||||
| (0.03) | |||||
| StateNebraska | 1.17*** | ||||
| (0.04) | |||||
| StateNevada | 1.54*** | ||||
| (0.03) | |||||
| StateNew Hampshire | 1.20*** | ||||
| (0.04) | |||||
| StateNew Jersey | 1.06*** | ||||
| (0.03) | |||||
| StateNew Mexico | 1.19*** | ||||
| (0.04) | |||||
| StateNew York | 0.78*** | ||||
| (0.03) | |||||
| StateNorth Carolina | 1.47*** | ||||
| (0.03) | |||||
| StateNorth Dakota | 1.48*** | ||||
| (0.03) | |||||
| StateOhio | 0.68*** | ||||
| (0.03) | |||||
| StateOklahoma | 1.51*** | ||||
| (0.03) | |||||
| StateOregon | 0.78*** | ||||
| (0.03) | |||||
| StatePennsylvania | 0.99*** | ||||
| (0.03) | |||||
| StatePuerto Rico | 0.50*** | ||||
| (0.04) | |||||
| StateRepublic of Palau | -0.62 | ||||
| (0.33) | |||||
| StateRhode Island | 0.84*** | ||||
| (0.04) | |||||
| StateSouth Carolina | 1.44*** | ||||
| (0.03) | |||||
| StateSouth Dakota | 1.08*** | ||||
| (0.05) | |||||
| StateTennessee | 1.08*** | ||||
| (0.03) | |||||
| StateTexas | 1.39*** | ||||
| (0.03) | |||||
| StateUtah | 1.09*** | ||||
| (0.03) | |||||
| StateVermont | 0.32*** | ||||
| (0.05) | |||||
| StateVirginia | 1.22*** | ||||
| (0.03) | |||||
| StateWashington | 0.31*** | ||||
| (0.03) | |||||
| StateWashington DC | 1.68*** | ||||
| (0.04) | |||||
| StateWest Virginia | 1.12*** | ||||
| (0.04) | |||||
| StateWisconsin | 1.10*** | ||||
| (0.03) | |||||
| StateWyoming | 1.21*** | ||||
| (0.04) | |||||
| AIC | 781020.08 | 567059.42 | 549715.12 | 511710.75 | |
| BIC | 781035.80 | 567082.37 | 549745.63 | 512153.03 | |
| Log Likelihood | -390508.04 | -283526.71 | -274853.56 | -255797.38 | |
| Deviance | 703798.68 | 503952.67 | 488038.56 | 449926.19 | |
| Num. obs. | 19119 | 15494 | 15145 | 15145 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
Model 5 is the best model as it contains majority of variables (establishing its complexity) and it has the lowest AIC and BIC scores among the models.
P5<- zelig(Illnesses ~ Year+ Hospitalizations + Fatalities + State, model = "poisson", data = fb_diseases, cite = F)
summary(P5)
## Model:
##
## Call:
## z5$zelig(formula = Illnesses ~ Year + Hospitalizations + Fatalities +
## State, data = fb_diseases)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -41.264 -3.792 -2.477 0.046 107.127
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.160e+01 7.621e-01 80.837 < 2e-16
## Year -2.967e-02 3.804e-04 -77.999 < 2e-16
## Hospitalizations 1.510e-02 6.333e-05 238.462 < 2e-16
## Fatalities -8.533e-03 1.719e-03 -4.964 6.90e-07
## StateAlaska 4.605e-01 4.135e-02 11.137 < 2e-16
## StateArizona 7.691e-01 2.891e-02 26.605 < 2e-16
## StateArkansas 1.734e+00 3.285e-02 52.779 < 2e-16
## StateCalifornia 7.299e-01 2.555e-02 28.561 < 2e-16
## StateColorado 9.661e-01 2.720e-02 35.520 < 2e-16
## StateConnecticut 8.065e-01 2.949e-02 27.350 < 2e-16
## StateDelaware 7.454e-01 7.323e-02 10.178 < 2e-16
## StateFlorida 2.312e-01 2.642e-02 8.750 < 2e-16
## StateGeorgia 1.115e+00 2.745e-02 40.626 < 2e-16
## StateGuam 5.067e-01 1.061e-01 4.775 1.80e-06
## StateHawaii 7.592e-02 3.057e-02 2.484 0.0130
## StateIdaho 1.127e+00 3.589e-02 31.407 < 2e-16
## StateIllinois 9.779e-01 2.583e-02 37.856 < 2e-16
## StateIndiana 1.088e+00 3.078e-02 35.358 < 2e-16
## StateIowa 1.297e+00 2.978e-02 43.569 < 2e-16
## StateKansas 6.726e-01 2.874e-02 23.404 < 2e-16
## StateKentucky 8.704e-01 4.817e-02 18.071 < 2e-16
## StateLouisiana 1.949e+00 2.965e-02 65.739 < 2e-16
## StateMaine 1.129e+00 3.003e-02 37.585 < 2e-16
## StateMaryland 4.091e-01 2.742e-02 14.921 < 2e-16
## StateMassachusetts 1.288e+00 2.878e-02 44.745 < 2e-16
## StateMichigan 7.631e-01 2.666e-02 28.626 < 2e-16
## StateMinnesota 6.986e-01 2.643e-02 26.427 < 2e-16
## StateMississippi 1.524e+00 3.724e-02 40.923 < 2e-16
## StateMissouri 1.226e+00 3.120e-02 39.304 < 2e-16
## StateMontana 1.045e+00 5.713e-02 18.288 < 2e-16
## StateMultistate 1.553e+00 2.695e-02 57.627 < 2e-16
## StateNebraska 1.169e+00 4.499e-02 25.987 < 2e-16
## StateNevada 1.540e+00 3.290e-02 46.807 < 2e-16
## StateNew Hampshire 1.200e+00 3.555e-02 33.755 < 2e-16
## StateNew Jersey 1.055e+00 3.151e-02 33.484 < 2e-16
## StateNew Mexico 1.193e+00 3.796e-02 31.442 < 2e-16
## StateNew York 7.760e-01 2.682e-02 28.936 < 2e-16
## StateNorth Carolina 1.469e+00 2.903e-02 50.589 < 2e-16
## StateNorth Dakota 1.483e+00 3.247e-02 45.685 < 2e-16
## StateOhio 6.790e-01 2.615e-02 25.959 < 2e-16
## StateOklahoma 1.512e+00 3.304e-02 45.758 < 2e-16
## StateOregon 7.847e-01 2.731e-02 28.731 < 2e-16
## StatePennsylvania 9.913e-01 2.694e-02 36.804 < 2e-16
## StatePuerto Rico 4.997e-01 4.177e-02 11.964 < 2e-16
## StateRepublic of Palau -6.159e-01 3.343e-01 -1.843 0.0654
## StateRhode Island 8.427e-01 4.013e-02 21.001 < 2e-16
## StateSouth Carolina 1.442e+00 2.928e-02 49.269 < 2e-16
## StateSouth Dakota 1.084e+00 4.905e-02 22.103 < 2e-16
## StateTennessee 1.075e+00 2.762e-02 38.932 < 2e-16
## StateTexas 1.387e+00 2.739e-02 50.617 < 2e-16
## StateUtah 1.088e+00 3.467e-02 31.375 < 2e-16
## StateVermont 3.215e-01 5.376e-02 5.980 2.23e-09
## StateVirginia 1.219e+00 2.848e-02 42.790 < 2e-16
## StateWashington 3.130e-01 2.763e-02 11.331 < 2e-16
## StateWashington DC 1.685e+00 4.130e-02 40.797 < 2e-16
## StateWest Virginia 1.118e+00 4.187e-02 26.706 < 2e-16
## StateWisconsin 1.099e+00 2.702e-02 40.675 < 2e-16
## StateWyoming 1.211e+00 3.703e-02 32.706 < 2e-16
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 537812 on 15144 degrees of freedom
## Residual deviance: 449926 on 15087 degrees of freedom
## (3974 observations deleted due to missingness)
## AIC: 511711
##
## Number of Fisher Scoring iterations: 6
##
## Next step: Use 'setx' method
x<- setx(P5, Year= mean(fb_diseases$Year))
x1<-setx(P5, Year=mean(fb_diseases$Year)+(sd(fb_diseases$Year)))
s<-sim(P5,x=x,x1=x1)
summary(s)
| sim x : |
|---|
| v |
| mean sd 50% 2.5% 97.5% |
| 1,] 17.23495 0.09388558 17.23708 17.05665 17.42531 |
| v |
| mean sd 50% 2.5% 97.5% |
| 1,] 17.088 4.001033 17 10 25 |
| sim x1 : |
|---|
| v |
| mean sd 50% 2.5% 97.5% |
| 1,] 14.78924 0.08542617 14.788 14.6291 14.95117 |
| v |
| mean sd 50% 2.5% 97.5% |
| 1,] 14.601 3.77174 14 8 23 |
| d |
| mean sd 50% 2.5% 97.5% |
| 1,] -2.445713 0.03201976 -2.445905 -2.510489 -2.381397 |
AAccording to the predicated values, an increase in the year indicates a decrease by -2.45 in the total number of illnesses suggesting that illnesses from foodborne diseases are actually decreasing.