1. Open a Markdown. Initialize the packages tidyverse, lubridate, forcats, and tibble. Read-in the outbreaks dataset and clean the names. Hide the code and messages (in the knitted document you shouldn’t see any of this being run).
Moving forward, try and use either tabs or headers for each question. At the minimum, each question should be completed in its own chunk. For all (or most) homework assignments moving forward, please show the code (don’t hide it like we did here).
The class of data for ‘Hospitalizations’ is numeric.
class(outbreaks$Hospitalizations) #find class of 'Hospitalizations'
## [1] "numeric"
table(outbreaks$`Primary Mode`, outbreaks$Hospitalizations) #Create a table of Hospitalizations by Primary Mode of transmission
##
## 0 1 2 3
## Animal Contact 228 84 48 24
## Environmental contamination other than food/water 71 20 6 1
## Food 14331 2298 1041 534
## Indeterminate/Other/Unknown 3291 597 261 90
## Person-to-person 17904 3019 1283 574
## Water 937 230 302 130
##
## 4 5 6 7
## Animal Contact 11 11 10 8
## Environmental contamination other than food/water 2 3 1 1
## Food 294 197 131 102
## Indeterminate/Other/Unknown 41 30 19 9
## Person-to-person 255 181 111 56
## Water 88 38 28 14
##
## 8 9 10 11
## Animal Contact 11 1 5 7
## Environmental contamination other than food/water 0 0 0 1
## Food 69 51 55 52
## Indeterminate/Other/Unknown 8 4 5 1
## Person-to-person 37 22 27 15
## Water 11 8 6 11
##
## 12 13 14 15
## Animal Contact 5 4 3 1
## Environmental contamination other than food/water 0 0 0 0
## Food 24 21 23 15
## Indeterminate/Other/Unknown 3 2 1 2
## Person-to-person 13 5 4 4
## Water 5 7 3 7
##
## 16 17 18 19
## Animal Contact 3 1 0 1
## Environmental contamination other than food/water 0 0 0 0
## Food 11 10 10 9
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 4 6 3 2
## Water 2 3 3 4
##
## 20 21 22 23
## Animal Contact 2 2 1 0
## Environmental contamination other than food/water 0 0 0 0
## Food 6 5 8 10
## Indeterminate/Other/Unknown 1 0 0 0
## Person-to-person 2 3 2 1
## Water 0 2 2 2
##
## 24 25 26 27
## Animal Contact 2 2 0 1
## Environmental contamination other than food/water 0 0 0 0
## Food 7 5 1 3
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 3 0 3 2
## Water 2 1 1 0
##
## 28 29 30 31
## Animal Contact 1 3 2 0
## Environmental contamination other than food/water 0 0 0 0
## Food 6 6 5 7
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 1 1 1 0
##
## 32 33 34 35
## Animal Contact 1 0 1 1
## Environmental contamination other than food/water 0 0 0 0
## Food 2 2 4 5
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 2 0 0
## Water 2 0 0 1
##
## 36 37 38 39
## Animal Contact 1 0 1 0
## Environmental contamination other than food/water 0 0 0 0
## Food 5 1 0 2
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 3 0 1 0
## Water 1 0 1 1
##
## 40 41 43 44
## Animal Contact 1 0 0 1
## Environmental contamination other than food/water 0 0 0 0
## Food 1 1 2 0
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 1 1 0
## Water 0 0 1 0
##
## 45 47 48 49
## Animal Contact 2 0 0 1
## Environmental contamination other than food/water 0 0 0 0
## Food 3 1 1 1
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 1 0 1 0
##
## 50 52 53 54
## Animal Contact 0 2 0 0
## Environmental contamination other than food/water 0 0 0 0
## Food 2 3 0 2
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 2 0
## Water 0 1 0 0
##
## 55 56 57 58
## Animal Contact 0 0 0 1
## Environmental contamination other than food/water 0 0 0 0
## Food 2 1 1 1
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 1 0 0
## Water 0 1 0 0
##
## 60 62 64 68
## Animal Contact 1 1 1 0
## Environmental contamination other than food/water 0 0 0 0
## Food 1 0 0 1
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 0 0 1 0
##
## 70 71 72 76
## Animal Contact 0 0 0 1
## Environmental contamination other than food/water 0 0 0 0
## Food 1 2 1 0
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 0 1 0 0
##
## 88 94 101 103
## Animal Contact 0 0 0 0
## Environmental contamination other than food/water 0 0 0 0
## Food 1 2 1 1
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 0 0 0 0
##
## 104 108 109 124
## Animal Contact 0 0 1 0
## Environmental contamination other than food/water 0 0 0 0
## Food 1 1 0 1
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 0 0 0 0
##
## 129 133 143 145
## Animal Contact 0 0 0 0
## Environmental contamination other than food/water 0 0 0 0
## Food 2 1 1 0
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 0 0 0 1
##
## 166 167 179 200
## Animal Contact 0 0 1 0
## Environmental contamination other than food/water 0 0 0 0
## Food 1 1 0 1
## Indeterminate/Other/Unknown 0 0 0 0
## Person-to-person 0 0 0 0
## Water 0 0 0 0
##
## 204 308
## Animal Contact 0 0
## Environmental contamination other than food/water 0 0
## Food 1 1
## Indeterminate/Other/Unknown 0 0
## Person-to-person 0 0
## Water 0 0
Notes:
The table presents the number of
hospitalizations by mode of transmission but further separates each by
month and year. It would be more helpful to form a table using something
like ggplot to separate each by month and year and then either plot or
include in a table.
There are 14,647 missing values in etiology. The percent of the
database that is missing etiology is 25.4%.
outbreaks %>%
summarize( #create a data frame/tibble to create a column for each number and percentage of missing etiology and total etiology
total_etiology = n(),
missing_etiology = sum(is.na(Etiology) | Etiology == ""), #Create a column for the sum of etiology. This includes if the value is NA or if there is missing data
pct_missing = 100 * missing_etiology / total_etiology #calculate the percentage of missing Etiology
) %>%
print() #print the tibble by piping everything prior through print
## # A tibble: 1 × 3
## total_etiology missing_etiology pct_missing
## <int> <int> <dbl>
## 1 57649 14647 25.4
There are 2,680 observations, or outbreaks, in Minnesota.
minnesota <- outbreaks %>%
filter(State %in% "Minnesota")
minnesota
## # A tibble: 2,680 × 21
## Year Month State `Primary Mode` Etiology `Serotype or Genotype`
## <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 2009 1 Minnesota Person-to-person Norovirus Geno… unknown
## 2 2009 1 Minnesota Food Norovirus <NA>
## 3 2009 2 Minnesota Person-to-person Norovirus <NA>
## 4 2009 1 Minnesota Person-to-person Norovirus unkn… <NA>
## 5 2009 1 Minnesota Food Norovirus <NA>
## 6 2009 1 Minnesota Food Norovirus <NA>
## 7 2009 1 Minnesota Food Norovirus <NA>
## 8 2009 1 Minnesota Food Norovirus <NA>
## 9 2009 1 Minnesota Food Norovirus <NA>
## 10 2009 2 Minnesota Food Norovirus <NA>
## # ℹ 2,670 more rows
## # ℹ 15 more variables: `Etiology Status` <chr>, Setting <chr>, Illnesses <dbl>,
## # Hospitalizations <dbl>, `Info on Hospitalizations` <dbl>, Deaths <dbl>,
## # `Info on Deaths` <dbl>, `Food Vehicle` <chr>,
## # `Food Contaminated Ingredient` <chr>, `IFSAC Category` <chr>,
## # `Water Exposure` <chr>, `Water Type` <chr>, `Animal Type` <chr>,
## # `Animal Type Specify` <chr>, `Water Status` <chr>
The year with the highest mean illnesses within the past 10
years was 2018.
The year with the highest mean hospitalization
within the past 10 years was 2017.
#Filter and subset the Minnesota data into a 'limit' dataset
limit <- minnesota %>%
filter(
`Primary Mode` == "Food", #Limit to primary mode as 'food'
`Etiology Status` == "Confirmed", #Limit to confirmed etiologyy status
Hospitalizations >= 1,
Year >= (as.numeric(format(Sys.Date(), "%Y")) - 10) #Limit for only those in the past 10 years
) %>%
select(Year, Month, Etiology, Illnesses, Hospitalizations)
#Calculate mean Illnesses and Hospitalizations by year
year_summary <- limit %>% #Create a summary of illnesses and hospitalizations by year called 'year_summary' by piping the limit dataset
group_by(Year) %>% #Group each by year
summarise(
mean_illnesses = mean(Illnesses, na.rm = TRUE), #summarize the mean of the illnesses by year
mean_hospitalizations = mean(Hospitalizations, na.rm = TRUE) #summarize the mean of hospitalizations by year
)
year_summary #show the results in a tibble.
## # A tibble: 5 × 3
## Year mean_illnesses mean_hospitalizations
## <dbl> <dbl> <dbl>
## 1 2015 6 1.4
## 2 2016 11.8 1
## 3 2017 13.4 2.6
## 4 2018 19 2.5
## 5 2019 7.14 1.71
Notes Lab3_MRoybal.html