1. Open a Markdown. Initialize the packages tidyverse, lubridate, forcats, and tibble. Read-in the outbreaks dataset and clean the names. Hide the code and messages (in the knitted document you shouldn’t see any of this being run).

Moving forward, try and use either tabs or headers for each question. At the minimum, each question should be completed in its own chunk. For all (or most) homework assignments moving forward, please show the code (don’t hide it like we did here).


2. Check the data type/class of hospitalizations. Generate a table or histogram of the hospitalizations. Leave a brief note about what you see. Hint: table() or hist()


The class of data for ‘Hospitalizations’ is numeric.

class(outbreaks$Hospitalizations) #find class of 'Hospitalizations'
## [1] "numeric"
table(outbreaks$`Primary Mode`, outbreaks$Hospitalizations) #Create a table of Hospitalizations by Primary Mode of transmission 
##                                                    
##                                                         0     1     2     3
##   Animal Contact                                      228    84    48    24
##   Environmental contamination other than food/water    71    20     6     1
##   Food                                              14331  2298  1041   534
##   Indeterminate/Other/Unknown                        3291   597   261    90
##   Person-to-person                                  17904  3019  1283   574
##   Water                                               937   230   302   130
##                                                    
##                                                         4     5     6     7
##   Animal Contact                                       11    11    10     8
##   Environmental contamination other than food/water     2     3     1     1
##   Food                                                294   197   131   102
##   Indeterminate/Other/Unknown                          41    30    19     9
##   Person-to-person                                    255   181   111    56
##   Water                                                88    38    28    14
##                                                    
##                                                         8     9    10    11
##   Animal Contact                                       11     1     5     7
##   Environmental contamination other than food/water     0     0     0     1
##   Food                                                 69    51    55    52
##   Indeterminate/Other/Unknown                           8     4     5     1
##   Person-to-person                                     37    22    27    15
##   Water                                                11     8     6    11
##                                                    
##                                                        12    13    14    15
##   Animal Contact                                        5     4     3     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                 24    21    23    15
##   Indeterminate/Other/Unknown                           3     2     1     2
##   Person-to-person                                     13     5     4     4
##   Water                                                 5     7     3     7
##                                                    
##                                                        16    17    18    19
##   Animal Contact                                        3     1     0     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                 11    10    10     9
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      4     6     3     2
##   Water                                                 2     3     3     4
##                                                    
##                                                        20    21    22    23
##   Animal Contact                                        2     2     1     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  6     5     8    10
##   Indeterminate/Other/Unknown                           1     0     0     0
##   Person-to-person                                      2     3     2     1
##   Water                                                 0     2     2     2
##                                                    
##                                                        24    25    26    27
##   Animal Contact                                        2     2     0     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  7     5     1     3
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      3     0     3     2
##   Water                                                 2     1     1     0
##                                                    
##                                                        28    29    30    31
##   Animal Contact                                        1     3     2     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  6     6     5     7
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 1     1     1     0
##                                                    
##                                                        32    33    34    35
##   Animal Contact                                        1     0     1     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  2     2     4     5
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     2     0     0
##   Water                                                 2     0     0     1
##                                                    
##                                                        36    37    38    39
##   Animal Contact                                        1     0     1     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  5     1     0     2
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      3     0     1     0
##   Water                                                 1     0     1     1
##                                                    
##                                                        40    41    43    44
##   Animal Contact                                        1     0     0     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  1     1     2     0
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     1     1     0
##   Water                                                 0     0     1     0
##                                                    
##                                                        45    47    48    49
##   Animal Contact                                        2     0     0     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  3     1     1     1
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 1     0     1     0
##                                                    
##                                                        50    52    53    54
##   Animal Contact                                        0     2     0     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  2     3     0     2
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     2     0
##   Water                                                 0     1     0     0
##                                                    
##                                                        55    56    57    58
##   Animal Contact                                        0     0     0     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  2     1     1     1
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     1     0     0
##   Water                                                 0     1     0     0
##                                                    
##                                                        60    62    64    68
##   Animal Contact                                        1     1     1     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  1     0     0     1
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 0     0     1     0
##                                                    
##                                                        70    71    72    76
##   Animal Contact                                        0     0     0     1
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  1     2     1     0
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 0     1     0     0
##                                                    
##                                                        88    94   101   103
##   Animal Contact                                        0     0     0     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  1     2     1     1
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 0     0     0     0
##                                                    
##                                                       104   108   109   124
##   Animal Contact                                        0     0     1     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  1     1     0     1
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 0     0     0     0
##                                                    
##                                                       129   133   143   145
##   Animal Contact                                        0     0     0     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  2     1     1     0
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 0     0     0     1
##                                                    
##                                                       166   167   179   200
##   Animal Contact                                        0     0     1     0
##   Environmental contamination other than food/water     0     0     0     0
##   Food                                                  1     1     0     1
##   Indeterminate/Other/Unknown                           0     0     0     0
##   Person-to-person                                      0     0     0     0
##   Water                                                 0     0     0     0
##                                                    
##                                                       204   308
##   Animal Contact                                        0     0
##   Environmental contamination other than food/water     0     0
##   Food                                                  1     1
##   Indeterminate/Other/Unknown                           0     0
##   Person-to-person                                      0     0
##   Water                                                 0     0

Notes:
The table presents the number of hospitalizations by mode of transmission but further separates each by month and year. It would be more helpful to form a table using something like ggplot to separate each by month and year and then either plot or include in a table.


3. Check how many missing values are in etiology and calculate what percent of the dataset that is. Bonus point if you can write the percentage with in-text code.


There are 14,647 missing values in etiology. The percent of the database that is missing etiology is 25.4%.

outbreaks %>% 
  summarize( #create a data frame/tibble to create a column for each number and percentage of missing etiology and total etiology
    total_etiology = n(),
    missing_etiology = sum(is.na(Etiology) | Etiology == ""), #Create a column for the sum of etiology. This includes if the value is NA or if there is missing data
    pct_missing = 100 * missing_etiology / total_etiology #calculate the percentage of missing Etiology
  ) %>% 
  print() #print the tibble by piping everything prior through print 
## # A tibble: 1 × 3
##   total_etiology missing_etiology pct_missing
##            <int>            <int>       <dbl>
## 1          57649            14647        25.4


4. Create a new dataset by filtering/subseting the outbreaks dataset to only include outbreaks from Minnesota. Name it appropriately. How many outbreaks are in this new dataset?


There are 2,680 observations, or outbreaks, in Minnesota.

minnesota <- outbreaks %>%
  filter(State %in% "Minnesota")
minnesota
## # A tibble: 2,680 × 21
##     Year Month State     `Primary Mode`   Etiology        `Serotype or Genotype`
##    <dbl> <dbl> <chr>     <chr>            <chr>           <chr>                 
##  1  2009     1 Minnesota Person-to-person Norovirus Geno… unknown               
##  2  2009     1 Minnesota Food             Norovirus       <NA>                  
##  3  2009     2 Minnesota Person-to-person Norovirus       <NA>                  
##  4  2009     1 Minnesota Person-to-person Norovirus unkn… <NA>                  
##  5  2009     1 Minnesota Food             Norovirus       <NA>                  
##  6  2009     1 Minnesota Food             Norovirus       <NA>                  
##  7  2009     1 Minnesota Food             Norovirus       <NA>                  
##  8  2009     1 Minnesota Food             Norovirus       <NA>                  
##  9  2009     1 Minnesota Food             Norovirus       <NA>                  
## 10  2009     2 Minnesota Food             Norovirus       <NA>                  
## # ℹ 2,670 more rows
## # ℹ 15 more variables: `Etiology Status` <chr>, Setting <chr>, Illnesses <dbl>,
## #   Hospitalizations <dbl>, `Info on Hospitalizations` <dbl>, Deaths <dbl>,
## #   `Info on Deaths` <dbl>, `Food Vehicle` <chr>,
## #   `Food Contaminated Ingredient` <chr>, `IFSAC Category` <chr>,
## #   `Water Exposure` <chr>, `Water Type` <chr>, `Animal Type` <chr>,
## #   `Animal Type Specify` <chr>, `Water Status` <chr>


5. Filter and subset the Minnesota only dataset to only include whose primary mode of transmission was food, with a confirmed etiology status, which had at least one hospitalization, and which occurred in the past 10 years. Reduce the dataset to only the variables year, month, etiology, illnesses, and hospitalizations. What year had the highest mean illnesses and what year had the highest mean hospitalizations? Hint: tapply(outcome, year, summary)


The year with the highest mean illnesses within the past 10 years was 2018.
The year with the highest mean hospitalization within the past 10 years was 2017.

#Filter and subset the Minnesota data into a 'limit' dataset
limit <- minnesota %>%
  filter(
    `Primary Mode` == "Food", #Limit to primary mode as 'food'
    `Etiology Status` == "Confirmed", #Limit to confirmed etiologyy status
    Hospitalizations >= 1,
    Year >= (as.numeric(format(Sys.Date(), "%Y")) - 10)  #Limit for only those in the past 10 years
  ) %>%
  select(Year, Month, Etiology, Illnesses, Hospitalizations)

#Calculate mean Illnesses and Hospitalizations by year
year_summary <- limit %>% #Create a summary of illnesses and hospitalizations by year called 'year_summary' by piping the limit dataset
  group_by(Year) %>% #Group each by year
  summarise(
    mean_illnesses = mean(Illnesses, na.rm = TRUE), #summarize the mean of the illnesses by year
    mean_hospitalizations = mean(Hospitalizations, na.rm = TRUE) #summarize the mean of hospitalizations by year
  )

year_summary #show the results in a tibble.
## # A tibble: 5 × 3
##    Year mean_illnesses mean_hospitalizations
##   <dbl>          <dbl>                 <dbl>
## 1  2015           6                     1.4 
## 2  2016          11.8                   1   
## 3  2017          13.4                   2.6 
## 4  2018          19                     2.5 
## 5  2019           7.14                  1.71
Notes
I did not use the tapply here, rather I grouped and calculated mean values.


6. Finally, please knit the Markdown to an html file. Turn in both the Markdown file the knitted results document.

Lab3_MRoybal.html