2024-04-03

Introduction

  • I currently have the flu

  • I’m in a lot of pain, and am struggling with things like making food and classwork

  • Because of this, I want to know more about the flu

  • Even though its morbid, I want to do a project on the Flu, and how many people die a year to it

  • I know that’s probably unwise to do while sick with the Flu, but I’m committed

Data

  • Data is from the CDC

  • Each entry has a year, a state, a death count, and a rate

  • Data ranges from 2014 and 2021, with a value from every state every year

  • I have a few questions about this data

  1. How have flu death totals changed over between 2014 and 2021

  2. Make a Choropleth map to show which state has the most average deaths per person

  • With both questions asked, let’s get to it

Data Manipulation

Flu_Data <- read.csv("flu-table.csv")
State_Data <- read.csv("U.S. states population data.csv")
  • Now, I need to fix up the data, to make it more usable
Flu_Data$state_name <- state.name[match(Flu_Data$STATE, state.abb)]
Flu_Data$DEATHS <- as.numeric(gsub(",", "", Flu_Data$DEATHS))
Flu_Data$DEATHS <- as.numeric(Flu_Data$DEATHS) 
Flu_Data <- Flu_Data[Flu_Data$YEAR != 2005, ]
State_Data$population <- State_Data$POPULATION..2023.
State_Data <- State_Data[order(State_Data$STATE), ]
State_Data$population <- as.numeric(gsub(",", "", State_Data$population))
State_Data$population <- as.numeric(State_Data$population) 
  • With that, let’s start solving some of these problems

Problem 1

  1. How have flu death totals changed over between 2014 and 2021
  • To start, let’s aggregate this data

  • I need the sum of each year in its own data frame

aggregate_data <- aggregate(Flu_Data$DEATHS, list(Flu_Data$YEAR), FUN = sum)
aggregate_data <- aggregate_data[!is.na(aggregate_data$x), ]
  • From there, we’re already in a position to create a graph of each year

The Graph

barplot(aggregate_data$x, 
        names.arg = aggregate_data$Group.1, 
        col = c("lightcoral", "lightsalmon", "lightgoldenrod", "lightgreen",             "lightcyan", "lightblue", "lightpink", "plum2"),
        main = "Bar Plot Example")

Analysis

  • It a bit stagnant, but it’s going down

  • We can esepcially see that 2021 is pretty solid

  • This gives me hope I wont terribly perish from the flu

Problem 2

  1. Make a Choropleth map to show which state has the most average deaths per person

Importing the data

  • To start, lets look get the average data per state

  • I’ll also do some formatting to make our lives easier later

aggregate_data <- aggregate(Flu_Data$DEATHS[!is.na(Flu_Data$DEATHS)], list(Flu_Data$state_name[!is.na(Flu_Data$DEATHS)]), FUN = mean)
names(aggregate_data)[1] <- "State"
names(aggregate_data)[2] <- "Cases"
aggregate_data$State <- lapply(aggregate_data$State, tolower)

Accounting for Population

  • Next, we need to take population size into account

  • We can use the State_Data frame to get population size, and then devide.

aggregate_data$CasesPerPerson <- aggregate_data$Cases / State_Data$population
  • Now, we can use the states feature, to get the location of each state

  • This is requred to make the map

states_map <- map_data("state")

Making the map

  • With that, we have all the information for the map
flu_combined <- arrange(merge(states_map, aggregate_data, by.x = "region", by.y = "State"), group, order)
ggplot(flu_combined, aes(x = long, y = lat, group = group, fill = CasesPerPerson)) + geom_polygon(colour = "black") + coord_map("polyconic") + scale_fill_gradient2(low = "lightblue", high = "darkblue", midpoint = median(flu_combined$CasesPerPerson))

Showing off the code

  • The code goes off the screen, so here’s a better look
flu_combined <- arrange(merge(states_map, aggregate_data, 
                              by.x = "region", by.y = "State"), 
                              group, order)
ggplot(flu_combined, aes(x = long, y = lat, group = group, 
                         fill =CasesPerPerson)) + 
                         geom_polygon(colour = "black") + 
                         coord_map("polyconic") + 
                         scale_fill_gradient2(low = "lightblue", 
                                              high = "darkblue", 
                         midpoint = median(flu_combined$CasesPerPerson))
  • Pretty elegant, huh

Anaylsis

  • As we can see, West Virginia and Mississippi have the highest mortality rate

  • I’d imagine some form of age demographics come into play

  • On the other hand, Arizona is doing pretty well for itself

  • More reason to think I won’t perish and turn into dust

Thank You For Watching / Reading

  • I know this is slightly scuffed but again, I’m really sick

  • I hope you enjoyed