In this report, I summarize the storm types that cause the most damage to both public health and the economy in the United States between the years of 1950 and 2011. Tornados were the most harmful with over 91,000 injuries and 5,600 deaths. Excessive heat is in second place with a mere 6,500 injuries and 1,900 deaths. As far as property and crop damage, floods are the most costly with $144 billion dollars in property damage and $5.6 billion dollars in crop damage. Flood is followed by hurricanes and typhoons, and tornados.

Data Processing

Here we will read in the NOAA dataset from the internet and perform basic data cleaning and processing steps to answer a few questions about storm damage in the United States. I first made sure that all the event types (EVTYPE) were capitalized to combine certain event types that were listed twice (once capitalized and once lower case)

In order to find what storm types had the largest effect on public health, I wanted to look at the number of fatalities and the number of injuries for each storm type. To do this, I first grouped the data by event type and found the sum for each event. I then totaled fatalities and injuries.

# what events have the largest effect on public health?
hazards <- storm_data %>%
    dplyr::select(EVTYPE, FATALITIES, INJURIES) %>%
    dplyr::group_by(EVTYPE) %>%
    dplyr::summarise(total_fatalities = sum(FATALITIES),
                     total_injuries = sum(INJURIES)) %>%
    dplyr::mutate(total = total_fatalities + total_injuries) %>%
    dplyr::arrange(desc(total))
## Warning: package 'bindrcpp' was built under R version 3.3.2
# set evtype as a factor
hazards$EVTYPE <- factor(hazards$EVTYPE, levels = unique(hazards$EVTYPE))

In order to assess what storm types had the largest effect on the US economy, I wanted to explore the amount of damage to property and to crops. To do this, I first had to structure the data so that the property and crop damages were both being reported in number of dollars. The original data was in terms of hundreds, thousands, millions, or billions, etc. (e.g. 1.5M and 1.5K). I then found the total damages summed for crop and property for each event type similar to above with injuries and fatalities.

#make sure all PROPDMGEXP and CROPDMGEXP are capitalized
storm_data$PROPDMGEXP <- toupper(storm_data$PROPDMGEXP)
storm_data$CROPDMGEXP <- toupper(storm_data$CROPDMGEXP)

#what events have the greatest economic consequence?
economy <- storm_data %>%
    dplyr::select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
    dplyr::mutate(CROPDMGEXP = ifelse(CROPDMGEXP == "M", 6, 
                                      ifelse(CROPDMGEXP == "K", 3,
                                             ifelse(CROPDMGEXP == "B", 9,
                                                    ifelse(CROPDMGEXP == "?", NA, CROPDMGEXP))))) %>%
    dplyr::mutate(PROPDMGEXP = ifelse(PROPDMGEXP == "M", 6, 
                                      ifelse(PROPDMGEXP == "K", 3,
                                             ifelse(PROPDMGEXP == "B", 9,
                                                    ifelse(PROPDMGEXP == "H", 2,
                                                           ifelse(PROPDMGEXP %in% c("+", "?", "-"), NA, PROPDMGEXP)))))) %>%
    dplyr::mutate(crop_damage = as.numeric(CROPDMG) * 10^(as.numeric(CROPDMGEXP)),
                  property_damage = as.numeric(PROPDMG) * 10^(as.numeric(PROPDMGEXP))) %>%
    dplyr::group_by(EVTYPE) %>%
    dplyr::summarise(total_prop = sum(property_damage, na.rm = T) / 1000000000,
                     total_crop = sum(crop_damage, na.rm = T)/ 1000000000) %>%
    dplyr::mutate(total = (total_prop + total_crop) / 1000000000) %>%
    dplyr::arrange(desc(total))

#factor evtype
economy$EVTYPE <- factor(economy$EVTYPE, levels = economy$EVTYPE)

Results

Here we can visualize the top 5 storm event types that pose a danger to human public health in the United States. Human public health is defined by the total number of injuries and fatalities caused by an event. Tornados are by far the most devastating event followed by excessive heat.

#plot the top 10 harmful events to public health as a stacked bar chart (fatalities + injuries)
ggplot(data = hazards[1:5,]) +
    geom_bar(stat = "identity", aes(x = EVTYPE, y = total), fill = "navyblue") +
    labs(x = "Event Type", y = "Total Injuries + Fatalities") +
    theme_bw()

Here we can visualize the 5 storm types that cause the most damage in terms of US dollars spent. This is calculated from both reported crop damage and property damage. Flood is the most harmful at $150 billion dollars followed by hurricanes/typhoons and tornados.

#plot the top 5 events based on economic damage (crop + property damage)
ggplot(data = economy[1:5,]) +
    geom_bar(stat = "identity", aes(x = EVTYPE, y = total), fill = "navyblue") +
    labs(x = "Event Type", y = "Total Amount of Damages (Billions USD)") +
    theme_bw()