Measuring impact of Storms and other severe weather events on Public Health and Economy in the United States

Synopsis

In this project we are trying to answer 2 main questions - 1. Which types of events are most harmful to population health? 2. Which types of events have the greatest economic consequences? We are using an extract from the National Oceanic and Atmospheric Administration (NOAA) Storm Database will be used to answer these questions. After the analysis we have found that Tornados have caused the highest number of both fatalities and injuries, and that Floods has caused the highest value of property/crop damage.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded from the following link: Storm Data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Questions that need to be answered

  1. Which types of events are most harmful to population health?
  2. Which types of events have the greatest economic consequences?

Library used in the analysis

        library(ggplot2)
        library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
        library(RColorBrewer)

Data Processing into R

The following code reads the data into R, selects the required column and then converts the character value of Property Damage exponent into numeric values according to corresponding magnitude

        #The data is loaded into R using the read.csv function. 
        stormdata<- read.csv("repdata_data_StormData.csv.bz2")
        storm_selected_data<- stormdata[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
        
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "K"] <-   1000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "M"] <- 1000000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "" ] <- 1
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "B"] <- 1000000000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "m"] <- 1000000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "+"] <- 0
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "0"] <- 1
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "5"] <- 100000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "6"] <- 1000000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "?"] <- 0
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "4"] <- 10000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "2"] <- 100
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "3"] <- 1000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "h"] <- 100
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "7"] <- 10000000
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "H"] <- 100
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "-"] <- 0
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "1"] <- 10
    storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "8"] <- 100000000


    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP ==  ""] <- 1
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "M"] <- 1000000
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "K"] <- 1000
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "m"] <- 1000000000
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "B"] <- 1000000
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "?"] <- 0
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "0"] <- 1
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "k"] <- 1000
    storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "2"] <- 100

Analysis of 1st Question - 1. Which types of events are most harmful to population health?

The following code finds the number of fatalities and injuries according to each EVTYPE in the selected data and arranges them in decreasing order selecting the top 5.

    fatal <- aggregate(FATALITIES ~ EVTYPE, data = storm_selected_data, FUN = sum)
        injury <- aggregate(INJURIES ~ EVTYPE, data = storm_selected_data, FUN = sum)
        
        fatal10 <- fatal[order(-fatal$FATALITIES),][1:5, ]
        injury10 <- injury[order(-injury$INJURIES),][1:5, ]

Result - Plot for 1st Question

The following code draws 2 barplots corresponding to 1st question depicting the most events with most fatalities and injuries

    par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8)
    barplot(fatal10$FATALITIES, names.arg=fatal10$EVTYPE, ylim= c(0,8000),
            col=brewer.pal(5, "Purples"),ylab="Number of Fatalities", 
            main=" Top 10 Events with Highest Fatalities")
    barplot(injury10$INJURIES, names.arg=injury10$EVTYPE,ylim= c(0,90000), 
            col=brewer.pal(5, "YlGn"), ylab="Number of Injuries", 
            main=" Top 10 Events with Highest Injuries")

Analysis of 2nd Question - 2. Which types of events have the greatest economic consequences?

The following code finds the total property/crop damage according to each EVTYPE in the selected data and arranges them in decreasing order selecting the top 10.

    storm_selected_data$PROPDMGVAL <- storm_selected_data$PROPDMG * storm_selected_data$PROPEXP
    storm_selected_data$CROPDMGVAL <- storm_selected_data$CROPDMG * storm_selected_data$CROPEXP

    storm_selected_data$ALLDMGVAL <- storm_selected_data$PROPDMGVAL + storm_selected_data$CROPDMGVAL
    
    propcropdmg <- aggregate(ALLDMGVAL ~ EVTYPE, data = storm_selected_data, FUN = sum)
    propcropdmg10<-propcropdmg[order(-propcropdmg$ALLDMGVAL), ][1:10,]

Result - Plot for 2nd Question

The following code draws a barplot corresponding to 2nd question depicting the most events with most property/crop damage

    par(mfrow = c(1, 1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8, cex.main = 0.9)
    
    barplot((propcropdmg10$ALLDMGVAL)/(1*1000000000), 
            names.arg=propcropdmg10$EVTYPE, 
            col=brewer.pal(10, "Set3"), 
            ylab=" Cost of Property Damage($ billions)", 
            main="Top 10 Events Causing Highest Property/Crop Damage Value")

Conclusion

After the analysis we have found that Tornados have caused the highest number of both fatalities and injuries, and that Floods has caused the highest value of property/crop damage.