Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The data for this assignment come from the National Weather Service Storm Data. The events in the database start in the year 1950 and end in November 2011.

Data Processing

Loading the necessary Library from the R packages for the analisys

library(dplyr)
library(ggplot2)
library(knitr)
library(reshape2)

Loading the CSV file and getting only neccessary variables for analysis. Also we will keep subset of data where there is either a fatality or injury.

stormData <- read.csv("repdata-data-StormData.csv.bz2")
damage <- subset(stormData,FATALITIES > 0 | INJURIES > 0, select = c(BGN_DATE,STATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,CROPDMG,CROPDMGEXP))

It is necessary to note that the data categories are not in a perfect shape - there are cases of misspeling and duplication of categories. In further research, it might be interesting to try cleaning the data in more detail. However due to the time constraint, we tried to perform only the basic data processing steps - converting all characters to upper case and trimming the categories from leading and trailing spaces

damage$EVTYPE <- toupper(damage$EVTYPE)
damage$EVTYPE <- trimws(damage$EVTYPE)

The public health damage data need to be summarized so as to show the number of injuries and fatalities by event name. Finally, top 10 events resulting in injuries and top 10 events resulting in fatalities are selected.

# sums of injuries and fatalities
sumhealth <- damage %>% group_by(EVTYPE) %>% summarise(fatalities = sum(FATALITIES),injuries = sum(INJURIES))

## Select ten most harmful events
topfatalities <- head(sumhealth[order(sumhealth$fatalities, decreasing = T),], n = 10)[, c(1, 2)]
topinjuries <- head(sumhealth[order(sumhealth$injuries, decreasing = T), ], n = 10)[, c(1, 3)]

## Prepare data for plotting
topfatalitiesPlot <- melt(topfatalities)
topinjuriesPlot <- melt(topinjuries)

The economic damage data is present in the form of a base and a multiplier. Hence, we multiply the base numbers by multipliers. we add a new colum for equivalent Prop_Mult to hold the multiplier

# property damage mulitiplier
damage$Prop_mult <- 0
damage$Prop_mult[damage$PROPDMGEXP == ""] <- 1
damage$Prop_mult[grep("[-+?]", damage$PROPDMGEXP)] <- 1
damage$Prop_mult[grep("[Hh]", damage$PROPDMGEXP)] <- 100
damage$Prop_mult[grep("[Kk]", damage$PROPDMGEXP)] <- 1000
damage$Prop_mult[grep("[Mm]", damage$PROPDMGEXP)] <- 1e+06
damage$Prop_mult[grep("[Bb]", damage$PROPDMGEXP)] <- 1e+09
damage$Prop_mult <- as.numeric(damage$Prop_mult)
damage$PROPDMG <- damage$Prop_mult * damage$PROPDMG

# Crop damage multiplier: prepare and use to multiply the damage
damage$Crop_mult <- 0
damage$Crop_mult[damage$CROPDMGEXP == ""] <- 1
damage$Crop_mult[grep("[-+?]", damage$CROPDMGEXP)] <- 1
damage$Crop_mult[grep("[Hh]", damage$CROPDMGEXP)] <- 100
damage$Crop_mult[grep("[Kk]", damage$CROPDMGEXP)] <- 1000
damage$Crop_mult[grep("[Mm]", damage$CROPDMGEXP)] <- 1e+06
damage$Crop_mult[grep("[Bb]", damage$CROPDMGEXP)] <- 1e+09
damage$Crop_mult <- as.numeric(damage$Crop_mult)
damage$CROPDMG <- damage$Crop_mult * damage$CROPDMG

Similar to the health data processing, the economic damage figures are first summarized according to the type of event. Subsequently, top 10 events with the highest economic impact (defined as damage to crops plus damage to property) were selected.

# Make sums of injuries and fatalities
sumecon <- damage %>% group_by(EVTYPE) %>% summarise(cropdmg = sum(CROPDMG), propdmg =sum(PROPDMG))
sumecon$totaldamage <- sumecon$cropdmg + sumecon$propdmg

## Select top 10
topecon <- head(sumecon[order(sumecon$totaldamage, decreasing = T), ], n = 10)

## Prepare data for plotting
topeconPlot <- topecon %>% select(-totaldamage) %>% melt

Results

Question 1: Public health

The following table and chart present the 10 most damaging events from the perspective of fatalities.

topfatalities
## Source: local data frame [10 x 2]
## 
##            EVTYPE fatalities
##             (chr)      (dbl)
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224
## Make the barchart
ggplot(topfatalitiesPlot,aes(x = factor(topfatalitiesPlot$EVTYPE), y = topfatalitiesPlot$value, fill = variable)) + 
    geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = -270), 
    plot.title = element_text(face = "bold")) + labs(x = "Weather event", y = "Number of injuries") + 
    theme(legend.position = "none") + ggtitle("Fatalities")

The event with the highest fatalities is Tornado followed by Excessive Heat from 1950 to 2011

The following table and chart present the 10 most damaging events from the perspective of injuries.

topinjuries
## Source: local data frame [10 x 2]
## 
##               EVTYPE injuries
##                (chr)    (dbl)
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
## Make the barchart
ggplot(topinjuriesPlot, aes(x = factor(topinjuriesPlot$EVTYPE), y = topinjuriesPlot$value, fill = variable)) + 
    geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = -270), 
    plot.title = element_text(face = "bold")) + labs(x = "Weather event", y = "Number of injuries") + 
    theme(legend.position = "none") + ggtitle("Injuries")

The event with the highest injuries is Tornado from 1950 to 2011. Tornado seems to cause more damage in terms of public health according to the analysis.

Question 2: Economic damage The following table and chart present the 10 most damaging events from the perspective of economic damage.

topecon
## Source: local data frame [10 x 4]
## 
##               EVTYPE    cropdmg     propdmg totaldamage
##                (chr)      (dbl)       (dbl)       (dbl)
## 1            TORNADO  141477000 41885103080 42026580080
## 2  HURRICANE/TYPHOON 2273120800 32747770000 35020890800
## 3              FLOOD  127602500  6632851640  6760454140
## 4     TROPICAL STORM  157265000  6560156000  6717421000
## 5       WINTER STORM   10283000  5221162500  5231445500
## 6          HURRICANE 1406720000  2812660000  4219380000
## 7   STORM SURGE/TIDE          0  4004000000  4004000000
## 8        FLASH FLOOD   47966750  3901610800  3949577550
## 9               HAIL  166300000  3505785700  3672085700
## 10          WILDFIRE  182087000  3484359200  3666446200
## Make the barchart
ggplot(topeconPlot, aes(x = factor(topeconPlot$EVTYPE), y = topeconPlot$value, fill = variable)) + 
    geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = -270), 
    plot.title = element_text(face = "bold")) + labs(x = "Weather event", y = "Economic damage") + 
    scale_fill_discrete(name = "Type of damage", labels = c("Crop", "Property")) + 
    theme(legend.position = "top") + ggtitle("Economic impact")

The event with the highest economic impact from the analysis of data from 1950 to 2011 is tornado.