Impact of severe weather events in USA 1950-2011


Synopsis

Goal of this data analysis is to analyse which severe weather events across USA (1950-2011) cause the most damages to population health and economy. Data analysed is from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. For the analysis author cleaned the data and calculated sum of deaths, injuries, economic (property and cop) damage for each weather event. As a last step author created plots to visualize the results. Auhtor found that from severe weather events 1950-2011 most population health damages in terms of injuries and fatalities is caused by tornado. Most damages to economy is caused by hurricane/typhoon.


Data Processing

For data processing the following is done by script:

-download and unpack data

-read data into r

-event type variable (evtype) is cleaned

-crop and property exponent variables are recoded

-sum of health and economic damages are calculated for each event

NB! Due to data is not very clean (a lot of misspellings, typos, misclassifications, mispositioning etc.) cleaning data totally is too time consuming and sometimes impossible. This means that final results should be interpret with some caution (in relative scale results should be quite exact, but same doesn’t apply to absolute scale). For more information about the data read Storm Data Document and National Climatic Data Center Storm Events FAQ


Downloading and reading in data that is needed:

message=FALSE
#Download the file and unzip it
download.file( "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata_data_StormData.csv.bz2" )
data <- read.csv( "repdata_data_StormData.csv.bz2" )[ ,c("EVTYPE", 
                                          "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG","CROPDMGEXP")]

Clean data (reduce number of different types of weather events which are produced due to typos, lower-upper case etc.):

#make all column names lower cases
names(data)<-tolower(names(data))
#make event type (evtype) to lower case
data$evtype<- tolower(data$evtype)
#remove digits
data$evtype<-gsub("\\d","",data$evtype)
#remove points
data$evtype<-gsub("[.]","",data$evtype)
# remove (.)
data$evtype<-gsub("[(.)]","",data$evtype)
#remove / 
data$evtype<-gsub("/"," ",data$evtype)
#replace double space with one space
data$evtype<-gsub("  "," ",data$evtype)
#remove whitspace ahead and behind the text
data$evtype<-gsub("^\\s+|\\s+$","",data$evtype)
#add / in hurricane typhoon, it is needed for dispalying results
data$evtype<-gsub("hurricane typhoon","hurricane/typhoon",data$evtype)

Recode crop and property exponent variables (as not all the values of exponent vairable are not clear, I assume that they are 1):

library("plyr")
data$propdmgexp <- revalue(data$propdmgexp, c("B"="9", "K"="3", "m"="6", "M"="3", 
                                              "h"="2", "H"="2", "-"="1", "+"="1", "?"="1"))
data$cropdmgexp <- revalue(data$cropdmgexp, c("B"="9", "K"="3", "k"="3", "m"="6", "M"="3", "?"="1"))
data$cropdmgexp<-as.numeric(data$cropdmgexp)
data$propdmgexp<-as.numeric(data$propdmgexp)

Sum of health and economic damages are calculated for each event:

# calculate total property and crop damage
data$propdmg_total<-(data$propdmg *10^data$propdmgexp)
data$cropdmg_total<-(data$cropdmg *10^data$cropdmgexp)
#calculate total economic damage
data$dmg_total<-data$cropdmg_total+data$propdmg_total
#calculate sum of fatalities, injuries and economic damages
library(plyr)
data_summary <- ddply(data, c("evtype"), summarise,
               fatalities   = sum(fatalities),
               injuries=sum(injuries),
               dmg_total=sum(dmg_total))
#make subset for each variable and order them decreasing
injuries <- data_summary[order(-data_summary$injuries), c("evtype", "injuries")] [1:5,]
fatalities <- data_summary[order(-data_summary$fatalities), c("evtype", "fatalities")] [1:5,]
dmg_total <- data_summary[order(-data_summary$dmg_total), c("evtype", "dmg_total")] [1:5,]


Results

Following graphs show which of the severe weather events cause most damage to population health.

#calculate variables to be shown in text after graph
options("scipen" = 10)
one_in<-injuries$evtype[1]
two_in<-injuries$evtype[2]
one_fa<-fatalities$evtype[1]
two_fa<-fatalities$evtype[2]
one_in_nr<-format(injuries$injuries[1], nsmall = 0)
two_in_nr<-format(injuries$injuries[2], nsmall = 0)
one_fa_nr<-format(fatalities$fatalities[1], nsmall = 0)
two_fa_nr<-format(fatalities$fatalities[2], nsmall = 0)
par(mfrow = c(1, 2), oma = c(0, 0, 2, 0))
par(mar = c(10, 4, 3, 2))
barplot(injuries$injuries, names.arg =injuries$evtype, 
        main="Injuries",ylab="Number of injuries", 
        cex.axis = 0.8, cex.names = 0.9,las=2, cex.lab=0.8,ylim=c(0, 90000),col="blue")
barplot(fatalities$fatalities, names.arg =fatalities$evtype, 
        main="Fatalities", ylab="Number of fatalities", 
        cex.axis = 0.8, cex.names = 0.9,las=2,cex.lab=0.8, ylim=c(0, 90000),col="blue")
title("Population health damages caused in USA by \nextreme weather events 1950-2011", 
      outer = TRUE)

plot of chunk unnamed-chunk-6

From severe weather events 1950-2011 most population health damage in terms of injuries are caused by tornado (91346 injuries) and tstm wind (6957 injuries). In terms of fatalities most damage is caused by tornado (5633 fatalities) and excessive heat (1903 fatalities). To assess which event is second most damaging to health we need further analysis, because we don’t have information how to compare injuries and fatalities (for example do injuries have to be weighed to be able to compare them with fatalities).

Following graph show which of the severe weather events cause most damage to economy (property and crop combined).

#calculate variables to be shown in text after graph
options("scipen" = 10)
dmg1<- dmg_total$evtype[1]
dmg2<- dmg_total$evtype[2]
dmg_one<-format(dmg_total$dmg_total[1]/1000000000, nsmall = 0)
dmg_two<-format(dmg_total$dmg_total[2]/1000000000, nsmall = 0)
par(mar = c(10, 4, 3, 10))
barplot(dmg_total$dmg_total/1000000000, names.arg =dmg_total$evtype, 
        ylab="Damage (billion $)", main="Economic damages caused in USA by \nextreme weather events 1950-2011",
        cex.axis = 0.8, cex.names = 1,las=2, cex.lab=0.8,col="blue")

plot of chunk unnamed-chunk-8

From severe weather events 1950-2011 most economic damage is caused by flood (12508 billion of dollars) and hurricane/typhoon (6555 billion of dollars).