“Major effects of storm events in USA between 1950 - 2011”

Synopsis

In this report we aim to describe the major effects of different strom events in USA between 1950 - 2011. The report checks top ten effects on population health and the economy. On population health we focus on the total number of injuries and/or fatalities caused by the event of interest. On economy we focus on total damage cost in terms of cash. We observed that Tornado and Heat events had the greatest effect on population health. Floods and Typhoons had the major economic consequences. We also observed that the event with high number of injuries had also higher number of fatalities. The events differed highly on property and crop damage, the ones with high property damage had lower crop damage and vice versa.

Loading and Processing the Raw Data

We obtained the Storm data for years 1950 - 2011.

Reading in the data

We first create the neccessary folders to hold the data and then we downloaded the data. The data is in a CSV file and we load it as Storm Data.

library(data.table)
library(ggplot2)
library(reshape2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:data.table':
## 
##     hour, mday, month, quarter, wday, week, yday, year
#creatind the data folder if it doesnt exist
if (!file.exists("data")) {
  dir.create("data")
}

#downloadTheData
if (!file.exists("data/stormData.csv")){
  fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileUrl, destfile = "data/stormData.csv")
 }else {message("Data Already downloaded")}
## Data Already downloaded
#load the data to R
stormData <- read.table("data/stormData.csv", header = T , sep = ",")
dimensions <- dim(stormData)

After creating the neccesary directories and loading the data with 902297 and 37 variables, we check the events with most occurences.

#convert all characters to small letter
stormData$EVTYPE <- tolower(stormData$EVTYPE)
stormData$ndate <- as.Date(stormData$BGN_DATE, "%m/%d/%Y")
table(year(stormData$ndate))
## 
##  1950  1951  1952  1953  1954  1955  1956  1957  1958  1959  1960  1961 
##   223   269   272   492   609  1413  1703  2184  2213  1813  1945  2246 
##  1962  1963  1964  1965  1966  1967  1968  1969  1970  1971  1972  1973 
##  2389  1968  2348  2855  2388  2688  3312  2926  3215  3471  2168  4463 
##  1974  1975  1976  1977  1978  1979  1980  1981  1982  1983  1984  1985 
##  5386  4975  3768  3728  3657  4279  6146  4517  7132  8322  7335  7979 
##  1986  1987  1988  1989  1990  1991  1992  1993  1994  1995  1996  1997 
##  8726  7367  7257 10410 10946 12522 13534 12607 20631 27970 32270 28680 
##  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009 
## 38128 31289 34471 34962 36293 39752 39363 39184 44034 43289 55663 45817 
##  2010  2011 
## 48161 62174
#Across the United States, which types of events are most harmful with respect to population health?
#the major variables of interest are FATALITIES and Injuries
#storm event types that cause the most economic and health impacts
evtype <- data.frame(table(stormData$EVTYPE))
evtype <- evtype[order(-evtype$Freq),] 
names(evtype) <- c("EVTYPE" , "FREQUENCY")
EVTYPE FREQUENCY
212 hail 288661
779 tstm wind 219942
685 thunderstorm wind 82564
758 tornado 60652
138 flash flood 54277
154 flood 25327
711 thunderstorm winds 20843
320 high wind 20214
418 lightning 15754
274 heavy snow 15708

Results

1: Across the United States, which types of events are most harmful with respect to population health?

To answer the above question we focus on top 10 events with highest number of fatalities and injuries. We also combine all the floods, winds, hails and heats into one event of interest.

#categorise the major events
harm <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data=stormData, sum)
harm$grp_EVTYPE <- harm$EVTYPE
harm <- harm[order(-harm$FATALITIES , -harm$INJURIES),] 
harm$grp_EVTYPE[grepl("flood"  , harm$EVTYPE)] <- "Flood"
harm$grp_EVTYPE[grepl("tornado"  , harm$EVTYPE)] <- "Tornado"
harm$grp_EVTYPE[grepl("wind"  , harm$EVTYPE)] <- "Wind"
harm$grp_EVTYPE[grepl("hail"  , harm$EVTYPE)] <- "Hail"
harm$grp_EVTYPE[grepl("heat"  , harm$EVTYPE)] <- "Heat"

#the dataset with combined events
harmFinal <- aggregate(cbind(FATALITIES, INJURIES) ~ grp_EVTYPE, data=harm, sum)
harmFinal <- harmFinal[order(-harmFinal$FATALITIES , -harmFinal$INJURIES),] 
harmFinal$sumHazards <- harmFinal$FATALITIES + harmFinal$INJURIES 
names(harmFinal) <- c("EVENT_TYPE" ,  "FATALITIES" ,"INJURIES" , "TOTAL_HAZARDS")

The table below shows the summary of the top 10 events

EVENT_TYPE FATALITIES INJURIES TOTAL_HAZARDS
452 Tornado 5636 91407 97043
131 Heat 3138 9224 12362
97 Flood 1525 8604 10129
524 Wind 1421 11402 12823
235 lightning 816 5230 6046
315 rip current 368 232 600
14 avalanche 224 170 394
526 winter storm 206 1321 1527
316 rip currents 204 297 501
90 extreme cold 162 231 393

To summarize the major 10 events we did a plot to show the population health effects. The y axis has the total hazard values represented in log to base 10, and a text on top of each bar to show the actual count of hazards.

#plot by hazards
harmFinal_Sub <- head(harmFinal, n=10)
data.m <- melt(harmFinal_Sub, id.vars='EVENT_TYPE')
data.m <- data.m[data.m$variable!="TOTAL_HAZARDS",] 
g <-ggplot(data.m, aes(grp_EVTYPE ,log10(value))) + 
  geom_bar(aes(x=EVENT_TYPE ,y=log10(value) , fill=variable),  stat="identity",position="dodge") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ylab("Log10 Hazards") + xlab("Event Type") +
  geom_text(data = data.m, aes(x=EVENT_TYPE ,fill=variable, label=value, vjust=-1) ,size=3 , position= position_dodge(width=1, height=0.1)) + ylim(0, 6)

print(g)
## ymax not defined: adjusting position using y instead

plot of chunk unnamed-chunk-4

We also did a summary of the top ten popupalation health effects by state shown in the table below:

#by State
harmState <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE + STATE, data=stormData, sum)
harmState$grp_EVTYPE[grepl("flood"  , harmState$EVTYPE)] <- "Flood"
harmState$grp_EVTYPE[grepl("tornado"  , harmState$EVTYPE)] <- "Tornado"
harmState$grp_EVTYPE[grepl("wind"  , harmState$EVTYPE)] <- "Wind"
harmState$grp_EVTYPE[grepl("hail"  , harmState$EVTYPE)] <- "Hail"
harmState$grp_EVTYPE[grepl("heat"  , harmState$EVTYPE)] <- "Heat"
harmState <- harmState[order(-harmState$FATALITIES , -harmState$INJURIES),] 
names(harmState) <- c("EVTYPE" , "STATE" , "FATALITIES" ,"INJURIES" , "EVENT_TYPE")
STATE FATALITIES INJURIES EVENT_TYPE
1126 IL 653 241 Heat
152 AL 617 7929 Tornado
3575 TX 538 8207 Tornado
1991 MS 450 6244 Tornado
1928 MO 388 4330 Tornado
214 AR 379 5116 Tornado
3459 TN 368 4748 Tornado
2992 PA 359 320 Heat
1103 IL 330 352 Heat
2900 OK 296 4829 Tornado

2: Across the United States, which types of events have the greatest economic consequences?

To show the greatest economic consequences we did a calculation on total economic effect per event. The calculation is a multiplication of crop/property damage and the exponential. We show the top 10 events with the highest combined consequenses. ie property and crop damage costs.

stormData$multiplier[stormData$PROPDMGEXP=="K"] <- 10^3
stormData$multiplier[stormData$PROPDMGEXP=="M"] <- 10^6
stormData$multiplier[stormData$PROPDMGEXP=="B"] <- 10^9

stormData$multiplier2[stormData$CROPDMGEXP=="K"] <- 10^3
stormData$multiplier2[stormData$CROPDMGEXP=="M"] <- 10^6
stormData$multiplier2[stormData$CROPDMGEXP=="B"] <- 10^9

stormData$multiplier[stormData$PROPDMGEXP==""] <- 0

# multiply the values to get total dollars
stormData$PROPDMG_new <- stormData$PROPDMG * stormData$multiplier
stormData$CROPDMG_new <- stormData$CROPDMG * stormData$multiplier2

#categorise the major events
economicHarm <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ EVTYPE, data=stormData, sum)

economicHarm$grp_EVTYPE <- economicHarm$EVTYPE
economicHarm <- economicHarm[order(-economicHarm$PROPDMG_new , -economicHarm$CROPDMG_new),] 
economicHarm$grp_EVTYPE[grepl("flood"  , economicHarm$EVTYPE)] <- "Flood"
economicHarm$grp_EVTYPE[grepl("tornado"  , economicHarm$EVTYPE)] <- "Tornado"
economicHarm$grp_EVTYPE[grepl("wind"  , economicHarm$EVTYPE)] <- "Wind"
economicHarm$grp_EVTYPE[grepl("hail"  , economicHarm$EVTYPE)] <- "Hail"
economicHarm$grp_EVTYPE[grepl("heat"  , economicHarm$EVTYPE)] <- "Heat"

economicHarmFinal <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ grp_EVTYPE, data=economicHarm, sum)
economicHarmFinal <- economicHarmFinal[order(-economicHarm$CROPDMG_new , -economicHarm$PROPDMG_new),] 

economicHarmFinal$COMBO <- economicHarmFinal$PROPDMG_new + economicHarmFinal$CROPDMG_new
economicHarmCombo <- economicHarmFinal[order(-economicHarmFinal$COMBO),]
names(economicHarmCombo) <- c( "EVENT_TYPE" , "PROPERTY_DAMAGE" , "CROP_DAMAGE" ,"TOTAL_DAMAGE" )
economicHarmCombo <- head(economicHarmCombo, n=10)

The table below shows the top ten events with the highest combined economic consequenses.

EVENT_TYPE PROPERTY_DAMAGE CROP_DAMAGE TOTAL_DAMAGE
18 Flood 1.460e+11 1.227e+10 1.583e+11
41 hurricane/typhoon 2.674e+10 2.608e+09 2.935e+10
60 Tornado 1.617e+10 3.650e+08 1.653e+10
11 drought 2.337e+08 1.397e+10 1.421e+10
28 Hail 9.597e+09 3.089e+09 1.269e+10
37 hurricane 9.716e+09 2.742e+09 1.246e+10
79 Wind 7.267e+09 2.092e+09 9.359e+09
43 ice storm 9.030e+08 5.022e+09 5.925e+09
57 storm surge/tide 4.641e+09 8.500e+05 4.641e+09
77 wildfire 3.498e+09 2.955e+08 3.794e+09

The graph below shows a comparison of the top ten events economic consequences.The y axis are the total economic effect expressed in log to base 10. Flood is observed to have the highest economic effect.

names(economicHarmCombo) <- c( "EVENT_TYPE" , "PROPERTY_DAMAGE($)" , "CROP_DAMAGE($)" ,"COMBO" )
g <-ggplot(economicHarmCombo, aes(log10(COMBO))) + 
  geom_bar(aes(x=EVENT_TYPE, y=log10(COMBO) ),  stat="identity", fill="blue",position="dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ylab("Log_10 Combined Economic") 
print(g)

plot of chunk unnamed-chunk-7

The states with that were affected majorly are shown on the table below

#by State
economicHarmState <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ EVTYPE + STATE, data=stormData, sum)
economicHarmState$grp_EVTYPE <- economicHarmState$EVTYPE

economicHarmState$grp_EVTYPE[grepl("flood"  , economicHarmState$grp_EVTYPE)] <- "Flood"
economicHarmState$grp_EVTYPE[grepl("tornado"  , economicHarmState$grp_EVTYPE)] <- "Tornado"
economicHarmState$grp_EVTYPE[grepl("wind"  , economicHarmState$grp_EVTYPE)] <- "Wind"
economicHarmState$grp_EVTYPE[grepl("hail"  , economicHarmState$grp_EVTYPE)] <- "Hail"
economicHarmState$grp_EVTYPE[grepl("heat"  , economicHarmState$grp_EVTYPE)] <- "Heat"
harmStateFinal <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ grp_EVTYPE + STATE, data=economicHarmState, sum)
harmStateFinal$COMBO <- harmStateFinal$PROPDMG_new + harmStateFinal$CROPDMG_new

harmStateFinal <- harmStateFinal[order(-harmStateFinal$COMBO),] 
names(harmStateFinal) <- c( "EVENT_TYPE" ,"STATE", "PROPERTY_DAMAGE" , "CROP_DAMAGE" ,"TOTAL_DAMAGE" )
EVENT_TYPE STATE PROPERTY_DAMAGE CROP_DAMAGE TOTAL_DAMAGE
126 Flood CA 1.160e+11 8.124e+08 1.168e+11
244 hurricane/typhoon FL 1.717e+10 9.552e+08 1.813e+10
370 Flood IL 5.717e+09 5.070e+09 1.079e+10
639 hurricane/typhoon MS 5.885e+09 1.515e+09 7.400e+09
1038 drought TX 1.988e+08 6.373e+09 6.572e+09
684 hurricane NC 4.147e+09 1.425e+09 5.572e+09
44 Tornado AL 4.983e+09 5.680e+07 5.039e+09
640 ice storm MS 2.595e+06 5.000e+09 5.003e+09
1056 storm surge/tide TX 4.500e+09 0.000e+00 4.500e+09
1017 Flood TN 4.379e+09 5.868e+06 4.385e+09