In this report we aim to describe the major effects of different strom events in USA between 1950 - 2011. The report checks top ten effects on population health and the economy. On population health we focus on the total number of injuries and/or fatalities caused by the event of interest. On economy we focus on total damage cost in terms of cash. We observed that Tornado and Heat events had the greatest effect on population health. Floods and Typhoons had the major economic consequences. We also observed that the event with high number of injuries had also higher number of fatalities. The events differed highly on property and crop damage, the ones with high property damage had lower crop damage and vice versa.
We obtained the Storm data for years 1950 - 2011.
We first create the neccessary folders to hold the data and then we downloaded the data. The data is in a CSV file and we load it as Storm Data.
library(data.table)
library(ggplot2)
library(reshape2)
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:data.table':
##
## hour, mday, month, quarter, wday, week, yday, year
#creatind the data folder if it doesnt exist
if (!file.exists("data")) {
dir.create("data")
}
#downloadTheData
if (!file.exists("data/stormData.csv")){
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "data/stormData.csv")
}else {message("Data Already downloaded")}
## Data Already downloaded
#load the data to R
stormData <- read.table("data/stormData.csv", header = T , sep = ",")
dimensions <- dim(stormData)
After creating the neccesary directories and loading the data with 902297 and 37 variables, we check the events with most occurences.
#convert all characters to small letter
stormData$EVTYPE <- tolower(stormData$EVTYPE)
stormData$ndate <- as.Date(stormData$BGN_DATE, "%m/%d/%Y")
table(year(stormData$ndate))
##
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961
## 223 269 272 492 609 1413 1703 2184 2213 1813 1945 2246
## 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973
## 2389 1968 2348 2855 2388 2688 3312 2926 3215 3471 2168 4463
## 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
## 5386 4975 3768 3728 3657 4279 6146 4517 7132 8322 7335 7979
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
## 8726 7367 7257 10410 10946 12522 13534 12607 20631 27970 32270 28680
## 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 38128 31289 34471 34962 36293 39752 39363 39184 44034 43289 55663 45817
## 2010 2011
## 48161 62174
#Across the United States, which types of events are most harmful with respect to population health?
#the major variables of interest are FATALITIES and Injuries
#storm event types that cause the most economic and health impacts
evtype <- data.frame(table(stormData$EVTYPE))
evtype <- evtype[order(-evtype$Freq),]
names(evtype) <- c("EVTYPE" , "FREQUENCY")
| EVTYPE | FREQUENCY | |
|---|---|---|
| 212 | hail | 288661 |
| 779 | tstm wind | 219942 |
| 685 | thunderstorm wind | 82564 |
| 758 | tornado | 60652 |
| 138 | flash flood | 54277 |
| 154 | flood | 25327 |
| 711 | thunderstorm winds | 20843 |
| 320 | high wind | 20214 |
| 418 | lightning | 15754 |
| 274 | heavy snow | 15708 |
To answer the above question we focus on top 10 events with highest number of fatalities and injuries. We also combine all the floods, winds, hails and heats into one event of interest.
#categorise the major events
harm <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data=stormData, sum)
harm$grp_EVTYPE <- harm$EVTYPE
harm <- harm[order(-harm$FATALITIES , -harm$INJURIES),]
harm$grp_EVTYPE[grepl("flood" , harm$EVTYPE)] <- "Flood"
harm$grp_EVTYPE[grepl("tornado" , harm$EVTYPE)] <- "Tornado"
harm$grp_EVTYPE[grepl("wind" , harm$EVTYPE)] <- "Wind"
harm$grp_EVTYPE[grepl("hail" , harm$EVTYPE)] <- "Hail"
harm$grp_EVTYPE[grepl("heat" , harm$EVTYPE)] <- "Heat"
#the dataset with combined events
harmFinal <- aggregate(cbind(FATALITIES, INJURIES) ~ grp_EVTYPE, data=harm, sum)
harmFinal <- harmFinal[order(-harmFinal$FATALITIES , -harmFinal$INJURIES),]
harmFinal$sumHazards <- harmFinal$FATALITIES + harmFinal$INJURIES
names(harmFinal) <- c("EVENT_TYPE" , "FATALITIES" ,"INJURIES" , "TOTAL_HAZARDS")
The table below shows the summary of the top 10 events
| EVENT_TYPE | FATALITIES | INJURIES | TOTAL_HAZARDS | |
|---|---|---|---|---|
| 452 | Tornado | 5636 | 91407 | 97043 |
| 131 | Heat | 3138 | 9224 | 12362 |
| 97 | Flood | 1525 | 8604 | 10129 |
| 524 | Wind | 1421 | 11402 | 12823 |
| 235 | lightning | 816 | 5230 | 6046 |
| 315 | rip current | 368 | 232 | 600 |
| 14 | avalanche | 224 | 170 | 394 |
| 526 | winter storm | 206 | 1321 | 1527 |
| 316 | rip currents | 204 | 297 | 501 |
| 90 | extreme cold | 162 | 231 | 393 |
To summarize the major 10 events we did a plot to show the population health effects. The y axis has the total hazard values represented in log to base 10, and a text on top of each bar to show the actual count of hazards.
#plot by hazards
harmFinal_Sub <- head(harmFinal, n=10)
data.m <- melt(harmFinal_Sub, id.vars='EVENT_TYPE')
data.m <- data.m[data.m$variable!="TOTAL_HAZARDS",]
g <-ggplot(data.m, aes(grp_EVTYPE ,log10(value))) +
geom_bar(aes(x=EVENT_TYPE ,y=log10(value) , fill=variable), stat="identity",position="dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ylab("Log10 Hazards") + xlab("Event Type") +
geom_text(data = data.m, aes(x=EVENT_TYPE ,fill=variable, label=value, vjust=-1) ,size=3 , position= position_dodge(width=1, height=0.1)) + ylim(0, 6)
print(g)
## ymax not defined: adjusting position using y instead
We also did a summary of the top ten popupalation health effects by state shown in the table below:
#by State
harmState <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE + STATE, data=stormData, sum)
harmState$grp_EVTYPE[grepl("flood" , harmState$EVTYPE)] <- "Flood"
harmState$grp_EVTYPE[grepl("tornado" , harmState$EVTYPE)] <- "Tornado"
harmState$grp_EVTYPE[grepl("wind" , harmState$EVTYPE)] <- "Wind"
harmState$grp_EVTYPE[grepl("hail" , harmState$EVTYPE)] <- "Hail"
harmState$grp_EVTYPE[grepl("heat" , harmState$EVTYPE)] <- "Heat"
harmState <- harmState[order(-harmState$FATALITIES , -harmState$INJURIES),]
names(harmState) <- c("EVTYPE" , "STATE" , "FATALITIES" ,"INJURIES" , "EVENT_TYPE")
| STATE | FATALITIES | INJURIES | EVENT_TYPE | |
|---|---|---|---|---|
| 1126 | IL | 653 | 241 | Heat |
| 152 | AL | 617 | 7929 | Tornado |
| 3575 | TX | 538 | 8207 | Tornado |
| 1991 | MS | 450 | 6244 | Tornado |
| 1928 | MO | 388 | 4330 | Tornado |
| 214 | AR | 379 | 5116 | Tornado |
| 3459 | TN | 368 | 4748 | Tornado |
| 2992 | PA | 359 | 320 | Heat |
| 1103 | IL | 330 | 352 | Heat |
| 2900 | OK | 296 | 4829 | Tornado |
To show the greatest economic consequences we did a calculation on total economic effect per event. The calculation is a multiplication of crop/property damage and the exponential. We show the top 10 events with the highest combined consequenses. ie property and crop damage costs.
stormData$multiplier[stormData$PROPDMGEXP=="K"] <- 10^3
stormData$multiplier[stormData$PROPDMGEXP=="M"] <- 10^6
stormData$multiplier[stormData$PROPDMGEXP=="B"] <- 10^9
stormData$multiplier2[stormData$CROPDMGEXP=="K"] <- 10^3
stormData$multiplier2[stormData$CROPDMGEXP=="M"] <- 10^6
stormData$multiplier2[stormData$CROPDMGEXP=="B"] <- 10^9
stormData$multiplier[stormData$PROPDMGEXP==""] <- 0
# multiply the values to get total dollars
stormData$PROPDMG_new <- stormData$PROPDMG * stormData$multiplier
stormData$CROPDMG_new <- stormData$CROPDMG * stormData$multiplier2
#categorise the major events
economicHarm <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ EVTYPE, data=stormData, sum)
economicHarm$grp_EVTYPE <- economicHarm$EVTYPE
economicHarm <- economicHarm[order(-economicHarm$PROPDMG_new , -economicHarm$CROPDMG_new),]
economicHarm$grp_EVTYPE[grepl("flood" , economicHarm$EVTYPE)] <- "Flood"
economicHarm$grp_EVTYPE[grepl("tornado" , economicHarm$EVTYPE)] <- "Tornado"
economicHarm$grp_EVTYPE[grepl("wind" , economicHarm$EVTYPE)] <- "Wind"
economicHarm$grp_EVTYPE[grepl("hail" , economicHarm$EVTYPE)] <- "Hail"
economicHarm$grp_EVTYPE[grepl("heat" , economicHarm$EVTYPE)] <- "Heat"
economicHarmFinal <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ grp_EVTYPE, data=economicHarm, sum)
economicHarmFinal <- economicHarmFinal[order(-economicHarm$CROPDMG_new , -economicHarm$PROPDMG_new),]
economicHarmFinal$COMBO <- economicHarmFinal$PROPDMG_new + economicHarmFinal$CROPDMG_new
economicHarmCombo <- economicHarmFinal[order(-economicHarmFinal$COMBO),]
names(economicHarmCombo) <- c( "EVENT_TYPE" , "PROPERTY_DAMAGE" , "CROP_DAMAGE" ,"TOTAL_DAMAGE" )
economicHarmCombo <- head(economicHarmCombo, n=10)
The table below shows the top ten events with the highest combined economic consequenses.
| EVENT_TYPE | PROPERTY_DAMAGE | CROP_DAMAGE | TOTAL_DAMAGE | |
|---|---|---|---|---|
| 18 | Flood | 1.460e+11 | 1.227e+10 | 1.583e+11 |
| 41 | hurricane/typhoon | 2.674e+10 | 2.608e+09 | 2.935e+10 |
| 60 | Tornado | 1.617e+10 | 3.650e+08 | 1.653e+10 |
| 11 | drought | 2.337e+08 | 1.397e+10 | 1.421e+10 |
| 28 | Hail | 9.597e+09 | 3.089e+09 | 1.269e+10 |
| 37 | hurricane | 9.716e+09 | 2.742e+09 | 1.246e+10 |
| 79 | Wind | 7.267e+09 | 2.092e+09 | 9.359e+09 |
| 43 | ice storm | 9.030e+08 | 5.022e+09 | 5.925e+09 |
| 57 | storm surge/tide | 4.641e+09 | 8.500e+05 | 4.641e+09 |
| 77 | wildfire | 3.498e+09 | 2.955e+08 | 3.794e+09 |
The graph below shows a comparison of the top ten events economic consequences.The y axis are the total economic effect expressed in log to base 10. Flood is observed to have the highest economic effect.
names(economicHarmCombo) <- c( "EVENT_TYPE" , "PROPERTY_DAMAGE($)" , "CROP_DAMAGE($)" ,"COMBO" )
g <-ggplot(economicHarmCombo, aes(log10(COMBO))) +
geom_bar(aes(x=EVENT_TYPE, y=log10(COMBO) ), stat="identity", fill="blue",position="dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ylab("Log_10 Combined Economic")
print(g)
The states with that were affected majorly are shown on the table below
#by State
economicHarmState <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ EVTYPE + STATE, data=stormData, sum)
economicHarmState$grp_EVTYPE <- economicHarmState$EVTYPE
economicHarmState$grp_EVTYPE[grepl("flood" , economicHarmState$grp_EVTYPE)] <- "Flood"
economicHarmState$grp_EVTYPE[grepl("tornado" , economicHarmState$grp_EVTYPE)] <- "Tornado"
economicHarmState$grp_EVTYPE[grepl("wind" , economicHarmState$grp_EVTYPE)] <- "Wind"
economicHarmState$grp_EVTYPE[grepl("hail" , economicHarmState$grp_EVTYPE)] <- "Hail"
economicHarmState$grp_EVTYPE[grepl("heat" , economicHarmState$grp_EVTYPE)] <- "Heat"
harmStateFinal <- aggregate(cbind(PROPDMG_new, CROPDMG_new) ~ grp_EVTYPE + STATE, data=economicHarmState, sum)
harmStateFinal$COMBO <- harmStateFinal$PROPDMG_new + harmStateFinal$CROPDMG_new
harmStateFinal <- harmStateFinal[order(-harmStateFinal$COMBO),]
names(harmStateFinal) <- c( "EVENT_TYPE" ,"STATE", "PROPERTY_DAMAGE" , "CROP_DAMAGE" ,"TOTAL_DAMAGE" )
| EVENT_TYPE | STATE | PROPERTY_DAMAGE | CROP_DAMAGE | TOTAL_DAMAGE | |
|---|---|---|---|---|---|
| 126 | Flood | CA | 1.160e+11 | 8.124e+08 | 1.168e+11 |
| 244 | hurricane/typhoon | FL | 1.717e+10 | 9.552e+08 | 1.813e+10 |
| 370 | Flood | IL | 5.717e+09 | 5.070e+09 | 1.079e+10 |
| 639 | hurricane/typhoon | MS | 5.885e+09 | 1.515e+09 | 7.400e+09 |
| 1038 | drought | TX | 1.988e+08 | 6.373e+09 | 6.572e+09 |
| 684 | hurricane | NC | 4.147e+09 | 1.425e+09 | 5.572e+09 |
| 44 | Tornado | AL | 4.983e+09 | 5.680e+07 | 5.039e+09 |
| 640 | ice storm | MS | 2.595e+06 | 5.000e+09 | 5.003e+09 |
| 1056 | storm surge/tide | TX | 4.500e+09 | 0.000e+00 | 4.500e+09 |
| 1017 | Flood | TN | 4.379e+09 | 5.868e+06 | 4.385e+09 |