This document contains the procedures I used to analyze Storm Data by the National Weather Service to determine events that caused the most loss of life or economic damage.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
# Required Libraries:
library(data.table)
library(reshape2)
Data can be downloaded from this link.
if (!file.exists("repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
file="repdata-data-StormData.csv.bz2")
}
#Once downloaded, the file must be decompressed.
if (!file.exists("repdata-data-StormData.csv")) {
filePath <- paste (getwd(), '/', 'repdata-data-StormData.csv.bz2', sep='')
bunzip2("repdata-data-StormData.csv.bz2", destname=gsub("[.]bz2$", "", "repdata-data-StormData.csv.bz2"),
overwrite=TRUE,
remove=FALSE)
}
#The last step is loading the data in a data.table.
filePath <- paste (getwd(), '/', 'repdata-data-StormData.csv', sep='')
dataFrame <- read.csv (filePath)
dataTable <- data.table(dataFrame)
rm (dataFrame)
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
In order to answer this question, I process the fields EVTYPE, FATALITIES, INJURIES The results will be available in healthData data.frame.
healthData <- aggregate (cbind(FATALITIES, INJURIES) ~ EVTYPE, dataTable, sum)
healthData <- healthData[healthData$FATALITIES > 0 | healthData$INJURIES > 0,]
total <- healthData$FATALITIES + healthData$INJURIES
healthData <- healthData[with(cbind(healthData,total), order(-total)),]
Filter the 10 most harmful types of event respect to pupulation health
healthData <- head(healthData,10)
healthData
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 464 LIGHTNING 816 5230
## 275 HEAT 937 2100
## 153 FLASH FLOOD 978 1777
## 427 ICE STORM 89 1975
## 760 THUNDERSTORM WIND 133 1488
## 972 WINTER STORM 206 1321
Across the United States, which types of events have the greatest economic consequences?
In order to answer this question, I process the fields EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. In this case, data need to be preprocessed, because de amounts and the units of the damage are present in different columns. The function damageValue will calculate de correct value.
damageValue <- function(argValue, argExp) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (argExp %in% c('h', 'H'))
return (argValue * (10 ** 2))
else if (argExp %in% c('k', 'K'))
return (argValue * (10 ** 3))
else if (argExp %in% c('m', 'M'))
return (argValue * (10 ** 6))
else if (argExp %in% c('b', 'B'))
return (argValue * (10 ** 9))
else if (!is.na(as.numeric(argExp)))
return (as.numeric(argExp))
else return (0)
}
Add the results in two new columns, propertyDamage and cropDamage.
dataTable$propertyDamage <- mapply(damageValue, dataTable$PROPDMG, dataTable$PROPDMGEXP)
dataTable$cropDamage <- mapply(damageValue, dataTable$CROPDMG, dataTable$CROPDMGEXP)
Now I can process the dataTable. The results will be available in economicData data.frame.
economicData <- aggregate (cbind(propertyDamage, cropDamage) ~ EVTYPE, dataTable, sum)
economicData <- economicData[economicData$propertyDamage > 0 | economicData$cropDamage > 0,]
total <- economicData$propertyDamage + economicData$cropDamage
economicData <- economicData[with(cbind(economicData,total), order(-total)),]
Filter the 10 most harmful types of event respect to economic consequences
economicData <- head(economicData,10)
economicData
## EVTYPE propertyDamage cropDamage
## 170 FLOOD 144657717736 5661980154
## 411 HURRICANE/TYPHOON 69305840018 2607872855
## 834 TORNADO 56937169417 415004177
## 670 STORM SURGE 43323536086 5254
## 244 HAIL 15732464102 3026160812
## 153 FLASH FLOOD 16140832926 1421349698
## 95 DROUGHT 1046107081 13972566971
## 402 HURRICANE 11868319058 2741910104
## 590 RIVER FLOOD 5118945570 5029459153
## 427 ICE STORM 3944928501 5022114545
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
According to the performed analysis, TORNADO is the event type that causes the highest impact against human health.
healthData
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 464 LIGHTNING 816 5230
## 275 HEAT 937 2100
## 153 FLASH FLOOD 978 1777
## 427 ICE STORM 89 1975
## 760 THUNDERSTORM WIND 133 1488
## 972 WINTER STORM 206 1321
Melting data using reshape2 library in order to plot results
healthDataPlot <- melt(healthData, id.var="EVTYPE")
healthDataBarPlot1 <- t(as.matrix (healthData))[2:3,]
healthDataBarPlot2 <- healthDataBarPlot1[,2:10]
The next graphic shows top 10 events that cause the highest impact against human health. Since TORNADO event is much greater than the rest, a second graph with the same data excluding TORNADO event is shown in order to appreciate the rest of event in detail.
par(mfrow=c(2,1))
par(font.axis=1)
par(las=2) # make label text perpendicular to axis
par(mar=c(5,11,3,3)) # increase y-axis margin.
barplot(healthDataBarPlot1,
main="10 most harmful types of event respect to pupulation",
col=c("darkblue","red"),
legend = rownames(healthDataBarPlot1),
names.arg = healthData$EVTYPE,
horiz = FALSE)
#legend ("topright", legend=rownames(healthDataBarPlot1),col=c("darkblue","red"), lwd="1", pt.cex=1, cex=0.3)
mtext(1, text = "Health damage (fatalities / injuries)", line = 8, las = 1)
barplot(healthDataBarPlot2,
main="Zoomed graph (data without tornado event)",
col=c("darkblue","red"),
legend = rownames(healthDataBarPlot2),
names.arg = healthData$EVTYPE[2:10],
horiz = FALSE)
#legend ("topright",legend=rownames(healthDataBarPlot1),col=c("darkblue","red"), lwd="1", pt.cex=1, cex=0.3)
mtext(1, text = "Health damage (fatalities / injuries)", line = 4, las = 1)
According to the performed analysis, FLOOD is the event type that causes the highest economic impact.
economicData
## EVTYPE propertyDamage cropDamage
## 170 FLOOD 144657717736 5661980154
## 411 HURRICANE/TYPHOON 69305840018 2607872855
## 834 TORNADO 56937169417 415004177
## 670 STORM SURGE 43323536086 5254
## 244 HAIL 15732464102 3026160812
## 153 FLASH FLOOD 16140832926 1421349698
## 95 DROUGHT 1046107081 13972566971
## 402 HURRICANE 11868319058 2741910104
## 590 RIVER FLOOD 5118945570 5029459153
## 427 ICE STORM 3944928501 5022114545
Melting data using reshape2 library in order to plot results
economicDataPlot <- melt(economicData, id.var="EVTYPE")
economicDataBarPlot1 <- t(as.matrix (economicData))[2:3,]
The next graphic shows top 10 events that cause the highest economic impact.
par(mfrow=c(1,1))
par(font.axis=1)
par(las=2) # make label text perpendicular to axis
par(mar=c(5,11,3,3)) # increase y-axis margin.
barplot(economicDataBarPlot1,
main="10 most harmful types of event respect to economic",
col=c("darkblue","red"),
legend = rownames(economicDataBarPlot1),
names.arg = healthData$EVTYPE,
horiz = TRUE)
mtext(1, text = "Economic damage (dollars)", line = 4, las = 1)
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] reshape2_1.4 reshape_0.8.5 knitr_1.7 ggplot2_1.0.0
## [5] data.table_1.9.4
##
## loaded via a namespace (and not attached):
## [1] chron_2.3-45 codetools_0.2-8 colorspace_1.2-4 digest_0.6.4
## [5] evaluate_0.5.5 formatR_1.0 grid_3.1.1 gtable_0.1.2
## [9] htmltools_0.2.6 MASS_7.3-33 munsell_0.4.2 plyr_1.8.1
## [13] proto_0.3-10 Rcpp_0.11.2 rmarkdown_0.2.64 scales_0.2.4
## [17] stringr_0.6.2 tools_3.1.1 yaml_2.1.13