Storm Data Analysis

This document contains the procedures I used to analyze Storm Data by the National Weather Service to determine events that caused the most loss of life or economic damage.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

# Required Libraries:
library(data.table)
library(reshape2)

Load data

Data can be downloaded from this link.

if (!file.exists("repdata-data-StormData.csv.bz2")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
        file="repdata-data-StormData.csv.bz2")
}

#Once downloaded, the file must be decompressed.
if (!file.exists("repdata-data-StormData.csv")) {
        filePath <- paste (getwd(), '/', 'repdata-data-StormData.csv.bz2', sep='')
        bunzip2("repdata-data-StormData.csv.bz2", destname=gsub("[.]bz2$", "", "repdata-data-StormData.csv.bz2"), 
        overwrite=TRUE, 
        remove=FALSE)
}
#The last step is loading the data in a data.table.
filePath <- paste (getwd(), '/', 'repdata-data-StormData.csv', sep='')
dataFrame <- read.csv (filePath)
dataTable <- data.table(dataFrame)
rm (dataFrame)

Processing Data

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

In order to answer this question, I process the fields EVTYPE, FATALITIES, INJURIES The results will be available in healthData data.frame.

  • Calculate the aggregate value of fields FATALITIES AND INJURIES for each type of EVTYPE.
  • Filter the results in order to select EVTYPE with some value for the study.
  • Order the results by total amount of FATALITIES AND INJURIES
healthData <- aggregate (cbind(FATALITIES, INJURIES) ~ EVTYPE, dataTable, sum)
healthData <- healthData[healthData$FATALITIES > 0 | healthData$INJURIES > 0,]
total <- healthData$FATALITIES + healthData$INJURIES
healthData <- healthData[with(cbind(healthData,total), order(-total)),]

Filter the 10 most harmful types of event respect to pupulation health

healthData <- head(healthData,10)
healthData
##                EVTYPE FATALITIES INJURIES
## 834           TORNADO       5633    91346
## 130    EXCESSIVE HEAT       1903     6525
## 856         TSTM WIND        504     6957
## 170             FLOOD        470     6789
## 464         LIGHTNING        816     5230
## 275              HEAT        937     2100
## 153       FLASH FLOOD        978     1777
## 427         ICE STORM         89     1975
## 760 THUNDERSTORM WIND        133     1488
## 972      WINTER STORM        206     1321

Across the United States, which types of events have the greatest economic consequences?

In order to answer this question, I process the fields EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. In this case, data need to be preprocessed, because de amounts and the units of the damage are present in different columns. The function damageValue will calculate de correct value.

damageValue <- function(argValue, argExp) {
        # h -> hundred, k -> thousand, m -> million, b -> billion
        if (argExp %in% c('h', 'H'))
                return (argValue * (10 ** 2))
        else if (argExp %in% c('k', 'K'))
                return (argValue * (10 ** 3))
        else if (argExp %in% c('m', 'M'))
                return (argValue * (10 ** 6))
        else if (argExp %in% c('b', 'B'))
                return (argValue * (10 ** 9))
        else if (!is.na(as.numeric(argExp)))
                return (as.numeric(argExp))
        else return (0)
}

Add the results in two new columns, propertyDamage and cropDamage.

dataTable$propertyDamage <- mapply(damageValue, dataTable$PROPDMG, dataTable$PROPDMGEXP)
dataTable$cropDamage <- mapply(damageValue, dataTable$CROPDMG, dataTable$CROPDMGEXP)

Now I can process the dataTable. The results will be available in economicData data.frame.

  • Calculate the aggregate value of fields propertyDamage AND cropDamage for each type of EVTYPE.
  • Filter the results in order to select EVTYPE with some value for the study.
  • Order the results by total amount of propertyDamage AND cropDamage
economicData <- aggregate (cbind(propertyDamage, cropDamage) ~ EVTYPE, dataTable, sum)
economicData <- economicData[economicData$propertyDamage > 0 | economicData$cropDamage > 0,]
total <- economicData$propertyDamage + economicData$cropDamage
economicData <- economicData[with(cbind(economicData,total), order(-total)),]

Filter the 10 most harmful types of event respect to economic consequences

economicData <- head(economicData,10)
economicData
##                EVTYPE propertyDamage  cropDamage
## 170             FLOOD   144657717736  5661980154
## 411 HURRICANE/TYPHOON    69305840018  2607872855
## 834           TORNADO    56937169417   415004177
## 670       STORM SURGE    43323536086        5254
## 244              HAIL    15732464102  3026160812
## 153       FLASH FLOOD    16140832926  1421349698
## 95            DROUGHT     1046107081 13972566971
## 402         HURRICANE    11868319058  2741910104
## 590       RIVER FLOOD     5118945570  5029459153
## 427         ICE STORM     3944928501  5022114545

Result

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

According to the performed analysis, TORNADO is the event type that causes the highest impact against human health.

healthData
##                EVTYPE FATALITIES INJURIES
## 834           TORNADO       5633    91346
## 130    EXCESSIVE HEAT       1903     6525
## 856         TSTM WIND        504     6957
## 170             FLOOD        470     6789
## 464         LIGHTNING        816     5230
## 275              HEAT        937     2100
## 153       FLASH FLOOD        978     1777
## 427         ICE STORM         89     1975
## 760 THUNDERSTORM WIND        133     1488
## 972      WINTER STORM        206     1321

Melting data using reshape2 library in order to plot results

healthDataPlot <- melt(healthData, id.var="EVTYPE")
healthDataBarPlot1 <- t(as.matrix (healthData))[2:3,]
healthDataBarPlot2 <- healthDataBarPlot1[,2:10]

The next graphic shows top 10 events that cause the highest impact against human health. Since TORNADO event is much greater than the rest, a second graph with the same data excluding TORNADO event is shown in order to appreciate the rest of event in detail.

par(mfrow=c(2,1))
par(font.axis=1)
par(las=2) # make label text perpendicular to axis
par(mar=c(5,11,3,3)) # increase y-axis margin.

barplot(healthDataBarPlot1, 
        main="10 most harmful types of event respect to pupulation",
        col=c("darkblue","red"), 
        legend = rownames(healthDataBarPlot1),
        names.arg = healthData$EVTYPE,
        horiz = FALSE)

#legend ("topright", legend=rownames(healthDataBarPlot1),col=c("darkblue","red"), lwd="1", pt.cex=1, cex=0.3)

mtext(1, text = "Health damage (fatalities / injuries)", line = 8, las = 1)

barplot(healthDataBarPlot2, 
        main="Zoomed graph (data without tornado event)",
        col=c("darkblue","red"), 
        legend = rownames(healthDataBarPlot2),
        names.arg = healthData$EVTYPE[2:10],
        horiz = FALSE)

#legend ("topright",legend=rownames(healthDataBarPlot1),col=c("darkblue","red"), lwd="1", pt.cex=1, cex=0.3)

mtext(1, text = "Health damage (fatalities / injuries)", line = 4, las = 1)

Across the United States, which types of events have the greatest economic consequences?

According to the performed analysis, FLOOD is the event type that causes the highest economic impact.

economicData
##                EVTYPE propertyDamage  cropDamage
## 170             FLOOD   144657717736  5661980154
## 411 HURRICANE/TYPHOON    69305840018  2607872855
## 834           TORNADO    56937169417   415004177
## 670       STORM SURGE    43323536086        5254
## 244              HAIL    15732464102  3026160812
## 153       FLASH FLOOD    16140832926  1421349698
## 95            DROUGHT     1046107081 13972566971
## 402         HURRICANE    11868319058  2741910104
## 590       RIVER FLOOD     5118945570  5029459153
## 427         ICE STORM     3944928501  5022114545

Melting data using reshape2 library in order to plot results

economicDataPlot <- melt(economicData, id.var="EVTYPE")
economicDataBarPlot1 <- t(as.matrix (economicData))[2:3,]

The next graphic shows top 10 events that cause the highest economic impact.

par(mfrow=c(1,1))
par(font.axis=1)
par(las=2) # make label text perpendicular to axis
par(mar=c(5,11,3,3)) # increase y-axis margin.

barplot(economicDataBarPlot1, 
        main="10 most harmful types of event respect to economic",
        col=c("darkblue","red"), 
        legend = rownames(economicDataBarPlot1),
        names.arg = healthData$EVTYPE,
        horiz = TRUE)

mtext(1, text = "Economic damage (dollars)", line = 4, las = 1)

System Informaton

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] reshape2_1.4     reshape_0.8.5    knitr_1.7        ggplot2_1.0.0   
## [5] data.table_1.9.4
## 
## loaded via a namespace (and not attached):
##  [1] chron_2.3-45     codetools_0.2-8  colorspace_1.2-4 digest_0.6.4    
##  [5] evaluate_0.5.5   formatR_1.0      grid_3.1.1       gtable_0.1.2    
##  [9] htmltools_0.2.6  MASS_7.3-33      munsell_0.4.2    plyr_1.8.1      
## [13] proto_0.3-10     Rcpp_0.11.2      rmarkdown_0.2.64 scales_0.2.4    
## [17] stringr_0.6.2    tools_3.1.1      yaml_2.1.13