Synopsis

In this report we investigate the impact of sever weather events in USA on both people health and economic damages. The analyzed period is 1950 - 2011. The goal of this report is to identify the events type that had the highest impact on people and economics. In order to focus on the worst events we selected the worst 10 events for each of the following indicators:

Data Processing

The data have been obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database Data. Tha data has been uncompressed and only relevant columns are loaded from csv file:

Archive <- "repdata-data-StormData.csv.bz2"
if (!file.exists(Archive)){
        stop(cat("data file ",Archive," not available"))
 }
## library(R.utils)
## bunzip2(Archive)
## source("http://bioconductor.org/biocLite.R")
## biocLite("limma")
library(limma)
Harmful_Data <- read.columns("repdata-data-StormData.csv", c("EVTYPE", "FATALITIES", "INJURIES"),
                             sep=",")
Damage_Data <- read.columns("repdata-data-StormData.csv", 
                            c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"),
                            sep =",")

in the following summary of data related to Economical Damages, we can see that PROPDGMEXP and CROPDMGEXP spurious values like “k”, “O” and Others are neglectible, so we assume that those spurious values are equivalent to empty cells.

summary(Damage_Data)
##                EVTYPE          PROPDMG          PROPDMGEXP    
##  HAIL             :288661   Min.   :   0.00          :465934  
##  TSTM WIND        :219940   1st Qu.:   0.00   K      :424665  
##  THUNDERSTORM WIND: 82563   Median :   0.00   M      : 11330  
##  TORNADO          : 60652   Mean   :  12.06   0      :   216  
##  FLASH FLOOD      : 54277   3rd Qu.:   0.50   B      :    40  
##  FLOOD            : 25326   Max.   :5000.00   5      :    28  
##  (Other)          :170878                     (Other):    84  
##     CROPDMG          CROPDMGEXP    
##  Min.   :  0.000          :618413  
##  1st Qu.:  0.000   K      :281832  
##  Median :  0.000   M      :  1994  
##  Mean   :  1.527   k      :    21  
##  3rd Qu.:  0.000   0      :    19  
##  Max.   :990.000   B      :     9  
##                    (Other):     9

We thus proceed normalizing the PROPDMG and CROPDMG in $ unit.

Damage_Data$PROPDMG[Damage_Data$PROPDMGEXP == "K"] <- 
        Damage_Data$PROPDMG[Damage_Data$PROPDMGEXP == "K"] * 1e3
Damage_Data$PROPDMG[Damage_Data$PROPDMGEXP == "M"] <- 
        Damage_Data$PROPDMG[Damage_Data$PROPDMGEXP == "M"] * 1e6
Damage_Data$PROPDMG[Damage_Data$PROPDMGEXP == "B"] <- 
        Damage_Data$PROPDMG[Damage_Data$PROPDMGEXP == "B"] * 1e9

Damage_Data$CROPDMG[Damage_Data$CROPDMGEXP == "K"] <- 
        Damage_Data$CROPDMG[Damage_Data$CROPDMGEXP == "K"] * 1e3
Damage_Data$CROPDMG[Damage_Data$CROPDMGEXP == "M"] <- 
        Damage_Data$CROPDMG[Damage_Data$CROPDMGEXP == "M"] * 1e6
Damage_Data$CROPDMG[Damage_Data$CROPDMGEXP == "B"] <- 
        Damage_Data$CROPDMG[Damage_Data$CROPDMGEXP == "B"] * 1e9

Results

In this chapter we present the results from the loaded datasets. More precisely we will identify the mosy harmful types of event as well as the types of events that caused the greatest economic consequences.

We will focus on the Worst 10 Event Types for each of the following indicators:

For all indicators, we consider the sum of the indicator values. In other words the worst event type is the one with the highest “cumulative” impact over the observation period. Other aggregation possibilities (like the average or the max) are not considered in this study.

Question 1: Across the United States, which types of events are most harmful with respect to population health?

Although the question refers to “most harmful” events type, we keep Fatalities and Injuries as independent indicators, and thus we provide the worst 10 events type for each indicator. As previously described we sum the indicators provided for each event.

Q1_Data <- aggregate(Harmful_Data[,c("FATALITIES", "INJURIES")], 
                     by=list(Harmful_Data[,"EVTYPE"]), FUN=sum)
names(Q1_Data)[1] <- c("EVTYPE")
iFatal <- order(Q1_Data$FATALITIES, decreasing = TRUE)
iInj <- order(Q1_Data$INJURIES, decreasing = TRUE)
Q1_Fatal <- Q1_Data[iFatal[1:10],c("EVTYPE", "FATALITIES")]
Q1_Inj <- Q1_Data[iInj[1:10],c("EVTYPE","INJURIES")]

The following plot is showing the worst 10 Events Type as far as Fatalities in concerned.

library(ggplot2)
g <- ggplot(Q1_Fatal, aes(EVTYPE, FATALITIES)) + geom_bar(stat="identity") +
        coord_flip()
g <- g + labs(title = "Fatalities: Worst 10 events", 
              y = "Number of Fatalities", 
              x = "Event Type")
print(g)

Similar picture is provided for the number of Injuries.

library(ggplot2)
g <- ggplot(Q1_Inj, aes(EVTYPE, INJURIES)) + geom_bar(stat="identity") +
        coord_flip()
g <- g + labs(title = "Injuries: Worst 10 events", 
              y = "Number of Injuries", 
              x = "Event Type")
print(g)

Question 2: Across the United States, which types of events have the greatest economic consequences?

Economic consequences have been calculated as sum of Property and Crop Damages. The total Damages of each event type are aggregated as sum of damages estimated at each event.

Q2_Data <- aggregate(Damage_Data[, c("PROPDMG", "CROPDMG")], 
                     by = list(Damage_Data$EVTYPE), FUN=sum)
names(Q2_Data)[1] <- "EVTYPE"
Q2_Data_Tot <- cbind(Q2_Data, Q2_Data$PROPDMG + Q2_Data$CROPDMG)
names(Q2_Data_Tot)[4] <- "TOTDMG"
iDamage <- order(Q2_Data_Tot$TOTDMG, decreasing = TRUE)
Q2_Damage <- Q2_Data_Tot[iDamage[1:10],c("EVTYPE", "TOTDMG")]

The following plot is showing the worst 10 Events Type as far as Economic Damages in concerned.

library(ggplot2)
g <- ggplot(Q2_Damage, aes(EVTYPE, TOTDMG))+ geom_bar(stat="identity") +
        coord_flip()
g <- g + labs(title = "Economic Damages: Worst 10 events", 
              y = "Damage Value ($)", 
              x = "Event Type")
print(g)