Date: March 25, 2015

Synopsis

This report is the end result of a sequences of preprocessing and analysis of Storm Data, an official publication of the National Oceanic and Atmospheric Administration (NOAA). We have relied on the Storm Data Documentation to guide our processing and analysis. We have converted letters representing the alphabetical character signifying the magnitude of the amount of dollars as per the Storm Data Documentation. We have extracted relevant variables for analysis based on the documentation as well. We have considered the Permitted Storm Data Events to find pattern are replace Events Type with their correct spelling. We have identified that Tornado has caused fatalities and injuries that are most harmful to human health. We have also attempted to estimate the lost of money in billion dollars US. This economic disruption, evaluated from crop and property damages, caused moslty by flood has reached roughly the top amount of 200 billions dollars US in the period from year 1950 to november 2011

Data processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We have downloaded the file from the course web site and use cache=TRUE for issuing time-consuming dowloading

# Download and read the raw data
# download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")
# datedownloaded<-date()
storm<- read.csv(bzfile("StormData.csv.bz2"), stringsAsFactors = FALSE)

We need to look at the data before downsteaming our analysis. this will help us narrow our analysis and focus on specific variables

Show me the summary of Storm Data ▼

As per the Data Storm Documentation we will we be processing the following variables that are considered as relevant to the analyis we will be undertaking.

  1. EVTYPE: Type of event
  2. FATALITIES: Number of fatalities
  3. INJURIES: Number of injuries
  4. PROPDMG: The amount of property damage caused by each element
  5. PROPDMGEXP: Property damage estimates in dollar US
  6. CROPDMG: The amount of crop damage caused by each element
  7. CROPDMGEXP: Crop damage estimates in dollar US
# Select variables relevant to harmfulness
harmful<- storm[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

We have two variables: *PROPDMGEXP“* and CROPDMGEXP that are factor varibales. Since we are going to replace alphabetical character by their appropriate value in the estimates crop/property variables, we will transform factor variables to character to allow insertion of new number. Alphabetical character used to signify magnitude include”H" for hundreds, “K” for thousands, “M” for millions and “B” for billions.

# Convert factor to character to allow replacement
harmful$CROPDMGEXP<-as.character(harmful$PROPDMGEXP)
harmful$PROPDMGEXP<-as.character(harmful$PROPDMGEXP)

Then, we can repalce alphabetical character by their appropriate value.

# Convert letters to integers (page 12)
harmful$PROPDMGEXP<-replace(harmful$PROPDMGEXP,harmful$PROPDMGEXP=='K',10^3)
harmful$PROPDMGEXP<-replace(harmful$PROPDMGEXP,(harmful$PROPDMGEXP == "M" | harmful$PROPDMGEXP == "m"),10^6)
harmful$PROPDMGEXP<-replace(harmful$PROPDMGEXP,harmful$PROPDMGEXP=='B',10^9)
harmful$PROPDMGEXP<-replace(harmful$PROPDMGEXP,(harmful$PROPDMGEXP == "H" | harmful$PROPDMGEXP == "h"),10^2)
harmful$PROPDMGEXP<-replace(harmful$PROPDMGEXP,(harmful$PROPDMGEXP == "" | harmful$PROPDMGEXP == "+" | harmful$PROPDMGEXP == "?"| harmful$PROPDMGEXP == "-"),1)
harmful$CROPDMGEXP<-replace(harmful$CROPDMGEXP,(harmful$CROPDMGEXP == "K" | harmful$CROPDMGEXP == "k"),10^3)
harmful$CROPDMGEXP<-replace(harmful$CROPDMGEXP,(harmful$CROPDMGEXP == "M" | harmful$CROPDMGEXP == "m"),10^6)
harmful$CROPDMGEXP<-replace(harmful$CROPDMGEXP,harmful$CROPDMGEXP == "B",9)
harmful$CROPDMGEXP<-replace(harmful$CROPDMGEXP,(harmful$CROPDMGEXP == "H" | harmful$CROPDMGEXP == "h"),10^2)
harmful$CROPDMGEXP<-replace(harmful$CROPDMGEXP,(harmful$CROPDMGEXP == "" | harmful$CROPDMGEXP == "+"| harmful$CROPDMGEXP == "?"| harmful$CROPDMGEXP == "-"),1)

We have created a new dataset with the same number of rows and 7 variables. We need now to look deeper at the EVTYPE variable before downstreaming the analysis.

Show me unique value of the EVTYPE variable ▼

We can see we have identical events with different spelling. We need to find a pattern to have a unique spelling for the same event. We will rely on the data event table from the Data Storm Documentation and the regular expression to fix this issue.

# Creation of a vector based on the data event table in the documentation
event <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", 
           "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", 
           "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill", 
           "Flash Flood", "Flood", "Freezing", "Frost/Freeze", "Funnel Cloud", "Hail", 
           "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", 
           "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", 
           "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", 
           "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", 
           "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", 
           "Wildfire", "Winter Storm", "Winter Weather")

Note that there are a few events that can be found within others. For example, Thunderstorm Wind is within Marine Thunderstorm Wind, Flood is within Flash Flood, etc…We need to be wary about those events and use the effective regular expression to avoid duplicated rows. A new vector is created.

# Determine regular expressions to find pattern
ev_reg <- c("Astronomical Low Tide", "Avalanc[he]", "Blizzard", 
            "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", 
            "Drought", "Dust Dev[ei]l", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill", 
            "Flash Flood", "^Flood", "Freezing", "Frost/Freeze", "Funnel Cloud", 
            "Hail", "^Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", 
            "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lig[hn]tning", "Marine Hail", 
            "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind|Marine TSTM Wind", 
            "Rip Current", "Seiche", "Sleet", "Storm Tide", "^Strong Wind", "^Thunderstorm Wind|^TSTM WIND", 
            "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", 
            "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")

Now we can create a new tidy dataset. The grep function will be used to find pattern matching and do replacement

# Creation of a new tidy dataset relevant to harmfulness
harmful_tidy<-NULL
for (i in 1:length(ev_reg)) {
      # reation of indices matching data event table
      indices<-grep(ev_reg[i], ignore.case = TRUE, harmful$EVTYPE)
      # Subset the rows in the harmful dataset that match the data event table
      rows <- harmful[indices, ]
      # Create a new variable with the spelling of the event data value(see documentation)
      EVTYPE_clean <- c(rep(event[i], nrow(rows)))
      # Add the new tidy EVTYPE variable
      rows_EVTYPE <- cbind(rows, EVTYPE_clean)
      # Create a new tidy dataset
      harmful_tidy <- rbind(harmful_tidy, rows_EVTYPE)
}
str(harmful_tidy)
## 'data.frame':    893293 obs. of  8 variables:
##  $ EVTYPE      : chr  "ASTRONOMICAL LOW TIDE" "ASTRONOMICAL LOW TIDE" "ASTRONOMICAL LOW TIDE" "ASTRONOMICAL LOW TIDE" ...
##  $ FATALITIES  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PROPDMG     : num  0 0 0 0 0 120 0 0 0 0 ...
##  $ PROPDMGEXP  : chr  "1000" "1000" "1000" "1000" ...
##  $ CROPDMG     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP  : chr  "1000" "1000" "1000" "1000" ...
##  $ EVTYPE_clean: Factor w/ 46 levels "Astronomical Low Tide",..: 1 1 1 1 1 1 1 1 1 1 ...

Natural disasters typically set in motion a complex chain of events that can disrupt both the local economy and, in severe cases, the national economy. Calculating the econmic damage of such an event can be useful to the Manager preparing for severe weather events and needing to prioritize resources for different types of events.

# Compute economic damages in billion dollars
ECON_DAMAGE<-round((harmful_tidy$PROPDMG*as.numeric(harmful_tidy$PROPDMGEXP) + harmful_tidy$CROPDMG*as.numeric(harmful_tidy$CROPDMGEXP))/10^9,digits=2)
harmful_tidy<-cbind(harmful_tidy,ECON_DAMAGE)
# Aggregate economic damage by EVTYPE
econ_damage <- aggregate(ECON_DAMAGE ~ EVTYPE_clean, harmful_tidy, sum)
econ_damage <- econ_damage[order(econ_damage$ECON_DAMAGE, decreasing = TRUE), ]
# Select top height most harmful causes of injuries
Topecon_damage<- econ_damage[1:8, ]
print(Topecon_damage)
##         EVTYPE_clean ECON_DAMAGE
## 14             Flood      196.40
## 38           Tornado       73.11
## 24 Hurricane/Typhoon       72.36
## 13       Flash Flood       53.86
## 18              Hail       30.81
## 37 Thunderstorm Wind       22.48
## 23         High Wind       14.54
## 40    Tropical Storm        9.49

Fatality is the death resulting form the storm events. It can direct or indirect. We will not be issuing this difference in this report. It is important to the disaster manager to know the fatal impact on the community.

# Aggregate FATALITIES by EVTYPE
fatalities <- aggregate(FATALITIES ~ EVTYPE_clean, harmful_tidy, sum)
fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ]
# Select top height most harmful causes of injuries
Topfatalities<- fatalities[1:8, ]
print(Topfatalities)
##         EVTYPE_clean FATALITIES
## 38           Tornado       5661
## 11    Excessive Heat       1922
## 19              Heat       1118
## 13       Flash Flood       1035
## 28         Lightning        817
## 37 Thunderstorm Wind        709
## 33       Rip Current        577
## 14             Flood        495

Injury is the damage to a biological organism caused by the storm event. It can be direct or indirect. It is important to the disaster manager to be informed about people injuries in order to plan his logistic team.

# Aggregate INJURIES by EVTYPE
injuries <- aggregate(INJURIES ~ EVTYPE_clean, harmful_tidy, sum)
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE), ]
# Select top height most harmful causes of injuries
Topinjuries<- injuries[1:8, ]
print(Topinjuries)
##         EVTYPE_clean INJURIES
## 38           Tornado    91407
## 37 Thunderstorm Wind     9458
## 14             Flood     6806
## 11    Excessive Heat     6525
## 28         Lightning     5232
## 19              Heat     2494
## 25         Ice Storm     1992
## 13       Flash Flood     1802

Results

Which types of events are most harmful to population health?

# plot Top fatalities
library(ggplot2)
ggplot(Topfatalities, aes(EVTYPE_clean, FATALITIES,fill=EVTYPE_clean)) + 
      geom_bar(stat = "identity") + xlab("Event Type") + theme(legend.position ="none") +
      ylab("Number of fatalities") + ggtitle("Top eight fatalities harmful to human health") + 
      theme(axis.text.x = element_text(size = 12,angle = 90, hjust = 1,face="bold")) + 
      geom_text(aes(label=FATALITIES,size=14))

# plot Top injuries
ggplot(Topinjuries, aes(EVTYPE_clean, INJURIES,fill=EVTYPE_clean)) + 
      geom_bar(stat = "identity") + xlab("Event Type") + theme(legend.position ="none") +
      ylab("Number of injuries") + ggtitle("Top eight injuries harmful to human health") + 
      theme(axis.text.x = element_text(size = 12,angle = 90, hjust = 1,face="bold")) + 
      geom_text(aes(label=INJURIES,size=14))

Conclusion 1

Tornadoes are different than other natural disasters, such as hurricanes, because they are confined to a relatively small area (typically a few hundred meters wide). Though hurricanes have more total energy, the energy density within a tornado can be much higher. This explain why Tornado is the cause of fatalities and injuries the most important among the fourty-eght(48) listed in the data event table. It has caused injuries of 91,407 and fatalities of 5,661 persons from the year 1950 and end in November 2011. From these results, we deduct that Tornadoes typically kill 92 people per year and injure 1,498. Most deaths come from flying or falling debris, and occur in the most violent tornadoes.



Which types of events have the greatest economic consequences?

# plot Top economic damage
ggplot(Topecon_damage, aes(EVTYPE_clean, ECON_DAMAGE,fill=EVTYPE_clean)) + 
      geom_bar(stat = "identity") + xlab("Event Type") + theme(legend.position ="none") +
      ylab("Economic damage in billion $") + ggtitle("Estimation of the greatest economic consequence from 1950 to 2011") + 
      theme(axis.text.x = element_text(size = 12,angle = 90, hjust = 1,face="bold")) + 
      geom_text(aes(label=ECON_DAMAGE,size=14))

Conclusion 2

Most of the United States is susceptible to some kind of storm events. As a result, storm events will exact their toll on local or regional economies on a continuing basis. Because the avenues of influence traverse through many economic sectors and affect many individuals and, moreover, are intertwined in innumerable and unseen ways, calculating the greatest economic consequences from a set of events is very important to the municipal manager.
The flood is estimated as the greatest economic consequence. It has caused roughly a loss of 200 billions dollars from 1950 to 2011. This amount is approximately twice greater than the impact of Tornado that occupies the second place among the events that has the greatest economic consequences. From these results, we deduct that Flood typically has caused a loss of 3.2 billions dollars per year.

Ideally the best approach to flood risk mitigation is to simply not build in a flood zone, which cannot be applied by the municipality manager and people who are currently living in the flood area. Providing a technical mitigation plan will require a huge explanation that is beyond of this course because mitigation plan is preventive. However we can emphasive on the type of ressources to prioritize. Because flood water contains dissolved chemicals and hazardous materials and exerts tremendous pressure on equipment, storage tanks and all kinds of mechanical equipment, I will mobilize a team composed of : Chemists, Hydrologists and Rescuers.