Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

This aim of this report consists in exploring the NOAA storm database containing data on extreme natural events. The events in the database start in the year 1950 and end in November 2011. The purpose of this analysis is to answer the following two questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Main conclusions of the study:
1. Tornado is the most harmful event with more than 5600 deaths and 91400 injuries.
2. Floods are the type of events causing the most significant economic damage with more than 157 billion USD.

Data Processing

#Load the data The data for this report are available here.
Some documentation of the variables data which is available here.

First, lets’s download the data file and unzip it.

if (!"repdata_data_StormData.csv.bz2" %in% dir("./")) {
    print("Downloading File.....")
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata_data_StormData.csv.bz2")
}

Then, let’s read the csv file and save the data in a database called storm

if (!"storm" %in% ls()) {
    storm <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), sep = ",", header = TRUE, stringsAsFactors = FALSE)
}
dim(storm)
## [1] 902297     37
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Extract the relevant information

Following the documentation, there are 48 types of events which we will save in the variable “events”.

events <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")  

In addition, some events are combined events. As such, regular expressions are needed to extract the part of the event.

events_regex <- c("Astronomical Low Tide|Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill|Extreme Cold|Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze|Frost|Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon|Hurricane|Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind|Marine tstm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind|tstm wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")  

Let’s extract some relevant columns from *storm” for our analysis. These are the following:

newdata <- data.frame(EVTYPE = character(0), FATALITIES = numeric(0), INJURIES = numeric(0), PROPDMG = numeric(0), PROPDMGEXP = character(0), CROPDMG = numeric(0), CROPDMGEXP = character(0))  
for (i in 1:length(events)) {
    rows <- storm[grep(events_regex[i], ignore.case = TRUE, storm$EVTYPE), ]
    rows <- rows[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
    CLEANNAME <- c(rep(events[i], nrow(rows)))
    rows <- cbind(rows, CLEANNAME)
    newdata <- rbind(newdata, rows)
}

Conversion of the order of magnitude

The order of magnitude for property and crop damage are labelled with letters H,K,M and B. We will convert these letters into integers like shown below:

newdata[(newdata$PROPDMGEXP == "K" | newdata$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
newdata[(newdata$PROPDMGEXP == "M" | newdata$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
newdata[(newdata$PROPDMGEXP == "B" | newdata$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
newdata[(newdata$CROPDMGEXP == "K" | newdata$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
newdata[(newdata$CROPDMGEXP == "M" | newdata$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
newdata[(newdata$CROPDMGEXP == "B" | newdata$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9

Let’s convert the property and crops damage as well.

suppressWarnings(newdata$PROPDMG <- newdata$PROPDMG * 10^as.numeric(newdata$PROPDMGEXP))  
suppressWarnings(newdata$CROPDMG <- newdata$CROPDMG * 10^as.numeric(newdata$CROPDMGEXP))  

The total economic damage is the sum of the property and crops damages.

suppressWarnings(TOTECODMG <- newdata$PROPDMG + newdata$CROPDMG)
newdata <- cbind(newdata, TOTECODMG)

Let’s delete the columns ‘PROPDMGEXP’ and ‘CROPDMGEXP’ which are not needed now that we have the total damage variable.

newdata <- newdata[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "CLEANNAME", "TOTECODMG")]

We are now ready to start answering the two questions.

##Results

Question 01 : Across the United States, which types of events are most harmful with respect to population health?

Let’s aggregate the data for fatalities.

fatalities <- aggregate(FATALITIES ~ CLEANNAME, data = newdata, FUN = sum)
fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ]
# 10 most harmful causes of fatalities
MaxFatalities <- fatalities[1:10, ]
print(MaxFatalities)  
##                  CLEANNAME FATALITIES
## 38                 Tornado       5661
## 19                    Heat       3138
## 11          Excessive Heat       1922
## 14                   Flood       1525
## 13             Flash Flood       1035
## 28               Lightning        817
## 37       Thunderstorm Wind        753
## 33             Rip Current        577
## 12 Extreme cold/Wind Chill        382
## 23               High Wind        299

Do the same for Injuries

injuries <- aggregate(INJURIES ~ CLEANNAME, data = newdata, FUN = sum)
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE), ]
# 10 most harmful causes of injuries
MaxInjuries <- injuries[1:10, ]
print(MaxInjuries)
##            CLEANNAME INJURIES
## 38           Tornado    91407
## 37 Thunderstorm Wind     9493
## 19              Heat     9224
## 14             Flood     8604
## 11    Excessive Heat     6525
## 28         Lightning     5232
## 25         Ice Storm     1992
## 13       Flash Flood     1802
## 23         High Wind     1523
## 18              Hail     1467

Let’s plot a pair of graphs of Total Fatalities and Total Injuries caused by these natural events.

par(mfrow = c(1, 2), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(MaxFatalities$FATALITIES, las = 3, names.arg = MaxFatalities$CLEANNAME, main = "Events with\n The Top 10 Highest Fatalities", ylab = "Number of Fatalities", col = "blue")
barplot(MaxInjuries$INJURIES, las = 3, names.arg = MaxInjuries$CLEANNAME, main = "Events with\n The Top 10 Highest Injuries", ylab = "Number of Injuries", col = "blue")

Based on the above histograms, most fatalities have been caused by Tornado and Heat.Tornado had caused most injuries across the United States between 1995 and 2011.

Question 02 : Across the United States, which types of events have the greatest economic consequences?

As for the impact on public health, we create two sorted lists below to aggregate the data by damages.

First, let’s aggregate the data for Property Damage.

propdmg <- aggregate(PROPDMG ~ CLEANNAME, data = newdata, FUN = sum)
propdmg <- propdmg[order(propdmg$PROPDMG, decreasing = TRUE), ]
# 5 most harmful causes of injuries
propdmgMax <- propdmg[1:10, ]
print(propdmgMax)
##            CLEANNAME      PROPDMG
## 14             Flood 168212215589
## 24 Hurricane/Typhoon  85356410010
## 38           Tornado  58603317864
## 18              Hail  17622990956
## 13       Flash Flood  17588791879
## 37 Thunderstorm Wind  11575228673
## 40    Tropical Storm   7714390550
## 45      Winter Storm   6749997251
## 23         High Wind   6166300000
## 44          Wildfire   4865614000

Do the same with the data for Crop Damage

cropdmg <- aggregate(CROPDMG ~ CLEANNAME, data = newdata, FUN = sum)
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = TRUE), ]
# 5 most harmful causes of injuries
cropdmgMax <- cropdmg[1:10, ]
print(cropdmgMax)
##                  CLEANNAME     CROPDMG
## 8                  Drought 13972621780
## 14                   Flood 12380109100
## 24       Hurricane/Typhoon  5516117800
## 25               Ice Storm  5022113500
## 18                    Hail  3114212870
## 16            Frost/Freeze  1997061000
## 13             Flash Flood  1532197150
## 12 Extreme cold/Wind Chill  1313623000
## 37       Thunderstorm Wind  1255947980
## 19                    Heat   904469280

Finally, we aggregate Total Economic Damage

ecodmg <- aggregate(TOTECODMG ~ CLEANNAME, data = newdata, FUN = sum)
ecodmg <- ecodmg[order(ecodmg$TOTECODMG, decreasing = TRUE), ]

The 5 most harmful causes of property damage are:

ecodmgMax <- ecodmg[1:10, ]
print(ecodmgMax)
##            CLEANNAME    TOTECODMG
## 14             Flood 157764680787
## 24 Hurricane/Typhoon  44330000800
## 38           Tornado  18172843863
## 18              Hail  11681050140
## 13       Flash Flood   9224527227
## 37 Thunderstorm Wind   7098296330
## 25         Ice Storm   5925150850
## 44          Wildfire   3685468370
## 23         High Wind   3472442200
## 8            Drought   1886667000

Let’s plot the graphs of total property damages, total crop damages and total economic damages caused by these natural events.

par(mfrow = c(1, 3), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmgMax$PROPDMG/(10^9), las = 3, names.arg = propdmgMax$CLEANNAME, main = "Top 10 Events with\nGreatest Property Damages", ylab = "Cost of damages (in $ billions)", col = "blue")
barplot(cropdmgMax$CROPDMG/(10^9), las = 3, names.arg = cropdmgMax$CLEANNAME, main = "Top 10 Events with\nGreatest Crop Damages", ylab = "Cost of damages (in $ billions)", col = "blue")
barplot(ecodmgMax$TOTECODMG/(10^9), las = 3, names.arg = ecodmgMax$CLEANNAME, main = "Top 10 Events with\nGreatest Economic Damages", ylab = "Cost of damages (in $ billions)", col = "blue")

The events with the greatest economic consequences are: Flood, Drought, Tornado and Typhoon.
Across the United States, Flood, Tornado and Typhoon have caused the greatest damage to properties.
Drought and Flood had been the causes for the greatest damage to crops.

Conclusion

Main conclusions of the study:
1. Tornado is the most harmful event with more than 5600 deaths and 90000 injuries.
2. Floods are the type of events causing the most significant economic damage with more than 150 billion USD.