1. Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database holding events data is considered for the analysis. With this analysis, the top five severe weather events resulting deaths, injury, property damage and crop damage is identified.
This project explores the NOAA storm database which tracks characteristics of major storms and weather events, including when and where it occurs, as well as estimates of any fatalities, injuries, and property damage.
The project is expected to answer the following major questions. These are:
1. Which types of events are most harmful with respect to population health across the United States?
2. Which types of events have the greatest economic consequences across the United States?

2. Data Processing

2.1. Load the required packages for th assessment

library(ggplot2)
library(dplyr)
library(R.utils)
require(gridExtra)
library(cowplot)

2.2. Loading, unzipping and reading the required data

filePath<- getwd()
fileName<- "stormData.csv.bz2"
fileUrl<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = fileName, method = "curl")
bunzip2.default(fileName)
Storm_Data <- read.csv("stormData.csv")

2.3. EXploring the Data and Get Some Overviews

names(Storm_Data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(Storm_Data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

As shwon above, there are 902297 observations and 37 variables.

The following histogram gives us a good overview of the events in which the database start in the year 1950 and end in November 2011. It also shows that in the earlier years of the database there are generally fewer events recorded.

formatedDate<- as.numeric(format(as.Date(Storm_Data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
hist(formatedDate, xlab = "Year",col = "lightblue", breaks = 60,
     main = "Frequency of Storm Events from 1950 to 2011")

2.4. Extracting Variables of Interest for the Analysis

As it is observed from the above section, there are a total of 37 variables. out of these variables the following are the lists of variables of interset.
1. Health variables: FATALITIES and INJURIES
2. Economic variables: PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP
3. Events variable: EVTYPE

SelectedData <- c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
Sel_StormData <- Storm_Data[SelectedData]
dim(Sel_StormData)
## [1] 902297      8

The follwing codes are developed to answer the questions explained in the synopsis section. The top five events are chosen for the demonestration of results in the form of plots

2.4.1. Which types of events are most harmful with respect to population health across the United States?*

Health Impact
1. Total fatalities and top five fatalities

T_Fatality <- aggregate(FATALITIES ~ EVTYPE, data = Sel_StormData,  FUN="sum")
Top5Fatalities <- T_Fatality[order(-T_Fatality$FATALITIES) , ][1:5, ]
Top5Fatalities
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816

2. Total injuries and top five injuries

T_Injury <- aggregate(INJURIES ~ EVTYPE, data = Sel_StormData,  FUN="sum")
Top5Injuries <- T_Injury[order(-T_Injury$INJURIES) , ][1:5, ]
Top5Injuries
##             EVTYPE INJURIES
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230

2.4.2. Which types of events have the greatest economic consequences across the United States?

Economic Impact
The followings expressed the selected variables and their meanings. -PROPDMG = Property damage -CROPDMG = crop damage -PROPDMGEXP = property damage exponent -CROPDMGEXP = crop damage exponents

The symbols embeded in the data is extracted using the following codes:

unique(Sel_StormData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(Sel_StormData$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

cleaning and preparing the property and crop damages exponent variables(PROPDMGEXP and `CROPDMGEXP)

+, -, ? <- 0
0-8 -> 10
H, h<- 100
K,k -> 1,000
M,m -> 1,000,000
B,b -> 1,000,000,000
exp_multiplier<- function(exp) {
    ifelse(exp == '+', 1,                         
        ifelse(exp %in% paste(seq(0,8)), 10^1,      
            ifelse(exp %in% c('H', 'h'), 10^2,       
                ifelse(exp %in% c('K', 'k'), 10^3,      
                    ifelse(exp %in% c('M', 'm'), 10^6,    
                        ifelse(exp %in% c('B', 'b'), 10^9,
                               0)
                        )
                    )
                )
            )
    )
    }

Sel_StormData$PropDmgMult <- exp_multiplier(Sel_StormData$PROPDMGEXP)
Sel_StormData$CropDmgMult <- exp_multiplier(Sel_StormData$CROPDMGEXP)

Add two variables namely “TotalCropDMG” and “TotalPropDMG” and get their values by multiplying them against the corresponding damage variables.
And finally add the above two expenses to get the total economic danage expenses.
The overall health imact is calculated by adding one variable T_HealthImpct with the sum of FATALITIES AND INJURIES variables

Sel_StormData$TotalCropDMG<- Sel_StormData$CROPDMG * Sel_StormData$CropDmgMult
Sel_StormData$TotalPropDMG<- Sel_StormData$PROPDMG * Sel_StormData$PropDmgMult
Sel_StormData$TotalDamage<- Sel_StormData$TotalCropDMG + Sel_StormData$TotalPropDMG
Sel_StormData$T_HealthImpct<- Sel_StormData$FATALITIES + Sel_StormData$INJURIES

Crop damage:

CropDamageByEvent <- aggregate(TotalCropDMG ~ EVTYPE, data = Sel_StormData,  FUN="sum")
Top5EventsCROPDMG <- CropDamageByEvent[order(-CropDamageByEvent$TotalCropDMG) , ][1:5, ]
Top5EventsCROPDMG
##          EVTYPE TotalCropDMG
## 95      DROUGHT  13972566000
## 170       FLOOD   5661968450
## 590 RIVER FLOOD   5029459000
## 427   ICE STORM   5022113500
## 244        HAIL   3025954650

Property Damage:

PropDamageByEvent <- aggregate(TotalPropDMG ~ EVTYPE, data = Sel_StormData,  FUN="sum")
Top5EventsPropDMG <- PropDamageByEvent[order(-PropDamageByEvent$TotalPropDMG) , ][1:5, ]
Top5EventsPropDMG
##                EVTYPE TotalPropDMG
## 170             FLOOD 144657709800
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56937162897
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140815011

Total Damage:

TotalDamageByEvent <- aggregate(TotalDamage ~ EVTYPE, data = Sel_StormData,  FUN="sum")
Top5EventsTotalDMG <- TotalDamageByEvent[order(-TotalDamageByEvent$TotalDamage) , ][1:5, ]
Top5EventsTotalDMG
##                EVTYPE  TotalDamage
## 170             FLOOD 150319678250
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57352117607
## 670       STORM SURGE  43323541000
## 244              HAIL  18758224527

Health Impact:

HealthDamageByEvent <- aggregate(T_HealthImpct ~ EVTYPE, data = Sel_StormData,  FUN="sum")
Top5EventsHealthImpact <- HealthDamageByEvent[order(-HealthDamageByEvent$T_HealthImpct) , ][1:5, ]
Top5EventsHealthImpact
##             EVTYPE T_HealthImpct
## 834        TORNADO         96979
## 130 EXCESSIVE HEAT          8428
## 856      TSTM WIND          7461
## 170          FLOOD          7259
## 464      LIGHTNING          6046

3.Results

3.1. Health Impact

The graphical representation of the top five events with the highest total fatalities and injuries are shown below:

Top5FatalityPlot<-ggplot(Top5Fatalities, aes(x = reorder(EVTYPE,-FATALITIES), y = FATALITIES)) + geom_bar(stat = "identity", fill = "red") +
    theme(axis.text.x = element_text(angle = 45,hjust = 1), text = element_text(size = 10)) +
    xlab("Events Type") + ylab("Total Number of Fatalities") + 
    ggtitle("Top five Severe Weather Events\n causing Fatalities in USA")
Top5InjuryPlot<- ggplot(Top5Injuries, aes(x = reorder(EVTYPE,-INJURIES), y = INJURIES)) + geom_bar(stat = "identity", fill = "green") +
    theme(axis.text.x = element_text(angle = 45,hjust = 1), text = element_text(size = 10)) +
    xlab("Events Type") + ylab("Total Number of Injuries")+
    ggtitle("Top five Severe Weather Events\n causing Injuries in USA")
plot_grid(Top5FatalityPlot,Top5InjuryPlot, ncol = 2)

3.2. Economic Impact

The follwing Plots show the graphical representation of the events causing property damage and crop damage

p1<- ggplot(Top5EventsPropDMG, aes(x=reorder(EVTYPE, -TotalPropDMG), y=TotalPropDMG/10^9))+geom_bar(stat="identity", fill="blue")+theme(text = element_text(size = 8),axis.text.x = element_text(angle=35, hjust = 1))+labs(x="Events Type", y="Property Damage(Billion Dollars)")+ggtitle("Top five Events with Highest property damage") 
p2<- ggplot(Top5EventsCROPDMG, aes(x=reorder(EVTYPE, -TotalCropDMG), y=TotalCropDMG/10^9))+geom_bar(stat="identity", fill="green")+theme(text = element_text(size = 8), axis.text.x = element_text(angle=35, hjust = 1)) +labs(x="Events Type", y="Crop Damage(Billion Dollars)")+ggtitle("Top five Events with Highest Crop damage")
p3<- ggplot(Top5EventsTotalDMG, aes(x=reorder(EVTYPE, -TotalDamage), y=TotalDamage/10^9))+geom_bar(stat="identity", fill="red")+theme(text =element_text(size=8), axis.text.x = element_text(angle=35, hjust = 1))+labs(x="Events Type", y="Economic Damage(Billion Dollars)")+ggtitle("Top five Events with Highest Economic Impact") 
p4<- ggplot(Top5EventsHealthImpact, aes(x=reorder(EVTYPE, -T_HealthImpct), y=T_HealthImpct))+geom_bar(stat="identity", fill="purple") + theme(text = element_text(size = 8), axis.text.x = element_text(angle=35, hjust = 1))+labs(x="Events Type", y="Health Impact")+ggtitle("Top five Events with Highest Health Impact\n(Fatalities & Injuries)")  
plot_grid(p1, p2, p3, p4, nrow =2, ncol = 2, align = 'vh')

4. Conclusions