Synopsis

The occurence of severe meteorological phenomena can cause both public health and economic problems for communities and municipalities. Several severe events may lead to loss of human lives/Fatalities, injuries or significant property damages and preventing such outcomes to the extent possible is a key concern. So our aim for this study is to examine the severe weather events which have caused the most fatalities /injuries or property damages using Storm Database.This database contains data from January 1950 to December 2019, as entered by U.S. National Oceanic and Atmospheric Administration’s (NOAA’s) National Weather Service (NWS).This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The basic goal of this analysis is to explore the NOAA Storm Database and answer two questions concerning severe weather events: •Which severe weather events (EVTYPE) are most harmful with respect to population health? •Which severe weather events have the greatest economic consequences?

Data Processing

The data for this assignment can be downloaded from the course web site:

• Dataset: Weather Data (URL: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)

• Definitions are available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf as published in the following document: NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007, Operations and Services Performance, NWSPD 10-16, STORM DATA PREPARATION

Attaching Required packages

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

*1. Download Data

File.URL<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(File.URL,destfile = "./StormData.csv.bz2")
unzip("StormData.csv.bz2")
## Warning in unzip("StormData.csv.bz2"): error 1 in extracting from zip file
StormData<-read.csv("StormData.csv",header = TRUE,sep = ",")
str(StormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(StormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

*2. Data Preparation

names(StormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The variables that we need for this analysis include: • EVTYPE - Event Type • FATALITIES - Number of reported fatalities caused by the event. • INJURIES - Number of reported injuries caused by the event. • PROPDMG/PROPDMGEXP - The dollar (USD) amount of property damage caused by the event. • CROPDMG/CROPDMGEXP - The dollar (USD) amount of crop damage caused by the event

# make a new subset of variables according to analysis requirement
new_storm<- StormData[c("EVTYPE","FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(new_storm)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

*3.Data transformation Variables(“PROPDMGEXP”,“CROPDMGEXP”) requires a transformation into the correct values, Property Damage (PROPDMG) and Crop Damage (CROPDMG). This is done by converting the exponent data (PROPDMGEXP and CROPDMGEXP) into numerical values and mutliplying this by the values in PROPDMG and CROPDMG.

The distinct exponent symbols are identified so that they may be quantified individually.

unique(new_storm$PROPDMGEXP)# For Property damage
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# Assign values for the property exponent data per the prior function
new_storm$PROP_new[new_storm$PROPDMGEXP == "K"] <- 10^3
new_storm$PROP_new[new_storm$PROPDMGEXP == "M"] <- 10^6
new_storm$PROP_new[new_storm$PROPDMGEXP == ""] <- 1
new_storm$PROP_new[new_storm$PROPDMGEXP == "B"] <- 10^9
new_storm$PROP_new[new_storm$PROPDMGEXP == "m"] <- 10^6
new_storm$PROP_new[new_storm$PROPDMGEXP == "0"] <- 1
new_storm$PROP_new[new_storm$PROPDMGEXP == "5"] <- 10^5
new_storm$PROP_new[new_storm$PROPDMGEXP == "6"] <- 10^6
new_storm$PROP_new[new_storm$PROPDMGEXP == "4"] <- 10^4
new_storm$PROP_new[new_storm$PROPDMGEXP == "2"] <- 10^2
new_storm$PROP_new[new_storm$PROPDMGEXP == "3"] <- 10^3
new_storm$PROP_new[new_storm$PROPDMGEXP == "h"] <- 10^2
new_storm$PROP_new[new_storm$PROPDMGEXP == "7"] <- 10^7
new_storm$PROP_new[new_storm$PROPDMGEXP == "H"] <- 10^2
new_storm$PROP_new[new_storm$PROPDMGEXP == "1"] <- 10
new_storm$PROP_new[new_storm$PROPDMGEXP == "8"] <- 10^8
new_storm$PROP_new[new_storm$PROPDMGEXP == "+"] <- 0
new_storm$PROP_new[new_storm$PROPDMGEXP == "-"] <- 0
new_storm$PROP_new[new_storm$PROPDMGEXP == "?"] <- 0

# Calculate the Total property damage value
new_storm$PROP_value<-new_storm$PROPDMG*new_storm$PROP_new
head(new_storm$PROP_value)
## [1] 25000  2500 25000  2500  2500  2500
# Estimating Crop Damage value
unique(new_storm$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
new_storm$CROP_new[new_storm$CROPDMGEXP == "K"] <- 10^3
new_storm$CROP_new[new_storm$CROPDMGEXP == "M"] <- 10^6
new_storm$CROP_new[new_storm$CROPDMGEXP == ""] <- 1
new_storm$CROP_new[new_storm$CROPDMGEXP == "B"] <- 10^9
new_storm$CROP_new[new_storm$CROPDMGEXP == "m"] <- 10^6
new_storm$CROP_new[new_storm$CROPDMGEXP == "0"] <- 1
new_storm$CROP_new[new_storm$CROPDMGEXP == "2"] <- 10^2
new_storm$CROP_new[new_storm$CROPDMGEXP == "?"] <- 0
new_storm$CROP_new[new_storm$CROPDMGEXP == "k"] <- 10^3

# Calculate the Total crop damage value
new_storm$CROP_value<-new_storm$CROPDMG*new_storm$CROP_new
head(new_storm$CROP_value)
## [1] 0 0 0 0 0 0
## aggregating variables by event type
Fatalities <- aggregate(FATALITIES ~ EVTYPE, new_storm, FUN = sum)
Injuries <- aggregate(INJURIES ~ EVTYPE, new_storm, FUN = sum)
Propdmg <- aggregate(PROP_value ~ EVTYPE, new_storm, FUN = sum)
Cropdmg <- aggregate(CROP_value ~ EVTYPE, new_storm, FUN = sum)

## selecting most harmful event(Top ten) type for population health and Property/Crop damage
TopFatalities<- Fatalities[order(-Fatalities$FATALITIES), ][1:5,]
TopInjuries<- Injuries[order(-Injuries$INJURIES), ][1:5,]
TopPropdmg<- Propdmg[order(-Propdmg$PROP_value), ][1:5,]
TopCropdmg<- Cropdmg[order(-Cropdmg$CROP_value), ][1:5,]

Results

Addressing Question 1: Which types of events are most harmful to population health?

First we will aggregate the Data for Total Fatalities and Total Injuries by event types

# aggregating variables by event type
Fatalities <- aggregate(FATALITIES ~ EVTYPE, new_storm, FUN = sum) 
head(Fatalities)
##                  EVTYPE FATALITIES
## 1    HIGH SURF ADVISORY          0
## 2         COASTAL FLOOD          0
## 3           FLASH FLOOD          0
## 4             LIGHTNING          0
## 5             TSTM WIND          0
## 6       TSTM WIND (G45)          0
Injuries <- aggregate(INJURIES ~ EVTYPE, new_storm, FUN = sum)
head(Injuries)
##                  EVTYPE INJURIES
## 1    HIGH SURF ADVISORY        0
## 2         COASTAL FLOOD        0
## 3           FLASH FLOOD        0
## 4             LIGHTNING        0
## 5             TSTM WIND        0
## 6       TSTM WIND (G45)        0
## selecting most harmful(Top ten) event type for population health
TopFatalities<- Fatalities[order(-Fatalities$FATALITIES), ][1:5,]
TopFatalities
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
TopInjuries<- Injuries[order(-Injuries$INJURIES), ][1:5,]
TopInjuries
##             EVTYPE INJURIES
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230

#Plot1

# Chart1:Fatalities
p1<-ggplot(TopFatalities, aes(x=reorder(EVTYPE,FATALITIES),y=FATALITIES))
p1<- p1+ geom_bar(stat="identity",fill="orange")+ coord_cartesian(ylim =c(0,6000))+
     xlab("Event Type")+ ylab("Number of Fatalities")+
     ggtitle("Total Fatalities By Event Type")+
     theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
     geom_label(aes(label=TopFatalities$FATALITIES),vjust=0,color = "Black", fontface = "bold")
# Chart2:Injuries
p2<-ggplot(TopInjuries, aes(x=reorder(EVTYPE,INJURIES),y=INJURIES))
p2<- p2+ geom_bar(stat="identity",fill="Dark Green")+ coord_cartesian(ylim =c(0,100000))+
     xlab("Event Type")+ ylab("Number of Injuries")+
     ggtitle("Total Injuries By Event Type")+
     theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
    geom_label(aes(label=TopInjuries$INJURIES),vjust=0,color = "Black", fontface = "bold")

Layout both Charts together

plot1<-gridExtra::grid.arrange(p1,p2,ncol=2,nrow=1)

Based on the Plot1 shown above, Tornados are the most harmful events to population health.

Addressing Question 2: Which types of events have the greatest economic consequences?

First we will aggregate the Data for Property Damage and Crop damage by event types

Propdmg <- aggregate(PROP_value ~ EVTYPE, new_storm, FUN = sum)# Property Damage by event type
head(Propdmg)
##                  EVTYPE PROP_value
## 1    HIGH SURF ADVISORY     200000
## 2         COASTAL FLOOD          0
## 3           FLASH FLOOD      50000
## 4             LIGHTNING          0
## 5             TSTM WIND    8100000
## 6       TSTM WIND (G45)       8000
Cropdmg <- aggregate(CROP_value ~ EVTYPE, new_storm, FUN = sum)# Crop Damage by event type
head(Cropdmg)
##                  EVTYPE CROP_value
## 1    HIGH SURF ADVISORY          0
## 2         COASTAL FLOOD          0
## 3           FLASH FLOOD          0
## 4             LIGHTNING          0
## 5             TSTM WIND          0
## 6       TSTM WIND (G45)          0
# selecting most harmful(Top ten) event type for Property and Crop Damage 
TopPropdmg<- Propdmg[order(-Propdmg$PROP_value), ][1:5,]
TopCropdmg<- Cropdmg[order(-Cropdmg$CROP_value), ][1:5,]

Plot2

# Chart3:Property damage
TopPropdmg<- mutate(TopPropdmg, Pexp=PROP_value/1000000)# converting in millions
mrg<-range(TopPropdmg$Pexp)
p3<-ggplot(TopPropdmg, aes(x=reorder(EVTYPE,Pexp),y=Pexp))
p3<-p3+ geom_bar(stat="identity",fill="Dark blue")+ 
   coord_cartesian(ylim =mrg)+
   xlab("Event Type")+ ylab("Property Damage (Million$)")+
   ggtitle("Total Property Damage By Event Type")+
   theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
   geom_label(aes(label=round(TopPropdmg$Pexp)),vjust=0,color = "Black", fontface = "bold")
#Chart4:Crop damage
TopCropdmg<- mutate(TopCropdmg, Cexp=CROP_value/1000000)# converting in millions
Cmrg<-range(TopCropdmg$Cexp)
p4<-ggplot(TopCropdmg, aes(x=reorder(EVTYPE,Cexp),y=Cexp))
p4<-p4+ geom_bar(stat="identity",fill="Pink")+ 
   coord_cartesian(ylim =Cmrg)+
   xlab("Event Type")+ ylab("Crop Damage (Million$)")+
   ggtitle("Total Crop Damage By Event Type")+
   theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
  geom_label(aes(label=round(TopCropdmg$Cexp)),vjust=0,color = "Black", fontface = "bold")

#layout Property and crop damage together

plot2<-gridExtra::grid.arrange(p3,p4,ncol=2,nrow=1)

Total Damage including Property and Crop Damage

#Calculating Total damage
new_storm$Total_Values<-new_storm$PROP_value+new_storm$CROP_value
#aggregating Total Damage by event type
Totaldmg <- aggregate(Total_Values ~ EVTYPE, new_storm, FUN = sum)
head(Totaldmg)
##                  EVTYPE Total_Values
## 1    HIGH SURF ADVISORY       200000
## 2         COASTAL FLOOD            0
## 3           FLASH FLOOD        50000
## 4             LIGHTNING            0
## 5             TSTM WIND      8100000
## 6       TSTM WIND (G45)         8000
# Most Harmful events
TopTotaldmg<- Totaldmg[order(-Totaldmg$Total_Values), ][1:10,]
TopTotaldmg
##                EVTYPE Total_Values
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333886
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221986
## 153       FLASH FLOOD  18243991078
## 95            DROUGHT  15018672000
## 402         HURRICANE  14610229010
## 590       RIVER FLOOD  10148404500
## 427         ICE STORM   8967041360
# converting values in Million $
TopTotaldmg<-mutate(TopTotaldmg,Million_Value=Total_Values/1000000)
TopTotaldmg
##               EVTYPE Total_Values Million_Value
## 1              FLOOD 150319678257    150319.678
## 2  HURRICANE/TYPHOON  71913712800     71913.713
## 3            TORNADO  57362333886     57362.334
## 4        STORM SURGE  43323541000     43323.541
## 5               HAIL  18761221986     18761.222
## 6        FLASH FLOOD  18243991078     18243.991
## 7            DROUGHT  15018672000     15018.672
## 8          HURRICANE  14610229010     14610.229
## 9        RIVER FLOOD  10148404500     10148.405
## 10         ICE STORM   8967041360      8967.041
Tmrg<-range(TopTotaldmg$Million_Value)

plot3

# plotting TotalValue damage in Million USD
p5<-ggplot(TopTotaldmg, aes(x=reorder(EVTYPE,Million_Value),y=Million_Value))
p5<-p5+ geom_bar(stat="identity",fill="red")+ 
  coord_cartesian(ylim =Tmrg)+
  xlab("Event Type")+ ylab("Total Damage (Million$)")+
  ggtitle("Total Property Damage By Event Type")+
  theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
  geom_label(aes(label=round(TopTotaldmg$Million_Value)),vjust=0,color = "Black", fontface = "bold")
plot3<-gridExtra::grid.arrange(p5,ncol=1,nrow=1)

Based on the Plot3 shown above, Floods have the greatest economic consequences on total (property and crop) property damage.

Conclusion

Tornados are the most harmful events to population health, both in terms of fatalities and injuries. Floods have the greatest economic consequences based on total dollars of damage.If seen seperately, floods cause more property damages,however, drought is the main cause for Crop Damage.