The occurence of severe meteorological phenomena can cause both public health and economic problems for communities and municipalities. Several severe events may lead to loss of human lives/Fatalities, injuries or significant property damages and preventing such outcomes to the extent possible is a key concern. So our aim for this study is to examine the severe weather events which have caused the most fatalities /injuries or property damages using Storm Database.This database contains data from January 1950 to December 2019, as entered by U.S. National Oceanic and Atmospheric Administration’s (NOAA’s) National Weather Service (NWS).This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The basic goal of this analysis is to explore the NOAA Storm Database and answer two questions concerning severe weather events: •Which severe weather events (EVTYPE) are most harmful with respect to population health? •Which severe weather events have the greatest economic consequences?
The data for this assignment can be downloaded from the course web site:
• Dataset: Weather Data (URL: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
• Definitions are available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf as published in the following document: NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007, Operations and Services Performance, NWSPD 10-16, STORM DATA PREPARATION
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
*1. Download Data
File.URL<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(File.URL,destfile = "./StormData.csv.bz2")
unzip("StormData.csv.bz2")
## Warning in unzip("StormData.csv.bz2"): error 1 in extracting from zip file
StormData<-read.csv("StormData.csv",header = TRUE,sep = ",")
str(StormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
head(StormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
*2. Data Preparation
names(StormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The variables that we need for this analysis include: • EVTYPE - Event Type • FATALITIES - Number of reported fatalities caused by the event. • INJURIES - Number of reported injuries caused by the event. • PROPDMG/PROPDMGEXP - The dollar (USD) amount of property damage caused by the event. • CROPDMG/CROPDMGEXP - The dollar (USD) amount of crop damage caused by the event
# make a new subset of variables according to analysis requirement
new_storm<- StormData[c("EVTYPE","FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(new_storm)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
*3.Data transformation Variables(“PROPDMGEXP”,“CROPDMGEXP”) requires a transformation into the correct values, Property Damage (PROPDMG) and Crop Damage (CROPDMG). This is done by converting the exponent data (PROPDMGEXP and CROPDMGEXP) into numerical values and mutliplying this by the values in PROPDMG and CROPDMG.
The distinct exponent symbols are identified so that they may be quantified individually.
unique(new_storm$PROPDMGEXP)# For Property damage
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# Assign values for the property exponent data per the prior function
new_storm$PROP_new[new_storm$PROPDMGEXP == "K"] <- 10^3
new_storm$PROP_new[new_storm$PROPDMGEXP == "M"] <- 10^6
new_storm$PROP_new[new_storm$PROPDMGEXP == ""] <- 1
new_storm$PROP_new[new_storm$PROPDMGEXP == "B"] <- 10^9
new_storm$PROP_new[new_storm$PROPDMGEXP == "m"] <- 10^6
new_storm$PROP_new[new_storm$PROPDMGEXP == "0"] <- 1
new_storm$PROP_new[new_storm$PROPDMGEXP == "5"] <- 10^5
new_storm$PROP_new[new_storm$PROPDMGEXP == "6"] <- 10^6
new_storm$PROP_new[new_storm$PROPDMGEXP == "4"] <- 10^4
new_storm$PROP_new[new_storm$PROPDMGEXP == "2"] <- 10^2
new_storm$PROP_new[new_storm$PROPDMGEXP == "3"] <- 10^3
new_storm$PROP_new[new_storm$PROPDMGEXP == "h"] <- 10^2
new_storm$PROP_new[new_storm$PROPDMGEXP == "7"] <- 10^7
new_storm$PROP_new[new_storm$PROPDMGEXP == "H"] <- 10^2
new_storm$PROP_new[new_storm$PROPDMGEXP == "1"] <- 10
new_storm$PROP_new[new_storm$PROPDMGEXP == "8"] <- 10^8
new_storm$PROP_new[new_storm$PROPDMGEXP == "+"] <- 0
new_storm$PROP_new[new_storm$PROPDMGEXP == "-"] <- 0
new_storm$PROP_new[new_storm$PROPDMGEXP == "?"] <- 0
# Calculate the Total property damage value
new_storm$PROP_value<-new_storm$PROPDMG*new_storm$PROP_new
head(new_storm$PROP_value)
## [1] 25000 2500 25000 2500 2500 2500
# Estimating Crop Damage value
unique(new_storm$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
new_storm$CROP_new[new_storm$CROPDMGEXP == "K"] <- 10^3
new_storm$CROP_new[new_storm$CROPDMGEXP == "M"] <- 10^6
new_storm$CROP_new[new_storm$CROPDMGEXP == ""] <- 1
new_storm$CROP_new[new_storm$CROPDMGEXP == "B"] <- 10^9
new_storm$CROP_new[new_storm$CROPDMGEXP == "m"] <- 10^6
new_storm$CROP_new[new_storm$CROPDMGEXP == "0"] <- 1
new_storm$CROP_new[new_storm$CROPDMGEXP == "2"] <- 10^2
new_storm$CROP_new[new_storm$CROPDMGEXP == "?"] <- 0
new_storm$CROP_new[new_storm$CROPDMGEXP == "k"] <- 10^3
# Calculate the Total crop damage value
new_storm$CROP_value<-new_storm$CROPDMG*new_storm$CROP_new
head(new_storm$CROP_value)
## [1] 0 0 0 0 0 0
## aggregating variables by event type
Fatalities <- aggregate(FATALITIES ~ EVTYPE, new_storm, FUN = sum)
Injuries <- aggregate(INJURIES ~ EVTYPE, new_storm, FUN = sum)
Propdmg <- aggregate(PROP_value ~ EVTYPE, new_storm, FUN = sum)
Cropdmg <- aggregate(CROP_value ~ EVTYPE, new_storm, FUN = sum)
## selecting most harmful event(Top ten) type for population health and Property/Crop damage
TopFatalities<- Fatalities[order(-Fatalities$FATALITIES), ][1:5,]
TopInjuries<- Injuries[order(-Injuries$INJURIES), ][1:5,]
TopPropdmg<- Propdmg[order(-Propdmg$PROP_value), ][1:5,]
TopCropdmg<- Cropdmg[order(-Cropdmg$CROP_value), ][1:5,]
First we will aggregate the Data for Total Fatalities and Total Injuries by event types
# aggregating variables by event type
Fatalities <- aggregate(FATALITIES ~ EVTYPE, new_storm, FUN = sum)
head(Fatalities)
## EVTYPE FATALITIES
## 1 HIGH SURF ADVISORY 0
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 0
## 4 LIGHTNING 0
## 5 TSTM WIND 0
## 6 TSTM WIND (G45) 0
Injuries <- aggregate(INJURIES ~ EVTYPE, new_storm, FUN = sum)
head(Injuries)
## EVTYPE INJURIES
## 1 HIGH SURF ADVISORY 0
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 0
## 4 LIGHTNING 0
## 5 TSTM WIND 0
## 6 TSTM WIND (G45) 0
## selecting most harmful(Top ten) event type for population health
TopFatalities<- Fatalities[order(-Fatalities$FATALITIES), ][1:5,]
TopFatalities
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
TopInjuries<- Injuries[order(-Injuries$INJURIES), ][1:5,]
TopInjuries
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
#Plot1
# Chart1:Fatalities
p1<-ggplot(TopFatalities, aes(x=reorder(EVTYPE,FATALITIES),y=FATALITIES))
p1<- p1+ geom_bar(stat="identity",fill="orange")+ coord_cartesian(ylim =c(0,6000))+
xlab("Event Type")+ ylab("Number of Fatalities")+
ggtitle("Total Fatalities By Event Type")+
theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
geom_label(aes(label=TopFatalities$FATALITIES),vjust=0,color = "Black", fontface = "bold")
# Chart2:Injuries
p2<-ggplot(TopInjuries, aes(x=reorder(EVTYPE,INJURIES),y=INJURIES))
p2<- p2+ geom_bar(stat="identity",fill="Dark Green")+ coord_cartesian(ylim =c(0,100000))+
xlab("Event Type")+ ylab("Number of Injuries")+
ggtitle("Total Injuries By Event Type")+
theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
geom_label(aes(label=TopInjuries$INJURIES),vjust=0,color = "Black", fontface = "bold")
plot1<-gridExtra::grid.arrange(p1,p2,ncol=2,nrow=1)
Based on the Plot1 shown above, Tornados are the most harmful events to population health.
First we will aggregate the Data for Property Damage and Crop damage by event types
Propdmg <- aggregate(PROP_value ~ EVTYPE, new_storm, FUN = sum)# Property Damage by event type
head(Propdmg)
## EVTYPE PROP_value
## 1 HIGH SURF ADVISORY 200000
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 50000
## 4 LIGHTNING 0
## 5 TSTM WIND 8100000
## 6 TSTM WIND (G45) 8000
Cropdmg <- aggregate(CROP_value ~ EVTYPE, new_storm, FUN = sum)# Crop Damage by event type
head(Cropdmg)
## EVTYPE CROP_value
## 1 HIGH SURF ADVISORY 0
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 0
## 4 LIGHTNING 0
## 5 TSTM WIND 0
## 6 TSTM WIND (G45) 0
# selecting most harmful(Top ten) event type for Property and Crop Damage
TopPropdmg<- Propdmg[order(-Propdmg$PROP_value), ][1:5,]
TopCropdmg<- Cropdmg[order(-Cropdmg$CROP_value), ][1:5,]
# Chart3:Property damage
TopPropdmg<- mutate(TopPropdmg, Pexp=PROP_value/1000000)# converting in millions
mrg<-range(TopPropdmg$Pexp)
p3<-ggplot(TopPropdmg, aes(x=reorder(EVTYPE,Pexp),y=Pexp))
p3<-p3+ geom_bar(stat="identity",fill="Dark blue")+
coord_cartesian(ylim =mrg)+
xlab("Event Type")+ ylab("Property Damage (Million$)")+
ggtitle("Total Property Damage By Event Type")+
theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
geom_label(aes(label=round(TopPropdmg$Pexp)),vjust=0,color = "Black", fontface = "bold")
#Chart4:Crop damage
TopCropdmg<- mutate(TopCropdmg, Cexp=CROP_value/1000000)# converting in millions
Cmrg<-range(TopCropdmg$Cexp)
p4<-ggplot(TopCropdmg, aes(x=reorder(EVTYPE,Cexp),y=Cexp))
p4<-p4+ geom_bar(stat="identity",fill="Pink")+
coord_cartesian(ylim =Cmrg)+
xlab("Event Type")+ ylab("Crop Damage (Million$)")+
ggtitle("Total Crop Damage By Event Type")+
theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
geom_label(aes(label=round(TopCropdmg$Cexp)),vjust=0,color = "Black", fontface = "bold")
#layout Property and crop damage together
plot2<-gridExtra::grid.arrange(p3,p4,ncol=2,nrow=1)
#Calculating Total damage
new_storm$Total_Values<-new_storm$PROP_value+new_storm$CROP_value
#aggregating Total Damage by event type
Totaldmg <- aggregate(Total_Values ~ EVTYPE, new_storm, FUN = sum)
head(Totaldmg)
## EVTYPE Total_Values
## 1 HIGH SURF ADVISORY 200000
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 50000
## 4 LIGHTNING 0
## 5 TSTM WIND 8100000
## 6 TSTM WIND (G45) 8000
# Most Harmful events
TopTotaldmg<- Totaldmg[order(-Totaldmg$Total_Values), ][1:10,]
TopTotaldmg
## EVTYPE Total_Values
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333886
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
## 153 FLASH FLOOD 18243991078
## 95 DROUGHT 15018672000
## 402 HURRICANE 14610229010
## 590 RIVER FLOOD 10148404500
## 427 ICE STORM 8967041360
# converting values in Million $
TopTotaldmg<-mutate(TopTotaldmg,Million_Value=Total_Values/1000000)
TopTotaldmg
## EVTYPE Total_Values Million_Value
## 1 FLOOD 150319678257 150319.678
## 2 HURRICANE/TYPHOON 71913712800 71913.713
## 3 TORNADO 57362333886 57362.334
## 4 STORM SURGE 43323541000 43323.541
## 5 HAIL 18761221986 18761.222
## 6 FLASH FLOOD 18243991078 18243.991
## 7 DROUGHT 15018672000 15018.672
## 8 HURRICANE 14610229010 14610.229
## 9 RIVER FLOOD 10148404500 10148.405
## 10 ICE STORM 8967041360 8967.041
Tmrg<-range(TopTotaldmg$Million_Value)
# plotting TotalValue damage in Million USD
p5<-ggplot(TopTotaldmg, aes(x=reorder(EVTYPE,Million_Value),y=Million_Value))
p5<-p5+ geom_bar(stat="identity",fill="red")+
coord_cartesian(ylim =Tmrg)+
xlab("Event Type")+ ylab("Total Damage (Million$)")+
ggtitle("Total Property Damage By Event Type")+
theme(axis.text.x = element_text(angle=45,hjust = 1),plot.title = element_text(hjust = 0.5)) +
geom_label(aes(label=round(TopTotaldmg$Million_Value)),vjust=0,color = "Black", fontface = "bold")
plot3<-gridExtra::grid.arrange(p5,ncol=1,nrow=1)
Based on the Plot3 shown above, Floods have the greatest economic consequences on total (property and crop) property damage.
Tornados are the most harmful events to population health, both in terms of fatalities and injuries. Floods have the greatest economic consequences based on total dollars of damage.If seen seperately, floods cause more property damages,however, drought is the main cause for Crop Damage.