Storm and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severs events can results in fatalities, injuries and property damage. Preventing such outcomes to the extent possible is a key concern. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, include when and where they occur, aswell as estimates of any fatalities, injuries and property damage. This report contains the exploratory analysis results on the health and economic impact by the severe weather events based on the data from NOAA database, in answering the following two questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Loading the data
# download file from URL
if (!file.exists("C:/Users/DELL 1/Documents/Module5PeerProject2/repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"C:/Users/DELL 1/Documents/Module5PeerProject2/repdata-data-StormData.csv.bz2")
}
# unzip file
if (!file.exists("C:/Users/DELL 1/Documents/Module5PeerProject2/repdata-data-StormData.csv.bz2")) {
library(R.utils)
bunzip2("C:/Users/DELL 1/Documents/Module5PeerProject2/repdata-data-StormData.csv.bz2", remove = FALSE)
}
# load data into R
storm <- read.csv("C:/Users/DELL 1/Documents/Module5PeerProject2/repdata-data-StormData.csv.bz2", header = TRUE)
Extracting the first few lines of the data
head(storm)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Selecting data for processing
There are 7 variables that are related to answer the above two questions :
Selecting specified columns
library(dplyr)
storm <- select(storm, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(storm)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
The impact to economic is measured by damages to the property and crop. PROPDMGEXP and CROPDMGEXP are variables that are related to this.
#Extracting unique elements of PROPDMGEXP
unique(storm$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
#Extracting unique elements of CROPDMGEXP
unique(storm$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
The resulting colums need to be transformed for further processing
storm$PROPDMGEXP <- as.character(storm$PROPDMGEXP)
storm$PROPDMGEXP = gsub("\\-|\\+|\\?","0",storm$PROPDMGEXP)
storm$PROPDMGEXP = gsub("B|b", "9", storm$PROPDMGEXP)
storm$PROPDMGEXP = gsub("M|m", "6", storm$PROPDMGEXP)
storm$PROPDMGEXP = gsub("K|k", "3", storm$PROPDMGEXP)
storm$PROPDMGEXP = gsub("H|h", "2", storm$PROPDMGEXP)
storm$PROPDMGEXP <- as.numeric(storm$PROPDMGEXP)
storm$PROPDMGEXP[is.na(storm$PROPDMGEXP)] = 0
storm$ActPropDam<- storm$PROPDMG * 10^storm$PROPDMGEXP
propDam <- aggregate(ActPropDam~EVTYPE, data=storm, sum)
propDam_reorder<- propDam[order(-propDam$ActPropDam),]
PropDamages<-propDam_reorder[1:10,]
storm$CROPDMGEXP <- as.character(storm$CROPDMGEXP)
storm$CROPDMGEXP = gsub("\\-|\\+|\\?","0",storm$CROPDMGEXP)
storm$CROPDMGEXP = gsub("B|b", "9", storm$CROPDMGEXP)
storm$CROPDMGEXP = gsub("M|m", "6", storm$CROPDMGEXP)
storm$CROPDMGEXP = gsub("K|k", "3", storm$CROPDMGEXP)
storm$CROPDMGEXP = gsub("H|h", "2", storm$CROPDMGEXP)
storm$CROPDMGEXP <- as.numeric(storm$CROPDMGEXP)
storm$CROPDMGEXP[is.na(storm$CROPDMGEXP)] = 0
storm$ActCropDam<- storm$CROPDMG * 10^storm$CROPDMGEXP
cropDam <- aggregate(ActCropDam~EVTYPE, data=storm, sum)
cropDam_reorder<- cropDam[order(-cropDam$ActCropDam),]
CropDamages<-cropDam_reorder[1:10,]
TotalDam <- aggregate(ActPropDam + ActCropDam~EVTYPE, data=storm, sum)
names(TotalDam)[2] <- "total"
TotalDamages <- arrange(TotalDam, desc(total)) %>% top_n(10)
## Selecting by total
#Weather events causing the most fatalities
byFATALITIES <- group_by(storm, EVTYPE)
MostFatal<- summarise(byFATALITIES,
total = sum(FATALITIES)
) %>% arrange(desc(total)) %>% top_n(10)
## Selecting by total
MostFatal
## Source: local data frame [10 x 2]
##
## EVTYPE total
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
#Weather events causing the most injuries
byINJURIES <- group_by(storm, EVTYPE)
MostInjuries <- summarise(byINJURIES ,
total = sum(INJURIES)
) %>% arrange(desc(total)) %>% top_n(10)
## Selecting by total
MostInjuries
## Source: local data frame [10 x 2]
##
## EVTYPE total
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Visualizing the results in graph format
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(MostFatal$total,
las = 3,
names.arg = MostFatal$EVTYPE,
col = "orange",
ylab = "Total No. of Deaths",
main = "Top 10 Weather Events Causing Fatalities")
barplot(MostInjuries$total,
las = 3,
names.arg = MostInjuries$EVTYPE,
col = "orange",
ylab = "Total No. of Injuries",
main = "Top 10 Weather Events Causing Injuries")
The result shows that Tornado is the most harmful weather event which caused the most fatalities and injuries across the United States
Listing table for property damages :
PropDamages
## EVTYPE ActPropDam
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380677
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046295
Listing table for crop damages :
CropDamages
## EVTYPE ActCropDam
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
Listing table for total damages :
TotalDamages
## EVTYPE total
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333947
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221986
## 6 FLASH FLOOD 18243991079
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
Visualizing the results in graph format
par(mfrow=c(1,3))
barplot(PropDamages$ActPropDam,
names.arg = PropDamages$EVTYPE,
las = 3,
col = "blue",
ylab = "Total Property Damage ($)",
main = "Top 10 Events Causing \n Most Property Damages")
barplot(CropDamages$ActCropDam,
names.arg = CropDamages$EVTYPE,
las = 3,
col = "blue",
ylab = "Total Crop Damage ($)",
main = "Top 10 Events Causing \n Most Crop Damages")
barplot(TotalDamages$total,
names = TotalDamages$EVTYPE,
las = 3,
col = "red",
ylab = "Total Damages ($)",
main = "Top 10 Events Causing \n Most Total Damages")
The results show that flood, tornado and typhoon have caused the greatest damage to properties. On the other hand, drought and flood appear as the caused for the greatest damage to crops. As a whole, flood is identified as the weather event that contributed the most to the economic damages.