Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
More information about the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA):
National Weather Service: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
National Climatic Data Center Storm Events: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf
Across the United States: + Which types of events (as indicated in the EVTYPEEVTYPE variable) are most harmful with respect to population health? + Which types of events have the greatest economic consequences?
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
if(!file.exists("stormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "stormData.csv.bz2", method = "curl")
}
#Read data from working directory
activity<-read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)
#Create tidy date of only needed information
tidy_data_health <- select(activity,EVTYPE,FATALITIES, INJURIES)
tidy_data_property <- select(activity, BGN_DATE,EVTYPE,PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
tidy_fatalities <- aggregate(FATALITIES ~ EVTYPE, data = tidy_data_health, sum)
#See first 20 lines
tidy_fatalities <- arrange(tidy_fatalities, desc(FATALITIES))
head(tidy_fatalities,20)
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
#Select only 10 lines
tidy_fatalities_selected <- tidy_fatalities[1:10, ]
#Define as factor in order to be ordered in the plot
tidy_fatalities_selected$EVTYPE <- factor(tidy_fatalities_selected$EVTYPE, levels = tidy_fatalities_selected$EVTYPE)
ggplot(tidy_fatalities_selected, aes(x= EVTYPE, y = FATALITIES)) + geom_bar(stat = "identity", fill = "red") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs(x = "Type of event", y = "Fatalities", title = "Number of fatalities by 10 most important events from 1950 to 2011")
As seen from the graphic, tornados causes more deaths in the USA than any other weather event
#Find injuries per event type.
tidy_injuries <- aggregate(INJURIES ~ EVTYPE, data = tidy_data_health, sum)
#See first 20 lines
tidy_injuries <- arrange(tidy_injuries, desc(INJURIES))
tidy_injuries_selected <- tidy_injuries[1:10, ]
#Define as factor in order to be ordered in the plot
tidy_injuries_selected$EVTYPE <- factor(tidy_injuries_selected$EVTYPE, levels = tidy_injuries_selected$EVTYPE)
ggplot(tidy_injuries_selected, aes(x= EVTYPE, y = INJURIES)) + geom_bar(stat = "identity", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs(x = "Type of event", y = "Number of injuries", title = "Number of injuried by 10 most important events from 1950 to 2011")
As seen from the graphic, tornadoes causes more injuries in the USA than any other weather event.
It is obvserved that tornadoes are the most dangerous weather events in the USA, being the top 1 in injuried and deaths.
#Analysis for property damage
# Convert H, K, M, B units to calculate Property Damage
tidy_prop_damage <- mutate(tidy_data_property, Dam_to_property = ifelse(toupper(PROPDMGEXP) =='K', PROPDMG*1000, ifelse(toupper(PROPDMGEXP) =='M', PROPDMG*1000000, ifelse(toupper(PROPDMGEXP) == 'B', PROPDMG*1000000000, ifelse(toupper(PROPDMGEXP) == 'H', PROPDMG*100, PROPDMG)))))
prop_damage <- subset(tidy_prop_damage, select = c("EVTYPE", "Dam_to_property"))
prop_damage_type <- aggregate( Dam_to_property ~ EVTYPE, data = prop_damage, sum)
#Analysis for crop damage
# Convert H, K, M, B units to calculate crop Damage
tidy_crop_damage <- mutate(tidy_data_property, Dam_to_crop = ifelse(toupper(CROPDMGEXP) =='K', CROPDMG*1000, ifelse(toupper(CROPDMGEXP) =='M', CROPDMG*1000000,
ifelse(toupper(CROPDMGEXP) == 'B', CROPDMG*1000000000,
ifelse(toupper(CROPDMGEXP) == 'H', CROPDMG*100, CROPDMG)))))
crop_damage <- subset(tidy_crop_damage, select = c("EVTYPE", "Dam_to_crop"))
crop_damage_type <- aggregate( Dam_to_crop ~ EVTYPE, data = crop_damage, sum)
#Merge data from property and crop damage
total_damage <- merge(crop_damage_type, prop_damage_type, by="EVTYPE")
total_damage <- mutate(total_damage, Tot_damage = Dam_to_crop + Dam_to_property)
#Order by 10 most important
total_damage <- arrange(total_damage, desc(Tot_damage))
total_damage_selected <- total_damage[1:10, ]
#Define as factor in order to be ordered in the plot
total_damage_selected$EVTYPE <- factor(total_damage_selected$EVTYPE, levels = total_damage_selected$EVTYPE)
ggplot(total_damage_selected, aes(x= EVTYPE, y = Tot_damage)) + geom_bar(stat = "identity", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs(x = "Type of event", y = "Damage", title = "Damage to property by 10 most important events from 1950 to 2011")
As seen from the plot, flood causes more property and crop damage than any other weather event.