This study aims to answer following questions about severe wheather events in US:
In order to answer these two questions NOAA Storm database is used and processed using R code described below.
First we are going to include all necessary packages and download the NOAA dataset from the website.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "storm_data.csv.bz2")
data <- read.csv("storm_data.csv.bz2")
Subsequently, we are going to group the dataset according to the type of wheather events and calculate the total number of injuries and fatalties for each wheather event. Likewise, similar calculation is done to calculate total economical damage that is result of damage to the crops and properties.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
summary_damage <- data %>% group_by(EVTYPE) %>% summarize(Total_Injuries = sum(INJURIES),Total_Fatalities = sum(FATALITIES))
injuries <- summary_damage[order(summary_damage$Total_Injuries,decreasing = TRUE),]
fatalities <- summary_damage[order(summary_damage$Total_Fatalities,decreasing = TRUE),]
injuries_max <- injuries[1:5,]
fatalities_max <- fatalities[1:5,]
summary_damage$total <- summary_damage$Total_Injuries + summary_damage$Total_Fatalities
summary_eco_damage <- data %>% group_by(EVTYPE) %>% summarize(Crop_Dmg = sum(CROPDMG),Prop_Dmg = sum(PROPDMG))
summary_eco_damage$Total_Dmg <- summary_eco_damage$Crop_Dmg + summary_eco_damage$Prop_Dmg
eco_dmg <- summary_eco_damage[order(summary_eco_damage$Total_Dmg,decreasing = TRUE),]
In this section we are going to look at the results. First, lets have a look at wheater events that results with top 5 injuries.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
head(injuries_max)
## # A tibble: 5 x 3
## EVTYPE Total_Injuries Total_Fatalities
## <fct> <dbl> <dbl>
## 1 TORNADO 91346 5633
## 2 TSTM WIND 6957 504
## 3 FLOOD 6789 470
## 4 EXCESSIVE HEAT 6525 1903
## 5 LIGHTNING 5230 816
ggplot(data=injuries_max,aes(x=factor(EVTYPE),y=Total_Injuries)) + geom_bar(stat="identity") + coord_flip() + ylab("Total Number of Injuries") + xlab("Event Type") + theme_classic()
We can see that on top of the list is Tornado, followed by thunderstrom wind, and flood. Now lets see which events result in top 5 fatalities.
head(fatalities_max)
## # A tibble: 5 x 3
## EVTYPE Total_Injuries Total_Fatalities
## <fct> <dbl> <dbl>
## 1 TORNADO 91346 5633
## 2 EXCESSIVE HEAT 6525 1903
## 3 FLASH FLOOD 1777 978
## 4 HEAT 2100 937
## 5 LIGHTNING 5230 816
ggplot(data=fatalities_max,aes(x=factor(EVTYPE),y=Total_Fatalities)) + geom_bar(stat="identity") + coord_flip() + ylab("Total Number of Injuries") + theme_classic() + xlab("Event Type")
Just as before on top of the list is tornado, followed by excessive heat.
Let’s have a look now events with most economic impact.
head(eco_dmg)
## # A tibble: 6 x 4
## EVTYPE Crop_Dmg Prop_Dmg Total_Dmg
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 100019. 3212258. 3312277.
## 2 FLASH FLOOD 179200. 1420125. 1599325.
## 3 TSTM WIND 109203. 1335966. 1445168.
## 4 HAIL 579596. 688693. 1268290.
## 5 FLOOD 168038. 899938. 1067976.
## 6 THUNDERSTORM WIND 66791. 876844. 943636.
ggplot(data=eco_dmg[1:5,],aes(x=factor(EVTYPE),y=Total_Dmg)) + geom_bar(stat="identity") + coord_flip() + ylab("Total Economical Damage") + theme_classic() + xlab("Event Type")
From the graph above we can see that tornados create most economic damage, followed by flash floods and thunderstorm wind.