U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database begain tracking a standard set of 48 storm data events in 1996. After analyzing storm data events from 1996 to 2011 it was found that Hurricanes/Typhoons cause the most economic impact in relation to crop and property damage, while Tornados take the most population toll in regards to injuries and fatalities. This report contains the analysis of the data obtained from the NOAA repository. The report includes specifics of data retreival and preparing it for the analysis. The report also includes the analysis of resuls in two categories(1) Types of events that are most harmful to population health, (2) Types of events that have huge economic consequences.
The compressed data is conditionally downloaded from the source URL if not found locally and then loaded directly via read.csv.
#install.packages("dplyr")
library('dplyr')
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
filename <- 'StormData.csv.bz2'
if (!file.exists(filename)) {
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', filename)
}
storm_data <- read.csv(filename)
df <- storm_data
To evaluate the health impact, the total fatalities and the total injuries for each event type (EVTYPE) are calculated. The codes for this calculation are shown below.
df.fatalities <- df %>% select(EVTYPE, FATALITIES) %>% group_by(EVTYPE) %>% summarise(total.fatalities = sum(FATALITIES)) %>% arrange(-total.fatalities)
head(df.fatalities, 10)
## # A tibble: 10 x 2
## EVTYPE total.fatalities
## <fct> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
The results above show above different event types and corresponding fatalities. For TORNADO has the largest number of fatalities 5633 in total. Now we compute the number of injuries caused by different events such as TORNADO, EXCESSIVE HEAT, FLASH FLOOD etc. shown in the table below. Only 10 rows are shown here.
df.injuries <- df %>% select(EVTYPE, INJURIES) %>% group_by(EVTYPE) %>% summarise(total.injuries = sum(INJURIES)) %>% arrange(-total.injuries)
head(df.injuries, 10)
## # A tibble: 10 x 2
## EVTYPE total.injuries
## <fct> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Total injuries results shown above depict that TORNADO causes largest injuries 91346 in total followed by THUNDERSTORM wind 6957.
To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used. In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event. The code below computes the economic impact/damage measured in dollars.
df.damage <- df %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
Symbol <- sort(unique(as.character(df.damage$PROPDMGEXP)))
Multiplier <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
convert.Multiplier <- data.frame(Symbol, Multiplier)
df.damage$Prop.Multiplier <- convert.Multiplier$Multiplier[match(df.damage$PROPDMGEXP, convert.Multiplier$Symbol)]
df.damage$Crop.Multiplier <- convert.Multiplier$Multiplier[match(df.damage$CROPDMGEXP, convert.Multiplier$Symbol)]
df.damage <- df.damage %>% mutate(PROPDMG = PROPDMG*Prop.Multiplier) %>% mutate(CROPDMG = CROPDMG*Crop.Multiplier) %>% mutate(TOTAL.DMG = PROPDMG+CROPDMG)
df.damage.total <- df.damage %>% group_by(EVTYPE) %>% summarize(TOTAL.DMG.EVTYPE = sum(TOTAL.DMG))%>% arrange(-TOTAL.DMG.EVTYPE)
head(df.damage.total,10)
## # A tibble: 10 x 2
## EVTYPE TOTAL.DMG.EVTYPE
## <fct> <dbl>
## 1 FLOOD 150319678250
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352117607
## 4 STORM SURGE 43323541000
## 5 FLASH FLOOD 17562132111
## 6 DROUGHT 15018672000
## 7 HURRICANE 14610229010
## 8 RIVER FLOOD 10148404500
## 9 ICE STORM 8967041810
## 10 TROPICAL STORM 8382236550
The two columns EVTPE and TOTAL.DMG.EVTYPE in the table above depict total damages with respect to storm event types for ten largest damages. For example, Flood causes the highest damage $150 Billion (rounded). Exact value is shown in the table. Hurricane and typhoon are second in terms of damages amounting to 71 Billion.
In the sections below, we show varaition of fatalities due to different types using bar-plot. The code given below is used to create this plot.
library(ggplot2)
g <- ggplot(df.fatalities[1:10,], aes(x=reorder(EVTYPE, -total.fatalities), y=total.fatalities))+geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Fatalities") +labs(x="EVENT TYPE", y="Total Fatalities")
g
The top 10 events with the highest total fatalities and injuries are shown above graphically. In the plot below, we compute the highest total injuries with respect to different event types such as Tornado, Excessive heat, etc.
g <- ggplot(df.injuries[1:10,], aes(x=reorder(EVTYPE, -total.injuries), y=total.injuries))+geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Injuries") +labs(x="EVENT TYPE", y="Total Injuries")
g
The plot shows that higest injuries are caused by Tornados followed by Thunderstrom wind, Flood etc. Only top 10 event types are shown in this plot.
Now we consider depicting the variation of economic damage with respect to different storm events. The top 10 events with the highest total economic damages that consist of property and crop are shown below.
g <- ggplot(df.damage.total[1:10,], aes(x=reorder(EVTYPE, -TOTAL.DMG.EVTYPE), y=TOTAL.DMG.EVTYPE))+geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Economic Impact") +labs(x="EVENT TYPE", y="Total Economic Impact ($USD)")
g
The plot abobe makes it clear that highest economic damage is caused by Flooding, followed by Hurricane/Typhoon, and Tornado. These are the top three events that have largest economic impact as compared toother events such as Ice Storm and Tropical storm.
In this project, health and ecomic impacts of storms have been studied using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. NOAA begain tracking a standard set of 48 storm data events in 1996. After analyzing storm data events from 1996 to 2011 it was found that Hurricanes/Typhoons cause the most economic impact in relation to crop and property damage, while Tornados take the most population toll in regards to injuries and fatalities. This report contains the analysis of the data obtained from the NOAA repository. The report included specifics of Data retreival and preparing it for the analysis. The report also included the analysis of resuls mainly in two categories(1) Types of events that are most harmful to population health, and (2) Types of events that have huge economic consequences.
Tornado events top the list in the analysis with over two and half times the health impact of second place, which is Excessive Heat. Excessive Heat is worth noting however due to the fact that even though it is far behind tornados in total health impact, but has the most fatalities overall.