Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The basic goal of this project is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Questions To Be Answered
Your data analysis must address the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data [47Mb]
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(data.table)
library(ggplot2)
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
url <-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv", method = "curl")
stormdata <- read.csv("StormData.csv", header = TRUE, sep=",")
cols <- c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')
stormsubset <- stormdata[, cols]
summary(stormsubset)
## EVTYPE FATALITIES INJURIES PROPDMG
## Length:902297 Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Mode :character Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:902297 Min. : 0.000 Length:902297
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
Check the unique elements of some of the variables The following codes won’t run because I set eval=FALSE in the code chunk.
unique(stormsubset$EVTYPE)
unique(stormsubset$FATALITIES)
unique(stormsubset$INJURIES)
unique(stormsubset$PROPDMG)
unique(stormsubset$CROPDMG)
stormsubset1 <- stormsubset[(stormsubset$EVTYPE != "?" & (stormsubset$INJURIES > 0 | stormsubset$FATALITIES > 0 | stormsubset$PROPDMG > 0 | stormsubset$CROPDMG > 0)), cols]
unique(stormsubset1$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
stormsubset1$PROPDMGEXP <- mapvalues(stormsubset1$PROPDMGEXP,
from = c("K", "M","", "B", "m", "+", "0", "5", "6", "4", "2", "3", "h",
"7", "H", "-"),
to = c(10^3, 10^6, 1, 10^9, 10^6, 0, 1, 10^5, 10^6, 10^4, 10^2, 10^3, 10^2,
10^7, 10^2, 0))
unique(stormsubset1$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k"
stormsubset1$CROPDMGEXP <- mapvalues(stormsubset1$CROPDMGEXP,
from = c("", "M", "K", "m", "B", "?", "0", "k"),
to = c(1, 10^6, 10^3, 10^6, 10^9, 0, 1, 10^3))
class(stormsubset1$PROPDMGEXP)
## [1] "character"
class(stormsubset1$PROPDMG)
## [1] "numeric"
stormsubset1$PROPDMGEXP <- as.numeric(as.character(stormsubset1$PROPDMGEXP))
stormsubset1$PROPDMGCOST <- (stormsubset1$PROPDMG * stormsubset1$PROPDMGEXP)
class(stormsubset1$CROPDMGEXP)
## [1] "character"
class(stormsubset1$CROPDMG)
## [1] "numeric"
stormsubset1$CROPDMGEXP <- as.numeric(as.character(stormsubset1$CROPDMGEXP))
stormsubset1$CROPDMGCOST <- (stormsubset1$CROPDMG * stormsubset1$CROPDMGEXP)
TotalOnHealth1 <- aggregate(cbind(FATALITIES,INJURIES, SUM_FATALITIES_INJURIES=FATALITIES + INJURIES)~EVTYPE,
data = stormsubset1, FUN = sum)
TotalOnHealth2 <- TotalOnHealth1[order(-TotalOnHealth1$SUM_FATALITIES_INJURIES),][1:10,]
head(TotalOnHealth2)
## EVTYPE FATALITIES INJURIES SUM_FATALITIES_INJURIES
## 406 TORNADO 5633 91346 96979
## 60 EXCESSIVE HEAT 1903 6525 8428
## 422 TSTM WIND 504 6957 7461
## 85 FLOOD 470 6789 7259
## 257 LIGHTNING 816 5230 6046
## 150 HEAT 937 2100 3037
TotalOnHealth2 <- as.data.table(TotalOnHealth2)
HealthImpact <- melt(TotalOnHealth2, id.vars="EVTYPE", variable.name="Fatalities_Injuries")
head(HealthImpact)
## EVTYPE Fatalities_Injuries value
## 1: TORNADO FATALITIES 5633
## 2: EXCESSIVE HEAT FATALITIES 1903
## 3: TSTM WIND FATALITIES 504
## 4: FLOOD FATALITIES 470
## 5: LIGHTNING FATALITIES 816
## 6: HEAT FATALITIES 937
ggplot(HealthImpact, aes(x = reorder(EVTYPE, -value), y = value)) +
geom_bar(stat = "identity", aes(fill = Fatalities_Injuries), position = "dodge") +
ylab("Health Impact") +
xlab("Event Type")+
theme(axis.text.x = element_text(angle=30, hjust=1)) +
ggtitle("Top 10 Weather Events Harmful to Population Health in The US") +
theme(plot.title = element_text(hjust = 0.5))
Totalcost1 <- aggregate(cbind(PROPDMGCOST,CROPDMGCOST,SUM_PROPDMGCOST_CROPDMGCOST=PROPDMGCOST + CROPDMGCOST)~
EVTYPE, data = stormsubset1, FUN = sum)
Totalcost2 <- Totalcost1[order(-Totalcost1$SUM_PROPDMGCOST_CROPDMGCOST),][1:10,]
head(Totalcost2)
## EVTYPE PROPDMGCOST CROPDMGCOST SUM_PROPDMGCOST_CROPDMGCOST
## 85 FLOOD 144.65771 5.6619684 150.31968
## 223 HURRICANE/TYPHOON 69.30584 2.6078728 71.91371
## 406 TORNADO 56.94738 0.4149533 57.36233
## 349 STORM SURGE 43.32354 0.0000050 43.32354
## 133 HAIL 15.73527 3.0259545 18.76122
## 72 FLASH FLOOD 16.82267 1.4213171 18.24399
Totalcost2 <- as.data.table(Totalcost2)
Economic_Consequence <- melt(Totalcost2, id.vars="EVTYPE", variable.name="PROPDMGCOST_CROPDMGCOST")
head(Economic_Consequence)
## EVTYPE PROPDMGCOST_CROPDMGCOST value
## 1: FLOOD PROPDMGCOST 144.65771
## 2: HURRICANE/TYPHOON PROPDMGCOST 69.30584
## 3: TORNADO PROPDMGCOST 56.94738
## 4: STORM SURGE PROPDMGCOST 43.32354
## 5: HAIL PROPDMGCOST 15.73527
## 6: FLASH FLOOD PROPDMGCOST 16.82267
gg_plot2 <- ggplot(Economic_Consequence, aes(x = reorder(EVTYPE, -value), y = value)) +
geom_bar(stat = "identity", aes(fill = PROPDMGCOST_CROPDMGCOST), position = "dodge") +
ylab("Economic Consequences in billions") + xlab("Event Type")+
theme(axis.text.x = element_text(angle=90, hjust=1)) +
ggtitle("Top 10 Weather Events with Greatest Economic Consequences in The US")
gg_plot2
Question 1 The gg_plot1 shows that TORNADO is the Most Harmful Weather Event for the Population’s Health.
Question 2 The gg_plot2 shows that FLOOD and DROUGHT have the Greatest Economic Consequences.