Summary: Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
library(ggplot2)
library(reshape2)
Load the dataset:
storms <- read.csv("repdata_data_StormData.csv", na.strings="")
Before actually start the data process, it is better to irrelevant columns in the data frame and also clean the data. We also need to calculate numeric value of the damage and it is done by factoring the exponent.
storms_processed <- storms
# Clean CROPDMG
not.na <- !is.na(storms$CROPDMGEXP)
B <- storms$CROPDMGEXP=="b"|storms$CROPDMGEXP=="B"
M <- storms$CROPDMGEXP=="m"|storms$CROPDMGEXP=="M"
K <- storms$CROPDMGEXP=="k"|storms$CROPDMGEXP=="K"
storms_processed$CROPDMG[not.na&K] = storms$CROPDMG[not.na&K]*1000
storms_processed$CROPDMG[not.na&M] = storms$CROPDMG[not.na&M]*1000000
storms_processed$CROPDMG[not.na&B] = storms$CROPDMG[not.na&B]*1000000000
# Clean PROPDMG
not.na <- !is.na(storms$PROPDMGEXP)
B <- storms$PROPDMGEXP=="b"|storms$PROPDMGEXP=="B"
M <- storms$PROPDMGEXP=="m"|storms$PROPDMGEXP=="M"
K <- storms$PROPDMGEXP=="k"|storms$PROPDMGEXP=="K"
storms_processed$PROPDMG[not.na&K] = storms$PROPDMG[not.na&K]*1000
storms_processed$PROPDMG[not.na&M] = storms$PROPDMG[not.na&M]*1000000
storms_processed$PROPDMG[not.na&B] = storms$PROPDMG[not.na&B]*1000000000
Summarize damages in a single variable called “Total_damage”:
storms_processed$Total_damage <- storms_processed$PROPDMG + storms_processed$CROPDMG
Summarize the injuries, fatalities and total damage variables:
storms_summary <- aggregate(cbind(INJURIES, FATALITIES, Total_damage) ~ EVTYPE, data=storms_processed, FUN=sum)
The Total damage variable corresponds to the effect of each event.
damage_sorted <- storms_summary[with(storms_summary,order(-Total_damage)),c("EVTYPE", "Total_damage")]
head(damage_sorted,3)
## EVTYPE Total_damage
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57352114049
Floods are the most damaging events.
ggplot(head(damage_sorted,10), aes(reorder(EVTYPE,Total_damage), Total_damage)) +
geom_bar(stat = "identity",fill="#FF9999", colour="black") +
coord_flip() +
xlab("EVENT") +
ylab("TOTAL DAMAGE ($)") +
ggtitle("EVENTS WITH GREATEST ECONOMIC CONSEQUENCES")
The impact on health is summarized below:
top_injuries <- storms_summary[with(storms_summary,order(-INJURIES)),c("EVTYPE", "FATALITIES", "INJURIES")]
head(top_injuries,5)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 130 EXCESSIVE HEAT 1903 6525
## 464 LIGHTNING 816 5230
top_fatalities <- storms_summary[with(storms_summary,order(-FATALITIES)),c("EVTYPE", "FATALITIES", "INJURIES")]
head(top_fatalities,5)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 153 FLASH FLOOD 978 1777
## 275 HEAT 937 2100
## 464 LIGHTNING 816 5230
Tornados are the most devasting event.
top_injuries_10 <- head(top_injuries, 10)
most_inj_10 <- melt(top_injuries_10, id.vars="EVTYPE")
ggplot(most_inj_10, aes(reorder(EVTYPE,value), value, fill = variable)) +
geom_bar(stat="identity") +
xlab("EVENT TYPE") +
ylab("# VICTIMS") +
coord_flip() +
ggtitle("NUMBER OF INJURIES AND FATALITIES")
Tornados and floods cause most of human health and economical impact.