Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The basic goal of this analysis is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
From these data, we found that tornadoes and heat are most dangerous event types to people, while flooding, hurricanes, and storm surges are the most costly event types to the economy.
The data for this analysis come from National Weather Service. There is also some documentation of the database available.
# Download the Storm Data dataset
if(!file.exists("./data")){dir.create("./data")}
destination.file <- "data/stormdata.csv.bz2"
if (!file.exists(destination.file)){
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destination.file, method = "auto")
}
# Load the data in the stormData variable.
stormData <- read.csv(destination.file)
In this section, we will address two questions introduced at the beginning of the report.
We will focus on fields FATALITIES and INJURIES.
library(plyr)
# Summarize data
fataldata <- arrange(ddply(stormData, .(EVTYPE), summarise, TotalFatalities=sum(FATALITIES)), desc(TotalFatalities))
injurdata <- arrange(ddply(stormData, .(EVTYPE), summarise, TotalInjuries=sum(INJURIES)), desc(TotalInjuries))
# Make small subsets with maximum of data
fataldataSmall <- head(fataldata)
injurdataSmall <- head(injurdata)
# Plot data
par(mfcol=c(1,2))
barplot(fataldataSmall$TotalFatalities, names.arg = fataldataSmall$EVTYPE,main = "Number of fatalities events", cex.names=0.6, las=2)
barplot(injurdataSmall$TotalInjuries, names.arg = injurdataSmall$EVTYPE,main = "Number of injuries events", cex.names=0.6, las=2)
We will focus on fields PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP
# Calculate real value
stormData$PROPDMGValue <- ifelse(stormData$PROPDMGEXP =="H", stormData$PROPDMG*100,
ifelse(stormData$PROPDMGEXP =="K", stormData$PROPDMG*1000,
ifelse(stormData$PROPDMGEXP == "M", stormData$PROPDMG*1e6,
ifelse(stormData$PROPDMGEXP == "B", stormData$PROPDMG*1e9,
stormData$PROPDMG))))
stormData$CROPDMGValue <- ifelse(stormData$CROPDMGEXP =="H", stormData$CROPDMG*100,
ifelse(stormData$CROPDMGEXP =="K", stormData$CROPDMG*1000,
ifelse(stormData$CROPDMGEXP == "M", stormData$CROPDMG*1e6,
ifelse(stormData$CROPDMGEXP == "B", stormData$CROPDMG*1e9,
stormData$CROPDMG))))
# Summarize data
propdmgdata <- arrange(ddply(stormData, .(EVTYPE), summarise, TotalDamage=sum(PROPDMGValue)), desc(TotalDamage))
cropdmgdata <- arrange(ddply(stormData, .(EVTYPE), summarise, TotalDamage=sum(CROPDMGValue)), desc(TotalDamage))
# Make small subsets with maximum of data
propdmgdataSmall <- head(propdmgdata)
cropdmgdataSmall <- head(cropdmgdata)
# Plot data
par(mfcol=c(1,2))
barplot(propdmgdataSmall$TotalDamage/1e6, names.arg = propdmgdataSmall$EVTYPE,main = "The costs of property damage\nmillion $",cex.axis=0.8,cex.names=0.5,las=2)
barplot(cropdmgdataSmall$TotalDamage/1e6, names.arg = cropdmgdataSmall$EVTYPE,main = "The costs of damage to crops\nmillion $",cex.axis=0.8,cex.names=0.5,las=2)
Weather event causing the most number of fatalities - TORNADO
Weather event causing the most number of injuries - TORNADO
Weather event causing the most number of property damage - FLOODS
Weather event causing the most number of crop damage - DROUGHT