The basic goal of this assignment is to explore the NOAA Storm Database and analyze the severity of the severe weather events. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size, which can be downloaded here.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
For calculating th impact to population we consider the aggregation of both “FATALITIES & INJURIES” column.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## download the data file.
#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
# destfile = "stormData.csv.bz2")
## Read tne data to R
stormData <- read.csv("stormData.csv.bz2", sep = ",", header = TRUE, stringsAsFactors = FALSE)
reqStormData <- stormData %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
stormEvents <- reqStormData %>% group_by(EVTYPE) %>%
summarize(FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE), TOTAL = FATALITIES + INJURIES) %>%
arrange(desc(TOTAL))
## Warning: package 'bindrcpp' was built under R version 3.4.3
# convert letters exponents into integers
reqStormData[(reqStormData$PROPDMGEXP == "K" | reqStormData$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
reqStormData[(reqStormData$PROPDMGEXP == "M" | reqStormData$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
reqStormData[(reqStormData$PROPDMGEXP == "B" | reqStormData$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
reqStormData[(reqStormData$CROPDMGEXP == "K" | reqStormData$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
reqStormData[(reqStormData$CROPDMGEXP == "M" | reqStormData$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
reqStormData[(reqStormData$CROPDMGEXP == "B" | reqStormData$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9
# multiply property and crops damage by 10 raised to the power of the exponent
suppressWarnings(reqStormData$PROPDMG <- reqStormData$PROPDMG * 10^as.numeric(reqStormData$PROPDMGEXP))
suppressWarnings(reqStormData$CROPDMG <- reqStormData$CROPDMG * 10^as.numeric(reqStormData$CROPDMGEXP))
stormLoss <- stormData %>% group_by(EVTYPE) %>%
summarize(PROP_LOSS = sum(PROPDMG), CROP_LOSS = sum(CROPDMG),
ECONOMIC_LOSS = PROP_LOSS + CROP_LOSS) %>%
arrange(desc(ECONOMIC_LOSS))
First, we’d like to analyze which types of severe weather events are most harmful to the population.
We plot the top events with maximum casualties to verify the outcome.
## Plot the total no. of casualties per event type
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
ggplot(stormEvents[1:30,], aes(x = EVTYPE, y = TOTAL)) + geom_bar(stat="identity") + xlab("Event Type") + ylab("Total Casualties") + ggtitle("Harmful Events") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Based on the above histogram, we find that Tornado had caused the most fatalities.
Now let’s look at the economic impact. First we need to calculate the total economic cost of each event. Multipliers need to be applied to the damage amounts according to the codes in the ‘EXP’ fields.
ggplot(stormLoss[1:20, ], aes(EVTYPE, ECONOMIC_LOSS)) + geom_bar(stat = "identity") + xlab("Event Type") + ylab("Total Economic Loss") + ggtitle("Storm Economics") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Based on the above histogram, we find that Tornado had caused the most Economic Loss.