The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Storms and other severe weather events cause both public health and economic problems for communities and municipalities. This database tracks characteristics of major storms and weather events in the U.S, such as when and where they occur, estimates of any fatalities, injuries, and property damage. Your data analysis must address the following questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? Analysis shows that Tornado is the most harmful event with respect to population health, and Flood is the event which has the greatest economic consequences.
library(knitr)
library(ggplot2)
Loading data.
dsNOAA <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)
head(dsNOAA)
Subset (NOAA) storm database.
tidyNOAA <- dsNOAA[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
head(tidyNOAA)
str(tidyNOAA)
To calculate the economic damage the following variables must be used: 1.PROPDMG and CROPDMG: Amount (without unit) of property damage and crop damage. 2.PROPDMGEXP and CROPDMGEXP: Unit expressed in power of 10 of the above variables. (H,K,M,B correspond to Hundreds, Thousands, Millions, Billions respectively) Convert H, K, M, B units to calculate Property Damage. First create an empty column.
tidyNOAA$PROPDMGNUM = 0
Fill in the data with correct units.
tidyNOAA[tidyNOAA$PROPDMGEXP == "H", ]$PROPDMGNUM = tidyNOAA[tidyNOAA$PROPDMGEXP == "H", ]$PROPDMG * 10^2
tidyNOAA[tidyNOAA$PROPDMGEXP == "K", ]$PROPDMGNUM = tidyNOAA[tidyNOAA$PROPDMGEXP == "K", ]$PROPDMG * 10^3
tidyNOAA[tidyNOAA$PROPDMGEXP == "M", ]$PROPDMGNUM = tidyNOAA[tidyNOAA$PROPDMGEXP == "M", ]$PROPDMG * 10^6
tidyNOAA[tidyNOAA$PROPDMGEXP == "B", ]$PROPDMGNUM = tidyNOAA[tidyNOAA$PROPDMGEXP == "B", ]$PROPDMG * 10^9
head(tidyNOAA, 100)
Convert H, K, M, B units to calculate Crop Damage. Create an empty column.
tidyNOAA$CROPDMGNUM = 0
Assign correct values based on parameters.
tidyNOAA[tidyNOAA$CROPDMGEXP == "H", ]$CROPDMGNUM = tidyNOAA[tidyNOAA$CROPDMGEXP == "H", ]$CROPDMG * 10^2
tidyNOAA[tidyNOAA$CROPDMGEXP == "K", ]$CROPDMGNUM = tidyNOAA[tidyNOAA$CROPDMGEXP == "K", ]$CROPDMG * 10^3
tidyNOAA[tidyNOAA$CROPDMGEXP == "M", ]$CROPDMGNUM = tidyNOAA[tidyNOAA$CROPDMGEXP == "M", ]$CROPDMG * 10^6
tidyNOAA[tidyNOAA$CROPDMGEXP == "B", ]$CROPDMGNUM = tidyNOAA[tidyNOAA$CROPDMGEXP == "B", ]$CROPDMG * 10^9
Plot number of fatalities with the most harmful event type.
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=tidyNOAA, sum)
fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)
ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity", fill = "red", las = 3) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Fatalities") + ggtitle("Number of fatalities by top 10 Weather Events")
## Warning: Ignoring unknown parameters: las
Plot number of injuries with the most harmful event type.
injuries <- aggregate(INJURIES ~ EVTYPE, data=tidyNOAA, sum)
injuries <- injuries[order(-injuries$INJURIES), ][1:10, ]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)
ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity", fill = "red", las = 3) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Injuries") + ggtitle("Number of injuries by top 10 Weather Events")
## Warning: Ignoring unknown parameters: las
Plot number of damages with the most harmful event type.
damages <- aggregate(PROPDMGNUM + CROPDMGNUM ~ EVTYPE, data=tidyNOAA, sum)
names(damages) = c("EVTYPE", "TOTALDAMAGE")
damages <- damages[order(-damages$TOTALDAMAGE), ][1:10, ]
damages$EVTYPE <- factor(damages$EVTYPE, levels = damages$EVTYPE)
ggplot(damages, aes(x = EVTYPE, y = TOTALDAMAGE)) +
geom_bar(stat = "identity", fill = "red", las = 3) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Damages ($)") + ggtitle("Property & Crop Damages by top 10 Weather Events")
## Warning: Ignoring unknown parameters: las