Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data analysis must address the following questions :

This analysis shows by aggregating the data by storm events type :

Data Pre-Processing

# Dowloading data if it's not already done
if(!file.exists("stormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
  destfile = "stormData.csv.bz2", method = "curl")
}

# Loading data
storm_data <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)

# Subset (NOAA) storm database
tidy_storm_data <- storm_data[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

# Convert H, K, M, B units to calculate Property Damage 
tidy_storm_data$PROPDMGNUM = 0
tidy_storm_data[tidy_storm_data$PROPDMGEXP == "H", ]$PROPDMGNUM = tidy_storm_data[tidy_storm_data$PROPDMGEXP == "H", ]$PROPDMG * 10^2
tidy_storm_data[tidy_storm_data$PROPDMGEXP == "K", ]$PROPDMGNUM = tidy_storm_data[tidy_storm_data$PROPDMGEXP == "K", ]$PROPDMG * 10^3
tidy_storm_data[tidy_storm_data$PROPDMGEXP == "M", ]$PROPDMGNUM = tidy_storm_data[tidy_storm_data$PROPDMGEXP == "M", ]$PROPDMG * 10^6
tidy_storm_data[tidy_storm_data$PROPDMGEXP == "B", ]$PROPDMGNUM = tidy_storm_data[tidy_storm_data$PROPDMGEXP == "B", ]$PROPDMG * 10^9

# Convert H, K, M, B units to calculate Crop Damage
tidy_storm_data$CROPDMGNUM = 0
tidy_storm_data[tidy_storm_data$CROPDMGEXP == "H", ]$CROPDMGNUM = tidy_storm_data[tidy_storm_data$CROPDMGEXP == "H", ]$CROPDMG * 10^2
tidy_storm_data[tidy_storm_data$CROPDMGEXP == "K", ]$CROPDMGNUM = tidy_storm_data[tidy_storm_data$CROPDMGEXP == "K", ]$CROPDMG * 10^3
tidy_storm_data[tidy_storm_data$CROPDMGEXP == "M", ]$CROPDMGNUM = tidy_storm_data[tidy_storm_data$CROPDMGEXP == "M", ]$CROPDMG * 10^6
tidy_storm_data[tidy_storm_data$CROPDMGEXP == "B", ]$CROPDMGNUM = tidy_storm_data[tidy_storm_data$CROPDMGEXP == "B", ]$CROPDMG * 10^9

Results

Accross the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ?

# import ggplot2 library to plot the result
library(ggplot2)
# plot number of fatalities with the most harmful event type
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=tidy_storm_data, sum)
fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)

fatality_graph <- ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
    geom_bar(stat = "identity", fill = "gray", las = 3) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Fatalities") + ggtitle("Number of Fatalities by Top 10 Weather Events")
## Warning: Ignoring unknown parameters: las
print(fatality_graph)

# import ggplot2 library to plot the result
library(ggplot2)
# plot number of injuries with the most harmful event type
injuries <- aggregate(INJURIES ~ EVTYPE, data=tidy_storm_data, sum)
injuries <- injuries[order(-injuries$INJURIES), ][1:10, ]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)

injuries_graph <- ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) + 
    geom_bar(stat = "identity", fill = "gray", las = 3) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Injuries") + ggtitle("Number of Injuries by Top 10 Weather Events")
## Warning: Ignoring unknown parameters: las
print(injuries_graph)

Across the United States, which types of events have the greatest economic consequences ?

# import ggplot2 library to plot the result
library(ggplot2)
# plot number of damages with the most harmful event type
damages <- aggregate(PROPDMGNUM + CROPDMGNUM ~ EVTYPE, data=tidy_storm_data, sum)
names(damages) = c("EVTYPE", "TOTALDAMAGE")
damages <- damages[order(-damages$TOTALDAMAGE), ][1:10, ]
damages$EVTYPE <- factor(damages$EVTYPE, levels = damages$EVTYPE)

property_crop_damages_graph <- ggplot(damages, aes(x = EVTYPE, y = TOTALDAMAGE)) + 
    geom_bar(stat = "identity", fill = "gray", las = 3) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damages ($)") + ggtitle("Property and Crop Damages by top 10 Weather Events")
## Warning: Ignoring unknown parameters: las
print(property_crop_damages_graph)

Conclusions

  • Reviewing storm data from 1996, we found “Tornado” is the most powerful event to population health. We evaluated it by normalized data of deviation value.

  • Reviewng storm data from 1996, we found “Flood” has the greatest economic consequences. This conclusion has been made by summary of both property damage and crop damage.