Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
This analysis aim to identify: - events that are most harmful with respect to population health - events that have the greatest economic consequences
Download and load the data.
##To download the zip file if it is not available and follow by unzip
if(!file.exists("StormData.csv.bz2")) {
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","./StormData.csv.bz2")
}
##Load the data
StormData <- read.csv(bzfile("StormData.csv.bz2"))
Since we are only concerned about the type of event that are most harmful with respect to population health and have the greatest economic consequences, we will subset the data with followings column - EVTYPE, which describes the type of event. - FATALITIES and INJURIES, to determine health impact - PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, to determine economic impact
SelectedStormData <- StormData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Q1 Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Fatalities
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Select event by summing up Fatalities
FatalitiesData <- aggregate(FATALITIES ~ EVTYPE, data = SelectedStormData, FUN = sum, na.rm = TRUE)
## Sort the data by Descending Order and select the first 3 data
SortFatalities <- arrange(FatalitiesData, desc(FATALITIES))
Top3Fatalities <- SortFatalities[1:3,]
## Plot into bar chart
library(ggplot2)
ggplot(Top3Fatalities, aes(x = EVTYPE, y = FATALITIES))+geom_bar(stat="identity")+ggtitle("Most Fatalities by Event")
Injuries
library(dplyr)
## Select event by summing up Injuries
InjuriesData <- aggregate(INJURIES ~ EVTYPE, data = SelectedStormData, FUN = sum, na.rm = TRUE)
## Sort the data by Descending Order and select the first 3 data
SortInjuries <- arrange(InjuriesData, desc(INJURIES))
Top3Injuries <- SortInjuries[1:3,]
## Plot into bar chart
library(ggplot2)
ggplot(Top3Injuries, aes(x = EVTYPE, y = INJURIES))+geom_bar(stat="identity")+ggtitle("Most Injuries by Event")
From the above chart, we can observe that “TORNADO” is the event that cause the most fatalities and injuries
Q2. Across the United States, which types of events have the greatest economic consequences
##transform PROPDMGEXP
SelectedStormData2 <- mutate(SelectedStormData, PropM =
ifelse(PROPDMGEXP %in% c(0:8), 0,
ifelse(PROPDMGEXP == "h" | PROPDMGEXP == "H", 100,
ifelse(PROPDMGEXP == "k" | PROPDMGEXP == "K", 1000,
ifelse(PROPDMGEXP == "m" | PROPDMGEXP == "M", 1000000,
ifelse(PROPDMGEXP == "b" | PROPDMGEXP == "B", 1000000000, 0 ))))))
##transform CROPDMGEXP
SelectedStormData2 <- mutate(SelectedStormData2, CropM =
ifelse(CROPDMGEXP %in% c(0:8), 0,
ifelse(CROPDMGEXP == "h" | CROPDMGEXP == "H", 100,
ifelse(CROPDMGEXP == "k" | CROPDMGEXP == "K", 1000,
ifelse(CROPDMGEXP == "m" | CROPDMGEXP == "M", 1000000,
ifelse(CROPDMGEXP == "b" | CROPDMGEXP == "B", 1000000000, 0 ))))))
## Get the total value
SelectedStormData2 <- mutate(SelectedStormData2, Total = PROPDMG * PropM + CROPDMG * CropM)
## Select event by summing up the total
TotalValue <- aggregate(Total ~ EVTYPE, data = SelectedStormData2, FUN = sum, na.rm = TRUE)
## Sort the data by Descending Order and select the first 3 data
SortValue <- arrange(TotalValue, desc(Total))
Top3Value <- SortValue[1:3,]
## Plot into bar chart
library(ggplot2)
ggplot(Top3Value, aes(x = EVTYPE, y = Total))+geom_bar(stat="identity")+ggtitle("Economic value by Event")
Flood caused the greatest economic consequences