Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report explore U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and address the following questions:
The data come in the form of a comma-separated-value file.
Data will be downloaded from: Storm Data [47Mb].
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
From url: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, the raw data will be downloaded as StormData.csv.bz2
setwd("C:/Users/irman.zulkeflie/Documents")
if(!file.exists("StormData.csv.bz2")) {
Original_Data_URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(Original_Data_URL, destfile="StormData.csv.bz2")
}
library(dplyr)
Stormdata <- read.csv("StormData.csv.bz2", stringsAsFactors=F)
#check number's of row and variable
dim(Stormdata)
## [1] 902297 37
# Filter Raw Data: EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP
subset.storm <- Stormdata %>%
select(STATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
#display 6 row form subset.storm
head(subset.storm)
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 AL TORNADO 0 15 25.0 K 0
## 2 AL TORNADO 0 0 2.5 K 0
## 3 AL TORNADO 0 2 25.0 K 0
## 4 AL TORNADO 0 2 2.5 K 0
## 5 AL TORNADO 0 2 2.5 K 0
## 6 AL TORNADO 0 6 2.5 K 0
There are two measurements in the dataset can reflect the degree of harmfulness of a type of event with respect to population health: fatalities and injuries.
Thus, sum them up over types of events to find out the most harmful type of event.
# Fatalities Category
fatalData <- aggregate(FATALITIES ~ EVTYPE, data = subset.storm, FUN = sum)
table(fatalData$FATALITIES)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 817 51 17 15 8 8 3 6 2 1 2 2 1 1 3
## 15 17 18 19 22 23 25 28 29 33 35 38 42 58 61
## 1 3 1 1 1 1 1 2 1 2 2 1 1 1 1
## 62 64 75 89 95 96 98 101 103 125 127 133 160 172 204
## 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1
## 206 224 248 368 470 504 816 937 978 1903 5633
## 1 1 1 1 1 1 1 1 1 1 1
# Injuries Category
InjureData <- aggregate(INJURIES ~ EVTYPE, data = subset.storm, FUN = sum)
table(InjureData$INJURIES)
##
## 0 1 2 3 4 5 6 7 8 10 12 13
## 827 34 18 3 7 6 1 1 4 3 2 1
## 15 16 17 20 21 22 23 24 26 27 28 29
## 5 1 2 1 2 1 1 2 2 1 1 2
## 31 35 36 38 40 42 43 46 48 50 52 68
## 1 1 1 1 1 2 1 1 2 1 1 1
## 70 72 77 79 86 95 129 137 150 152 155 170
## 1 1 1 1 1 1 1 1 1 1 1 1
## 216 231 232 251 280 297 302 309 340 342 398 440
## 1 1 1 1 1 1 1 1 1 1 1 1
## 545 734 805 908 911 1021 1137 1275 1321 1361 1488 1777
## 1 1 1 1 1 1 1 1 1 1 1 1
## 1975 2100 5230 6525 6789 6957 91346
## 1 1 1 1 1 1 1
# From two table above, plot the top 5 harmful event for injured and fatal category with reference to population health
library(ggplot2)
PlotFatal <- fatalData[order(fatalData$FATALITIES, decreasing = T), ]
PlotInjured <- InjureData[order(InjureData$INJURIES, decreasing = T), ]
# From two table above, plot the top 5 harmful event for injured and fatal category with reference to population health
# Plot top 5 fatalities per event type
ggplot(PlotFatal[1:5, ], aes(EVTYPE, FATALITIES)) + geom_bar(stat = "identity") +
ylab("Number Of Fatalities") + xlab("Event") + ggtitle("Numbers Of Fatalities Per Events Across the U.S")
# Plot top 5 Injured per event type
ggplot(PlotInjured[1:5, ], aes(EVTYPE, INJURIES)) + geom_bar(stat = "identity") +
ylab("Number Of Injured") + xlab("Event") + ggtitle("Numbers Of Injured Per Events Across the U.S")
Tornado is the most harmful event type as shown in the above two bar charts, which has led to 5633 deaths and 91346 injuries from year 1950 to November 2011.
As we can see, there are two damages PROPDMG and CROPDMG. Since the damage costs are reported in two separate columns, a damage and damage exponent column, create a separate columns in the dataframe to assign the PROPDMGEXP and CROPDMGEXP to the correct value.
However, it does not account for the characters like “-”, “+” or “?” and the numbers like “1”, “2”, etc.
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% "B"] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% "B"] * 1000
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("M", "m")] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("M", "m")] * 1
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("K")] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("K")] * 0.001
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("H", "h")] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("H", "h")] * 1e-04
subset.storm$PROPDMG[!(subset.storm$PROPDMGEXP %in% c("B", "M", "m", "K", "H", "h"))] <- subset.storm$PROPDMG[!(subset.storm$PROPDMGEXP %in% c("B", "M", "m", "K", "H", "h"))] * 1e-06
subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% "B"] <- subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% "B"] * 1000
subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("M", "m")] <- subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("M", "m")] * 1
subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("K", "k")] <- subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("K", "k")] * 0.001
subset.storm$CROPDMG[!(subset.storm$CROPDMGEXP %in% c("B", "M", "m", "K", "k"))] <- subset.storm$CROPDMG[!(subset.storm$CROPDMGEXP %in% c("B", "M", "m", "K", "k"))] * 1e-06
Calculated the total damage by adding all property damages and crop damage for the events.
Then, visualize the top five events.
EcoConsDmg <- subset.storm$PROPDMG + subset.storm$CROPDMG
EcoCons <- aggregate(EcoConsDmg ~ subset.storm$EVTYPE, FUN = sum)
PlotEcoCons <- EcoCons[order(EcoCons$EcoConsDmg, decreasing = T), ]
names(PlotEcoCons)[1] <- "EVTYPE"
ggplot(PlotEcoCons[1:5, ], aes(EVTYPE, EcoConsDmg)) + geom_bar(stat = "identity") + ylab("Economic Damages (million dollars)") +
xlab("Event") + ggtitle("Top Five Events Causing Economic Damages Across the U.S")
From the diagram shows that flood damage causes the highest damage.
The results show that, from year 1950 to November 2011, tornados are most harmful for population health and floods have the greatest economic losses.