This study was carried using the National Weather Service Storm Data Documentation. The data was accessed on 5/29/2014 and all the manipulations were documented below.
The study aims to answer the two following questions:
Across the United States, which types of events are the most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The analysis revealed that Tornadoes cause the most harm to public health considering fatalities and injuries both.
Additionally Hurricanes/typhoons caused most of the economic damage in the order of billions of U.S. dollars.
The RDS data is read to a data frame using only the necessary columns, as the reading takes a lot of processing power.
## Loading required package: ggplot2
stormdt <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, colClasses = c("factor",
"character", "NULL", "NULL", "factor", "character", "factor", "factor",
rep("NULL", 10), "numeric", "numeric", "factor", "factor", "numeric", "numeric",
"numeric", "factor", "numeric", "factor", rep("NULL", 9)))
head(stormdt, 3)
## STATE__ BGN_DATE COUNTY COUNTYNAME STATE EVTYPE LENGTH WIDTH F
## 1 1.00 4/18/1950 0:00:00 97.00 MOBILE AL TORNADO 14.0 100 3
## 2 1.00 4/18/1950 0:00:00 3.00 BALDWIN AL TORNADO 2.0 150 2
## 3 1.00 2/20/1951 0:00:00 57.00 FAYETTE AL TORNADO 0.1 123 2
## MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 0.00 0 15 25.0 K 0
## 2 0.00 0 0 2.5 K 0
## 3 0.00 0 2 25.0 K 0
Once a data frame is established, the parameters of analyzing the damage done to public health should be considered. In this study we focused on total number of fatalities(fat) and injuries(inj). There are both indirect and direct consequences for both parameters but this study considers both.
We simply need to add the total number of fatalities (and injuries) for each event type (EVTYPE). And bind these two new data frames by rows.
fat <- aggregate(FATALITIES ~ EVTYPE, data = stormdt, FUN = sum, na.rm = TRUE)
fat <- fat[order(fat[, 2], decreasing = TRUE), ]
fat$type <- rep("fatality", length(fat[, 1]))
names(fat) <- c("EvType", "Damage", "DmgType")
head(fat, 3)
## EvType Damage DmgType
## 834 TORNADO 5633 fatality
## 130 EXCESSIVE HEAT 1903 fatality
## 153 FLASH FLOOD 978 fatality
inj <- aggregate(INJURIES ~ EVTYPE, data = stormdt, FUN = sum, na.rm = TRUE)
inj <- inj[order(inj[, 2], decreasing = TRUE), ]
inj$type <- rep("injury", length(inj[, 1]))
names(inj) <- c("EvType", "Damage", "DmgType")
head(inj, 3)
## EvType Damage DmgType
## 834 TORNADO 91346 injury
## 856 TSTM WIND 6957 injury
## 170 FLOOD 6789 injury
health <- rbind(fat, inj)
Since there are a total of 195 event types, we need to subset the ones that are most serious public healtwise; for a better understanding. Here events with less than a total of 1000 injuries and 100 fatalities are not considered to be serious.
Additionally Tornadoes have a significant lead in fatalities and injuries so to check which event types follow tornadoes, we subset the data frame again without Tornadoes.
srshealth <- health[health$Damage > 1000 & health$DmgType == "injury" | health$Damage >
100 & health$DmgType == "fatality", ]
sanstornado <- srshealth[-grep("TORNADO", srshealth$EvType), ]
Once a data frame is established, the parameters of analyzing the economic consequences should be considered. In this study we focused on Property Damage(propDmg) and Crop Damage(cropDmg).
Although there exists detailed damage figures in the data the 'highest' economic consequences are decided to be the ones in the order of Billions of US Dollars. The PROPDMGEXP and CROPDMGEXP variables in the data gives the exponential of the damage where 'B' indicates billions.
We simply need to subset new data frames for each damage type that have 'B' in their [PC]ROPDMGEXP variable. And bind these two new data frames by rows.
propDmg.f <- factor(stormdt[grep("[Bb]", stormdt$PROPDMGEXP), 6])
propDmg <- as.data.frame(propDmg.f)
colnames(propDmg) <- "EvType"
propDmg$DmgType <- rep("property", length(propDmg$EvType))
head(propDmg, 3)
## EvType DmgType
## 1 WINTER STORM property
## 2 HURRICANE OPAL/HIGH WINDS property
## 3 HURRICANE OPAL property
cropDmg.f <- factor(stormdt[grep("[Bb]", stormdt$CROPDMGEXP), 6])
cropDmg <- as.data.frame(cropDmg.f)
colnames(cropDmg) <- "EvType"
cropDmg$DmgType <- rep("crop", length(cropDmg$EvType))
head(cropDmg, 3)
## EvType DmgType
## 1 HEAT crop
## 2 RIVER FLOOD crop
## 3 DROUGHT crop
HDmg <- rbind(propDmg, cropDmg)
Two plots are drawn using the data that was cleaned above:
qplot(x = EvType, y = Damage, data = srshealth, facets = DmgType ~ ., geom = "bar",
stat = "identity", main = "Total Number of Fatalities or Injuries per Event Type") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0)) + facet_wrap(~DmgType,
scales = "free_y", nrow = 2)
qplot(x = EvType, y = Damage, data = sanstornado, facets = DmgType ~ ., geom = "bar",
stat = "identity", main = "Total Number of Fatalities or Injuries per Event Type (w/o Tornadoes)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0)) + facet_wrap(~DmgType,
scales = "free_y", nrow = 2)
The first one clearly shows that Tornadoes caused the most fatalities and injuries throughout the data. The second one shows Excessive Heat folows tornadoes, and is followed by Floods and Thunderstorms with Wind.
A single Plot is drawn using the data that was cleaned above:
qplot(EvType, data = HDmg, facets = DmgType ~ ., geom = "bar", ylab = "Times Damage Exceeds USD Billions",
main = "Number of Times the Event Type damage Exceeded USD Billions") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + facet_wrap(~DmgType,
scales = "free_y", nrow = 2)
This plot clearly shows that Hurricanes/ Typhoons cause the highes damage on property while Drought is the most damaging natural disaster when it comes to crops.