This report explores the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. More info can be found on National Weather Service Storm Data Documentation.
This study has shown that Tornado is the event type with the highest accumulated health impact, while Flood is the one with the highest accumulated economic damage, accounting from 1950 to 2011.
Import the libraries which will be used:
library(R.utils)
library(plyr)
library(lattice)
First, the database is downloaded and unzipped, in case it is not present on current folder:
if (!file.exists("./repdata-data-StormData.csv")) {
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "repdata-data-StormData.csv.bz2", method = "curl")
bunzip2("repdata-data-StormData.csv.bz2")
}
stormData <- read.csv("./repdata-data-StormData.csv")
According to question being adressed, only related fields are selected from database:
# selects only the collumns which contains relevant data
stormData <- stormData[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")]
Some transformations are performed in order to get the real damage value in US dollars, both for properties and crops:
# mapping factor value to its adequate values
stormData$PROPDMGEXP_VALUE <- mapvalues(stormData$PROPDMGEXP, from = c("K",
"M", "m", "B", "H", "h"), to = c(1000, 1e+06, 1e+06, 1e+09, 100, 100))
stormData$CROPDMGEXP_VALUE <- mapvalues(stormData$CROPDMGEXP, from = c("K",
"k", "M", "m", "B"), to = c(1000, 1000, 1e+06, 1e+06, 1e+09))
# ignores all values which are different from h,H,m,M,k,K,b,B
stormData$PROPDMGEXP_VALUE[!stormData$PROPDMGEXP %in% c("h", "H", "m", "M",
"k", "K", "b", "B")] <- 0
stormData$CROPDMGEXP_VALUE[!stormData$CROPDMGEXP %in% c("h", "H", "m", "M",
"k", "K", "b", "B")] <- 0
# converting from factor to integer
stormData$PROPDMGEXP_VALUE <- as.integer(as.character(stormData$PROPDMGEXP_VALUE))
stormData$CROPDMGEXP_VALUE <- as.integer(as.character(stormData$CROPDMGEXP_VALUE))
# calculates real damage value
stormData$PROPDMG_TOTAL <- stormData$PROPDMG * stormData$PROPDMGEXP_VALUE
stormData$CROPDMG_TOTAL <- stormData$CROPDMG * stormData$CROPDMGEXP_VALUE
Total damage on properties and crops are added to get total damage value per weather event:
# sum up crop damage and property damage to check total damage
stormData$CROP_PROP_TOTAL <- stormData$PROPDMG_TOTAL + stormData$CROPDMG_TOTAL
Total acummulated damage over all years per event type is calculated. A bar chart from the top 5 most economic damage is created:
# calculates the total number of occurrences, total damage and mean damage
# per event type
economicSummary <- ddply(stormData, "EVTYPE", summarise, N = length(CROP_PROP_TOTAL),
mean = mean(CROP_PROP_TOTAL), total = sum(CROP_PROP_TOTAL))
# orders by total damage value (among all years)
plotData <- tail(economicSummary[order(economicSummary$total), ])
plotData
## EVTYPE N mean total
## 147 FLASH FLOOD 54277 323565 1.756e+10
## 238 HAIL 288661 64984 1.876e+10
## 666 STORM SURGE 261 165990579 4.332e+10
## 830 TORNADO 60652 945593 5.735e+10
## 406 HURRICANE/TYPHOON 88 817201282 7.191e+10
## 164 FLOOD 25326 5935390 1.503e+11
barchart(total/1e+09 ~ EVTYPE, data = plotData, xlab = "Event Type", ylab = "Total Economic Damage (in Billion Dolars)",
main = "Economic Damage per Event Type (Top 5)")
The same is done to calculate health damage. Data is summarized to compute the accumulated number of fatalities and injuries over all years per event type:
# calculates health summaries
healthSummary <- ddply(stormData, "EVTYPE", summarise, N = length(FATALITIES),
meanFat = mean(FATALITIES), totalFat = sum(FATALITIES), meanInj = mean(INJURIES),
totalInj = sum(INJURIES))
# plot top 5 event with most fatalities
plotData <- tail(healthSummary[order(healthSummary$totalFat), ])
plotData
## EVTYPE N meanFat totalFat meanInj totalInj
## 854 TSTM WIND 219940 0.002292 504 0.03163 6957
## 452 LIGHTNING 15754 0.051796 816 0.33198 5230
## 269 HEAT 767 1.221643 937 2.73794 2100
## 147 FLASH FLOOD 54277 0.018019 978 0.03274 1777
## 123 EXCESSIVE HEAT 1678 1.134088 1903 3.88856 6525
## 830 TORNADO 60652 0.092874 5633 1.50607 91346
barchart(totalFat ~ EVTYPE, data = plotData, xlab = "Event Type", ylab = "Total Number of Fatalities",
main = "Total Fatalities per Event Type (Top 5)")
# plot top 5 event with most injuries
plotData <- tail(healthSummary[order(healthSummary$totalInj), ])
plotData
## EVTYPE N meanFat totalFat meanInj totalInj
## 269 HEAT 767 1.221643 937 2.73794 2100
## 452 LIGHTNING 15754 0.051796 816 0.33198 5230
## 123 EXCESSIVE HEAT 1678 1.134088 1903 3.88856 6525
## 164 FLOOD 25326 0.018558 470 0.26806 6789
## 854 TSTM WIND 219940 0.002292 504 0.03163 6957
## 830 TORNADO 60652 0.092874 5633 1.50607 91346
barchart(totalInj ~ EVTYPE, data = plotData, xlab = "Event Type", ylab = "Total Number of Injuries",
main = "Total Injuries per Event Type (Top 5)")
It's possible to see by the information presented on the Data Processing section that the weather event type which has the higher economic impact is the Flood, which has caused an economic damage of more than 150 billion dollars from 1950 to 2011. This event has not only a high damage per occurrence but also a high frequency (more than 25k occurrences), which contributes to its first position on the ranking. Hurricanes and Typhoon occupy the second place, however these events are rare (only 88 occurrences), followed by Tornado.
Regarding the health damage, the analysis showed that Tornado is by far the event type which has caused the highest number of fatalities and injuries over all years accounted. It caused more than 5.6k fatalities and more than 90k injuries on US population.