This study includes an exploratory research of NOAA natural disasters data. The goal of the study is to determine what natural disasters bring the most harm to population health and economy. This is done by summarising data collected by U.S. National Oceanic and Atmospheric Administration during the years 1950-2011. For each event type the total amount of damage is calculated and presented in descending order so that reader can see the relative perspective of the most damaging events and decide upon required measures.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setwd('~/Coursera/4 - Reproducible Research/Week 4/Project')
if (!file.exists('NOAAdata.csv.bz2'))
{
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', 'NOAAdata.csv.bz2')
}
if(!exists('NOAA'))
{
NOAA <- read.csv('NOAAdata.csv.bz2')
}
Let’s see what events are most harmful for population health. For this we will investigate numbers of fatalities and injuries
health <- NOAA[,c('EVTYPE','INJURIES','FATALITIES')] # selecting only relevant data
healthPerEvent <- aggregate(formula = cbind(INJURIES , FATALITIES) ~ EVTYPE ,data = health, FUN = sum) # summarising all event types
healthPerEvent$EVTYPE <- factor(healthPerEvent$EVTYPE, levels = healthPerEvent[order(healthPerEvent$FATALITIES),'EVTYPE'])
healthPerEvent <- healthPerEvent[order(healthPerEvent$FATALITIES,decreasing = TRUE),] # ordering descending by fatalities
fatalPerEvent <- healthPerEvent[1:20,] #selecting 20 most harmful events
fatpl<- ggplot( mapping = aes(y = FATALITIES, x = EVTYPE), data = fatalPerEvent) +
geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=FATALITIES, y = 500)) #creating barplot
print(fatpl) #plotting
From the above plot we see that wind-related disasters like Tornado and Thunderstorm wind are by far the most fatal disaster in the US. Following are the heat related events. The following events are either sub-categories of above events, or their severity is negligible compared to the most harmful of them.
Let’s now look at injuries data
healthPerEvent$EVTYPE <- factor(healthPerEvent$EVTYPE, levels = healthPerEvent[order(healthPerEvent$INJURIES),'EVTYPE'])
healthPerEvent <- healthPerEvent[order(healthPerEvent$INJURIES,decreasing = TRUE),]
injurPerEvent <- healthPerEvent[1:20,]
injpl<- ggplot( mapping = aes(y = INJURIES, x = EVTYPE), data = injurPerEvent) +
geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=INJURIES, y = 10000))
print(injpl)
First of all, as expected, the numbers of injuries is far larger. Second, we again see wind-related disasters on the first place. following are heat-related disasters.
Let’s see what events are most harmful for Economy. For this we will investigate numbers of property and crop damage
First we need to tidy up the data. PROPDMGEXP and CROPDMGEXP values are not suitable for calculations. We’ll have to convert order abbreviations into actual numbers.
economy <- NOAA[,c('EVTYPE','PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')]
# Sorting the property exponent data
economy$PROPDMGEXP <- as.character(economy$PROPDMGEXP)
economy$PROPDMGEXP[economy$PROPDMGEXP == "K"] <- 1000
economy$PROPDMGEXP[economy$PROPDMGEXP == "M"] <- 1e+06
economy$PROPDMGEXP[economy$PROPDMGEXP == "" ] <- 1
economy$PROPDMGEXP[economy$PROPDMGEXP == "B"] <- 1e+09
economy$PROPDMGEXP[economy$PROPDMGEXP == "m"] <- 1e+06
economy$PROPDMGEXP[economy$PROPDMGEXP == "0"] <- 1
economy$PROPDMGEXP[economy$PROPDMGEXP == "5"] <- 1e+05
economy$PROPDMGEXP[economy$PROPDMGEXP == "6"] <- 1e+06
economy$PROPDMGEXP[economy$PROPDMGEXP == "4"] <- 1e+04
economy$PROPDMGEXP[economy$PROPDMGEXP == "2"] <- 1e+02
economy$PROPDMGEXP[economy$PROPDMGEXP == "3"] <- 1e+03
economy$PROPDMGEXP[economy$PROPDMGEXP == "h"] <- 100
economy$PROPDMGEXP[economy$PROPDMGEXP == "7"] <- 1e+07
economy$PROPDMGEXP[economy$PROPDMGEXP == "H"] <- 100
economy$PROPDMGEXP[economy$PROPDMGEXP == "1"] <- 10
economy$PROPDMGEXP[economy$PROPDMGEXP == "8"] <- 1e+08
# give 0 to invalid exponent data, so they will not be counted in
economy$PROPDMGEXP[economy$PROPDMGEXP == "+"] <- 0
economy$PROPDMGEXP[economy$PROPDMGEXP == "-"] <- 0
economy$PROPDMGEXP[economy$PROPDMGEXP == "?"] <- 0
economy$PROPDMGEXP <- as.numeric(economy$PROPDMGEXP)
economy$PROPDMGVAL <- economy$PROPDMG * economy$PROPDMGEXP
economy$CROPDMGEXP <- as.character(economy$CROPDMGEXP)
economy$CROPDMGEXP[economy$CROPDMGEXP == "K"] <- 1000
economy$CROPDMGEXP[economy$CROPDMGEXP == "M"] <- 1e+06
economy$CROPDMGEXP[economy$CROPDMGEXP == "" ] <- 1
economy$CROPDMGEXP[economy$CROPDMGEXP == "B"] <- 1e+09
economy$CROPDMGEXP[economy$CROPDMGEXP == "m"] <- 1e+06
economy$CROPDMGEXP[economy$CROPDMGEXP == "0"] <- 1
economy$CROPDMGEXP[economy$CROPDMGEXP == "5"] <- 1e+05
economy$CROPDMGEXP[economy$CROPDMGEXP == "6"] <- 1e+06
economy$CROPDMGEXP[economy$CROPDMGEXP == "4"] <- 1e+04
economy$CROPDMGEXP[economy$CROPDMGEXP == "2"] <- 1e+02
economy$CROPDMGEXP[economy$CROPDMGEXP == "3"] <- 1+03
economy$CROPDMGEXP[economy$CROPDMGEXP == "h"] <- 100
economy$CROPDMGEXP[economy$CROPDMGEXP == "7"] <- 1e+07
economy$CROPDMGEXP[economy$CROPDMGEXP == "H"] <- 100
economy$CROPDMGEXP[economy$CROPDMGEXP == "1"] <- 10
economy$CROPDMGEXP[economy$CROPDMGEXP == "8"] <- 1e+08
# give 0 to invalid exponent data, so they will not be counted in
economy$CROPDMGEXP[economy$CROPDMGEXP == "+"] <- 0
economy$CROPDMGEXP[economy$CROPDMGEXP == "-"] <- 0
economy$CROPDMGEXP[economy$CROPDMGEXP == "?"] <- 0
economy$CROPDMGEXP <-as.numeric(economy$CROPDMGEXP)
## Warning: NAs introduced by coercion
economy$CROPDMGVAL <- economy$CROPDMG * economy$CROPDMGEXP
economyPerEvent <- aggregate(formula = cbind(PROPDMGVAL , CROPDMGVAL) ~ EVTYPE ,data = economy, FUN = sum)
economyPerEvent$EVTYPE <- factor(economyPerEvent$EVTYPE, levels = economyPerEvent[order(economyPerEvent$PROPDMGVAL),'EVTYPE'])
economyPerEvent <- economyPerEvent[order(economyPerEvent$PROPDMGVAL,decreasing = TRUE),]
propdmgPerEvent <- economyPerEvent[1:20,]
proppl<- ggplot( mapping = aes(y = PROPDMGVAL, x = EVTYPE), data = propdmgPerEvent) +
geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=PROPDMGVAL, y = 1e+10))
print(proppl)
From the above plot we see that wind-related disasters like Tornados and thunderstorm wind are by far the most economy damaging disasters in the US. Following are the water/flood related events. The following events are either sub-categories of above events, or their severity is negligible compared to the most harmful of them.
Let’s now look at Crops Damage data
economyPerEvent$EVTYPE <- factor(economyPerEvent$EVTYPE, levels = economyPerEvent[order(economyPerEvent$CROPDMGVAL ),'EVTYPE'])
economyPerEvent <- economyPerEvent[order(economyPerEvent$CROPDMGVAL,decreasing = TRUE),]
cropPerEvent <- economyPerEvent[1:20,]
crpoppl<- ggplot( mapping = aes(y = CROPDMGVAL, x = EVTYPE), data = cropPerEvent) +
geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=CROPDMGVAL, y = 1e+09))
print(crpoppl)
In this case, the major damaging disasters are not the same as in former category. Rather these are the ones we usually identify with the crops destruction: Extreme precipitation, extreme temperatures
From this study we see that besides for the crops, Tornadoes bring the most harm for population and economy. In case of crops, floods are the main troublemakers.