This study includes an exploratory research of NOAA natural disasters data. The goal of the study is to determine what natural disasters bring the most harm to population health and economy. This is done by summarising data collected by U.S. National Oceanic and Atmospheric Administration during the years 1950-2011. For each event type the total amount of damage is calculated and presented in descending order so that reader can see the relative perspective of the most damaging events and decide upon required measures.

Loading libraries

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Processing

setwd('~/Coursera/4 - Reproducible Research/Week 4/Project')
if (!file.exists('NOAAdata.csv.bz2'))
{
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', 'NOAAdata.csv.bz2')

}
if(!exists('NOAA'))
{
  NOAA <- read.csv('NOAAdata.csv.bz2')
}

Population health impact

Let’s see what events are most harmful for population health. For this we will investigate numbers of fatalities and injuries

Fatalities

health <- NOAA[,c('EVTYPE','INJURIES','FATALITIES')]  # selecting only relevant data
healthPerEvent <- aggregate(formula = cbind(INJURIES , FATALITIES) ~ EVTYPE ,data = health, FUN = sum) # summarising all event types
healthPerEvent$EVTYPE <- factor(healthPerEvent$EVTYPE, levels =  healthPerEvent[order(healthPerEvent$FATALITIES),'EVTYPE'])
healthPerEvent <- healthPerEvent[order(healthPerEvent$FATALITIES,decreasing = TRUE),] # ordering descending by fatalities
fatalPerEvent <- healthPerEvent[1:20,] #selecting 20 most harmful events
fatpl<- ggplot( mapping = aes(y = FATALITIES, x = EVTYPE), data = fatalPerEvent) + 
  geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=FATALITIES, y = 500)) #creating barplot
print(fatpl) #plotting

From the above plot we see that wind-related disasters like Tornado and Thunderstorm wind are by far the most fatal disaster in the US. Following are the heat related events. The following events are either sub-categories of above events, or their severity is negligible compared to the most harmful of them.

Injuries

Let’s now look at injuries data

healthPerEvent$EVTYPE <- factor(healthPerEvent$EVTYPE, levels =  healthPerEvent[order(healthPerEvent$INJURIES),'EVTYPE'])
healthPerEvent <- healthPerEvent[order(healthPerEvent$INJURIES,decreasing = TRUE),]
injurPerEvent <- healthPerEvent[1:20,]

injpl<- ggplot( mapping = aes(y = INJURIES, x = EVTYPE), data = injurPerEvent) + 
  geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=INJURIES, y = 10000))
print(injpl)

First of all, as expected, the numbers of injuries is far larger. Second, we again see wind-related disasters on the first place. following are heat-related disasters.

Economy impact

Let’s see what events are most harmful for Economy. For this we will investigate numbers of property and crop damage

First we need to tidy up the data. PROPDMGEXP and CROPDMGEXP values are not suitable for calculations. We’ll have to convert order abbreviations into actual numbers.

economy <- NOAA[,c('EVTYPE','PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')]

# Sorting the property exponent data
economy$PROPDMGEXP <- as.character(economy$PROPDMGEXP)
economy$PROPDMGEXP[economy$PROPDMGEXP == "K"] <- 1000
economy$PROPDMGEXP[economy$PROPDMGEXP == "M"] <- 1e+06
economy$PROPDMGEXP[economy$PROPDMGEXP == "" ] <- 1
economy$PROPDMGEXP[economy$PROPDMGEXP == "B"] <- 1e+09
economy$PROPDMGEXP[economy$PROPDMGEXP == "m"] <- 1e+06
economy$PROPDMGEXP[economy$PROPDMGEXP == "0"] <- 1
economy$PROPDMGEXP[economy$PROPDMGEXP == "5"] <- 1e+05
economy$PROPDMGEXP[economy$PROPDMGEXP == "6"] <- 1e+06
economy$PROPDMGEXP[economy$PROPDMGEXP == "4"] <- 1e+04
economy$PROPDMGEXP[economy$PROPDMGEXP == "2"] <- 1e+02
economy$PROPDMGEXP[economy$PROPDMGEXP == "3"] <- 1e+03
economy$PROPDMGEXP[economy$PROPDMGEXP == "h"] <- 100
economy$PROPDMGEXP[economy$PROPDMGEXP == "7"] <- 1e+07
economy$PROPDMGEXP[economy$PROPDMGEXP == "H"] <- 100
economy$PROPDMGEXP[economy$PROPDMGEXP == "1"] <- 10
economy$PROPDMGEXP[economy$PROPDMGEXP == "8"] <- 1e+08
# give 0 to invalid exponent data, so they will not be counted in
economy$PROPDMGEXP[economy$PROPDMGEXP == "+"] <- 0
economy$PROPDMGEXP[economy$PROPDMGEXP == "-"] <- 0
economy$PROPDMGEXP[economy$PROPDMGEXP == "?"] <- 0
economy$PROPDMGEXP <- as.numeric(economy$PROPDMGEXP)
economy$PROPDMGVAL <- economy$PROPDMG * economy$PROPDMGEXP


economy$CROPDMGEXP <- as.character(economy$CROPDMGEXP) 
economy$CROPDMGEXP[economy$CROPDMGEXP == "K"] <- 1000
economy$CROPDMGEXP[economy$CROPDMGEXP == "M"] <- 1e+06
economy$CROPDMGEXP[economy$CROPDMGEXP == "" ] <- 1
economy$CROPDMGEXP[economy$CROPDMGEXP == "B"] <- 1e+09
economy$CROPDMGEXP[economy$CROPDMGEXP == "m"] <- 1e+06
economy$CROPDMGEXP[economy$CROPDMGEXP == "0"] <- 1
economy$CROPDMGEXP[economy$CROPDMGEXP == "5"] <- 1e+05
economy$CROPDMGEXP[economy$CROPDMGEXP == "6"] <- 1e+06
economy$CROPDMGEXP[economy$CROPDMGEXP == "4"] <- 1e+04
economy$CROPDMGEXP[economy$CROPDMGEXP == "2"] <- 1e+02
economy$CROPDMGEXP[economy$CROPDMGEXP == "3"] <- 1+03
economy$CROPDMGEXP[economy$CROPDMGEXP == "h"] <- 100
economy$CROPDMGEXP[economy$CROPDMGEXP == "7"] <- 1e+07
economy$CROPDMGEXP[economy$CROPDMGEXP == "H"] <- 100
economy$CROPDMGEXP[economy$CROPDMGEXP == "1"] <- 10
economy$CROPDMGEXP[economy$CROPDMGEXP == "8"] <- 1e+08
# give 0 to invalid exponent data, so they will not be counted in
economy$CROPDMGEXP[economy$CROPDMGEXP == "+"] <- 0
economy$CROPDMGEXP[economy$CROPDMGEXP == "-"] <- 0
economy$CROPDMGEXP[economy$CROPDMGEXP == "?"] <- 0
economy$CROPDMGEXP <-as.numeric(economy$CROPDMGEXP)
## Warning: NAs introduced by coercion
economy$CROPDMGVAL <- economy$CROPDMG * economy$CROPDMGEXP
economyPerEvent <- aggregate(formula = cbind(PROPDMGVAL , CROPDMGVAL) ~ EVTYPE ,data = economy, FUN = sum)

Property damage

economyPerEvent$EVTYPE <- factor(economyPerEvent$EVTYPE, levels =  economyPerEvent[order(economyPerEvent$PROPDMGVAL),'EVTYPE'])
economyPerEvent <- economyPerEvent[order(economyPerEvent$PROPDMGVAL,decreasing = TRUE),]
propdmgPerEvent <- economyPerEvent[1:20,]
proppl<- ggplot( mapping = aes(y = PROPDMGVAL, x = EVTYPE), data = propdmgPerEvent) + 
  geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=PROPDMGVAL, y = 1e+10))
print(proppl)

From the above plot we see that wind-related disasters like Tornados and thunderstorm wind are by far the most economy damaging disasters in the US. Following are the water/flood related events. The following events are either sub-categories of above events, or their severity is negligible compared to the most harmful of them.

Crops Damage

Let’s now look at Crops Damage data

economyPerEvent$EVTYPE <- factor(economyPerEvent$EVTYPE, levels =  economyPerEvent[order(economyPerEvent$CROPDMGVAL ),'EVTYPE'])
economyPerEvent <- economyPerEvent[order(economyPerEvent$CROPDMGVAL,decreasing = TRUE),]
cropPerEvent <- economyPerEvent[1:20,]

crpoppl<- ggplot( mapping = aes(y = CROPDMGVAL, x = EVTYPE), data = cropPerEvent) + 
  geom_bar(stat = 'identity') + coord_flip() + geom_text(aes(label=CROPDMGVAL, y = 1e+09))
print(crpoppl)

In this case, the major damaging disasters are not the same as in former category. Rather these are the ones we usually identify with the crops destruction: Extreme precipitation, extreme temperatures

Results

From this study we see that besides for the crops, Tornadoes bring the most harm for population and economy. In case of crops, floods are the main troublemakers.