In this investigation data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been analysed to determine the type of weather events which are most harmful with respect to population health and have the greatest economic consequences to the United States. The NOAA Storm Database is a compilation of sever weather events that have occurred in the United States during the period 1950 to November 2011. Data processing completed on the data includes loading it into R, cleaning and aggregation so that summary figures could be compiled. The results show that Tornado events are by far the most harmful to population health while Flooding events contribute the greatest economic consequences.
First we will load the data, which can been done directly from the bz2 file type using read.csvfunction
#source URL for data
FileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#checking to see if there is a data directory in working directory, if not create one
if (!dir.exists("./data")){
dir.create("./data")
}
#checking to see if the file is already downloaded in the data dir, if not download it
if (!file.exists("./data/StormData.csv.bz2")){
download.file(FileUrl,"./data/StormData.csv.bz2")
}
#load the data into R
StormDataLarge<- read.csv("./data/stormData.csv.bz2")
dim(StormDataLarge)
## [1] 902297 37
As can be seen the loaded data frame is very large with 902297 obs and 37 variables. We will subset StormDataLarge into a smaller data frame StormData with only the data we need and will work with for the analysis.
Colnames <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG",
"CROPDMGEXP")
StormData<-StormDataLarge[Colnames]
head(StormData)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
In order to asses which type of weather events are the most harmful to human health we will use the aggregate function to summarize the data so that we can see the number of fatalities for each type of weather event and the number of injures of each type of weather event.
library(dplyr)
EvtypeFatal<-aggregate(FATALITIES ~ EVTYPE, data = StormData, FUN = sum)
EvtypeFatal<-arrange(EvtypeFatal,-EvtypeFatal[,2])
EvtypeInjuries<-aggregate(INJURIES ~ EVTYPE, data = StormData, FUN = sum)
EvtypeInjuries<-arrange(EvtypeInjuries,-EvtypeInjuries[,2])
To assess the economic consequences of sever weather events we will again aggregate the data by each type of weather event. Economic consequences will be considered to be the sum of the two variables property damage PROPDMG and crop damage CROPDMG.
The property and crop damage variables have an additional alphabetical character variable PROPDMGEXP and CROPDMGEXP which defines the magnitude of the particular damage value, for example “k” for thousands, M" for millions and “B” for billions. These will first need to be converted to the appropriate numerical value and multiplied by their respective property and crop damage value.
summary(StormData$PROPDMGEXP)
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
First we see there are some unusual values “-”,“+”,“” etc which we will set to 1.
StormData$PROPDMGEXP<-as.character(StormData$PROPDMGEXP)
StormData$PROPDMGEXP<-toupper(StormData$PROPDMGEXP)
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("-","+","","?","0")]<-"1"
Next we will replace H,K,M,B with their appropriate numeric magnitude and ‘2’,‘3’,‘6’ and’9’ which we will take to be exponents for the magnitude of the values.
StormData$PROPDMGEXP[StormData$PROPDMGEXP == 'H']<-"100"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '2']<-"100"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == 'K']<-"1000"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '3']<-"1000"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '4']<-"1e4"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '5']<-"1e5"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '6']<-"1e6"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == 'M']<-"1e6"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '7']<-"1e7"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == '8']<-"1e8"
StormData$PROPDMGEXP[StormData$PROPDMGEXP == 'B']<-"1e9"
StormData$PROPDMGEXP<-as.factor(StormData$PROPDMGEXP)
summary(StormData$PROPDMGEXP)
## 1 100 1000 1e4 1e5 1e6 1e7 1e8 1e9
## 466189 20 424669 4 28 11341 5 1 40
Now we can repeat the above process for the crop damage
summary(StormData$CROPDMGEXP)
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
StormData$CROPDMGEXP<-as.character(StormData$CROPDMGEXP)
StormData$CROPDMGEXP<-toupper(StormData$CROPDMGEXP)
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("-","+","","?","0")]<-"1"
StormData$CROPDMGEXP[StormData$CROPDMGEXP == '2']<-"100"
StormData$CROPDMGEXP[StormData$CROPDMGEXP == 'H']<-"100"
StormData$CROPDMGEXP[StormData$CROPDMGEXP == 'K']<-"1000"
StormData$CROPDMGEXP[StormData$CROPDMGEXP == 'M']<-"1e6"
StormData$CROPDMGEXP[StormData$CROPDMGEXP == 'B']<-"1e9"
StormData$CROPDMGEXP<-as.factor(StormData$CROPDMGEXP)
summary(StormData$CROPDMGEXP)
## 1 100 1000 1e6 1e9
## 618439 1 281853 1995 9
Now we can multiple the property and crop damage values by their now numerical exponents
StormData$PROPDMGEXP<-as.numeric(as.character(StormData$PROPDMGEXP))
StormData$CROPDMGEXP<-as.numeric(as.character(StormData$CROPDMGEXP))
StormData$PROPDMG<-as.numeric(StormData$PROPDMG)
StormData$CROPDMG<-as.numeric(StormData$CROPDMG)
StormData$PROPDMGLong<-StormData$PROPDMG*StormData$PROPDMGEXP
StormData$CROPDMGLong<-StormData$CROPDMG*StormData$CROPDMGEXP
StormData$Damage<-StormData$PROPDMGLong+StormData$CROPDMGLong
Now we have a new variable Damage which is property damage + crop damage for each sever weather event in the database, we will refer to this as total damage. So we can see which type of weather event causes the most damage overall in the U.S we will aggregate the total damage and event type as we did above with fatalities and injuries.
DamageEst<-aggregate(Damage ~ EVTYPE, data = StormData, FUN = sum)
DamageEst$DamageBillions<-DamageEst$Damage/(1e9)
DamageEst<-DamageEst[,c(1,3)]
DamageEst<-arrange(DamageEst,-DamageEst[,2])
head(DamageEst)
## EVTYPE DamageBillions
## 1 FLOOD 150.31968
## 2 HURRICANE/TYPHOON 71.91371
## 3 TORNADO 57.36233
## 4 STORM SURGE 43.32354
## 5 HAIL 18.76122
## 6 FLASH FLOOD 18.24399
The plot below shows the types of sever weather events which cause the most fatalities.
par(mar= c(7, 4, 4, 2) + 0.1)
barplot(height = EvtypeFatal$FATALITIES[1:10], names.arg = EvtypeFatal$EVTYPE[1:10], las = 2, cex.names= 0.8, col = heat.colors (10), main = "Top 10 Fatalilites by Sever Weather Event Type", ylab = "Fatalilites")
Below shows us the most injury causing types of sever weather events.
par(mar= c(9, 4.5, 4, 2) + 0.1)
barplot(height = EvtypeInjuries$INJURIES[1:10], names.arg = EvtypeInjuries$EVTYPE[1:10], las = 2, cex.names= 0.8, col = heat.colors (10), main = "Top 10 Injuries by Sever Weather Event Type", ylab="" )
mtext("Injuries", side=2, line=3.7)
As we can see from the above two graphs Tornado is by far the most dangerous type of sever weather event for both fatalities and injuries.
In graph below we can see Total Damage by type of weather event in billions $USD where total Damage is the combination of both property + crop damage.
par(mar= c(9, 4.5, 4, 2) + 0.1)
barplot(height = DamageEst$DamageBillions[1:10], names.arg = DamageEst$EVTYPE[1:10], las = 2, cex.names= 0.8, col = heat.colors (10), main = "Top 10 Total Damage by Sever Weather Event Type", ylab="" )
mtext("Total Damage (Billions $USD)", side=2, line=3.7)
The above graph shows the most damaging type of sever weather event is Flooding followed by Hurricane/Typhoon and Tornado.
From the above graphs it can be seen that Tornado is the most harmful type of sever weather event while Flooding is the has the greatest economic consequences in terms of property and crop damage.