Storms and other severe weather events impacts both public health and economic activities for communities and governments. Some of the devastating events can result in fatalities, injuries, and property damage. Preventing such outcomes to the extent possible is a a major area of concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
This report analyzes the NOAA storm database containing data on severe climate events. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. It was collected during the period from 1950 through 2011. The purpose of this analysis is to answer the following two questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences? Main conclusions of the study are as follows: 1. Tornado is the most harmful climate event for population health with more than 5600 deaths and 91346 injuries. 2. Floods have caused the most significant economic damage ~150 billion USD.
Loading Packages and setting up the working directory
setwd("/home/rstudio/Reproducible Research/week2")
library(grid)
library(ggplot2)
library(plyr)
require(gridExtra)
## Loading required package: gridExtra
## Data Processing
data <- read.csv("repdata_data_StormData1.csv")
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
summary(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
Since we do not need all the columns, let us select the relevant columns
reduceddata <-data[ , c(8, 23:28)]
head(reduceddata)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Next, we aggregate fatalities and injuries to assess the harm that different events caused with respect to population health. We have taken top 20 harmful events sorted in descending order.
harmfulevent<-aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data = reduceddata, sum, na.rm=TRUE)
harmfulevent<-arrange(harmfulevent, desc(FATALITIES+INJURIES))
harmfulevent<-harmfulevent[1:15,]
harmfulevent
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 TSTM WIND 504 6957
## 4 FLOOD 470 6789
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
## 7 FLASH FLOOD 978 1777
## 8 ICE STORM 89 1975
## 9 THUNDERSTORM WIND 133 1488
## 10 WINTER STORM 206 1321
## 11 HIGH WIND 248 1137
## 12 HAIL 15 1361
## 13 HURRICANE/TYPHOON 64 1275
## 14 HEAVY SNOW 127 1021
## 15 WILDFIRE 75 911
We plot the event data to analyze the damage caused to population health
names_events <- harmfulevent$EVTYPE
barplot(t(harmfulevent[,-1]), names.arg = names_events, ylim = c(0,95000), beside = T, cex.names = 0.8, las=2, col = c("yellow", "orange"), main="Top Disaster Casualties")
legend("topright",c("Fatalities","Injuries"),fill=c("yellow","orange"),bty = "n")
We can see from the table and barplot that maximum damage both in terms of injury and fatality i.e.popoulation health harm is caused by Tornadoes.
Data Processing
table(reduceddata$PROPDMGEXP)
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
table(reduceddata$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
We need to convert property and crop damage into numbers where H=10^2, K=10^3, M =10^6, and B=10^9. For this, we create two new variables: propvalue, cropvalue. Assign “O” to NA values
reduceddata$propFactor<-factor(reduceddata$PROPDMGEXP,levels=c("H","K","M","B","h","m","O"))
reduceddata$propFactor[is.na(reduceddata$propFactor)] <- "O"
table(reduceddata$propFactor)
##
## H K M B h m O
## 6 424665 11330 40 1 7 466248
reduceddata$cropFactor<-factor(reduceddata$CROPDMGEXP,levels=c("K","M","B","k","m","O"))
reduceddata$cropFactor[is.na(reduceddata$cropFactor)] <- "O"
table(reduceddata$cropFactor)
##
## K M B k m O
## 281832 1994 9 21 1 618440
reduceddata<- mutate(reduceddata,propvalue= 0, cropvalue=0)
reduceddata$propvalue[reduceddata$propFactor=="K"]<-reduceddata$PROPDMG[reduceddata$propFactor=="K"]*1000
reduceddata$propvalue[reduceddata$propFactor=="H"|reduceddata$propFactor=="h"]<-reduceddata$PROPDMG[reduceddata$propFactor=="H"|reduceddata$propFactor=="h"]*100
reduceddata$propvalue[reduceddata$propFactor=="M"|reduceddata$propFactor=="m"]<-reduceddata$PROPDMG[reduceddata$propFactor=="M"|reduceddata$propFactor=="m"]*1e6
reduceddata$propvalue[reduceddata$propFactor=="B"]<-reduceddata$PROPDMG[reduceddata$propFactor=="B"]*1e9
reduceddata$propvalue[reduceddata$propFactor=="O"]<- reduceddata$PROPDMG[reduceddata$propFactor=="O"]*1
reduceddata$cropvalue[reduceddata$cropFactor=="K"|reduceddata$cropFactor=="k"]<-reduceddata$CROPDMG[reduceddata$cropFactor=="K"|reduceddata$cropFactor=="k"]*1000
reduceddata$cropvalue[reduceddata$cropFactor=="M"|reduceddata$cropFactor=="m"]<-reduceddata$CROPDMG[reduceddata$cropFactor=="M"|reduceddata$cropFactor=="m"]*1e6
reduceddata$cropvalue[reduceddata$cropFactor=="B"]<-reduceddata$CROPDMG[reduceddata$cropFactor=="B"]*1e9
reduceddata$cropvalue[reduceddata$cropFactor=="O"]<-reduceddata$CROPDMG[reduceddata$cropFactor=="O"]*1
Next, we aggregate crop and property damage to assess the harm that different events caused. We have taken top 20 harmful events sorted in descending order.
economic_dmg<-aggregate(propvalue + cropvalue~ EVTYPE, data = reduceddata, sum, na.rm=TRUE)
names(economic_dmg) = c("EVENT", "TOTAL_DAMAGE")
economic_dmg<-arrange(economic_dmg, desc(TOTAL_DAMAGE))
economic_dmg<-economic_dmg[1:20,]
economic_dmg$TOTAL_DAMAGE <- economic_dmg$TOTAL_DAMAGE/10^9
economic_dmg$EVENT <- factor(economic_dmg$EVENT, levels = economic_dmg$EVENT)
head(economic_dmg)
## EVENT TOTAL_DAMAGE
## 1 FLOOD 150.31968
## 2 HURRICANE/TYPHOON 71.91371
## 3 TORNADO 57.35211
## 4 STORM SURGE 43.32354
## 5 HAIL 18.75822
## 6 FLASH FLOOD 17.56213
Plotting the result to analyze
with(economic_dmg, barplot(TOTAL_DAMAGE, names.arg = EVENT, beside = T, cex.names = 0.8, las=2, col = "light blue", main = "Total Property & Crop Damage (Top 20)", ylab = "Total Damage in USD (10^9)"))
We can observe from the table and bar plot that maximum damage is caused by Flood.