In this report I will analyse, which kind of severe weather events have the greatest impact on human fatalities, injusries and on property damage. The analysis uses data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, recorded in the period 4/18/1950 to 11/28/2011.
- Set-up (load resrouces, load data)
- Data Processing: Check data for NAs and transform damage data into integer numbers
- Results: Find and graph the weather events with the most severe impact on the following categories. For each of the above categories, top three most impactful type of events are graphed.
- fatalities
- injuries
- property damage
- crop damage
- total damage (property plus crop)
- Interpreation and conclusion
I begin by showing the system resources I use and activating all necessary resources
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_4.1.1 magrittr_2.0.1 fastmap_1.1.0 tools_4.1.1
## [5] htmltools_0.5.2 yaml_2.2.1 jquerylib_0.1.4 stringi_1.7.4
## [9] rmarkdown_2.11 knitr_1.36 stringr_1.4.0 xfun_0.26
## [13] digest_0.6.28 rlang_0.4.11 evaluate_0.14
devtools::install_github("jhrcook/mustashe")
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install Rtools 4.0 from https://cran.r-project.org/bin/windows/Rtools/.
## Skipping install of 'mustashe' from a github remote, the SHA1 (5547b7fd) has not changed since last install.
## Use `force = TRUE` to force installation
library(mustashe) ## used to stash (cache) the downloaded data
library(ggplot2) ## used for graphing
Data are loaded, stashed (cached), and split into 2 data frames;
1. health (contains all data relevant to the anlysis of the health related impacts, in particular FATALITIES and INJURIES)
2. dmg (contains all data related to the impact on property and crop damage)
# start from the original bz2 file
stash("Stormdata",{Stormdata<-read.csv("repdata_data_StormData.csv.bz2")})
## Loading stashed object.
health<-Stormdata[,c("EVTYPE","FATALITIES","INJURIES")]
dmg<-Stormdata[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(health)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
## 6 TORNADO 0 6
summary(health)
## EVTYPE FATALITIES INJURIES
## Length:902297 Min. : 0.0000 Min. : 0.0000
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000
## Mode :character Median : 0.0000 Median : 0.0000
## Mean : 0.0168 Mean : 0.1557
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :583.0000 Max. :1700.0000
head(dmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
summary(dmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG
## Length:902297 Min. : 0.00 Length:902297 Min. : 0.000
## Class :character 1st Qu.: 0.00 Class :character 1st Qu.: 0.000
## Mode :character Median : 0.00 Mode :character Median : 0.000
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
## CROPDMGEXP
## Length:902297
## Class :character
## Mode :character
##
##
##
##summary(propdmg)
##summary(cropdmg)
The main task here is to deal with the specific data format that is used for the damage figures. The damage data consist of a numerical value and a character indicating the exponent, i.e. a scaling factor. Some observations contain invalid characters. Hence, the invalid characters have to be filtered out and the damage figure has to be computed from the value multiplied by the scaling factor.
Invalid EXP-values are removed. I accepted only ““(blanc),”K”, “M”,“B”. Also accepted are “k”,“m”,“b”.If an invalid symbol (as in, not permissable according to the description) is used at any point, the observation is removed The dataset does not contain any NAs (see analysis below), so there is no need to remove NAs.
sum(is.na(health$FATALITIES)) ## compute the number of NAs in FATALITIES
## [1] 0
sum(is.na(health$INJURIES)) ## compute the number of NAs in INJURIES
## [1] 0
sum(is.na(dmg$PROPDMG)) ## compute the number of NAs in PROPDMG
## [1] 0
sum(is.na(dmg$CROPDMG)) ## compute the number of NAs in CROPDMG
## [1] 0
dmg$PROPDMGNUM<-dmg$PROPDMG
PROPisK<-dmg$PROPDMGEXP=="K"|dmg$PROPDMGEXP=="k"
PROPisM<-dmg$PROPDMGEXP=="M"|dmg$PROPDMGEXP=="m"
PROPisB<-dmg$PROPDMGEXP=="B"|dmg$PROPDMGEXP=="b"
dmg$PROPDMGNUM[PROPisK]<-1000*dmg$PROPDMG[PROPisK]
dmg$PROPDMGNUM[PROPisM]<-1000000*dmg$PROPDMG[PROPisM]
dmg$PROPDMGNUM[PROPisB]<-1000000000*dmg$PROPDMG[PROPisB]
dmg$CROPDMGNUM<-dmg$CROPDMG
CROPisK<-dmg$CROPDMGEXP=="K"|dmg$CROPDMGEXP=="k"
CROPisM<-dmg$CROPDMGEXP=="M"|dmg$CROPDMGEXP=="m"
CROPisB<-dmg$CROPDMGEXP=="B"|dmg$CROPDMGEXP=="b"
dmg$CROPDMGNUM[CROPisK]<-1000*dmg$CROPDMG[CROPisK]
dmg$CROPDMGNUM[CROPisM]<-1000000*dmg$CROPDMG[CROPisM]
dmg$CROPDMGNUM[CROPisB]<-1000000000*dmg$CROPDMG[CROPisB]
head(dmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM CROPDMGNUM
## 1 TORNADO 25.0 K 0 25000 0
## 2 TORNADO 2.5 K 0 2500 0
## 3 TORNADO 25.0 K 0 25000 0
## 4 TORNADO 2.5 K 0 2500 0
## 5 TORNADO 2.5 K 0 2500 0
## 6 TORNADO 2.5 K 0 2500 0
I start with fatalities and injuries. I calculate the means of both variables by event type, order the means (decending) and plot the top 3 values.
# fatalieties
FatMeans<-tapply(health$FATALITIES,health$EVTYPE,mean) #caluculate means by event type
OFatMeans<-FatMeans[order(FatMeans,decreasing=TRUE)] #order
FAT<-data.frame(names(OFatMeans),OFatMeans,row.names=1:length(OFatMeans)) #assemble data in one data frame
colnames(FAT)<-c("Event","Fatalities")
# injuries
InjMeans<-tapply(health$INJURIES,health$EVTYPE,mean) #caluclate means by eent type
OInjMeans<-InjMeans[order(InjMeans,decreasing=TRUE)] #order
INJ<-data.frame(names(OInjMeans),OInjMeans,row.names=1:length(OInjMeans)) #assemble data in one data frame
colnames(INJ)<-c("Event","Injuries")
# graphing
ggplot(data=FAT[1:3,],aes(x=reorder(Event,-Fatalities),y=Fatalities))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Average Number of Fatalities ") +
xlab("Event Type") + ylab("Average Number of Fatatlities")
ggplot(data=INJ[1:3,],aes(x=reorder(Event,-Injuries),y=Injuries))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Average Number of Injuries") + xlab("Event Type") + ylab("Average Number of Injured Persons")
In the same way, I preoceed with property damage, crop damage, and total damage. I calculate the means of both variables by event type, order the means (decending) and plot the top 3 values.
#property damage
PropMeans<-tapply(dmg$PROPDMGNUM,dmg$EVTYPE,mean)
OPropMeans<-PropMeans[order(PropMeans,decreasing=TRUE)]
PDMG<-data.frame(names(OPropMeans),OPropMeans,row.names=1:length(OPropMeans))
colnames(PDMG)<-c("Event","PDamage")
head(PDMG)
## Event PDamage
## 1 TORNADOES, TSTM WIND, HAIL 1600000000
## 2 HEAVY RAIN/SEVERE WEATHER 1250000000
## 3 HURRICANE/TYPHOON 787566364
## 4 HURRICANE OPAL 352538444
## 5 STORM SURGE 165990559
## 6 WILD FIRES 156025000
#crop damage
CropMeans<-tapply(dmg$CROPDMGNUM,dmg$EVTYPE,mean)
OCropMeans<-CropMeans[order(CropMeans,decreasing=TRUE)]
CDMG<-data.frame(names(OCropMeans),OCropMeans,row.names=1:length(OCropMeans))
colnames(CDMG)<-c("Event","CDamage")
head(CDMG)
## Event CDamage
## 1 EXCESSIVE WETNESS 142000000
## 2 COLD AND WET CONDITIONS 66000000
## 3 DAMAGING FREEZE 43683333
## 4 Early Frost 42000000
## 5 HURRICANE/TYPHOON 29634918
## 6 RIVER FLOOD 29072017
#total damage
dmg$TDMGNUM<-dmg$PROPDMGNUM+dmg$CROPDMGNUM #calculate total damage
TMeans<-tapply(dmg$TDMGNUM,dmg$EVTYPE,mean)
OTMeans<-TMeans[order(TMeans,decreasing=TRUE)]
TDMG<-data.frame(names(OTMeans),OTMeans,row.names=1:length(OTMeans))
colnames(TDMG)<-c("Event","TDamage")
head(TDMG)
## Event TDamage
## 1 TORNADOES, TSTM WIND, HAIL 1602500000
## 2 HEAVY RAIN/SEVERE WEATHER 1250000000
## 3 HURRICANE/TYPHOON 817201282
## 4 HURRICANE OPAL 354649556
## 5 STORM SURGE 165990579
## 6 WILD FIRES 156025000
#graph
# here are 2 graphs that are omitted in the report
#ggplot(data=PDMG[1:3,],aes(x=reorder(Event,-PDamage),y=PDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg. Amount of Pers. Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Personal Damage in US$")
#ggplot(data=CDMG[1:3,],aes(x=reorder(Event,-CDamage),y=CDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg Amount of Crop Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Crop Damage in US$")
# total damage graph in the report
ggplot(data=TDMG[1:3,],aes(x=reorder(Event,-TDamage),y=TDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg. Amount of Total Damage Inflicted \n (Overall Greatest Economic Damage)") + xlab("Event Type") + ylab("Average Amount of Total Damage in US$")
Overall, TORNADOES, TSTM WIND, HAIL appears to be the weather event with the most severe impact on human lives, personal and overall damage. In terms of injuries, Heat Wave has the worst record, whereas in the category of crop damage, excessive wetness, particulary EXCESSIVE WETNESS does the most harm.