In this report I will analyse, which kind of severe weather events have the greatest impact on human fatalities, injusries and on property damage. The analysis uses data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, recorded in the period 4/18/1950 to 11/28/2011.
- Set-up (load resrouces, load data)
- Data Processing: Check data for NAs and transform damage data into integer numbers
- Results: Find and graph the weather events with the most severe impact on the following categories. For each of the above categories, top three most impactful type of events are graphed.
- fatalities
- injuries
- property damage
- crop damage
- total damage (property plus crop)
- Interpreation and conclusion
I begin by showing the system resources I use and activating all necessary resources
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_4.1.1 magrittr_2.0.1 fastmap_1.1.0 tools_4.1.1
## [5] htmltools_0.5.2 yaml_2.2.1 jquerylib_0.1.4 stringi_1.7.4
## [9] rmarkdown_2.11 knitr_1.36 stringr_1.4.0 xfun_0.26
## [13] digest_0.6.28 rlang_0.4.11 evaluate_0.14
devtools::install_github("jhrcook/mustashe")
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install Rtools 4.0 from https://cran.r-project.org/bin/windows/Rtools/.
## Skipping install of 'mustashe' from a github remote, the SHA1 (5547b7fd) has not changed since last install.
## Use `force = TRUE` to force installation
library(mustashe) ## used to stash (cache) the downloaded data
library(ggplot2) ## used for graphing
Data are loaded, stashed (cached), and split into 2 data frames;
1. health (contains all data relevant to the anlysis of the health related impacts, in particular FATALITIES and INJURIES)
2. dmg (contains all data related to the impact on property and crop damage)
# start from the original bz2 file
stash("Stormdata",{Stormdata<-read.csv("repdata_data_StormData.csv.bz2")})
## Loading stashed object.
health<-Stormdata[,c("EVTYPE","FATALITIES","INJURIES")]
dmg<-Stormdata[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(health)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
## 6 TORNADO 0 6
summary(health)
## EVTYPE FATALITIES INJURIES
## Length:902297 Min. : 0.0000 Min. : 0.0000
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000
## Mode :character Median : 0.0000 Median : 0.0000
## Mean : 0.0168 Mean : 0.1557
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :583.0000 Max. :1700.0000
head(dmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
summary(dmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG
## Length:902297 Min. : 0.00 Length:902297 Min. : 0.000
## Class :character 1st Qu.: 0.00 Class :character 1st Qu.: 0.000
## Mode :character Median : 0.00 Mode :character Median : 0.000
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
## CROPDMGEXP
## Length:902297
## Class :character
## Mode :character
##
##
##
##summary(propdmg)
##summary(cropdmg)
The main task here is to deal with the specific data format that is used for the damage figures. The damage data consist of a numerical value and a character indicating the exponent, i.e. a scaling factor. Some observations contain invalid characters. Hence, the invalid characters have to be filtered out and the damage figure has to be computed from the value multiplied by the scaling factor.
Invalid EXP-values are removed. I accepted only ““(blanc),”K”, “M”,“B”. Also accepted are “k”,“m”,“b”.If an invalid symbol (as in, not permissable according to the description) is used at any point, the observation is removed The dataset does not contain any NAs (see analysis below), so there is no need to remove NAs.
sum(is.na(health$FATALITIES)) ## compute the number of NAs in FATALITIES
## [1] 0
sum(is.na(health$INJURIES)) ## compute the number of NAs in INJURIES
## [1] 0
sum(is.na(dmg$PROPDMG)) ## compute the number of NAs in PROPDMG
## [1] 0
sum(is.na(dmg$CROPDMG)) ## compute the number of NAs in CROPDMG
## [1] 0
dmg$PROPDMGNUM<-dmg$PROPDMG
PROPisK<-dmg$PROPDMGEXP=="K"|dmg$PROPDMGEXP=="k"
PROPisM<-dmg$PROPDMGEXP=="M"|dmg$PROPDMGEXP=="m"
PROPisB<-dmg$PROPDMGEXP=="B"|dmg$PROPDMGEXP=="b"
dmg$PROPDMGNUM[PROPisK]<-1000*dmg$PROPDMG[PROPisK]
dmg$PROPDMGNUM[PROPisM]<-1000000*dmg$PROPDMG[PROPisM]
dmg$PROPDMGNUM[PROPisB]<-1000000000*dmg$PROPDMG[PROPisB]
dmg$CROPDMGNUM<-dmg$CROPDMG
CROPisK<-dmg$CROPDMGEXP=="K"|dmg$CROPDMGEXP=="k"
CROPisM<-dmg$CROPDMGEXP=="M"|dmg$CROPDMGEXP=="m"
CROPisB<-dmg$CROPDMGEXP=="B"|dmg$CROPDMGEXP=="b"
dmg$CROPDMGNUM[CROPisK]<-1000*dmg$CROPDMG[CROPisK]
dmg$CROPDMGNUM[CROPisM]<-1000000*dmg$CROPDMG[CROPisM]
dmg$CROPDMGNUM[CROPisB]<-1000000000*dmg$CROPDMG[CROPisB]
head(dmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM CROPDMGNUM
## 1 TORNADO 25.0 K 0 25000 0
## 2 TORNADO 2.5 K 0 2500 0
## 3 TORNADO 25.0 K 0 25000 0
## 4 TORNADO 2.5 K 0 2500 0
## 5 TORNADO 2.5 K 0 2500 0
## 6 TORNADO 2.5 K 0 2500 0
I start with fatalities and injuries. I calculate the sum of both variables by event type, order the sums (decending) and plot the top 3 values.
# fatalieties
FatSums<-tapply(health$FATALITIES,health$EVTYPE,sum) #caluculate sums by event type
OFatSums<-FatSums[order(FatSums,decreasing=TRUE)] #order
FAT<-data.frame(names(OFatSums),OFatSums,row.names=1:length(OFatSums)) #assemble data in one data frame
colnames(FAT)<-c("Event","Fatalities")
# injuries
InjSums<-tapply(health$INJURIES,health$EVTYPE,sum) #caluclate sums by eent type
OInjSums<-InjSums[order(InjSums,decreasing=TRUE)] #order
INJ<-data.frame(names(OInjSums),OInjSums,row.names=1:length(OInjSums)) #assemble data in one data frame
colnames(INJ)<-c("Event","Injuries")
# graphing
ggplot(data=FAT[1:3,],aes(x=reorder(Event,-Fatalities),y=Fatalities))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Total Number of Fatalities ") +
xlab("Event Type") + ylab("Total Number of Fatatlities")
ggplot(data=INJ[1:3,],aes(x=reorder(Event,-Injuries),y=Injuries))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Total Number of Injuries") + xlab("Event Type") + ylab("Total Number of Injured Persons")
In the same way, I preoceed with property damage, crop damage, and total damage. I calculate the sums of both variables by event type, order the sums (decending) and plot the top 3 values.
#property damage
PropSums<-tapply(dmg$PROPDMGNUM,dmg$EVTYPE,sum)
OPropSums<-PropSums[order(PropSums,decreasing=TRUE)]
PDMG<-data.frame(names(OPropSums),OPropSums,row.names=1:length(OPropSums))
colnames(PDMG)<-c("Event","PDamage")
head(PDMG)
## Event PDamage
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56937160779
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140812067
## 6 HAIL 15732267048
#crop damage
CropSums<-tapply(dmg$CROPDMGNUM,dmg$EVTYPE,sum)
OCropSums<-CropSums[order(CropSums,decreasing=TRUE)]
CDMG<-data.frame(names(OCropSums),OCropSums,row.names=1:length(OCropSums))
colnames(CDMG)<-c("Event","CDamage")
head(CDMG)
## Event CDamage
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
## 6 HURRICANE 2741910000
#total damage
dmg$TDMGNUM<-dmg$PROPDMGNUM+dmg$CROPDMGNUM #calculate total damage
TSums<-tapply(dmg$TDMGNUM,dmg$EVTYPE,sum)
OTSums<-TSums[order(TSums,decreasing=TRUE)]
TDMG<-data.frame(names(OTSums),OTSums,row.names=1:length(OTSums))
colnames(TDMG)<-c("Event","TDamage")
head(TDMG)
## Event TDamage
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049
## 4 STORM SURGE 43323541000
## 5 HAIL 18758221521
## 6 FLASH FLOOD 17562129167
#graph
# here are 2 graphs that are omitted in the report
#ggplot(data=PDMG[1:3,],aes(x=reorder(Event,-PDamage),y=PDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg. Amount of Pers. Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Personal Damage in US$")
#ggplot(data=CDMG[1:3,],aes(x=reorder(Event,-CDamage),y=CDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg Amount of Crop Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Crop Damage in US$")
# total damage graph in the report
ggplot(data=TDMG[1:3,],aes(x=reorder(Event,-TDamage),y=TDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Total Amount of Total Damage Inflicted \n (Overall Greatest Economic Damage)") + xlab("Event Type") + ylab("Total Amount of Total Damage in US$")
Overall, TORNADO appears to be the weather event with the most severe impact on human lives and injuries. It is also on the thrird place in terms of economic damage. In terms of economic damage (crop and personal damage combined), FLOOD has the greatest impact.