Severe Weather Events as Causes of Fatalities, Injuries and Proplerty Damage

Synopsis

In this report I will analyse, which kind of severe weather events have the greatest impact on human fatalities, injusries and on property damage. The analysis uses data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, recorded in the period 4/18/1950 to 11/28/2011.

I will proceed as follows:

  1. Set-up (load resrouces, load data)
  2. Data Processing: Check data for NAs and transform damage data into integer numbers
  3. Results: Find and graph the weather events with the most severe impact on the following categories. For each of the above categories, top three most impactful type of events are graphed.
    • fatalities
    • injuries
    • property damage
    • crop damage
    • total damage (property plus crop)
  4. Interpreation and conclusion

1. Set-up

I begin by showing the system resources I use and activating all necessary resources

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.1.1  magrittr_2.0.1  fastmap_1.1.0   tools_4.1.1    
##  [5] htmltools_0.5.2 yaml_2.2.1      jquerylib_0.1.4 stringi_1.7.4  
##  [9] rmarkdown_2.11  knitr_1.36      stringr_1.4.0   xfun_0.26      
## [13] digest_0.6.28   rlang_0.4.11    evaluate_0.14
devtools::install_github("jhrcook/mustashe")
## WARNING: Rtools is required to build R packages, but is not currently installed.
## 
## Please download and install Rtools 4.0 from https://cran.r-project.org/bin/windows/Rtools/.
## Skipping install of 'mustashe' from a github remote, the SHA1 (5547b7fd) has not changed since last install.
##   Use `force = TRUE` to force installation
library(mustashe)  ## used to stash (cache) the downloaded data
library(ggplot2)  ## used for graphing


2. Data Processing

2.1 Loading Data

Data are loaded, stashed (cached), and split into 2 data frames;
1. health (contains all data relevant to the anlysis of the health related impacts, in particular FATALITIES and INJURIES)
2. dmg (contains all data related to the impact on property and crop damage)

# start from the original bz2 file
stash("Stormdata",{Stormdata<-read.csv("repdata_data_StormData.csv.bz2")})
## Loading stashed object.
health<-Stormdata[,c("EVTYPE","FATALITIES","INJURIES")]
dmg<-Stormdata[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]


head(health)
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2
## 6 TORNADO          0        6
summary(health)
##     EVTYPE            FATALITIES          INJURIES        
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Mode  :character   Median :  0.0000   Median :   0.0000  
##                     Mean   :  0.0168   Mean   :   0.1557  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##                     Max.   :583.0000   Max.   :1700.0000
head(dmg)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO    25.0          K       0           
## 2 TORNADO     2.5          K       0           
## 3 TORNADO    25.0          K       0           
## 4 TORNADO     2.5          K       0           
## 5 TORNADO     2.5          K       0           
## 6 TORNADO     2.5          K       0
summary(dmg)
##     EVTYPE             PROPDMG         PROPDMGEXP           CROPDMG       
##  Length:902297      Min.   :   0.00   Length:902297      Min.   :  0.000  
##  Class :character   1st Qu.:   0.00   Class :character   1st Qu.:  0.000  
##  Mode  :character   Median :   0.00   Mode  :character   Median :  0.000  
##                     Mean   :  12.06                      Mean   :  1.527  
##                     3rd Qu.:   0.50                      3rd Qu.:  0.000  
##                     Max.   :5000.00                      Max.   :990.000  
##   CROPDMGEXP       
##  Length:902297     
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
##summary(propdmg)
##summary(cropdmg)

2.2 Checking for NAs and Transforming Data

The main task here is to deal with the specific data format that is used for the damage figures. The damage data consist of a numerical value and a character indicating the exponent, i.e. a scaling factor. Some observations contain invalid characters. Hence, the invalid characters have to be filtered out and the damage figure has to be computed from the value multiplied by the scaling factor.
Invalid EXP-values are removed. I accepted only ““(blanc),”K”, “M”,“B”. Also accepted are “k”,“m”,“b”.If an invalid symbol (as in, not permissable according to the description) is used at any point, the observation is removed The dataset does not contain any NAs (see analysis below), so there is no need to remove NAs.

sum(is.na(health$FATALITIES))  ## compute the number of NAs in FATALITIES
## [1] 0
sum(is.na(health$INJURIES))    ## compute the number of NAs in INJURIES
## [1] 0
sum(is.na(dmg$PROPDMG))        ## compute the number of NAs in PROPDMG
## [1] 0
sum(is.na(dmg$CROPDMG))        ## compute the number of NAs in CROPDMG
## [1] 0


dmg$PROPDMGNUM<-dmg$PROPDMG
PROPisK<-dmg$PROPDMGEXP=="K"|dmg$PROPDMGEXP=="k"
PROPisM<-dmg$PROPDMGEXP=="M"|dmg$PROPDMGEXP=="m"
PROPisB<-dmg$PROPDMGEXP=="B"|dmg$PROPDMGEXP=="b"
dmg$PROPDMGNUM[PROPisK]<-1000*dmg$PROPDMG[PROPisK]
dmg$PROPDMGNUM[PROPisM]<-1000000*dmg$PROPDMG[PROPisM]
dmg$PROPDMGNUM[PROPisB]<-1000000000*dmg$PROPDMG[PROPisB]

dmg$CROPDMGNUM<-dmg$CROPDMG
CROPisK<-dmg$CROPDMGEXP=="K"|dmg$CROPDMGEXP=="k"
CROPisM<-dmg$CROPDMGEXP=="M"|dmg$CROPDMGEXP=="m"
CROPisB<-dmg$CROPDMGEXP=="B"|dmg$CROPDMGEXP=="b"
dmg$CROPDMGNUM[CROPisK]<-1000*dmg$CROPDMG[CROPisK]
dmg$CROPDMGNUM[CROPisM]<-1000000*dmg$CROPDMG[CROPisM]
dmg$CROPDMGNUM[CROPisB]<-1000000000*dmg$CROPDMG[CROPisB]

head(dmg)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM CROPDMGNUM
## 1 TORNADO    25.0          K       0                 25000          0
## 2 TORNADO     2.5          K       0                  2500          0
## 3 TORNADO    25.0          K       0                 25000          0
## 4 TORNADO     2.5          K       0                  2500          0
## 5 TORNADO     2.5          K       0                  2500          0
## 6 TORNADO     2.5          K       0                  2500          0


3. Results

Finding and graphing the weather events with the most severe impact

I start with fatalities and injuries. I calculate the sum of both variables by event type, order the sums (decending) and plot the top 3 values.

# fatalieties
FatSums<-tapply(health$FATALITIES,health$EVTYPE,sum) #caluculate sums by event type
OFatSums<-FatSums[order(FatSums,decreasing=TRUE)]   #order
FAT<-data.frame(names(OFatSums),OFatSums,row.names=1:length(OFatSums)) #assemble data in one data frame
colnames(FAT)<-c("Event","Fatalities")

# injuries                                                    
InjSums<-tapply(health$INJURIES,health$EVTYPE,sum) #caluclate sums by eent type
OInjSums<-InjSums[order(InjSums,decreasing=TRUE)] #order
INJ<-data.frame(names(OInjSums),OInjSums,row.names=1:length(OInjSums)) #assemble data in one data frame
colnames(INJ)<-c("Event","Injuries")

# graphing
ggplot(data=FAT[1:3,],aes(x=reorder(Event,-Fatalities),y=Fatalities))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Total Number of Fatalities ") +
  xlab("Event Type") + ylab("Total Number of Fatatlities")

ggplot(data=INJ[1:3,],aes(x=reorder(Event,-Injuries),y=Injuries))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Total Number of Injuries") + xlab("Event Type") + ylab("Total Number of Injured Persons")

In the same way, I preoceed with property damage, crop damage, and total damage. I calculate the sums of both variables by event type, order the sums (decending) and plot the top 3 values.

#property damage
PropSums<-tapply(dmg$PROPDMGNUM,dmg$EVTYPE,sum)
OPropSums<-PropSums[order(PropSums,decreasing=TRUE)]
PDMG<-data.frame(names(OPropSums),OPropSums,row.names=1:length(OPropSums))
colnames(PDMG)<-c("Event","PDamage")
head(PDMG)
##               Event      PDamage
## 1             FLOOD 144657709807
## 2 HURRICANE/TYPHOON  69305840000
## 3           TORNADO  56937160779
## 4       STORM SURGE  43323536000
## 5       FLASH FLOOD  16140812067
## 6              HAIL  15732267048
#crop damage
CropSums<-tapply(dmg$CROPDMGNUM,dmg$EVTYPE,sum)
OCropSums<-CropSums[order(CropSums,decreasing=TRUE)]
CDMG<-data.frame(names(OCropSums),OCropSums,row.names=1:length(OCropSums))
colnames(CDMG)<-c("Event","CDamage")
head(CDMG)
##         Event     CDamage
## 1     DROUGHT 13972566000
## 2       FLOOD  5661968450
## 3 RIVER FLOOD  5029459000
## 4   ICE STORM  5022113500
## 5        HAIL  3025954473
## 6   HURRICANE  2741910000
#total damage
dmg$TDMGNUM<-dmg$PROPDMGNUM+dmg$CROPDMGNUM #calculate total damage
TSums<-tapply(dmg$TDMGNUM,dmg$EVTYPE,sum)
OTSums<-TSums[order(TSums,decreasing=TRUE)]
TDMG<-data.frame(names(OTSums),OTSums,row.names=1:length(OTSums))
colnames(TDMG)<-c("Event","TDamage")
head(TDMG)
##               Event      TDamage
## 1             FLOOD 150319678257
## 2 HURRICANE/TYPHOON  71913712800
## 3           TORNADO  57352114049
## 4       STORM SURGE  43323541000
## 5              HAIL  18758221521
## 6       FLASH FLOOD  17562129167
#graph

# here are 2 graphs that are omitted in the report
#ggplot(data=PDMG[1:3,],aes(x=reorder(Event,-PDamage),y=PDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg. Amount of Pers. Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Personal Damage in US$")
#ggplot(data=CDMG[1:3,],aes(x=reorder(Event,-CDamage),y=CDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg Amount of Crop Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Crop Damage in US$")

# total damage graph in the report
ggplot(data=TDMG[1:3,],aes(x=reorder(Event,-TDamage),y=TDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Total Amount of Total Damage Inflicted \n (Overall Greatest Economic Damage)") + xlab("Event Type") + ylab("Total Amount of Total Damage in US$")

3. Interpretation and Conclusion

Overall, TORNADO appears to be the weather event with the most severe impact on human lives and injuries. It is also on the thrird place in terms of economic damage. In terms of economic damage (crop and personal damage combined), FLOOD has the greatest impact.

THE END