Severe Weather Events as Causes of Fatalities, Injuries and Proplerty Damage

Synopsis

In this report I will analyse, which kind of severe weather events have the greatest impact on human fatalities, injusries and on property damage. The analysis uses data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, recorded in the period 4/18/1950 to 11/28/2011.

I will proceed as follows:

  1. Set-up (load resrouces, load data)
  2. Data Processing: Check data for NAs and transform damage data into integer numbers
  3. Results: Find and graph the weather events with the most severe impact on the following categories. For each of the above categories, top three most impactful type of events are graphed.
    • fatalities
    • injuries
    • property damage
    • crop damage
    • total damage (property plus crop)
  4. Interpreation and conclusion

1. Set-up

I begin by showing the system resources I use and activating all necessary resources

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.1.1  magrittr_2.0.1  fastmap_1.1.0   tools_4.1.1    
##  [5] htmltools_0.5.2 yaml_2.2.1      jquerylib_0.1.4 stringi_1.7.4  
##  [9] rmarkdown_2.11  knitr_1.36      stringr_1.4.0   xfun_0.26      
## [13] digest_0.6.28   rlang_0.4.11    evaluate_0.14
devtools::install_github("jhrcook/mustashe")
## WARNING: Rtools is required to build R packages, but is not currently installed.
## 
## Please download and install Rtools 4.0 from https://cran.r-project.org/bin/windows/Rtools/.
## Skipping install of 'mustashe' from a github remote, the SHA1 (5547b7fd) has not changed since last install.
##   Use `force = TRUE` to force installation
library(mustashe)  ## used to stash (cache) the downloaded data
library(ggplot2)  ## used for graphing


2. Data Processing

2.1 Loading Data

Data are loaded, stashed (cached), and split into 2 data frames;
1. health (contains all data relevant to the anlysis of the health related impacts, in particular FATALITIES and INJURIES)
2. dmg (contains all data related to the impact on property and crop damage)

# start from the original bz2 file
stash("Stormdata",{Stormdata<-read.csv("repdata_data_StormData.csv.bz2")})
## Loading stashed object.
health<-Stormdata[,c("EVTYPE","FATALITIES","INJURIES")]
dmg<-Stormdata[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]


head(health)
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2
## 6 TORNADO          0        6
summary(health)
##     EVTYPE            FATALITIES          INJURIES        
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Mode  :character   Median :  0.0000   Median :   0.0000  
##                     Mean   :  0.0168   Mean   :   0.1557  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##                     Max.   :583.0000   Max.   :1700.0000
head(dmg)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO    25.0          K       0           
## 2 TORNADO     2.5          K       0           
## 3 TORNADO    25.0          K       0           
## 4 TORNADO     2.5          K       0           
## 5 TORNADO     2.5          K       0           
## 6 TORNADO     2.5          K       0
summary(dmg)
##     EVTYPE             PROPDMG         PROPDMGEXP           CROPDMG       
##  Length:902297      Min.   :   0.00   Length:902297      Min.   :  0.000  
##  Class :character   1st Qu.:   0.00   Class :character   1st Qu.:  0.000  
##  Mode  :character   Median :   0.00   Mode  :character   Median :  0.000  
##                     Mean   :  12.06                      Mean   :  1.527  
##                     3rd Qu.:   0.50                      3rd Qu.:  0.000  
##                     Max.   :5000.00                      Max.   :990.000  
##   CROPDMGEXP       
##  Length:902297     
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
##summary(propdmg)
##summary(cropdmg)

2.2 Checking for NAs and Transforming Data

The main task here is to deal with the specific data format that is used for the damage figures. The damage data consist of a numerical value and a character indicating the exponent, i.e. a scaling factor. Some observations contain invalid characters. Hence, the invalid characters have to be filtered out and the damage figure has to be computed from the value multiplied by the scaling factor.
Invalid EXP-values are removed. I accepted only ““(blanc),”K”, “M”,“B”. Also accepted are “k”,“m”,“b”.If an invalid symbol (as in, not permissable according to the description) is used at any point, the observation is removed The dataset does not contain any NAs (see analysis below), so there is no need to remove NAs.

sum(is.na(health$FATALITIES))  ## compute the number of NAs in FATALITIES
## [1] 0
sum(is.na(health$INJURIES))    ## compute the number of NAs in INJURIES
## [1] 0
sum(is.na(dmg$PROPDMG))        ## compute the number of NAs in PROPDMG
## [1] 0
sum(is.na(dmg$CROPDMG))        ## compute the number of NAs in CROPDMG
## [1] 0


dmg$PROPDMGNUM<-dmg$PROPDMG
PROPisK<-dmg$PROPDMGEXP=="K"|dmg$PROPDMGEXP=="k"
PROPisM<-dmg$PROPDMGEXP=="M"|dmg$PROPDMGEXP=="m"
PROPisB<-dmg$PROPDMGEXP=="B"|dmg$PROPDMGEXP=="b"
dmg$PROPDMGNUM[PROPisK]<-1000*dmg$PROPDMG[PROPisK]
dmg$PROPDMGNUM[PROPisM]<-1000000*dmg$PROPDMG[PROPisM]
dmg$PROPDMGNUM[PROPisB]<-1000000000*dmg$PROPDMG[PROPisB]

dmg$CROPDMGNUM<-dmg$CROPDMG
CROPisK<-dmg$CROPDMGEXP=="K"|dmg$CROPDMGEXP=="k"
CROPisM<-dmg$CROPDMGEXP=="M"|dmg$CROPDMGEXP=="m"
CROPisB<-dmg$CROPDMGEXP=="B"|dmg$CROPDMGEXP=="b"
dmg$CROPDMGNUM[CROPisK]<-1000*dmg$CROPDMG[CROPisK]
dmg$CROPDMGNUM[CROPisM]<-1000000*dmg$CROPDMG[CROPisM]
dmg$CROPDMGNUM[CROPisB]<-1000000000*dmg$CROPDMG[CROPisB]

head(dmg)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM CROPDMGNUM
## 1 TORNADO    25.0          K       0                 25000          0
## 2 TORNADO     2.5          K       0                  2500          0
## 3 TORNADO    25.0          K       0                 25000          0
## 4 TORNADO     2.5          K       0                  2500          0
## 5 TORNADO     2.5          K       0                  2500          0
## 6 TORNADO     2.5          K       0                  2500          0


3. Results

Finding and graphing the weather events with the most severe impact

I start with fatalities and injuries. I calculate the means of both variables by event type, order the means (decending) and plot the top 3 values.

# fatalieties
FatMeans<-tapply(health$FATALITIES,health$EVTYPE,mean) #caluculate means by event type
OFatMeans<-FatMeans[order(FatMeans,decreasing=TRUE)]   #order
FAT<-data.frame(names(OFatMeans),OFatMeans,row.names=1:length(OFatMeans)) #assemble data in one data frame
colnames(FAT)<-c("Event","Fatalities")

# injuries                                                    
InjMeans<-tapply(health$INJURIES,health$EVTYPE,mean) #caluclate means by eent type
OInjMeans<-InjMeans[order(InjMeans,decreasing=TRUE)] #order
INJ<-data.frame(names(OInjMeans),OInjMeans,row.names=1:length(OInjMeans)) #assemble data in one data frame
colnames(INJ)<-c("Event","Injuries")

# graphing
ggplot(data=FAT[1:3,],aes(x=reorder(Event,-Fatalities),y=Fatalities))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Average Number of Fatalities ") +
  xlab("Event Type") + ylab("Average Number of Fatatlities")

ggplot(data=INJ[1:3,],aes(x=reorder(Event,-Injuries),y=Injuries))+geom_bar(stat="identity") + ggtitle("Population Health:\nTop Three Weather Events by Average Number of Injuries") + xlab("Event Type") + ylab("Average Number of Injured Persons")

In the same way, I preoceed with property damage, crop damage, and total damage. I calculate the means of both variables by event type, order the means (decending) and plot the top 3 values.

#property damage
PropMeans<-tapply(dmg$PROPDMGNUM,dmg$EVTYPE,mean)
OPropMeans<-PropMeans[order(PropMeans,decreasing=TRUE)]
PDMG<-data.frame(names(OPropMeans),OPropMeans,row.names=1:length(OPropMeans))
colnames(PDMG)<-c("Event","PDamage")
head(PDMG)
##                        Event    PDamage
## 1 TORNADOES, TSTM WIND, HAIL 1600000000
## 2  HEAVY RAIN/SEVERE WEATHER 1250000000
## 3          HURRICANE/TYPHOON  787566364
## 4             HURRICANE OPAL  352538444
## 5                STORM SURGE  165990559
## 6                 WILD FIRES  156025000
#crop damage
CropMeans<-tapply(dmg$CROPDMGNUM,dmg$EVTYPE,mean)
OCropMeans<-CropMeans[order(CropMeans,decreasing=TRUE)]
CDMG<-data.frame(names(OCropMeans),OCropMeans,row.names=1:length(OCropMeans))
colnames(CDMG)<-c("Event","CDamage")
head(CDMG)
##                     Event   CDamage
## 1       EXCESSIVE WETNESS 142000000
## 2 COLD AND WET CONDITIONS  66000000
## 3         DAMAGING FREEZE  43683333
## 4             Early Frost  42000000
## 5       HURRICANE/TYPHOON  29634918
## 6             RIVER FLOOD  29072017
#total damage
dmg$TDMGNUM<-dmg$PROPDMGNUM+dmg$CROPDMGNUM #calculate total damage
TMeans<-tapply(dmg$TDMGNUM,dmg$EVTYPE,mean)
OTMeans<-TMeans[order(TMeans,decreasing=TRUE)]
TDMG<-data.frame(names(OTMeans),OTMeans,row.names=1:length(OTMeans))
colnames(TDMG)<-c("Event","TDamage")
head(TDMG)
##                        Event    TDamage
## 1 TORNADOES, TSTM WIND, HAIL 1602500000
## 2  HEAVY RAIN/SEVERE WEATHER 1250000000
## 3          HURRICANE/TYPHOON  817201282
## 4             HURRICANE OPAL  354649556
## 5                STORM SURGE  165990579
## 6                 WILD FIRES  156025000
#graph

# here are 2 graphs that are omitted in the report
#ggplot(data=PDMG[1:3,],aes(x=reorder(Event,-PDamage),y=PDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg. Amount of Pers. Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Personal Damage in US$")
#ggplot(data=CDMG[1:3,],aes(x=reorder(Event,-CDamage),y=CDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg Amount of Crop Damage Inflicted") + xlab("Event Type") + ylab("Average Amount of Crop Damage in US$")

# total damage graph in the report
ggplot(data=TDMG[1:3,],aes(x=reorder(Event,-TDamage),y=TDamage))+geom_bar(stat="identity") + ggtitle("Economic Consequences:\nTop Three Weather Events by Avg. Amount of Total Damage Inflicted \n (Overall Greatest Economic Damage)") + xlab("Event Type") + ylab("Average Amount of Total Damage in US$")

3. Interpretation and Conclusion

Overall, TORNADOES, TSTM WIND, HAIL appears to be the weather event with the most severe impact on human lives, personal and overall damage. In terms of injuries, Heat Wave has the worst record, whereas in the category of crop damage, excessive wetness, particulary EXCESSIVE WETNESS does the most harm.

THE END