Synopsis

Severe weather events, such as storms, can significantly impact public health and the economy, often resulting in fatalities, injuries, and extensive property damage. Mitigating these outcomes is a primary concern for many communities and municipalities. This analysis examines the effects of severe weather events on public health and economic damage in the United States using NOAA’s storm database. The results show that floods and hurricanes/typhoons cause the highest property and crop damage costs, leading to the most substantial economic consequences. Additionally, tornadoes have the greatest impact on population health, causing the highest number of fatalities and injuries.

Introduction

Severe weather events, such as storms, can have significant public health and economic impacts on communities and municipalities. These events often lead to fatalities, injuries, and extensive property damage. Mitigating these outcomes is a primary concern for many.

This project aims to analyze the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database provides detailed information on major storms and weather events across the United States, including their occurrence, locations, and estimates of associated fatalities, injuries, and property damage.

Data Processing

Software Environment information

First, we have to ascertain some information about software environment where we are going to conduct the analysis. We can get this by using the code below.

sessionInfo()

## R version 4.2.3 (2023-03-15 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.33     R6_2.5.1          jsonlite_1.8.7    evaluate_0.22    
##  [5] cachem_1.0.8      rlang_1.1.3       cli_3.5.0         rstudioapi_0.15.0
##  [9] jquerylib_0.1.4   bslib_0.5.1       rmarkdown_2.25    tools_4.2.3      
## [13] xfun_0.40         yaml_2.3.7        fastmap_1.1.1     compiler_4.2.3   
## [17] htmltools_0.5.6.1 knitr_1.44        sass_0.4.7

Loading packages

We can now load the required packages for the analysis.

library(R.utils) # load bz2 file

## Loading required package: R.oo

## Loading required package: R.methodsS3

## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.

## R.oo v1.26.0 (2024-01-24 05:12:50 UTC) successfully loaded. See ?R.oo for help.

## 
## Attaching package: 'R.oo'

## The following object is masked from 'package:R.methodsS3':
## 
##     throw

## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods

## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save

## R.utils v2.12.3 (2023-11-18 01:00:02 UTC) successfully loaded. See ?R.utils for help.

## 
## Attaching package: 'R.utils'

## The following object is masked from 'package:utils':
## 
##     timestamp

## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, isOpen, nullfile, parse, warnings

library(data.table)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:data.table':
## 
##     between, first, last

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(reshape2)

## 
## Attaching package: 'reshape2'

## The following objects are masked from 'package:data.table':
## 
##     dcast, melt

Load the dataset

Now we can download and load the bz2 file to run the analysis. Afterwards, we can visualize the first few lines of the file using the head function

#url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
#download.file(url, "stormdata.csv.bz2")
#unzip("stormdata.csv.bz2", exdir = ".")

storm <- read.csv("stormdata.csv", fill=TRUE, header=TRUE)
head(storm)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

We then look at the column names of the data using the code below:

names(storm)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The results shows a number of variables, however we shall subset for the variables that we would need for our analysis and convert them to lower case. Thereafter we shall look at the structure using str command.

storm2 <- storm %>% 
  select(c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")) %>%
  rename_all(tolower)

str(storm2)

## 'data.frame':    902297 obs. of  7 variables:
##  $ evtype    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp: chr  "K" "K" "K" "K" ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp: chr  "" "" "" "" ...

Now,the data table shows that we have 902,297 rows and 7 columns. The information on these variables are explained as follows:

-evtype : storm event type -fatalities: amount of fatalities per event -injuries : amount of injuries per event -propdmg : property damage -propdmgexp: property damage in exponents -cropdmg : crop damage -cropdmgexp: crop damage in exponents

Processing data for population health analysis

length(unique(storm$EVTYPE))

## [1] 985

First, we will select the columns needed for the bar plot and group the data by event type. We will then calculate the sum of both fatalities and injuries for each event type. Next, we will arrange the results in descending order and select the top 10 rows. Finally, we will gather the data and convert it into categorical variables to create a grouped bar plot.

pop_health <- storm2 %>%
    select(evtype, fatalities, injuries) %>%
    group_by(evtype) %>%
    summarize(fatalities = sum(fatalities), injuries = sum(injuries)) %>%
    arrange(desc(fatalities), desc(injuries)) %>%
    slice(1:10) %>%
    melt(id.vars = "evtype", variable.name = "type", value.name = "value")

Processing data for economic consequences analysis

Here, since the variable PROPDMGEXP is regarding property damage expenses,it can be utilized to denote the events with greatest economic consequences.

unique(storm2$propdmgexp)

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

unique(storm2$cropdmgexp)

## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Given the messy values for the exponents of property and crop damage costs, we created a function to standardize these values and calculate the costs in millions accordingly.

#create function to calculate cost
cost <- function(x) {
  if (x == "H")
    1E-4
  else if (x == "K")
    1E-3
  else if (x == "M")
    1
  else if (x == "B")
    1E3
  else
    1-6
}

Aside from the function to calculate cost, the methods is basically much the same for the rest of the analysis.

economic <- storm2 %>%
    select(evtype, propdmg, propdmgexp, cropdmg, cropdmgexp) %>% 
    mutate(prop_dmg = propdmg * sapply(propdmgexp, FUN = cost),
           crop_dmg = cropdmg * sapply(cropdmgexp, FUN = cost)) %>%
    group_by(evtype) %>% 
    summarize(property = sum(prop_dmg), crop = sum(crop_dmg)) %>%
    arrange(desc(property), desc(crop)) %>%
    slice(1:10) %>%
    melt(id.vars = "evtype", variable.name = "type", value.name = "value")

Results

This section shall look at the results to the questions that the analysis seeks to answer after the data processing steps.

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

ggplot(data=pop_health, aes(reorder(evtype, -value), value, fill=type)) +
  geom_bar(position = "dodge", stat="identity") + 
  labs(x="Event Type", y="Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 20, vjust=0.7)) + 
  ggtitle("Total Number of Fatalities and Injuries of top 10 storm event types") +
  scale_fill_manual(values=c("blue", "yellow"))

Interpretation : Based on the bar plot, it is evident that tornadoes have the highest impact on the population health, since it causes the most fatalities and injuries.

2. Across the United States, which types of events have the greatest economic consequences?

ggplot(data=economic, aes(reorder(evtype, -value), value, fill=type)) +
  geom_bar(position = "dodge", stat="identity") + 
  labs(x="Event Type", y="Count (millions)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 25, vjust=0.5)) + 
  ggtitle("Total Cost of Property and Crop Damage by top 10 storm event types") +
  scale_fill_manual(values=c("darkgreen", "red"))

Interpretation : From the bar plot, Floods and Hurricanes/Typhoons have highest property and crop damage costs, thus resulting in the biggest economic consequences.

Conclusion

Based on the analysis, resources should be prioritized towards addressing tornadoes to enhance public safety and health. This can be achieved by investing in better infrastructure and early warning systems. Additionally, to mitigate the impacts of hurricanes and typhoons, there should be increased funding for innovative solutions aimed at developing robust systems and infrastructure. These improvements are essential to protect properties and crops, thereby minimizing potential damages.

Reproducible Research: Storm Data Analysis (Course Project 2)

AbdulAziz Ascandari

2024-06-02