Synopsis: Using time series data from the National Oceanic and Atmospheric Administration, this study examines the occurrences of severe weather in the United States (NOAA). In addition to keeping track of numerous events throughout every US state, the NOAA storm database also offers data on property damage and casualties. The goal of our investigation is to ascertain which event has the greatest economic ramifications and which is most detrimental to the health of the US populace. According to our analysis of fatalities and injuries, tornadoes are the most dangerous natural disaster for US health (population). However, we determine that Flood has the biggest economic impact on the US based on Property and Cost damage.

Introduction: Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  1. Across the United States, which types of events have the greatest economic consequences?

Data Processing

#Load Libraries

library(data.table)
library(ggplot2)
library(R.utils) # load bz2 file
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.26.0 (2024-01-24 05:12:50 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.12.3 (2023-11-18 01:00:02 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, isOpen, nullfile, parse, warnings
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:R.utils':
## 
##     extract

#Download and read bz2 file

if (!file.exists("stormdata.csv.bz2")) {
    url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(url, "stormdata.csv.bz2")
    bunzip2("stormdata.csv.bz2", "stormdata.csv", remove=FALSE)
}

storm <- data.table::fread("stormdata.csv", fill=TRUE, header=TRUE)
head(storm)
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME  STATE
##     <char>             <char>   <char>    <char>  <num>     <char> <char>
## 1:    1.00  4/18/1950 0:00:00     0130       CST     97     MOBILE     AL
## 2:    1.00  4/18/1950 0:00:00     0145       CST      3    BALDWIN     AL
## 3:    1.00  2/20/1951 0:00:00     1600       CST     57    FAYETTE     AL
## 4:    1.00   6/8/1951 0:00:00     0900       CST     89    MADISON     AL
## 5:    1.00 11/15/1951 0:00:00     1500       CST     43    CULLMAN     AL
## 6:    1.00 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE     AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
##     <char>     <num>  <char>     <char>   <char>   <char>      <num>     <lgcl>
## 1: TORNADO         0                                               0         NA
## 2: TORNADO         0                                               0         NA
## 3: TORNADO         0                                               0         NA
## 4: TORNADO         0                                               0         NA
## 5: TORNADO         0                                               0         NA
## 6: TORNADO         0                                               0         NA
##    END_RANGE END_AZI END_LOCATI LENGTH WIDTH     F   MAG FATALITIES INJURIES
##        <num>  <char>     <char>  <num> <num> <int> <num>      <num>    <num>
## 1:         0                      14.0   100     3     0          0       15
## 2:         0                       2.0   150     2     0          0        0
## 3:         0                       0.1   123     2     0          0        2
## 4:         0                       0.0   100     2     0          0        2
## 5:         0                       0.0   150     2     0          0        2
## 6:         0                       1.5   177     2     0          0        6
##    PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP    WFO STATEOFFIC ZONENAMES LATITUDE
##      <num>     <char>   <num>     <char> <char>     <char>    <char>    <num>
## 1:    25.0          K       0                                            3040
## 2:     2.5          K       0                                            3042
## 3:    25.0          K       0                                            3340
## 4:     2.5          K       0                                            3458
## 5:     2.5          K       0                                            3412
## 6:     2.5          K       0                                            3450
##    LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
##        <num>      <num>      <num>  <char>  <num>
## 1:      8812       3051       8806              1
## 2:      8755          0          0              2
## 3:      8742          0          0              3
## 4:      8626          0          0              4
## 5:      8642          0          0              5
## 6:      8748          0          0              6

#column names of data

names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
storm2 <- storm %>% 
  select(c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")) %>%
  rename_all(tolower)
str(storm2)
## Classes 'data.table' and 'data.frame':   902297 obs. of  7 variables:
##  $ evtype    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp: chr  "K" "K" "K" "K" ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp: chr  "" "" "" "" ...
##  - attr(*, ".internal.selfref")=<externalptr>

The above data table has 902,297 observations and 7 coulmns. Description of each variable: 1. evtype = storm event type 2. fatalities = amount of fatalities per event 3. injuries = amount of injuries per event 4. propdmg = property damage amount 5. propdmgexp = property damage in exponents 6. cropdmg = crop damage amount 7. cropdmgexp= crop damage in exponents

Data processing #Processing data for population health analysis

length(unique(storm$EVTYPE))
## [1] 985

First group the data by type of event, choose the columns I need for the bar plot, and then total the number of injuries and deaths. After that, arrange it in descending order, cut the first ten rows, compile the data, and convert it into categorical variables so that a grouped bar plot can be made.

pop_health <-
    storm2 %>% select(evtype, fatalities, injuries) %>% 
    group_by(evtype) %>% 
    summarize(fatalities = sum(fatalities), injuries = sum(injuries), .groups='drop') %>%
    arrange(desc(fatalities), desc(injuries)) %>%
    slice(1:10) %>% 
    gather(key = type, value = value, fatalities, injuries)

Data processing for an analysis of the economic consequences Since the variable PROPDMGEXP relates to property damage costs, it can be used to identify the events that have the biggest financial consequences.

unique(storm2$propdmgexp)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm2$cropdmgexp)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

I wrote a function to deal with the messy values for the exponents for crop damage and property damage costs. It calculates the cost using each exponent value but in millions of dollars.

#function to caculate cost

cost <- function(x) {
  if (x == "H")
    1E-4
  else if (x == "K")
    1E-3
  else if (x == "M")
    1
  else if (x == "B")
    1E3
  else
    1-6
}
economic <-
    storm2 %>% select("evtype", "propdmg", "propdmgexp", "cropdmg", "cropdmgexp") %>% 
    mutate(prop_dmg = propdmg*sapply(propdmgexp, FUN = cost), crop_dmg = cropdmg*sapply(cropdmgexp, FUN = cost), .keep="unused") %>%
    group_by(evtype) %>% 
    summarize(property = sum(prop_dmg), crop = sum(crop_dmg), .groups='drop') %>%
    arrange(desc(property), desc(crop)) %>%
    slice(1:10) %>% 
    gather(key = type, value = value, property, crop)

Results: Question 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

ggplot(data=pop_health, aes(reorder(evtype, -value), value, fill=type)) +
  geom_bar(position = "dodge", stat="identity") + 
  labs(x="Event Type", y="Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 20, vjust=0.7)) + 
  ggtitle("Total Number of Fatalities and Injuries of top 10 storm event types") +
  scale_fill_manual(values=c("blue", "pink"))

Since tornadoes cause the most deaths and injuries, it is clear from the bar plot that they have the greatest effect on population health.

Question 2: Across the United States, which types of events have the greatest economic consequences?

ggplot(data=economic, aes(reorder(evtype, -value), value, fill=type)) +
  geom_bar(position = "dodge", stat="identity") + 
  labs(x="Event Type", y="Count (millions)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 25, vjust=0.5)) + 
  ggtitle("Total Cost of Property and Crop Damage by top 10 storm event types") +
  scale_fill_manual(values=c("darkgreen", "orange"))

According to the bar plot, floods and hurricanes/typhoons cause the most money in losses to property and crops, which has the greatest impact on the economy.

Conclusion: The analysis suggests that better infrastructure or early warning systems should be built to mitigate the effects of tornadoes on population health and safety. Regarding hurricane and typhoon preparedness, increased funding should be allocated to innovative projects that create stronger infrastructure and systems to protect these properties and crops from damage.