SYNOPSIS

RESULTS

Our analysis show that Tornados have the greatest health impact, whereas Hurricane/Typhoon have the greatest economical impact.

INTRODUCTION

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

QUESTIONS

Your data analysis must address the following questions:

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

SETTING GLOBAL OPTIONS

knitr::opts_chunk$set(echo = TRUE,
                     warning = FALSE,
                     fig.width = 10,
                     fig.height = 5,
                     fig.keep = "all",
                     dev = "png")

library(ggplot2)
library(knitr)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ lubridate 1.9.3     ✔ stringr   1.5.1
## ✔ purrr     1.0.2     ✔ tibble    3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(data.table)
## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:reshape2':
## 
##     dcast, melt
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## The following object is masked from 'package:purrr':
## 
##     transpose
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] data.table_1.15.0 reshape2_1.4.4    lubridate_1.9.3   forcats_1.0.0    
##  [5] stringr_1.5.1     purrr_1.0.2       readr_2.1.5       tibble_3.2.1     
##  [9] tidyverse_2.0.0   dplyr_1.1.4       tidyr_1.3.1       knitr_1.45       
## [13] ggplot2_3.4.4    
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4      jsonlite_1.8.8    compiler_4.3.2    Rcpp_1.0.12      
##  [5] tidyselect_1.2.0  jquerylib_0.1.4   scales_1.3.0      yaml_2.3.8       
##  [9] fastmap_1.1.1     plyr_1.8.9        R6_2.5.1          generics_0.1.3   
## [13] munsell_0.5.0     bslib_0.6.1       pillar_1.9.0      tzdb_0.4.0       
## [17] rlang_1.1.3       utf8_1.2.4        stringi_1.8.3     cachem_1.0.8     
## [21] xfun_0.41         sass_0.4.8        timechange_0.3.0  cli_3.6.2        
## [25] withr_3.0.0       magrittr_2.0.3    digest_0.6.34     grid_4.3.2       
## [29] rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.4   vctrs_0.6.5      
## [33] evaluate_0.23     glue_1.7.0        fansi_1.0.6       colorspace_2.1-0 
## [37] rmarkdown_2.25    tools_4.3.2       pkgconfig_2.0.3   htmltools_0.5.7

PROCESSING

1. Loading and Inspecting the Data

StormData <- read.csv("repdata_data_StormData.csv")
head(StormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
colnames(StormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

2. Refining the data

As stated on the introduction, we are only looking at data relating to health and economical impact. The only columns used in analysis will be:

  • EVTYPE
  • FATALITITES
  • INJURIES
  • PROPDMGEXP
  • CROPDMG
  • CROPDMGEXP
Refined <- c("EVTYPE",
             "FATALITIES",
             "INJURIES",
             "PROPDMG",
             "PROPDMGEXP",
             "CROPDMG",
             "CROPDMGEXP")
Refined_Model <- StormData[, Refined]
Refined1 <- StormData[, Refined]
summary(Refined1)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000
# Create a subtable where only events that caused damaged are stored

Refined2 <- as_tibble(Refined1)
Refined2 <- Refined1 %>%
  filter(INJURIES > 0, 
         FATALITIES > 0, 
         PROPDMG > 0,
         CROPDMG > 0)
Refined2df <- as.data.frame(Refined2)
Refined2dt <- as.data.table(Refined2)

3. Formating the exponent columns

The code goes as follows:

  • NA = 1
  • K = 1,000
  • M = 1,000,000
  • B = 1,000,000,000
Refined2df$PROPDMGEXP[Refined2df$PROPDMGEXP == "K"] <- 1000
Refined2df$PROPDMGEXP[Refined2df$PROPDMGEXP == "M"] <- 1000000
Refined2df$PROPDMGEXP[Refined2df$PROPDMGEXP == "B"] <- 1000000000
Refined2df$PROPDMGEXP[is.na(Refined2df$PROPDMGEXP) ] <- 1

Refined2df$CROPDMGEXP[Refined2df$CROPDMGEXP == "K"] <- 1000
Refined2df$CROPDMGEXP[Refined2df$CROPDMGEXP == "M"] <- 1000000
Refined2df$CROPDMGEXP[Refined2df$CROPDMGEXP == "B"] <- 1000000000
Refined2df$CROPDMGEXP[is.na(Refined2df$CROPDMGEXP) ] <- 1

# Creating columns with Property and Crop Cost

Refined2df$CropCost <- (Refined2df$CROPDMG * as.numeric(Refined2df$CROPDMGEXP))

Refined2df$PropertyCost <- (Refined2df$PROPDMG * as.numeric(Refined2df$PROPDMGEXP))

ANALYSIS

1. Estimating the total Fatalities and Injuries

With this analysis we will estimate the health impact of the different events.

HealthImpact <- c("EVTYPE",
                  "FATALITIES",
                  "INJURIES")
HealthImpact <- StormData[, HealthImpact]

HI <- HealthImpact %>%
  group_by(EVTYPE) %>%
  summarise(Fatalities = sum(FATALITIES),
            Injuries = sum(INJURIES),
            Total = (sum(FATALITIES) + sum(INJURIES))) %>%
  arrange(desc(by_group = Total))

Top10_Health <- HI[1:10,]

head(Top10_Health)
## # A tibble: 6 × 4
##   EVTYPE         Fatalities Injuries Total
##   <chr>               <dbl>    <dbl> <dbl>
## 1 TORNADO              5633    91346 96979
## 2 EXCESSIVE HEAT       1903     6525  8428
## 3 TSTM WIND             504     6957  7461
## 4 FLOOD                 470     6789  7259
## 5 LIGHTNING             816     5230  6046
## 6 HEAT                  937     2100  3037

2. Estimating the total cost to Property and Crops

With this analysis we will look at the economical impact.

EI <- Refined2df %>%
  group_by(EVTYPE) %>%
  summarise(CropCost = sum(CropCost),
            PropCost = sum(PropertyCost),
            Total = (sum(CropCost) + sum(PropertyCost))) %>%
  arrange(desc(by_group = Total))

Top10_Eco <- EI[1:10,]

head(Top10_Eco)
## # A tibble: 6 × 4
##   EVTYPE              CropCost    PropCost       Total
##   <chr>                  <dbl>       <dbl>       <dbl>
## 1 HURRICANE/TYPHOON 1795000000 11300000000 13095000000
## 2 WILDFIRE            75150000  1165120000  1240270000
## 3 HIGH WIND          222935000   948690000  1171625000
## 4 TORNADO             93525000  1051902000  1145427000
## 5 TROPICAL STORM     121695000   628520000   750215000
## 6 EXCESSIVE HEAT     492400000      170000   492570000

RESULTS

1. Which events are more harmful with respect to population health?

To present this data we will generate a bar chart

HII <- Top10_Health %>%
  select(-Total)
HII_melt <- melt(HII,
                 id.vars = "EVTYPE",
                 variable.name = "Damage_Type")

ggplot(HII_melt, 
       aes(x = reorder(EVTYPE, -value), 
           y = value)) + 
  geom_bar(stat = "identity", 
           aes(fill = Damage_Type),
           position = "stack") + 
  scale_fill_manual(values = c("red",
                               "blue")) +
  labs(y = "Total",
       x = "Event Type",
       title = "Health Impact",
       subtitle = "Top 10 US Weather Events") + 
  scale_y_continuous(expand = c(0, 0)) +
  #scale_x_continuous(expand = c(0, 0)) +
  theme_classic() +
  theme(axis.text.x = element_text(angle=45, hjust=1))

ggsave("question1.png")
## Saving 10 x 5 in image

As Tornado overpower the data, we are going to remove them from the graph in order to see the other 9 events into more detail.

HIII <- Top10_Health %>%
  select(-Total) %>%
  filter(EVTYPE != "TORNADO")

HIII_melt <- melt(HIII,
                 id.vars = "EVTYPE",
                 variable.name = "Damage_Type")

ggplot(HIII_melt, 
       aes(x = reorder(EVTYPE, -value), 
           y = value)) + 
  geom_bar(stat = "identity", 
           aes(fill = Damage_Type),
           position = "stack") + 
  scale_fill_manual(values = c("red",
                               "blue")) +
  labs(y = "Total",
       x = "Event Type",
       title = "Health Impact",
       subtitle = "Top 10 US Weather Events (without TORNADO") + 
  scale_y_continuous(expand = c(0, 0)) +
  #scale_x_continuous(expand = c(0, 0)) +
  theme_classic() +
  theme(axis.text.x = element_text(angle=45, hjust=1))

ggsave("question1-1.png")
## Saving 10 x 5 in image

2. Which events have the greatest economical impact?

EII <- Top10_Eco %>%
  select(-Total)
EII_melt <- melt(EII,
                 id.vars = "EVTYPE",
                 variable.name = "Type")

ggplot(EII_melt, 
       aes(x = reorder(EVTYPE, -value), 
           y = value)) + 
  geom_bar(stat = "identity", 
           aes(fill = Type),
           position = "stack") + 
  scale_fill_manual(values = c("grey",
                               "green")) +
  labs(y = "Total",
       x = "Event Type",
       title = "Economic Impact",
       subtitle = "Top 10 US Weather Events") + 
  scale_y_continuous(expand = c(0, 0)) +
  #scale_x_continuous(expand = c(0, 0)) +
  theme_classic() +
  theme(axis.text.x = element_text(angle=45, hjust=1))

ggsave("question2.png")
## Saving 10 x 5 in image