Introduction

Severe weather conditions like storms can bring about a host of issues that can affect both the public’s health and the economy of municipalities and communities. Such events have the potential to cause fatalities, injuries, and damage to property, and mitigating such outcomes is a top priority. To this end, the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database serves as a crucial resource for examining the characteristics of significant storms and weather events in the United States. The database contains data on the time and location of such events, along with estimated figures for any loss of life, injuries, and damage to property.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and determine which events cause the most harm to populations in respect to health and property damage

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. Data is downloaded automatically from the source

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Data described above is downloaded and loaded into R using code described below, dates are interpreted from string into R date objects. A summary of the data is outputted.

download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2',destfile = 'files/data/stormdata.csv.bz2')
stormdata <- read.csv('files/data/stormdata.csv.bz2',header = T,sep = ',',quote = '"',na.strings = 'NA')

stormdata <- stormdata %>% mutate(across(c('BGN_DATE', 'END_DATE'), function(x){
  mdy(str_split_fixed(x," ",2)[,1])
  }))
stormdata$EVTYPE <- as.factor(stormdata$EVTYPE)

Data Transformations

Some columns require transformation, such as BGN_DATE into proper data formats

Data analysis

This analysis aims to answer two questions from the supplied data:

1 Across the United States, which types of events are most harmful with respect to population health?

2 Across the United States, which types of events have the greatest economic consequences?

Looking at the provided database the following attributes can help with anwering these questions:

Population health hazards (1)

In order to get the event with highest hazard towards the population we have to summarize the total amount of fatalities for each event type. Since its not possible to add injuries and fatalities since there is a big difference in the event I chose to only compute deaths.

stormdata_fat <- stormdata %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES)) %>% arrange(desc(FATALITIES)) %>% .[1:10,]
most_harmful <- stormdata_fat[which.max(stormdata_fat$FATALITIES),]
ggplot(stormdata_fat, aes(y=reorder(EVTYPE,FATALITIES),x=FATALITIES)) + 
  geom_bar(stat = "summary", fun = "sum") +
  ggtitle(paste0("Total Fatalities by top 10 events, 1950-2012")) +
  xlab("Total Fatalities") + ylab("Event")

Greatest economic consequences (2)

In order to get the event with highest economic consequences we have to summarize the total property damage for each event type. We had to do some preparation on the exponential for the property damage replacing string notations to integer exponential.

print.money <- function(x, ...) {
  print.default(paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=",")))
}
stormdata <- stormdata %>% mutate(across(c('PROPDMGEXP', 'CROPDMGEXP'), function(x){
  x <- stri_replace_all_regex(x,
                         pattern=c("[Kk]","[Mm]","[Bb]","[Hh]"),
                         replacement=c(3,6,9,2),vectorise = FALSE)
  x <- stri_replace_all_regex(x,
                         pattern="[^0-9.]",
                         replacement="", vectorise = FALSE)
  x <- as.numeric(x)
  x[is.na(x)] <- 0
  x
  }))
stormdata <- stormdata %>% mutate(DMG=PROPDMG*10^PROPDMGEXP+CROPDMG*10^CROPDMGEXP)
stormdata_dmg <-  stormdata %>% group_by(EVTYPE) %>% summarise(DMG=sum(DMG)) %>% arrange(desc(DMG)) %>% .[1:10,]
most_dmg <- stormdata_dmg[which.max(stormdata_dmg$DMG),]
ggplot(stormdata_dmg, aes(y=reorder(EVTYPE,DMG),x=DMG/10^9)) + 
  geom_bar(stat = "summary", fun = "sum") +
  ggtitle(paste0("Total Property Damage by top 10 events, 1950-2012")) +
  xlab("Total Property Damage $ Billions") + ylab("Event")

dmg <- print.money(most_dmg$DMG/10^9)
## [1] "$150.32"

Results

Population health hazards (1)

Considering death the biggest hazard to population health, the event with highest fatalities is TORNADO with a total of 5633

Greatest economic consequences (2)

Considering the event with highest economic consequences being the amount of property damage, the event with highest total is FLOOD with a total of $150.32 Billion USD Dollars