Synopsis

This report reviews US National Oceanic and Atmospheric Administration (NOAA) data for Storms in the last 51 years. We are interested in identifying which events have greater economic and human cost on an individual basis. We determined wich event had the most significant cost by taking the average human or economic cost per type of event. Averages per type of event were used because the NOAA documentation indicates that in the early years after 1950 not all types of events are properly docuemnted. Events with less significant impact that occure more frequently could have a more significant impact in the long run but this analysis was focused on individual events. As a result of the analysis we identified that Hurricanes (Typhoones) have the most significant health and economic impact per individual event.

Load R packages and options

library(readr, warn.conflicts = FALSE, quietly = TRUE)
library(dplyr, warn.conflicts = FALSE, quietly = TRUE)
library(stringdist, warn.conflicts = FALSE, quietly = TRUE)
library(ggplot2, warn.conflicts = FALSE, quietly = TRUE)
options(scipen=50)

Loading Data

Downloaded the necesary data for this exercise from Cloudfront website provided in the Reproducable research coursera course. This includes data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. Events in the database are from 1950 to November 2011.

We read the data from a bz2 compress file that includes a comma separated value (csv) raw text file.

noaa<-read_csv("data/repdata%2Fdata%2FStormData.csv.bz2")

The dataset includes 37 columns and 902,297 records.

dim(noaa)
## [1] 902297     37

Data Processing

A few transformations will need to be made to the raw data. The main reason for this is that it includes human errors/typos when submitting the type of events and magnitudes for damage in properties and crops. Also, the human and Economic costs are combinations of other variables in the dataset, these will need to be ageggated for analysis.

A clean list of Event types could not be found so one was created using the Storm data documentation . This was created by copying the “Storm Data Event Table” in page 6 to a csv text file that was named “Storm_Data_Event_Table.csv”. This table should have 48 records acording to documentation.

events<-read.csv(file = "Data/Storm_Data_Event_Table.csv", nrows=48)

Event types in the original NOAA dataset are not clean and inlcude a lot of typos. The following code will help reduce the dataset to the needed variables and restate the Event types, Property damage Exponential and Crop Damage Exponential to clean he typos as best as possible. Please note that due to time limitations this is not an exhaustive transformation.

noaa2<-noaa%>% select(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)%>%
  mutate(BGN_YEAR=format(as.Date(BGN_DATE, "%m/%d/%Y"), "%Y")) %>%
  mutate(fix_EVTTYPE=events$event.name[amatch(toupper(EVTYPE), toupper(events$event.name), 
                                              maxDist=10, method="lcs")])%>%
  mutate(PROPDMGEXP=toupper(PROPDMGEXP), CROPDMGEXP=toupper(CROPDMGEXP)) %>%
  mutate(PROPDMGEXP=sub("^$", "0", PROPDMGEXP), CROPDMGEXP=sub("^$", "0", CROPDMGEXP))%>%
  mutate(PROPDMGEXP=if_else(is.na(PROPDMGEXP),"0", PROPDMGEXP), 
         CROPDMGEXP=if_else(is.na(CROPDMGEXP),"0", CROPDMGEXP))
  

head(noaa2)
## # A tibble: 6 x 10
##             BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
##                <chr>   <chr>      <dbl>    <dbl>   <dbl>      <chr>
## 1  4/18/1950 0:00:00 TORNADO          0       15    25.0          K
## 2  4/18/1950 0:00:00 TORNADO          0        0     2.5          K
## 3  2/20/1951 0:00:00 TORNADO          0        2    25.0          K
## 4   6/8/1951 0:00:00 TORNADO          0        2     2.5          K
## 5 11/15/1951 0:00:00 TORNADO          0        2     2.5          K
## 6 11/15/1951 0:00:00 TORNADO          0        6     2.5          K
## # ... with 4 more variables: CROPDMG <dbl>, CROPDMGEXP <chr>,
## #   BGN_YEAR <chr>, fix_EVTTYPE <fctr>

Remove records that will not be used based on bad or missing data

noaa2<-noaa2%>%filter(!is.na(fix_EVTTYPE))
dim(noaa2)
## [1] 896241     10

Two new columns will be added to summ the Damage to health and Damage to Propoert 1. health.damage = Fatalities + injuries 2. Property.damage = Property Damage + Crop Damage

noaa2<-mutate(noaa2, HEALTH.DAMAGE=FATALITIES+INJURIES, 
            PROPERTY.DAMAGE=PROPDMG*recode(PROPDMGEXP,"H"=100, "K"=1000, "M"=1000000, "B"=1000000000, .default=0)+
            CROPDMG*recode(CROPDMGEXP,"H"=100, "K"=1000, "M"=1000000, "B"=1000000000, .default=0))

Reults

Question 1:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We acknowledge that there are multiple ways to calculate which events are more harmfull to population helth but for this report we will take the average of each event type.

healthdmg<-noaa2 %>% group_by(fix_EVTTYPE) %>% summarize(avg.helth.damage=mean(HEALTH.DAMAGE))
healthdmg<-healthdmg[order(healthdmg$avg.helth.damage), ]
healthdmg$fix_EVTTYPE<-factor(healthdmg$fix_EVTTYPE, levels = healthdmg$fix_EVTTYPE)
tail(healthdmg)
## # A tibble: 6 x 2
##           fix_EVTTYPE avg.helth.damage
##                <fctr>            <dbl>
## 1         Rip Current         1.422481
## 2             Tornado         1.598026
## 3                Heat         3.610069
## 4      Excessive Heat         4.967540
## 5             Tsunami         7.714286
## 6 Hurricane (Typhoon)        15.044944

From the chart below we identify that Hurricanes(Typhones) are the type of events that cause the most health damage per individual event.

ggplot(healthdmg, aes(x=fix_EVTTYPE, y=avg.helth.damage)) + 
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Event Types more harmfull to populations" 
       ,caption="NOAA data"
       ,x="Event Type"
       ,y="Avg. Health Damage (Fatalities+Injuries)") + 
  theme(axis.text.x = element_text(angle=90, vjust=0.6))

Question 2:

Across the United States, which types of events have the greatest economic consequences? We acknowledge that there are multiple ways to calculate which events have the greatest economic consequences but for this report we will take the average of each event type.

economicdmg<-noaa2 %>% group_by(fix_EVTTYPE) %>% summarize(avg.economic.damage=mean(PROPERTY.DAMAGE))
economicdmg<-economicdmg[order(economicdmg$avg.economic.damage), ]
economicdmg$fix_EVTTYPE<-factor(economicdmg$fix_EVTTYPE, levels = economicdmg$fix_EVTTYPE)
tail(economicdmg)
## # A tibble: 6 x 2
##           fix_EVTTYPE avg.economic.damage
##                <fctr>               <dbl>
## 1             Drought             5831086
## 2               Flood             6029598
## 3             Tsunami             6861048
## 4      Tropical Storm            12064974
## 5    Storm Surge/Tide           117275254
## 6 Hurricane (Typhoon)           808024863

From the chart below and the table above we identify that Hurricanes(Typhoon) have the most significant cost impact per individual event. In second place we have Storme Surges and Tides.

ggplot(economicdmg, aes(x=fix_EVTTYPE, y=avg.economic.damage)) + 
  geom_bar(stat="identity", width=.5, fill="tomato3") + 
  labs(title="Economic Consequences per Event Type" 
       ,caption="NOAA data"
       ,x="Event Type"
       ,y="Economic Consequences (Properties + Crops)") + 
  theme(axis.text.x = element_text(angle=90, vjust=0.6))