Synopsis

In this report we aim to identify and quantify the most destructive types of weather events between 1996 and 2011. Data obtained from the U.S National Oceanic and Atmospheric Administration’s storm database was used. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

      sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] lubridate_1.7.4   R.utils_2.10.1    R.oo_1.24.0       R.methodsS3_1.8.1
## [5] data.table_1.12.8 dplyr_0.8.5       ggplot2_3.2.1     knitr_1.25       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2       magrittr_1.5     tidyselect_1.0.0 munsell_0.5.0   
##  [5] colorspace_1.4-1 R6_2.4.0         rlang_0.4.5      stringr_1.4.0   
##  [9] tools_3.6.1      grid_3.6.1       gtable_0.3.0     xfun_0.10       
## [13] withr_2.1.2      htmltools_0.4.0  yaml_2.2.0       lazyeval_0.2.2  
## [17] digest_0.6.22    assertthat_0.2.1 tibble_2.1.3     crayon_1.3.4    
## [21] purrr_0.3.3      glue_1.4.0       evaluate_0.14    rmarkdown_2.1   
## [25] stringi_1.4.3    compiler_3.6.1   pillar_1.4.2     scales_1.0.0    
## [29] pkgconfig_2.0.3

Data Processing

We downloaded the .csv.bz2 file from the url provided. The data file is comma delimited and compressed using bz2. The data is read in its entirety using the fread() function.

#download data

bz2.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

dir <- getwd()
bz2.file <- "repdata_data_StormData.csv.bz2"
bz2.combine <- as.character(paste(dir, bz2.file, sep = "/"))

download.file(bz2.url, destfile = bz2.combine, method = "libcurl")

#read data

data <- fread("repdata_data_StormData.csv.bz2")

We adjusted the data in several ways–
1. Added a new date field formatted as a date value
2. Modified the “EVTYPE” column so that all characters in the fields are entirely lowercase
3. Created a numeric variable based on the character value in the PROPDMGEXP and CROPDMGEXP columns that is representative of the exponent assigned to the characters.
4. Created a new column containing the product of the DMG and DMGEXP columns for both PROP and CROP values.

Note: Adjustments 3 and 4 are detailed in the Results section under #2.

Dates were adjusted so that we may filter by year more easily. While the "EVTYPE’ column was adjusted to minimize the effect of typographical differences in the way the weather events data was entered into the database. Adjustments 3 and 4 made so that we may calculate the numerical amounts for the damage estimates in the database. The database separated the damage amount into a base columna and an exponent column. The exponent column was detailed as a character and required transformation into its numerical equivalent.

data$DateNew <- mdy_hms(data$BGN_DATE)

data$EVTYPE <- tolower(data$EVTYPE)

Results

We aimed to answer two specific questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

For both questions, only data from 1996 until November 2011 was included. Prior years data was excluded as it was substantially less complete.

Across the United States, which types of events are most harmful with respect to population health?

Harmful is defined as sum of fatalities and injuries by event type. Top ten weather events are listed.

newdata <- data %>% 
        filter(DateNew > ymd(19951231) & (FATALITIES > 0 | INJURIES > 0)) %>% 
        select(EVTYPE, FATALITIES, INJURIES) %>%
        group_by(EVTYPE) %>%
        mutate(FATALITIES_AND_INJURIES = (FATALITIES + INJURIES))

summaryEVTYPE <- newdata %>%
        group_by(EVTYPE) %>%
        summarize(FATALITIES_AND_INJURIES = sum(FATALITIES_AND_INJURIES)) %>%
        arrange(desc(FATALITIES_AND_INJURIES))

topEVENTS <- summaryEVTYPE %>% filter(FATALITIES_AND_INJURIES > 999)

#10 most harmful events  
 ggplot(data = topEVENTS, mapping = aes(x = EVTYPE, y = FATALITIES_AND_INJURIES)) +
         geom_histogram(stat = "identity", color = "white", fill = "steelblue") +
         labs(title = "THE 10 EVENTS CAUSING THE MOST FATALITIES AND INJURIES", 
              subtitle = "Events occuring since 1996",
              caption = "Source: U.S. National Oceanic and Atmospheric Administration's Storm Database",
              x = "Event Type", y = "Number of Fatalities and Injuries") +
         theme(axis.text.x = element_text(size = 6))

The weather events causing the most fatalities and injuries in the United States between 1996 and 2012 are tornadoes, excessive heat, and floods.

Across the United States, which types of events have the greatest economic consequences?

***Please note the letter to numeric equivalents used for the “…EXP” columns.

Economic consequences were calculated as the sum total of the property damage and crop damage estimates.

newdataECO <- data %>%
        filter(DateNew > ymd(19951231) & (PROPDMG > 0 | CROPDMG > 0)) %>%
        select(EVTYPE, STATE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
        group_by(EVTYPE) 

numbers <- as.character(c(0:8))

newdataECO$PROPDMGEXP[newdataECO$PROPDMGEXP == "H"] <- (100)
newdataECO$PROPDMGEXP[newdataECO$PROPDMGEXP == "K"] <- (1000)
newdataECO$PROPDMGEXP[newdataECO$PROPDMGEXP == "M"] <- (1000000)
newdataECO$PROPDMGEXP[newdataECO$PROPDMGEXP == "B"] <- (1000000000)
newdataECO$PROPDMGEXP[newdataECO$PROPDMGEXP %in% numbers] <- (10)
newdataECO$PROPDMGEXP[newdataECO$PROPDMGEXP == ""] <- (0)

newdataECO$CROPDMGEXP[newdataECO$CROPDMGEXP == "H"] <- (100)
newdataECO$CROPDMGEXP[newdataECO$CROPDMGEXP == "K"] <- (1000)
newdataECO$CROPDMGEXP[newdataECO$CROPDMGEXP == "M"] <- (1000000)
newdataECO$CROPDMGEXP[newdataECO$CROPDMGEXP == "B"] <- (1000000000)
newdataECO$CROPDMGEXP[newdataECO$CROPDMGEXP %in% numbers] <- (10)
newdataECO$CROPDMGEXP[newdataECO$CROPDMGEXP == ""] <- (0)

newdataECO$PROPDMGEXP <- as.numeric(newdataECO$PROPDMGEXP)
newdataECO$CROPDMGEXP <- as.numeric(newdataECO$CROPDMGEXP)

newdataECOfinal <- newdataECO %>%
        group_by(EVTYPE) %>%
        mutate(TotalPropDmg = (PROPDMG * PROPDMGEXP),
               TotalCropDmg = (CROPDMG * CROPDMGEXP),
               TotalDMG = (TotalPropDmg + TotalCropDmg))

newdataECOfinal1 <- newdataECOfinal %>% group_by(EVTYPE) %>%
        summarise(TotalDamage = sum(TotalDMG)) %>%
        arrange(desc(TotalDamage))

ecograph <- newdataECOfinal1[1:10,]

ggplot(data = ecograph, mapping = aes(x = EVTYPE, y = TotalDamage)) +
        geom_histogram(stat = "identity", color = "white", fill = "steelblue") +
        labs(title = "WEATHER EVENTS CAUSING THE MOST ECONOMIC DAMAGE", 
                subtitle = "Economic Damage is the sum of crop and property damage from 1996-2012",
                caption = "Source: U.S. National Oceanic and Atmospheric Administration's Storm Database",
                x = "Event Type", y = "Dollar Amount of Damage") +
        theme(axis.text.x = element_text(size = 6))

The weather events causing the most economic damage between 1996 and 2012 are floods, hurricanes/typhoons, and storm surges.