Synopsis - Course Project 2 Reproducible Research Goal: Exploratory analysis with the intention of answering the following questions:

Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences? Results: In general, the magnitude of adverse human health affects scales with the magnitude of economic damage caused by different storm types. These results do not exclude outliers or leverage points.

Disclaimer: there are a lot of repeat event types, similar spellings, miss-spellings, etc. This means that the EVTYPE data can be grouped in many different ways that would take too long to explore for this assignment. I will be treating every spelling as a distinct event type.

Data processing ## R Markdown

stormdata_raw <- read.csv(file = "C:/Users/Admin/Downloads/repdata_data_StormData.csv.bz2", 
                          header = TRUE, na.strings = "")
head(stormdata_raw)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 2         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 3         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 4         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 5         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 6         0    <NA>       <NA>     <NA>     <NA>          0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0    <NA>       <NA>   14.0   100 3   0          0       15    25.0
## 2         0    <NA>       <NA>    2.0   150 2   0          0        0     2.5
## 3         0    <NA>       <NA>    0.1   123 2   0          0        2    25.0
## 4         0    <NA>       <NA>    0.0   100 2   0          0        2     2.5
## 5         0    <NA>       <NA>    0.0   150 2   0          0        2     2.5
## 6         0    <NA>       <NA>    1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP  WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0       <NA> <NA>       <NA>      <NA>     3040      8812
## 2          K       0       <NA> <NA>       <NA>      <NA>     3042      8755
## 3          K       0       <NA> <NA>       <NA>      <NA>     3340      8742
## 4          K       0       <NA> <NA>       <NA>      <NA>     3458      8626
## 5          K       0       <NA> <NA>       <NA>      <NA>     3412      8642
## 6          K       0       <NA> <NA>       <NA>      <NA>     3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806    <NA>      1
## 2          0          0    <NA>      2
## 3          0          0    <NA>      3
## 4          0          0    <NA>      4
## 5          0          0    <NA>      5
## 6          0          0    <NA>      6
str(stormdata_raw)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Which events are most harmful? With respect to human health: So what is population health? Lets define population health as the total number of fatalities plus the total number of injuries (POP_HEALTH)

With respect to economic consequences: Lets define economic consequences (Econ) as the property + crop damage - and we’ll need to write a function to multiple in value (K, M, and B)

New data frame for processing:

stormdata <- stormdata_raw
head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 2         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 3         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 4         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 5         0    <NA>       <NA>     <NA>     <NA>          0         NA
## 6         0    <NA>       <NA>     <NA>     <NA>          0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0    <NA>       <NA>   14.0   100 3   0          0       15    25.0
## 2         0    <NA>       <NA>    2.0   150 2   0          0        0     2.5
## 3         0    <NA>       <NA>    0.1   123 2   0          0        2    25.0
## 4         0    <NA>       <NA>    0.0   100 2   0          0        2     2.5
## 5         0    <NA>       <NA>    0.0   150 2   0          0        2     2.5
## 6         0    <NA>       <NA>    1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP  WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0       <NA> <NA>       <NA>      <NA>     3040      8812
## 2          K       0       <NA> <NA>       <NA>      <NA>     3042      8755
## 3          K       0       <NA> <NA>       <NA>      <NA>     3340      8742
## 4          K       0       <NA> <NA>       <NA>      <NA>     3458      8626
## 5          K       0       <NA> <NA>       <NA>      <NA>     3412      8642
## 6          K       0       <NA> <NA>       <NA>      <NA>     3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806    <NA>      1
## 2          0          0    <NA>      2
## 3          0          0    <NA>      3
## 4          0          0    <NA>      4
## 5          0          0    <NA>      5
## 6          0          0    <NA>      6

Apply multiplier to economic damage

# Convert property damage exponent
stormdata$PROPDMGEXP <- toupper(stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "H"] <- "100"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "K"] <- "1000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "M"] <- "1000000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "B"] <- "1000000000"
stormdata$PROPDMGEXP <- as.numeric(stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP[is.na(stormdata$PROPDMGEXP)] <- 0

# Convert crop damage exponent
stormdata$CROPDMGEXP <- toupper(stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "K"] <- "1000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "M"] <- "1000000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "B"] <- "1000000000"
stormdata$CROPDMGEXP <- as.numeric(stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP[is.na(stormdata$CROPDMGEXP)] <- 0

# Compute total damage
stormdata <- stormdata %>%
  mutate(TOTALPROPDMG = PROPDMG * PROPDMGEXP,
         TOTALCROPDMG = CROPDMG * CROPDMGEXP,
         TOTALDMG = TOTALPROPDMG + TOTALCROPDMG)

Events Most Harmful to Population Health

health_impact <- stormdata %>%
  group_by(EVTYPE) %>%
  summarise(FATALITIES = sum(FATALITIES, na.rm = TRUE),
            INJURIES = sum(INJURIES, na.rm = TRUE)) %>%
  mutate(TOTAL_HEALTH_IMPACT = FATALITIES + INJURIES) %>%
  arrange(desc(TOTAL_HEALTH_IMPACT)) %>%
  top_n(10, TOTAL_HEALTH_IMPACT)

ggplot(health_impact, aes(x = reorder(EVTYPE, -TOTAL_HEALTH_IMPACT), y = TOTAL_HEALTH_IMPACT)) +
  geom_col(fill = "steelblue") +
  labs(title = "Top 10 Weather Events by Population Health Impact",
       x = "Event Type", y = "Total Fatalities + Injuries") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Events With Greatest Economic Consequences

economic_impact <- stormdata %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL_ECONOMIC_DAMAGE = sum(TOTALDMG, na.rm = TRUE)) %>%
  arrange(desc(TOTAL_ECONOMIC_DAMAGE)) %>%
  top_n(10, TOTAL_ECONOMIC_DAMAGE)

ggplot(economic_impact, aes(x = reorder(EVTYPE, -TOTAL_ECONOMIC_DAMAGE), y = TOTAL_ECONOMIC_DAMAGE / 1e9)) +
  geom_col(fill = "darkgreen") +
  labs(title = "Top 10 Weather Events by Economic Damage",
       x = "Event Type", y = "Total Damage (Billion USD)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))