Synopsis

This report explores the impact of severe weather events in the United States on public health and the economy between 1950 and 2011, using data from the U.S National Oceanic and Atmospheric Administric (NOAA). The analysis identifies weather events that caused the most fatalities and injuries during this time period, as well as those that caused to highest economic damages. The insight provided by this report can aid to highlight which types of events should be prioritized for preparedness for future weather disaster prevention and mitigation.

Data Processing

The data used in this analysis comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It includes details of major storms and weather events in the United States from 1950 to November 2011. The raw data is provided in a comma-separated-value (CSV) format compressed via the bzip2 algorithm.

We first load the necessary R packages and read the compressed CSV file directly into R.

# Define the file path
data_path <- "C:/Users/nater/OneDrive/Documents/coursera/reprodR/assignment2/repdata_data_StormData.csv.bz2"

# Load the dataset
storm_data <- read.csv(data_path, stringsAsFactors = FALSE)


#load data
data_path <- "C:/Users/nater/OneDrive/Documents/coursera/reprodR/assignment2/repdata_data_StormData.csv.bz2"
storm_data <- read.csv(data_path, stringsAsFactors = FALSE)


#look at data structure

str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
###Next, we select the relevant columns for our analysis: EVTYPE (event type), FATALITIES, INJURIES, PROPDMG (property damage), PROPDMGEXP (property damage exponent), CROPDMG (crop damage), and CROPDMGEXP (crop damage exponent).

#refine data, subset columns
storm_subset <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

### economic damage values are split into two parts: the base number and an exponent that defines the multiplier. We convert these exponents into numeric values and then calculate the total property, crop, and combined damage

# convert exponents to numeric values
convert_exp <- function(exp) {
  exp <- toupper(trimws(exp))
  ifelse(exp == "K", 1e3,
         ifelse(exp == "M", 1e6,
         ifelse(exp == "B", 1e9, 1)))
}
storm_subset <- storm_subset %>% 
  mutate(
    PROPDMGEXP = convert_exp(PROPDMGEXP),
    CROPDMGEXP = convert_exp(CROPDMGEXP),
    PROP_DAMAGE = PROPDMG * PROPDMGEXP,
    CROP_DAMAGE = CROPDMG * CROPDMGEXP,
    TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE
  )
### Data is now cleaned and ready for analysis

Results

## Plotting the data
health_impact <- storm_subset %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries = sum(INJURIES, na.rm = TRUE),
    TotalHealthImpact = Fatalities + Injuries
  ) %>%
  arrange(desc(TotalHealthImpact)) %>%
  head(10)

 fig.cap="Figure 1: This plot shows the impact each weather event has in terms of total injuries and deaths recorded, and gathered by NOAA."
ggplot(health_impact, aes(x = reorder(EVTYPE, -TotalHealthImpact), y = TotalHealthImpact)) +
  geom_col(fill = "steelblue") +
  labs(title = "Top 10 Weather Events by Health Impact",
       x = "Event Type", y = "Fatalities + Injuries") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# events with economic impact 
economic_impact <- storm_subset %>%
  group_by(EVTYPE) %>%
  summarise(TotalDamage = sum(TOTAL_DAMAGE, na.rm = TRUE)) %>%
  arrange(desc(TotalDamage)) %>%
  head(10)


### Figure 2: Economic Impact of Weather Events
# plot
ggplot(economic_impact, aes(x = reorder(EVTYPE, -TotalDamage), y = TotalDamage / 1e9)) +
  geom_col(fill = "darkred") +
  labs(title = "Top 10 Weather Events by Economic Damage",
       x = "Event Type", y = "Total Damage (in Billions USD)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

### Figure 2: Economic Impact of Weather Events

This plot shows the impact each weather event has in terms of total property and crop damage, as recorded by NOAA.

Events most harmful to Population Health

By plotting our data by event type and total amount of fatalities and injuries, we can see that Tornado are overwhelmingly costly to Population health, towering above other event types. This may seem like an error in the data based on how drastically different it appears, but we can verify this with an extra couple functions:

 tornado_health_impact <- storm_subset %>%
     filter(EVTYPE == "TORNADO") %>%
     summarise(
         TotalFatalities = sum(FATALITIES, na.rm = TRUE),
         TotalInjuries = sum(INJURIES, na.rm = TRUE),
         TotalHealthImpact = TotalFatalities + TotalInjuries
     )
# View the calculated totals
 tornado_health_impact
##   TotalFatalities TotalInjuries TotalHealthImpact
## 1            5633         91346             96979
 # Sum the fatalities and injuries for tornadoes
 flood_health_impact <- storm_subset %>%
     filter(EVTYPE == "FLOOD") %>%
     summarise(
         TotalFatalities = sum(FATALITIES, na.rm = TRUE),
         TotalInjuries = sum(INJURIES, na.rm = TRUE),
         TotalHealthImpact = TotalFatalities + TotalInjuries
     )
 
 # View the calculated totals
 flood_health_impact
##   TotalFatalities TotalInjuries TotalHealthImpact
## 1             470          6789              7259

This verification step shows that tornadoes are in fact wildly more dangerous in terms of the risk to public health.

Event most harmful economically

By plotting event type by Damage(in USD billions), we can see that floods cause the most economic damage, with hurricanes coming in second at about half that value.