Synopsis

This paper will analyze data from NOAA Storm Database from 1996 to 2011, to answer the questions: Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?

Data Processing

Download Data

The NOAA Storm Data can be downloaded here: National Oceanic and Atmospheric Administration (NOAA). It can be downloaded directly here (47Mb).

Load libraries

library(R.utils)
## Warning: package 'R.utils' was built under R version 4.1.2
## Warning: package 'R.oo' was built under R version 4.1.2
## Warning: package 'R.methodsS3' was built under R version 4.1.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.2

Load Data

  file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  file_name <- "storm_data.csv.bz2"
  download.file(file_url, file_name, method = "curl")
  
  storm_data_original <- read.csv(file_name, sep = ",", header = TRUE)

The data set is loaded into storm_data_original.

storm_data_original has the following variables.

str(storm_data_original)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Clean Data

Of the above 37 variables in the original data, we will focus on the following 8 variables:

storm_data <- storm_data_original[, c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

storm_data is subset to include only events recorded after 1996, since that’s when NOAA changed event type standards. Records from prior to 1996 will be excluded to eliminate errors.

storm_data$BGN_DATE <- as.Date(as.character(storm_data$BGN_DATE), "%m/%d/%Y %H:%M:%S")
storm_data <- subset(storm_data, format(storm_data$BGN_DATE, "%Y") > 1996 )

Here is a list of the 55 event type standards:

storm_events <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Cold", "Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Extreme Cold", "Flash Flood", "Flood", "Freezing Fog", "Frost/Freeze", "Frost", "Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Hurricane", "Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")

First, we will case match to ensure uniformity, then convert to factor, and eliminate the duplicates.

storm_data$EVTYPE <- factor(toupper(storm_data$EVTYPE))
storm_data <- subset(storm_data, (storm_data$EVTYPE %in% toupper(storm_events)))
droplevels(storm_data$EVTYPE)

Breaking out the event types which are harmful to the population

FATALATIES and INJURIES variables are most harmful to population health, we’ll set them to be greater than 0. storm_data is a subset including only FATALATIES and INJURIES variables are greater than 0. We’ll name this subset storm_data_for_harmfulness.

storm_data_for_harmfulness <- subset(storm_data, storm_data$FATALITIES > 0 | storm_data$INJURIES > 0 )

For population health affects, we’ll focus on EVTYPE, FATALATIES and INJURIES.

storm_data_for_harmfulness <- storm_data_for_harmfulness[,c("EVTYPE", "FATALITIES", "INJURIES")]

We’ll create a HARMFULNESS variable to summarize the total harmfulness for the population for FATATLITIES and INJURIES variables:

storm_data_for_harmfulness$HARMFULNESS <- storm_data_for_harmfulness$FATALITIES + storm_data_for_harmfulness$INJURIES

HARMFULLNESS is then summed together by EVTYPE and saved in storm_data_for_harmfulness_grouped_per_event_type

storm_data_for_harmfulness_grouped_per_event_type <- aggregate(x = list(HEALTH_IMPACT = storm_data_for_harmfulness$HARMFULNESS), 
                                  by = list(EVENT_TYPE = storm_data_for_harmfulness$EVTYPE), 
                                  FUN = sum,
                                  na.rm = TRUE)
storm_data_for_harmfulness_grouped_per_event_type <- storm_data_for_harmfulness_grouped_per_event_type[order(storm_data_for_harmfulness_grouped_per_event_type$HEALTH_IMPACT, decreasing = TRUE),]
head(storm_data_for_harmfulness_grouped_per_event_type)
##           EVENT_TYPE HEALTH_IMPACT
## 33           TORNADO         21447
## 10    EXCESSIVE HEAT          8093
## 14             FLOOD          7121
## 26         LIGHTNING          4424
## 13       FLASH FLOOD          2425
## 32 THUNDERSTORM WIND          1530

Breaking out the event types which are harmful to the economy

PROPDMGEXP and CROPDMGEXP use a letter to signify magnitude (“K” for thousands, “M” for millions, and “B” for billions).

  • If PROPDMGEXP or CROPDMGEXP is “K”, the multiplier is 1,000 (e.g. ’PROPDMGFULL = PROPDMG * 1000`).
  • If PROPDMGEXP or CROPDMGEXP is “M”, the multiplier is 1,000,000 (e.g. ’PROPDMGFULL = PROPDMG * 1000000`).
  • If PROPDMGEXP or CROPDMGEXP is “B”, the multiplier is 1,000,000,000 (e.g. ’PROPDMGFULL = PROPDMG * 1000000000`).

First, lets remove the missing values from PROPDMGEXP and CROPDMGEXP.

storm_data_for_economy <- subset(storm_data, storm_data$PROPDMGEXP != "" & storm_data$CROPDMGEXP != "" )

FATALATIES and INJURIES variables are most harmful to population health, we’ll set them to be greater than 0. storm_data is a subset including only FATALATIES and INJURIES variables are greater than 0. We’ll name this subset storm_data_for_harmfulness.

storm_data_for_economy <- subset(storm_data_for_economy, storm_data_for_economy$PROPDMG > 0 | storm_data_for_economy$CROPDMG > 0 )

Then simplify the dataframe to EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP.

storm_data_for_economy <- storm_data_for_economy[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

We will case match to ensure uniformity, then conver to factor.

storm_data_for_economy$PROPDMGEXP <- factor(toupper(storm_data_for_economy$PROPDMGEXP))
storm_data_for_economy$CROPDMGEXP <- factor(toupper(storm_data_for_economy$CROPDMGEXP))

Calculate the numbers!

storm_data_for_economy$PROPDMGFULL <- ifelse(storm_data_for_economy$PROPDMGEXP == "K",
                                             storm_data_for_economy$PROPDMG * 1000,
                                        ifelse(storm_data_for_economy$PROPDMGEXP == "M",
                                               storm_data_for_economy$PROPDMG * 1000000,
                                        ifelse(storm_data_for_economy$PROPDMGEXP == "B",
                                               storm_data_for_economy$PROPDMG * 1000000000, 0)))

storm_data_for_economy$CROPDMGFULL <- ifelse(storm_data_for_economy$CROPDMGEXP == "K",
                                             storm_data_for_economy$CROPDMG * 1000,
                                        ifelse(storm_data_for_economy$CROPDMGEXP == "M",
                                               storm_data_for_economy$CROPDMG * 1000000,
                                        ifelse(storm_data_for_economy$CROPDMGEXP == "B",
                                               storm_data_for_economy$CROPDMG * 1000000000, 0)))

We’ll create a DAMAGE variable to summarize the total harmfulness for the population for PROPDMGFULL and CROPDMGFULL variables:

storm_data_for_economy$DAMAGE <- storm_data_for_economy$PROPDMGFULL + storm_data_for_economy$CROPDMGFULL

DAMAGE is then summed together by EVTYPE and saved in storm_data_for_economy_grouped_per_event_type

storm_data_for_economy_grouped_per_event_type <- aggregate(x = list(ECONOMIC_IMPACT = storm_data_for_economy$DAMAGE), 
                                  by = list(EVENT_TYPE = storm_data_for_economy$EVTYPE), 
                                  FUN = sum,
                                  na.rm = TRUE)

storm_data_for_economy_grouped_per_event_type <- storm_data_for_economy_grouped_per_event_type[order(storm_data_for_economy_grouped_per_event_type$ECONOMIC_IMPACT, decreasing = TRUE),]
head(storm_data_for_economy_grouped_per_event_type)
##           EVENT_TYPE ECONOMIC_IMPACT
## 15             FLOOD    136877233900
## 27 HURRICANE/TYPHOON     29348167800
## 40           TORNADO     16203902150
## 26         HURRICANE     11474663000
## 20              HAIL      9172124220
## 14       FLASH FLOOD      8246133530

Results

Across the United States, which types of events are most harmful with respect to population health?

Here we see the top 10 event types which negatively affect the population:

head(storm_data_for_harmfulness_grouped_per_event_type, 10)
##           EVENT_TYPE HEALTH_IMPACT
## 33           TORNADO         21447
## 10    EXCESSIVE HEAT          8093
## 14             FLOOD          7121
## 26         LIGHTNING          4424
## 13       FLASH FLOOD          2425
## 32 THUNDERSTORM WIND          1530
## 18              HEAT          1459
## 24 HURRICANE/TYPHOON          1339
## 39      WINTER STORM          1209
## 22         HIGH WIND          1181

Below is a visualization of the top 10 weather event harmful to the population health:

health_impact_chart <- ggplot(data = head(storm_data_for_harmfulness_grouped_per_event_type, 10),
                            aes(x = reorder(EVENT_TYPE, HEALTH_IMPACT), y = HEALTH_IMPACT, fill = EVENT_TYPE)) +
                            coord_flip() +
                            geom_bar(stat = "identity") + 
                            xlab("Event Type") +
                            ylab("Total Fatalities and Injures") +
                            ggtitle("Top 10 Weather Events Most Harmful to\nPopulation Health")
print(health_impact_chart)

Across the United States, which types of events have the greatest economic consequences?

Here we see the top 10 event types which negatively affect the economy:

head(storm_data_for_economy_grouped_per_event_type, 10)
##           EVENT_TYPE ECONOMIC_IMPACT
## 15             FLOOD    136877233900
## 27 HURRICANE/TYPHOON     29348167800
## 40           TORNADO     16203902150
## 26         HURRICANE     11474663000
## 20              HAIL      9172124220
## 14       FLASH FLOOD      8246133530
## 39 THUNDERSTORM WIND      3780985440
## 46          WILDFIRE      3684468370
## 25         HIGH WIND      2873328540
## 42    TROPICAL STORM      1496252350

Below is a visualization of the top 10 weather event harmful to the economy:

health_impact_chart <- ggplot(data = head(storm_data_for_economy_grouped_per_event_type, 10),
                            aes(x = reorder(EVENT_TYPE, ECONOMIC_IMPACT), y = ECONOMIC_IMPACT, fill = EVENT_TYPE)) +
                            coord_flip() +
                            geom_bar(stat = "identity") + 
                            xlab("Event Type") +
                            ylab("Total Fatalities and Injures") +
                            ggtitle("Top 10 Weather Events Most Harmful to\nPopulation Health")
print(health_impact_chart)