Negative Impact of Weather Events on Population Health and Economy

Synopsis
Data Processing
Results
- U.S. Events Related to Population Health
- U.S. Events with the Greatest Economic Impact
Supplementary Data
- Weather Events with the Most Population Health and Economic Impact
- Other Weather Event Types

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. In this report we aim to describe the impact of weather events on population health and the economy in the United States between the years 1950 and 2011. Our overall hypothesis is that more violent weather events have the most impact on the population and the economy. To investigate this hypothesis, we obtained weather event data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. According to NOAA, they receive their data from the National Weather Service. In turn, “[t]he National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.” From these data, we found that, with regard to public health impact, tornadoes were, by far, the leading cause of injury and fatalities. Additionally, with regard to economic impact, we suprisingly found that a less violent event, flooding, was the leader with hurricanes a distant second.

Data Processing

Installing Packages and Loading Libraries

Ensure we have all the packages and libraries we will be using for the analysis.

# Make sure we have all the packages we need installed
# and their libraries loaded

# Create a vector of package names
pkgs <- c("tidyverse", "lubridate", "viridis", "scales")

# Check if each package is installed
for (pkg in pkgs) {
    if (!require(pkg, character.only = TRUE)) {
        # If not installed, install the package
        install.packages(pkg)
        # Load the package library
        library(pkg, character.only = TRUE)
    }
}

Reading in the Initial Data

We reach out to the Coursera website and download the raw data file we need.

# Set file name
filename <- "StormData.csv.bz2"

# Check if file exists in current working directory
if (!file.exists(filename)) {
  # If file does not exist, download data file
  fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileUrl, destfile = filename)
}

# Unzip data file
storm_data <- read.csv(bzfile(filename))

Exploring the Initial Data

Now we can do some high level information gathering of our initial data.

# get some first-pass information on the data
dim(storm_data)

## [1] 902297     37

str(storm_data)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

summary(storm_data)

##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
##

Reducing the Data to Interesting Columns

We will not be using all the information in our initial dataset. We can reduce the amount of columns to start the pairing down process. Here is a list of columns we will be keeping and the justification for keeping them:

EVTYPE: This column is necessary to identify the type of severe weather event.
FATALITIES: This column will allow us to assess the impact of each event on population health.
INJURIES: This column will allow us to assess the impact of each event on population health.
PROPDMG: This column will allow us to assess the economic impact of each event.
PROPDMGEXP: This column is necessary to scale the property damage estimate in PROPDMG. This column contains the exponent used to scale the crop damage estimate (e.g. “K” for thousands, “M” for millions).
CROPDMG: This column will allow us to assess the economic impact of each event on crops.
CROPDMGEXP: This column is necessary to scale the crop damage estimate in CROPDMG.

# only keep the columns relevant to our analysis
storm_data_col_filter <- storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Missing Data Analysis

We need to verify the amount of missing data in each column which turns out to be zero for every column in our new dataset.

# look for incomplete data
# Calculate percentage of missing values in each column
missing_vals <- colSums(is.na(storm_data_col_filter))
percent_missing <- missing_vals / nrow(storm_data_col_filter) * 100

# Print percentage of missing values for each column
print(percent_missing)

##     EVTYPE FATALITIES   INJURIES    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP 
##          0          0          0          0          0          0          0

Transforming the Data

Having gone though data filtering, we now do a series of transformations that make dealing with the data easier. The most difficult part is looking at the PROPDMGEXP and CROPDMGEXP columns. Otherwise the other columns are set up to do our analysis.

Let’s review the current setup of our columns in the filtered data.

str(storm_data_col_filter)

## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Cleaning Up the PROPDMGEXP Column

If we look at the PROPDMGEXP column’s table() output we can see there is a wide range of data.

# look at the frequency of unique values in PROPDMGEXP
table(storm_data_col_filter$PROPDMGEXP)

## 
##             -      ?      +      0      1      2      3      4      5      6 
## 465934      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330

We can clearly see the some obvious values such as “K” for thousands and “M” for millions, and “B” for billions However, there appear to be several values that aren’t as easily explained. The overwhelming majority of these have blank values which leaves us in the position of eliminating them or having to determine what a blank modifier means. We opted for the second option. Based on our best guess for blank values, we have determined that assuming these are in thousands is the best course of action. This conclusion was reached by looking at a random sample of blank values then looking at the property damage done with a description of the damage. An assumption of thousands of dollars seems to be appropriate.

# retrieve the observations that have blank values for PROPDMGEXP and the REMARKS that go with them
blankEstimation <- storm_data[(storm_data$PROPDMG != 0) & (storm_data$PROPDMGEXP %in% c("")), c("PROPDMG", "PROPDMGEXP", "REMARKS")]

#show the first 5 observations
head(blankEstimation, 5)

##        PROPDMG PROPDMGEXP
## 192467    0.41           
## 196196    3.00           
## 196687    2.00           
## 196961    4.00           
## 199598    4.00           
##                                                                                                                                                                                                                                                                                                                                                              REMARKS
## 192467                                                                                                                                                                                                                            Wind gusts to 96 mph Mt Tamalpias and 89 mph at the Golden Gate Bridge Petaluma river at Petuluma went 1.6 feet over flood stage. 
## 196196                                                                                                                                                                                                                                          A small tornado touched down at the North Florida Prison Reception Center damaging the building before dissipating. 
## 196687                                                                                                                                                                                                                                                                                     The sheriff's office reported numerous power lines and trees were down.  
## 196961 Thunderstorm winds produced minor damage to five homes and ripped off a tin roof from a barn in Hartwell.      ~ Hart County   1 E Hartwell,31,2032EST,0.5,30,0,0,3,?,Tornado (F0) \nA short lived tornado touched down and toppled two large trees and blew over a farm shed.  Two large columns that supported the front porch of a house were blown away. 
## 199598                                                                                                                                                                                                                                Several large limbs were downed on U.S. Route 24 west of Logansport. Four vehicles were damaged, and the highway was blocked.

We have now determined that only observations with blank, “K”, “k”, “M”, “m”, and “B” will be used for our analysis. Just as a sanity check we will ensure that the removed observations only account for a small percentage of the overall observations.

# Calculate the percentage of removed observations
propdmgexp_counts <- table(storm_data_col_filter$PROPDMGEXP)
removed_counts <- sum(propdmgexp_counts[c("-", "?", "+", "1", "2", "3", "4", "5", "6", "7", "8", "h", "H","0")])
total_obs <- nrow(storm_data_col_filter)
pct_removed <- removed_counts / total_obs * 100
pct_kept <- 100 - pct_removed

# Print the results
cat("Removed", removed_counts, "observations out of", total_obs, "(", round(pct_removed, 4), "%)", "\n")

## Removed 321 observations out of 902297 ( 0.0356 %)

cat("Kept", total_obs - removed_counts, "observations (", round(pct_kept, 4), "%)", "\n")

## Kept 901976 observations ( 99.9644 %)

Satisfied that we will only be removing a small amount of the data, we can now remove all observations with “-”, “?”, “+”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “h”, “H”, or “0” from our dataset.

# look at the number of observations pre-removal
dim(storm_data_col_filter)

## [1] 902297      7

# remove observations from PROPDMGEXP that don't match our known values
invalid_propdmgexp <- c("-", "?", "+", "1", "2", "3", "4", "5", "6", "7", "8", "h", "H", "0")
storm_data_col_filter <- storm_data_col_filter[!storm_data_col_filter$PROPDMGEXP %in% invalid_propdmgexp, ]

# look at the number of observations post-removal
dim(storm_data_col_filter)

## [1] 901976      7

Cleaning Up the CROPDMGEXP Column

We have one more column that needs to be cleaned up in this way, the CROPDMGEXP column. If we look at the CROPDMGEXP column’s table() output we can see range is not as wide as with PROPDMGEXP.

# look at the frequency of unique values in CROPDMGEXP
table(storm_data_col_filter$CROPDMGEXP)

## 
##             ?      0      2      B      k      K      m      M 
## 618100      7     19      1      9     21 281826      1   1992

Just as before we have determined that only observations with blank, “K”, “k”, “M”, “m”, and “B” will be used for our analysis. Again, we will ensure that the removed observations only account for a small percentage of the overall observations.

# Calculate the percentage of removed observations
cropdmgexp_counts <- table(storm_data_col_filter$CROPDMGEXP)
removed_counts <- sum(cropdmgexp_counts[c("?", "2", "0")])
total_obs <- nrow(storm_data_col_filter)
pct_removed <- removed_counts / total_obs * 100
pct_kept <- 100 - pct_removed

# Print the results
cat("Removed", removed_counts, "observations out of", total_obs, "(", round(pct_removed, 4), "%)", "\n")

## Removed 27 observations out of 901976 ( 0.003 %)

cat("Kept", total_obs - removed_counts, "observations (", round(pct_kept, 4), "%)", "\n")

## Kept 901949 observations ( 99.997 %)

Satisfied that we will only be removing a small amount of the data again, we can now remove all observations with “?”, “2”, or “0” from our dataset.

# look at the number of observations pre-removal
dim(storm_data_col_filter)

## [1] 901976      7

# remove observations from PROPDMGEXP that don't match our known values
invalid_cropdmgexp <- c("?", "2", "0")
storm_data_col_filter <- storm_data_col_filter[!storm_data_col_filter$CROPDMGEXP %in% invalid_cropdmgexp, ]

# look at the number of observations post-removal
dim(storm_data_col_filter)

## [1] 901949      7

PROPDMGEXP and CROPDMGEXP Standardizing Values

The last step is to standardize the values in our two columns so that it is easier for our analysis going forward. We will only allow “K”, “M”, and “B” values respectively.

# Convert blank or "k" values to "K" in PROPDMGEXP
storm_data_col_filter$PROPDMGEXP[storm_data_col_filter$PROPDMGEXP %in% c("", "k")] <- "K"

# Convert "m" values to "M" in PROPDMGEXP
storm_data_col_filter$PROPDMGEXP[storm_data_col_filter$PROPDMGEXP == "m"] <- "M"

# Convert blank or "k" values to "K" in CROPDMGEXP
storm_data_col_filter$CROPDMGEXP[storm_data_col_filter$CROPDMGEXP %in% c("", "k")] <- "K"

# Convert "m" values to "M" in CROPDMGEXP
storm_data_col_filter$CROPDMGEXP[storm_data_col_filter$CROPDMGEXP == "m"] <- "M"

Naturally, we need to verify that all the data has been modified. So we will look at the table() output for both columns.

# show the frequency of unique values for our columns
table(storm_data_col_filter$PROPDMGEXP)

## 
##      B      K      M 
##     40 890573  11336

table(storm_data_col_filter$CROPDMGEXP)

## 
##      B      K      M 
##      9 899947   1993

Creating New Calculated Columns for Property and Crop Damage

Now we can add new columns, called PROPDMGCALC and CROPDMGCALC, to our data that represents these final calculated values.

# Create new columns PROPDMGCALC and CROPDMGCALC
storm_data_col_filter$PROPDMGCALC <- with(storm_data_col_filter, ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
                                                                        ifelse(PROPDMGEXP == "M", PROPDMG * 1000000,
                                                                               ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000, PROPDMG))))
storm_data_col_filter$CROPDMGCALC <- with(storm_data_col_filter, ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
                                                                        ifelse(CROPDMGEXP == "M", CROPDMG * 1000000,
                                                                               ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000, CROPDMG))))

Standardizing the Event Types

Unfortunately, the NOAA data uses a great many terms to refer to the same thing. For example, “TSTM” and “Thunderstorm” mean the same thing in the EVTYPE column. To allow for easier consumption, we will standardize the categories and add a new weather_event column to our dataset.

# create a new column called "weather_event"
storm_data_col_filter$weather_event <- NA

# consolidate similar weather events into broader categories
storm_data_col_filter$weather_event[grepl("TSTM|THUNDERSTORM", storm_data_col_filter$EVTYPE)] <- "Thunderstorm"
storm_data_col_filter$weather_event[grepl("RAIN|PRECIPITATION", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Rain"
storm_data_col_filter$weather_event[grepl("SNOW|SLEET", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Snow"
storm_data_col_filter$weather_event[grepl("WIND|GUST", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Wind"
storm_data_col_filter$weather_event[grepl("HAIL", storm_data_col_filter$EVTYPE)] <- "Hail"
storm_data_col_filter$weather_event[grepl("FLOOD|FLASH FLOOD|FLOODING", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Flood"
storm_data_col_filter$weather_event[grepl("HEAT|WARM", storm_data_col_filter$EVTYPE)] <- "Heat"
storm_data_col_filter$weather_event[grepl("COLD|FREEZE|FROST", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Cold"
storm_data_col_filter$weather_event[grepl("FOG|MIST|HAZE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Fog"
storm_data_col_filter$weather_event[grepl("ICE|GLAZE|FREEZING RAIN", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Ice"
storm_data_col_filter$weather_event[grepl("TORNADO|WATERSPOUT|FUNNEL CLOUD", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Tornado"
storm_data_col_filter$weather_event[grepl("BLIZZARD", storm_data_col_filter$EVTYPE)] <- "Blizzard"
storm_data_col_filter$weather_event[grepl("DROUGHT|DRY", storm_data_col_filter$EVTYPE)] <- "Drought"
storm_data_col_filter$weather_event[grepl("TROPICAL STORM", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Tropical Storm"
storm_data_col_filter$weather_event[grepl("HURRICANE|CYCLONE|TYPHOON", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Hurricane"
storm_data_col_filter$weather_event[grepl("EXTREME COLD|WIND CHILL", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Extreme Cold"
storm_data_col_filter$weather_event[grepl("DUST|SAND|SANDSTORM", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Dust/Sandstorm"
storm_data_col_filter$weather_event[grepl("LANDSLIDE|MUDSLIDE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Landslide"
storm_data_col_filter$weather_event[grepl("AVALANCHE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Avalanche"
storm_data_col_filter$weather_event[grepl("WILDFIRE|FOREST FIRE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Wildfire"
storm_data_col_filter$weather_event[grepl("VOLCANIC ASH|VOLCANIC ERUPTION", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Volcanic Activity"
storm_data_col_filter$weather_event[grepl("COASTAL FLOOD|STORM SURGE|TIDAL FLOOD", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Coastal Flood"
storm_data_col_filter$weather_event[grepl("RIP CURRENT|HIGH SURF", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Rip Current/High Surf"
storm_data_col_filter$weather_event[grepl("SEICHE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Seiche"
storm_data_col_filter$weather_event[grepl("TIDE|HIGH TIDE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "High Tide"
storm_data_col_filter$weather_event[grepl("WINTER STORM|WINTER WEATHER|WINTRY MIX|WINTERY MIX|WINTER STORMS", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storm_data_col_filter$weather_event[grepl("LIGHTNING|LIGHTING|LIGHTNING INJURY|LIGHTNING DAMAGE|LIGHTNING FIRE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Lightning"
storm_data_col_filter$weather_event[grepl("FUNNEL|FUNNELS", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Funnel Cloud"
storm_data_col_filter$weather_event[grepl("HEAVY SURF|HIGH SEAS|HIGH WAVES|HEAVY SWELLS|ROUGH SEAS|ROUGH SURF", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "High Waves/Heavy Surf"
storm_data_col_filter$weather_event[grepl("RECORD HIGH|RECORD LOW|RECORD TEMPERATURE|RECORD WARMTH|RECORD COOL", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Record Temperature"

# any remaining values in the "EVTYPE" column are put into the "Other" category
storm_data_col_filter$weather_event[is.na(storm_data_col_filter$weather_event)] <- "Other"

Results

At last we can get to the two fundamental questions that we need answered by our data. We will look at each one in turn and examine how we arrived at our conclusions.

U.S. Events Related to Population Health

Question One: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Calculate Population Health Impact by Event Type

We need to calculate the Population Health Impact by adding the fatalities and injuries together then using that value to compare by our standardized event type. Finally, we show the most harmful event types with respect to population health. As the results show, tornadoes are way out in first place for injuries and fatalities.

# Create a new column to calculate the total harm to population health
storm_data_col_filter$POP_HEALTH <- storm_data_col_filter$FATALITIES + storm_data_col_filter$INJURIES

# Calculate the total harm for each event type
event_harm <- aggregate(POP_HEALTH ~ weather_event, data = storm_data_col_filter, FUN = sum)

# Sort the data by the calculated harm
event_harm <- event_harm[order(event_harm$POP_HEALTH, decreasing = TRUE), ]

# Show the event types by harm to population health
print(event_harm)

##            weather_event POP_HEALTH
## 26               Tornado      97072
## 30                  Wind      12512
## 12                  Heat      12301
## 8                  Flood      10113
## 18             Lightning       6049
## 16                   Ice       2522
## 31          Winter Storm       2247
## 29              Wildfire       1543
## 11                  Hail       1484
## 15             Hurricane       1468
## 22 Rip Current/High Surf       1361
## 24                  Snow       1248
## 9                    Fog       1158
## 2               Blizzard        906
## 7           Extreme Cold        649
## 19                 Other        631
## 6         Dust/Sandstorm        507
## 27        Tropical Storm        449
## 1              Avalanche        395
## 20                  Rain        380
## 14 High Waves/Heavy Surf        178
## 4                   Cold        125
## 17             Landslide         99
## 5                Drought         86
## 3          Coastal Flood         65
## 25          Thunderstorm         40
## 13             High Tide         16
## 10          Funnel Cloud          3
## 21    Record Temperature          0
## 23                Seiche          0
## 28     Volcanic Activity          0

We can also represent population health impact by event type visually, using the following graph:

# Ensure proper formatting of weather_event factor
event_harm$weather_event <- factor(event_harm$weather_event)

# Filter the data to keep only the top 10 weather events
top_ten_events <- event_harm %>%
    group_by(weather_event) %>%
    summarise(Total_POP_HEALTH = sum(POP_HEALTH)) %>%
    arrange(desc(Total_POP_HEALTH)) %>%
    slice_head(n = 10) %>%
    pull(weather_event)

event_harm_top_ten <- event_harm %>%
    filter(weather_event %in% top_ten_events)

# Create the plot for the top 10 weather events
event_harm_plot_top_ten <- ggplot(event_harm_top_ten, aes(x = reorder(weather_event, -POP_HEALTH), y = POP_HEALTH, fill = weather_event)) +
    geom_bar(stat = "identity") +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 10 Weather Events and Their Population Health Impact",
         x = "Weather Event",
         y = "Number of Fatalities and Injuries, Square Root Scale") +
    scale_fill_viridis(discrete = TRUE) +
    scale_y_continuous(trans = "sqrt", labels = comma) +
    coord_flip()

print(event_harm_plot_top_ten)

U.S. Events with the Greatest Economic Impact

Question Two: Across the United States, which types of events have the greatest economic consequences?

Calculate Economic Impact by Event Type

We need to calculate the Economic Impact by adding the property damage and crop damage together then using that value to compare by our standardized event type. Finally, we show the most harmful event types with respect to economic damage. To our surprise, flooding was the clear leader with hurricanes a distant second. Notice that our most deadly event, tornadoes, came in third in terms of economic damage.

# Create a new column to calculate the total economic impact
storm_data_col_filter$ECONOMIC_IMPACT <- storm_data_col_filter$PROPDMGCALC + storm_data_col_filter$CROPDMGCALC

# Calculate the total economic consequences for each event type
economic_impact <- aggregate(ECONOMIC_IMPACT ~ weather_event, data = storm_data_col_filter, FUN = sum)

# Sort the data by the calculated economic impact
economic_impact <- economic_impact[order(economic_impact$ECONOMIC_IMPACT, decreasing = TRUE), ]

# Display the event types and the economic consequences
print(economic_impact)

##            weather_event ECONOMIC_IMPACT
## 8                  Flood    179480080360
## 15             Hurricane     90872527810
## 26               Tornado     58969948290
## 3          Coastal Flood     43741176060
## 11                  Hail     19106782570
## 30                  Wind     17761133620
## 5                Drought     15025675380
## 16                   Ice      9018499660
## 27        Tropical Storm      8409286550
## 29              Wildfire      8275745130
## 31          Winter Storm      6824751750
## 13             High Tide      4651783000
## 20                  Rain      4017831490
## 4                   Cold      2287607500
## 7           Extreme Cold      1392048400
## 25          Thunderstorm      1226668550
## 24                  Snow      1138352790
## 19                 Other       991311000
## 18             Lightning       951112870
## 12                  Heat       924549250
## 2               Blizzard       771473950
## 17             Landslide       346094000
## 22 Rip Current/High Surf       105318000
## 9                    Fog        25011500
## 14 High Waves/Heavy Surf        11400500
## 6         Dust/Sandstorm         9938130
## 1              Avalanche         8721800
## 23                Seiche          980000
## 28     Volcanic Activity          500000
## 10          Funnel Cloud          199600
## 21    Record Temperature               0

We can also represent economic impact by event type visually, using the following graph:

# Ensure proper formatting of weather_event factor
economic_impact$weather_event <- factor(economic_impact$weather_event)

# Filter the data to keep only the top 10 weather events based on economic impact
top_ten_events_economic <- economic_impact %>%
    group_by(weather_event) %>%
    summarise(Total_ECONOMIC_IMPACT = sum(ECONOMIC_IMPACT)) %>%
    arrange(desc(Total_ECONOMIC_IMPACT)) %>%
    slice_head(n = 10) %>%
    pull(weather_event)

economic_impact_top_ten <- economic_impact %>%
    filter(weather_event %in% top_ten_events_economic)

# Create the ggplot visualization for the top 10 weather events
economic_impact_plot_top_ten <- ggplot(economic_impact_top_ten, aes(x = reorder(weather_event, -ECONOMIC_IMPACT), y = ECONOMIC_IMPACT / 1e5, fill = weather_event)) +
    geom_bar(stat = "identity") +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(title = "Top 10 Weather Events and Their Economic Impact",
         x = "Weather Event",
         y = "Economic Impact (per 100K), Square Root Scale") +
    scale_fill_viridis(discrete = TRUE) +
    scale_y_continuous(trans = "sqrt", labels = comma) +
    coord_flip()

print(economic_impact_plot_top_ten)

Supplementary Data

Now that we have completed our analysis, there are some interesting paths we could continue to follow.

Weather Events with the Most Population Health and Economic Impact

A further dive into the top weather events that are, both, high population heath and economic impact would be an interesting avenue to pursue. We have include some initial data to get things started.

# Top ten weather events by POP_HEALTH
top_pop_health <- event_harm %>%
    top_n(10, POP_HEALTH) %>%
    arrange(desc(POP_HEALTH))

# Top ten weather events by ECONOMIC_IMPACT
top_econ_impact <- economic_impact %>%
    top_n(10, ECONOMIC_IMPACT) %>%
    arrange(desc(ECONOMIC_IMPACT))

# Top overlapping events
top_overlap <- inner_join(top_pop_health, top_econ_impact, by = "weather_event")

# Print the top overlapping events
print(top_overlap)

##   weather_event POP_HEALTH ECONOMIC_IMPACT
## 1       Tornado      97072     58969948290
## 2          Wind      12512     17761133620
## 3         Flood      10113    179480080360
## 4           Ice       2522      9018499660
## 5      Wildfire       1543      8275745130
## 6          Hail       1484     19106782570
## 7     Hurricane       1468     90872527810

Other Weather Event Types

While we did our best to capture as many weather events as possible into our consolidation, there were still quite a few left over. An analysis into the “other” events might produce some interesting results. Included here is the list of those values that are in the “other” category currently.

# Filter the dataframe to keep only rows where weather_event is "Other"
other_events <- storm_data_col_filter[storm_data_col_filter$weather_event == "Other", ]

# Find unique values of EVTYPE for the "Other" category
unique_other_evtypes <- unique(other_events$EVTYPE)

# Print the unique EVTYPE values in the "Other" category
print(unique_other_evtypes)

##   [1] "WALL CLOUD"                    "LOW TEMPERATURE RECORD"       
##   [3] "MARINE MISHAP"                 "HIGH TEMPERATURE RECORD"      
##   [5] "SEVERE TURBULENCE"             "APACHE COUNTY"                
##   [7] "HEAVY PRECIPATATION"           "URBAN/SMALL"                  
##   [9] "WILD FIRES"                    "HIGH"                         
##  [11] "WATER SPOUT"                   "MICROBURST"                   
##  [13] "URBAN AND SMALL"               "DOWNBURST"                    
##  [15] "WET MICROBURST"                "FREEZING DRIZZLE"             
##  [17] "UNSEASONABLY WET"              "WAYTERSPOUT"                  
##  [19] "URBAN AND SMALL STREAM"        "MUD SLIDE"                    
##  [21] "LIGNTNING"                     "COOL AND WET"                 
##  [23] "SMALL STREAM AND"              "MUD SLIDES"                   
##  [25] "EXCESSIVE WETNESS"             "ROTATING WALL CLOUD"          
##  [27] "LARGE WALL CLOUD"              "GRASS FIRES"                  
##  [29] "COASTAL SURGE"                 "URBAN/SMALL STREAM"           
##  [31] "TORNDAO"                       "AVALANCE"                     
##  [33] "OTHER"                         "ICY ROADS"                    
##  [35] "HEAVY MIX"                     "DAM FAILURE"                  
##  [37] "SOUTHEAST"                     "FREEZING DRIZZLE AND FREEZING"
##  [39] "HIGH WATER"                    "WET WEATHER"                  
##  [41] "BEACH EROSIN"                  "LOW TEMPERATURE"              
##  [43] "HYPOTHERMIA"                   "MUD/ROCK SLIDE"               
##  [45] "RAPIDLY RISING WATER"          "FLASH FLOOODING"              
##  [47] "EXCESSIVE"                     "HEAVY SEAS"                   
##  [49] "?"                             "HOT PATTERN"                  
##  [51] "BRUSH FIRES"                   "WINTER MIX"                   
##  [53] "MILD PATTERN"                  "HEAVY SHOWERS"                
##  [55] "BRUSH FIRE"                    "HEAVY SHOWER"                 
##  [57] "URBAN SMALL"                   "SMALL STREAM"                 
##  [59] "URBAN/SML STREAM FLD"          "Other"                        
##  [61] "Record dry month"              "Temperature record"           
##  [63] "Marine Accident"               "COASTAL STORM"                
##  [65] "Wet Month"                     "Wet Year"                     
##  [67] "Beach Erosion"                 "Hot and Dry"                  
##  [69] "Icy Roads"                     "Landslump"                    
##  [71] "Coastal Storm"                 "small hail"                   
##  [73] "MIXED PRECIP"                  "Freezing Spray"               
##  [75] "Summary Jan 17"                "Summary of March 14"          
##  [77] "Summary of March 23"           "Summary of March 24"          
##  [79] "Summary of April 3rd"          "Summary of April 12"          
##  [81] "Summary of April 13"           "Summary of April 21"          
##  [83] "Summary of April 27"           "Summary of May 9-10"          
##  [85] "Summary of May 10"             "Summary of May 13"            
##  [87] "Summary of May 14"             "Summary of May 22 am"         
##  [89] "Summary of May 22 pm"          "Heatburst"                    
##  [91] "Summary of May 26 am"          "Summary of May 26 pm"         
##  [93] "Metro Storm, May 26"           "Summary of May 31 am"         
##  [95] "Summary of May 31 pm"          "Summary of June 3"            
##  [97] "Summary of June 4"             "Summary June 5-6"             
##  [99] "Summary June 6"                "Summary of June 11"           
## [101] "Summary of June 12"            "Summary of June 13"           
## [103] "Summary of June 15"            "Summary of June 16"           
## [105] "Summary June 18-19"            "Summary of June 23"           
## [107] "Summary of June 24"            "Summary of June 30"           
## [109] "Summary of July 2"             "Summary of July 3"            
## [111] "Summary of July 11"            "Summary of July 22"           
## [113] "Summary July 23-24"            "Summary of July 26"           
## [115] "Summary of July 29"            "Summary September 4"          
## [117] "Summary September 20"          "Summary September 23"         
## [119] "Summary Sept. 25-26"           "Summary: Oct. 20-21"          
## [121] "Summary: October 31"           "Summary: Nov. 6-7"            
## [123] "Summary: Nov. 16"              "Microburst"                   
## [125] "wet micoburst"                 "Hail(0.75)"                   
## [127] "No Severe Weather"             "Summary of May 22"            
## [129] "Summary of June 6"             "Summary of June 10"           
## [131] "Summary of June 18"            "Summary September 3"          
## [133] "Summary: Sept. 18"             "Small Hail"                   
## [135] "Freezing Drizzle"              "Record Heat"                  
## [137] "Sml Stream Fld"                "Heat Wave"                    
## [139] "NONE"                          "DAM BREAK"                    
## [141] "UNSEASONABLY COOL"             "URBAN/SML STREAM FLDG"        
## [143] "Hypothermia/Exposure"          "HYPOTHERMIA/EXPOSURE"         
## [145] "COASTALSTORM"                  "Blizzard Summary"             
## [147] "SUMMARY OF MARCH 24-25"        "SUMMARY OF MARCH 27"          
## [149] "SUMMARY OF MARCH 29"           "Freezing drizzle"             
## [151] "URBAN/SMALL STRM FLDG"         "Mild and Dry Pattern"         
## [153] "HIGH SWELLS"                   "HIGH  SWELLS"                 
## [155] "BEACH EROSION"                 "HOT SPELL"                    
## [157] "UNSEASONABLY HOT"              "COASTAL EROSION"              
## [159] "HYPERTHERMIA/EXPOSURE"         "ROCK SLIDE"                   
## [161] "HOT WEATHER"                   "TROPICAL DEPRESSION"          
## [163] "COOL SPELL"                    "RED FLAG FIRE WX"             
## [165] "VOG"                           "MONTHLY TEMPERATURE"          
## [167] "REMNANTS OF FLOYD"             "LANDSPOUT"                    
## [169] "DRIEST MONTH"                  "UNSEASONABLY COOL & WET"      
## [171] "LANDSLUMP"                     "UNSEASONAL LOW TEMP"          
## [173] "RED FLAG CRITERIA"             "WND"                          
## [175] "SMOKE"                         "EXTREMELY WET"                
## [177] "ROGUE WAVE"                    "NORTHERN LIGHTS"              
## [179] "HAZARDOUS SURF"                "ABNORMALLY WET"               
## [181] "DROWNING"                      "TSUNAMI"                      
## [183] "DENSE SMOKE"