Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. In this report we aim to describe the impact of weather events on population health and the economy in the United States between the years 1950 and 2011. Our overall hypothesis is that more violent weather events have the most impact on the population and the economy. To investigate this hypothesis, we obtained weather event data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. According to NOAA, they receive their data from the National Weather Service. In turn, “[t]he National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.” From these data, we found that, with regard to public health impact, tornadoes were, by far, the leading cause of injury and fatalities. Additionally, with regard to economic impact, we suprisingly found that a less violent event, flooding, was the leader with hurricanes a distant second.
Ensure we have all the packages and libraries we will be using for the analysis.
# Make sure we have all the packages we need installed
# and their libraries loaded
# Create a vector of package names
pkgs <- c("tidyverse", "lubridate", "viridis", "scales")
# Check if each package is installed
for (pkg in pkgs) {
if (!require(pkg, character.only = TRUE)) {
# If not installed, install the package
install.packages(pkg)
# Load the package library
library(pkg, character.only = TRUE)
}
}
We reach out to the Coursera website and download the raw data file we need.
# Set file name
filename <- "StormData.csv.bz2"
# Check if file exists in current working directory
if (!file.exists(filename)) {
# If file does not exist, download data file
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = filename)
}
# Unzip data file
storm_data <- read.csv(bzfile(filename))
Now we can do some high level information gathering of our initial data.
# get some first-pass information on the data
dim(storm_data)
## [1] 902297 37
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
summary(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
We will not be using all the information in our initial dataset. We can reduce the amount of columns to start the pairing down process. Here is a list of columns we will be keeping and the justification for keeping them:
# only keep the columns relevant to our analysis
storm_data_col_filter <- storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
We need to verify the amount of missing data in each column which turns out to be zero for every column in our new dataset.
# look for incomplete data
# Calculate percentage of missing values in each column
missing_vals <- colSums(is.na(storm_data_col_filter))
percent_missing <- missing_vals / nrow(storm_data_col_filter) * 100
# Print percentage of missing values for each column
print(percent_missing)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 0 0 0 0 0 0 0
Having gone though data filtering, we now do a series of transformations that make dealing with the data easier. The most difficult part is looking at the PROPDMGEXP and CROPDMGEXP columns. Otherwise the other columns are set up to do our analysis.
Let’s review the current setup of our columns in the filtered data.
str(storm_data_col_filter)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
If we look at the PROPDMGEXP column’s table() output we can see there is a wide range of data.
# look at the frequency of unique values in PROPDMGEXP
table(storm_data_col_filter$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
We can clearly see the some obvious values such as “K” for thousands and “M” for millions, and “B” for billions However, there appear to be several values that aren’t as easily explained. The overwhelming majority of these have blank values which leaves us in the position of eliminating them or having to determine what a blank modifier means. We opted for the second option. Based on our best guess for blank values, we have determined that assuming these are in thousands is the best course of action. This conclusion was reached by looking at a random sample of blank values then looking at the property damage done with a description of the damage. An assumption of thousands of dollars seems to be appropriate.
# retrieve the observations that have blank values for PROPDMGEXP and the REMARKS that go with them
blankEstimation <- storm_data[(storm_data$PROPDMG != 0) & (storm_data$PROPDMGEXP %in% c("")), c("PROPDMG", "PROPDMGEXP", "REMARKS")]
#show the first 5 observations
head(blankEstimation, 5)
## PROPDMG PROPDMGEXP
## 192467 0.41
## 196196 3.00
## 196687 2.00
## 196961 4.00
## 199598 4.00
## REMARKS
## 192467 Wind gusts to 96 mph Mt Tamalpias and 89 mph at the Golden Gate Bridge Petaluma river at Petuluma went 1.6 feet over flood stage.
## 196196 A small tornado touched down at the North Florida Prison Reception Center damaging the building before dissipating.
## 196687 The sheriff's office reported numerous power lines and trees were down.
## 196961 Thunderstorm winds produced minor damage to five homes and ripped off a tin roof from a barn in Hartwell. ~ Hart County 1 E Hartwell,31,2032EST,0.5,30,0,0,3,?,Tornado (F0) \nA short lived tornado touched down and toppled two large trees and blew over a farm shed. Two large columns that supported the front porch of a house were blown away.
## 199598 Several large limbs were downed on U.S. Route 24 west of Logansport. Four vehicles were damaged, and the highway was blocked.
We have now determined that only observations with blank, “K”, “k”, “M”, “m”, and “B” will be used for our analysis. Just as a sanity check we will ensure that the removed observations only account for a small percentage of the overall observations.
# Calculate the percentage of removed observations
propdmgexp_counts <- table(storm_data_col_filter$PROPDMGEXP)
removed_counts <- sum(propdmgexp_counts[c("-", "?", "+", "1", "2", "3", "4", "5", "6", "7", "8", "h", "H","0")])
total_obs <- nrow(storm_data_col_filter)
pct_removed <- removed_counts / total_obs * 100
pct_kept <- 100 - pct_removed
# Print the results
cat("Removed", removed_counts, "observations out of", total_obs, "(", round(pct_removed, 4), "%)", "\n")
## Removed 321 observations out of 902297 ( 0.0356 %)
cat("Kept", total_obs - removed_counts, "observations (", round(pct_kept, 4), "%)", "\n")
## Kept 901976 observations ( 99.9644 %)
Satisfied that we will only be removing a small amount of the data, we can now remove all observations with “-”, “?”, “+”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “h”, “H”, or “0” from our dataset.
# look at the number of observations pre-removal
dim(storm_data_col_filter)
## [1] 902297 7
# remove observations from PROPDMGEXP that don't match our known values
invalid_propdmgexp <- c("-", "?", "+", "1", "2", "3", "4", "5", "6", "7", "8", "h", "H", "0")
storm_data_col_filter <- storm_data_col_filter[!storm_data_col_filter$PROPDMGEXP %in% invalid_propdmgexp, ]
# look at the number of observations post-removal
dim(storm_data_col_filter)
## [1] 901976 7
We have one more column that needs to be cleaned up in this way, the CROPDMGEXP column. If we look at the CROPDMGEXP column’s table() output we can see range is not as wide as with PROPDMGEXP.
# look at the frequency of unique values in CROPDMGEXP
table(storm_data_col_filter$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618100 7 19 1 9 21 281826 1 1992
Just as before we have determined that only observations with blank, “K”, “k”, “M”, “m”, and “B” will be used for our analysis. Again, we will ensure that the removed observations only account for a small percentage of the overall observations.
# Calculate the percentage of removed observations
cropdmgexp_counts <- table(storm_data_col_filter$CROPDMGEXP)
removed_counts <- sum(cropdmgexp_counts[c("?", "2", "0")])
total_obs <- nrow(storm_data_col_filter)
pct_removed <- removed_counts / total_obs * 100
pct_kept <- 100 - pct_removed
# Print the results
cat("Removed", removed_counts, "observations out of", total_obs, "(", round(pct_removed, 4), "%)", "\n")
## Removed 27 observations out of 901976 ( 0.003 %)
cat("Kept", total_obs - removed_counts, "observations (", round(pct_kept, 4), "%)", "\n")
## Kept 901949 observations ( 99.997 %)
Satisfied that we will only be removing a small amount of the data again, we can now remove all observations with “?”, “2”, or “0” from our dataset.
# look at the number of observations pre-removal
dim(storm_data_col_filter)
## [1] 901976 7
# remove observations from PROPDMGEXP that don't match our known values
invalid_cropdmgexp <- c("?", "2", "0")
storm_data_col_filter <- storm_data_col_filter[!storm_data_col_filter$CROPDMGEXP %in% invalid_cropdmgexp, ]
# look at the number of observations post-removal
dim(storm_data_col_filter)
## [1] 901949 7
The last step is to standardize the values in our two columns so that it is easier for our analysis going forward. We will only allow “K”, “M”, and “B” values respectively.
# Convert blank or "k" values to "K" in PROPDMGEXP
storm_data_col_filter$PROPDMGEXP[storm_data_col_filter$PROPDMGEXP %in% c("", "k")] <- "K"
# Convert "m" values to "M" in PROPDMGEXP
storm_data_col_filter$PROPDMGEXP[storm_data_col_filter$PROPDMGEXP == "m"] <- "M"
# Convert blank or "k" values to "K" in CROPDMGEXP
storm_data_col_filter$CROPDMGEXP[storm_data_col_filter$CROPDMGEXP %in% c("", "k")] <- "K"
# Convert "m" values to "M" in CROPDMGEXP
storm_data_col_filter$CROPDMGEXP[storm_data_col_filter$CROPDMGEXP == "m"] <- "M"
Naturally, we need to verify that all the data has been modified. So we will look at the table() output for both columns.
# show the frequency of unique values for our columns
table(storm_data_col_filter$PROPDMGEXP)
##
## B K M
## 40 890573 11336
table(storm_data_col_filter$CROPDMGEXP)
##
## B K M
## 9 899947 1993
Now we can add new columns, called PROPDMGCALC and CROPDMGCALC, to our data that represents these final calculated values.
# Create new columns PROPDMGCALC and CROPDMGCALC
storm_data_col_filter$PROPDMGCALC <- with(storm_data_col_filter, ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
ifelse(PROPDMGEXP == "M", PROPDMG * 1000000,
ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000, PROPDMG))))
storm_data_col_filter$CROPDMGCALC <- with(storm_data_col_filter, ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
ifelse(CROPDMGEXP == "M", CROPDMG * 1000000,
ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000, CROPDMG))))
Unfortunately, the NOAA data uses a great many terms to refer to the same thing. For example, “TSTM” and “Thunderstorm” mean the same thing in the EVTYPE column. To allow for easier consumption, we will standardize the categories and add a new weather_event column to our dataset.
# create a new column called "weather_event"
storm_data_col_filter$weather_event <- NA
# consolidate similar weather events into broader categories
storm_data_col_filter$weather_event[grepl("TSTM|THUNDERSTORM", storm_data_col_filter$EVTYPE)] <- "Thunderstorm"
storm_data_col_filter$weather_event[grepl("RAIN|PRECIPITATION", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Rain"
storm_data_col_filter$weather_event[grepl("SNOW|SLEET", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Snow"
storm_data_col_filter$weather_event[grepl("WIND|GUST", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Wind"
storm_data_col_filter$weather_event[grepl("HAIL", storm_data_col_filter$EVTYPE)] <- "Hail"
storm_data_col_filter$weather_event[grepl("FLOOD|FLASH FLOOD|FLOODING", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Flood"
storm_data_col_filter$weather_event[grepl("HEAT|WARM", storm_data_col_filter$EVTYPE)] <- "Heat"
storm_data_col_filter$weather_event[grepl("COLD|FREEZE|FROST", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Cold"
storm_data_col_filter$weather_event[grepl("FOG|MIST|HAZE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Fog"
storm_data_col_filter$weather_event[grepl("ICE|GLAZE|FREEZING RAIN", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Ice"
storm_data_col_filter$weather_event[grepl("TORNADO|WATERSPOUT|FUNNEL CLOUD", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Tornado"
storm_data_col_filter$weather_event[grepl("BLIZZARD", storm_data_col_filter$EVTYPE)] <- "Blizzard"
storm_data_col_filter$weather_event[grepl("DROUGHT|DRY", storm_data_col_filter$EVTYPE)] <- "Drought"
storm_data_col_filter$weather_event[grepl("TROPICAL STORM", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Tropical Storm"
storm_data_col_filter$weather_event[grepl("HURRICANE|CYCLONE|TYPHOON", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Hurricane"
storm_data_col_filter$weather_event[grepl("EXTREME COLD|WIND CHILL", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Extreme Cold"
storm_data_col_filter$weather_event[grepl("DUST|SAND|SANDSTORM", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Dust/Sandstorm"
storm_data_col_filter$weather_event[grepl("LANDSLIDE|MUDSLIDE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Landslide"
storm_data_col_filter$weather_event[grepl("AVALANCHE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Avalanche"
storm_data_col_filter$weather_event[grepl("WILDFIRE|FOREST FIRE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Wildfire"
storm_data_col_filter$weather_event[grepl("VOLCANIC ASH|VOLCANIC ERUPTION", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Volcanic Activity"
storm_data_col_filter$weather_event[grepl("COASTAL FLOOD|STORM SURGE|TIDAL FLOOD", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Coastal Flood"
storm_data_col_filter$weather_event[grepl("RIP CURRENT|HIGH SURF", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Rip Current/High Surf"
storm_data_col_filter$weather_event[grepl("SEICHE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Seiche"
storm_data_col_filter$weather_event[grepl("TIDE|HIGH TIDE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "High Tide"
storm_data_col_filter$weather_event[grepl("WINTER STORM|WINTER WEATHER|WINTRY MIX|WINTERY MIX|WINTER STORMS", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Winter Storm"
storm_data_col_filter$weather_event[grepl("LIGHTNING|LIGHTING|LIGHTNING INJURY|LIGHTNING DAMAGE|LIGHTNING FIRE", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Lightning"
storm_data_col_filter$weather_event[grepl("FUNNEL|FUNNELS", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Funnel Cloud"
storm_data_col_filter$weather_event[grepl("HEAVY SURF|HIGH SEAS|HIGH WAVES|HEAVY SWELLS|ROUGH SEAS|ROUGH SURF", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "High Waves/Heavy Surf"
storm_data_col_filter$weather_event[grepl("RECORD HIGH|RECORD LOW|RECORD TEMPERATURE|RECORD WARMTH|RECORD COOL", storm_data_col_filter$EVTYPE, ignore.case = TRUE)] <- "Record Temperature"
# any remaining values in the "EVTYPE" column are put into the "Other" category
storm_data_col_filter$weather_event[is.na(storm_data_col_filter$weather_event)] <- "Other"
At last we can get to the two fundamental questions that we need answered by our data. We will look at each one in turn and examine how we arrived at our conclusions.
Question Two: Across the United States, which types of events have the greatest economic consequences?
We need to calculate the Economic Impact by adding the property damage and crop damage together then using that value to compare by our standardized event type. Finally, we show the most harmful event types with respect to economic damage. To our surprise, flooding was the clear leader with hurricanes a distant second. Notice that our most deadly event, tornadoes, came in third in terms of economic damage.
# Create a new column to calculate the total economic impact
storm_data_col_filter$ECONOMIC_IMPACT <- storm_data_col_filter$PROPDMGCALC + storm_data_col_filter$CROPDMGCALC
# Calculate the total economic consequences for each event type
economic_impact <- aggregate(ECONOMIC_IMPACT ~ weather_event, data = storm_data_col_filter, FUN = sum)
# Sort the data by the calculated economic impact
economic_impact <- economic_impact[order(economic_impact$ECONOMIC_IMPACT, decreasing = TRUE), ]
# Display the event types and the economic consequences
print(economic_impact)
## weather_event ECONOMIC_IMPACT
## 8 Flood 179480080360
## 15 Hurricane 90872527810
## 26 Tornado 58969948290
## 3 Coastal Flood 43741176060
## 11 Hail 19106782570
## 30 Wind 17761133620
## 5 Drought 15025675380
## 16 Ice 9018499660
## 27 Tropical Storm 8409286550
## 29 Wildfire 8275745130
## 31 Winter Storm 6824751750
## 13 High Tide 4651783000
## 20 Rain 4017831490
## 4 Cold 2287607500
## 7 Extreme Cold 1392048400
## 25 Thunderstorm 1226668550
## 24 Snow 1138352790
## 19 Other 991311000
## 18 Lightning 951112870
## 12 Heat 924549250
## 2 Blizzard 771473950
## 17 Landslide 346094000
## 22 Rip Current/High Surf 105318000
## 9 Fog 25011500
## 14 High Waves/Heavy Surf 11400500
## 6 Dust/Sandstorm 9938130
## 1 Avalanche 8721800
## 23 Seiche 980000
## 28 Volcanic Activity 500000
## 10 Funnel Cloud 199600
## 21 Record Temperature 0
We can also represent economic impact by event type visually, using the following graph:
# Ensure proper formatting of weather_event factor
economic_impact$weather_event <- factor(economic_impact$weather_event)
# Filter the data to keep only the top 10 weather events based on economic impact
top_ten_events_economic <- economic_impact %>%
group_by(weather_event) %>%
summarise(Total_ECONOMIC_IMPACT = sum(ECONOMIC_IMPACT)) %>%
arrange(desc(Total_ECONOMIC_IMPACT)) %>%
slice_head(n = 10) %>%
pull(weather_event)
economic_impact_top_ten <- economic_impact %>%
filter(weather_event %in% top_ten_events_economic)
# Create the ggplot visualization for the top 10 weather events
economic_impact_plot_top_ten <- ggplot(economic_impact_top_ten, aes(x = reorder(weather_event, -ECONOMIC_IMPACT), y = ECONOMIC_IMPACT / 1e5, fill = weather_event)) +
geom_bar(stat = "identity") +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Top 10 Weather Events and Their Economic Impact",
x = "Weather Event",
y = "Economic Impact (per 100K), Square Root Scale") +
scale_fill_viridis(discrete = TRUE) +
scale_y_continuous(trans = "sqrt", labels = comma) +
coord_flip()
print(economic_impact_plot_top_ten)
Now that we have completed our analysis, there are some interesting paths we could continue to follow.
A further dive into the top weather events that are, both, high population heath and economic impact would be an interesting avenue to pursue. We have include some initial data to get things started.
# Top ten weather events by POP_HEALTH
top_pop_health <- event_harm %>%
top_n(10, POP_HEALTH) %>%
arrange(desc(POP_HEALTH))
# Top ten weather events by ECONOMIC_IMPACT
top_econ_impact <- economic_impact %>%
top_n(10, ECONOMIC_IMPACT) %>%
arrange(desc(ECONOMIC_IMPACT))
# Top overlapping events
top_overlap <- inner_join(top_pop_health, top_econ_impact, by = "weather_event")
# Print the top overlapping events
print(top_overlap)
## weather_event POP_HEALTH ECONOMIC_IMPACT
## 1 Tornado 97072 58969948290
## 2 Wind 12512 17761133620
## 3 Flood 10113 179480080360
## 4 Ice 2522 9018499660
## 5 Wildfire 1543 8275745130
## 6 Hail 1484 19106782570
## 7 Hurricane 1468 90872527810
While we did our best to capture as many weather events as possible into our consolidation, there were still quite a few left over. An analysis into the “other” events might produce some interesting results. Included here is the list of those values that are in the “other” category currently.
# Filter the dataframe to keep only rows where weather_event is "Other"
other_events <- storm_data_col_filter[storm_data_col_filter$weather_event == "Other", ]
# Find unique values of EVTYPE for the "Other" category
unique_other_evtypes <- unique(other_events$EVTYPE)
# Print the unique EVTYPE values in the "Other" category
print(unique_other_evtypes)
## [1] "WALL CLOUD" "LOW TEMPERATURE RECORD"
## [3] "MARINE MISHAP" "HIGH TEMPERATURE RECORD"
## [5] "SEVERE TURBULENCE" "APACHE COUNTY"
## [7] "HEAVY PRECIPATATION" "URBAN/SMALL"
## [9] "WILD FIRES" "HIGH"
## [11] "WATER SPOUT" "MICROBURST"
## [13] "URBAN AND SMALL" "DOWNBURST"
## [15] "WET MICROBURST" "FREEZING DRIZZLE"
## [17] "UNSEASONABLY WET" "WAYTERSPOUT"
## [19] "URBAN AND SMALL STREAM" "MUD SLIDE"
## [21] "LIGNTNING" "COOL AND WET"
## [23] "SMALL STREAM AND" "MUD SLIDES"
## [25] "EXCESSIVE WETNESS" "ROTATING WALL CLOUD"
## [27] "LARGE WALL CLOUD" "GRASS FIRES"
## [29] "COASTAL SURGE" "URBAN/SMALL STREAM"
## [31] "TORNDAO" "AVALANCE"
## [33] "OTHER" "ICY ROADS"
## [35] "HEAVY MIX" "DAM FAILURE"
## [37] "SOUTHEAST" "FREEZING DRIZZLE AND FREEZING"
## [39] "HIGH WATER" "WET WEATHER"
## [41] "BEACH EROSIN" "LOW TEMPERATURE"
## [43] "HYPOTHERMIA" "MUD/ROCK SLIDE"
## [45] "RAPIDLY RISING WATER" "FLASH FLOOODING"
## [47] "EXCESSIVE" "HEAVY SEAS"
## [49] "?" "HOT PATTERN"
## [51] "BRUSH FIRES" "WINTER MIX"
## [53] "MILD PATTERN" "HEAVY SHOWERS"
## [55] "BRUSH FIRE" "HEAVY SHOWER"
## [57] "URBAN SMALL" "SMALL STREAM"
## [59] "URBAN/SML STREAM FLD" "Other"
## [61] "Record dry month" "Temperature record"
## [63] "Marine Accident" "COASTAL STORM"
## [65] "Wet Month" "Wet Year"
## [67] "Beach Erosion" "Hot and Dry"
## [69] "Icy Roads" "Landslump"
## [71] "Coastal Storm" "small hail"
## [73] "MIXED PRECIP" "Freezing Spray"
## [75] "Summary Jan 17" "Summary of March 14"
## [77] "Summary of March 23" "Summary of March 24"
## [79] "Summary of April 3rd" "Summary of April 12"
## [81] "Summary of April 13" "Summary of April 21"
## [83] "Summary of April 27" "Summary of May 9-10"
## [85] "Summary of May 10" "Summary of May 13"
## [87] "Summary of May 14" "Summary of May 22 am"
## [89] "Summary of May 22 pm" "Heatburst"
## [91] "Summary of May 26 am" "Summary of May 26 pm"
## [93] "Metro Storm, May 26" "Summary of May 31 am"
## [95] "Summary of May 31 pm" "Summary of June 3"
## [97] "Summary of June 4" "Summary June 5-6"
## [99] "Summary June 6" "Summary of June 11"
## [101] "Summary of June 12" "Summary of June 13"
## [103] "Summary of June 15" "Summary of June 16"
## [105] "Summary June 18-19" "Summary of June 23"
## [107] "Summary of June 24" "Summary of June 30"
## [109] "Summary of July 2" "Summary of July 3"
## [111] "Summary of July 11" "Summary of July 22"
## [113] "Summary July 23-24" "Summary of July 26"
## [115] "Summary of July 29" "Summary September 4"
## [117] "Summary September 20" "Summary September 23"
## [119] "Summary Sept. 25-26" "Summary: Oct. 20-21"
## [121] "Summary: October 31" "Summary: Nov. 6-7"
## [123] "Summary: Nov. 16" "Microburst"
## [125] "wet micoburst" "Hail(0.75)"
## [127] "No Severe Weather" "Summary of May 22"
## [129] "Summary of June 6" "Summary of June 10"
## [131] "Summary of June 18" "Summary September 3"
## [133] "Summary: Sept. 18" "Small Hail"
## [135] "Freezing Drizzle" "Record Heat"
## [137] "Sml Stream Fld" "Heat Wave"
## [139] "NONE" "DAM BREAK"
## [141] "UNSEASONABLY COOL" "URBAN/SML STREAM FLDG"
## [143] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [145] "COASTALSTORM" "Blizzard Summary"
## [147] "SUMMARY OF MARCH 24-25" "SUMMARY OF MARCH 27"
## [149] "SUMMARY OF MARCH 29" "Freezing drizzle"
## [151] "URBAN/SMALL STRM FLDG" "Mild and Dry Pattern"
## [153] "HIGH SWELLS" "HIGH SWELLS"
## [155] "BEACH EROSION" "HOT SPELL"
## [157] "UNSEASONABLY HOT" "COASTAL EROSION"
## [159] "HYPERTHERMIA/EXPOSURE" "ROCK SLIDE"
## [161] "HOT WEATHER" "TROPICAL DEPRESSION"
## [163] "COOL SPELL" "RED FLAG FIRE WX"
## [165] "VOG" "MONTHLY TEMPERATURE"
## [167] "REMNANTS OF FLOYD" "LANDSPOUT"
## [169] "DRIEST MONTH" "UNSEASONABLY COOL & WET"
## [171] "LANDSLUMP" "UNSEASONAL LOW TEMP"
## [173] "RED FLAG CRITERIA" "WND"
## [175] "SMOKE" "EXTREMELY WET"
## [177] "ROGUE WAVE" "NORTHERN LIGHTS"
## [179] "HAZARDOUS SURF" "ABNORMALLY WET"
## [181] "DROWNING" "TSUNAMI"
## [183] "DENSE SMOKE"