Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This document presents some analyses of the NOAA storm database that can be useful for leaders and members of local communities in planning for potentially destructive weather events.
The analysis and conclusions described below are directed at identifying the weather events that had the most impact to people, property and crops between 2001 and 2010. The impact to people will be computed using counts of injury and fatality recorded in the storm database. It is then the economic consequences of weather events that will be informed by the damage (in dollars) to property and crops. These will be conducted in two separate analyses, each of which will begin with a visualization that informs the specific computations regarding the type of weather event that may need more analysis to find potential actions that could avoid these costs.
Pre-processing the data, as usual, was the most time-consuming part of the analysis. All code as well as the output are below, including the deadends encountered in trying to clean up the EVTYPE entries with many typos, abbreviation and extraneous adjectives. The use of bulk replacements and individual corrections may or may not be useful on a different time frame since they are based on what is in the EVTYPE column after subsetting on the dates.
Final outcomes: Tornadoes cause the most harm to humans but Heat events have a higher mortality rate. Economically, Flooding causes the most property damage while Drought is the most damaging to crops.
NOAA’s Storm Database was read into R to begin the pre-processing before analysis.
storm_data <- read.csv("repdata_data_StormData.csv.bz2", quote="\"") #Be patient, it is a large file
dim(storm_data)
## [1] 902297 37
head(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
We see that there 902297 rows and 37 columns in this file. Because this is a large dataset and there are 1745947 NA entries, it makes sense to find out more about the NAs and consider some data reduction before continuing with analysis.
# Extract the year and count records per year
storm_data$YEAR <- year(as.Date(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))
table(storm_data$YEAR)
##
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962
## 223 269 272 492 609 1413 1703 2184 2213 1813 1945 2246 2389
## 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975
## 1968 2348 2855 2388 2688 3312 2926 3215 3471 2168 4463 5386 4975
## 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988
## 3768 3728 3657 4279 6146 4517 7132 8322 7335 7979 8726 7367 7257
## 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
## 10410 10946 12522 13534 12607 20631 27970 32270 28680 38128 31289 34471 34962
## 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
## 36293 39752 39363 39184 44034 43289 55663 45817 48161 62174
The online description of NOAA’s data collection timeline gives a very big hint about the many NAs by visualizing the addition of more Event Types on separate occasions. The 1950 beginning of this database only tracked tornados. We also see that improving technology changed Collection Sources and format for Supplemental Information can account for the growth in reporting since the mid-1990s.
To focus on a single, recent (this file ends in November 2011) decade, we first must convert the beginning date column (BGN_date) to the actual date format.
# Convert the BGN_DATE column to an actual Date format
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
Now we can subset the data to contain only the rows from the first day of 2001 to the last day of 2010 which we can used to answer the research questions:
decade_data <- subset(storm_data, BGN_DATE >= as.Date("2001-01-01") & BGN_DATE <= as.Date("2010-12-31"))
range(decade_data$BGN_DATE) #confirm date range
## [1] "2001-01-01" "2010-12-31"
dim(decade_data) #confirm data reduction
## [1] 426518 38
Now we have fewer than half as many rows, the same number of columns and the data file is restricted to the years 2001-2010.Under the admittedly shaky assumption that the most harmful/damaging events will occur the most frequently, lets find out which of the 48 NWS events are the most numerous in the EVTYPE column.
sort(table(decade_data$EVTYPE), decreasing = TRUE)
##
## HAIL TSTM WIND
## 136697 72855
## THUNDERSTORM WIND FLASH FLOOD
## 59724 34860
## FLOOD TORNADO
## 15417 14326
## HIGH WIND HEAVY SNOW
## 13434 9167
## WINTER STORM HEAVY RAIN
## 8086 8035
## LIGHTNING MARINE TSTM WIND
## 7978 6175
## WINTER WEATHER MARINE THUNDERSTORM WIND
## 5651 4267
## FUNNEL CLOUD STRONG WIND
## 3758 3012
## WATERSPOUT WILDFIRE
## 2140 1957
## BLIZZARD DROUGHT
## 1843 1550
## FROST/FREEZE ICE STORM
## 1221 1210
## WINTER WEATHER/MIX DENSE FOG
## 1104 1035
## URBAN/SML STREAM FLD EXTREME COLD/WIND CHILL
## 903 847
## EXCESSIVE HEAT LAKE-EFFECT SNOW
## 812 569
## WILD/FOREST FIRE HEAT
## 561 519
## LANDSLIDE HIGH SURF
## 496 488
## COLD/WIND CHILL COASTAL FLOOD
## 473 467
## TROPICAL STORM MARINE HAIL
## 446 396
## RIP CURRENT TSTM WIND/HAIL
## 385 338
## DUST STORM AVALANCHE
## 314 279
## HEAVY SURF/HIGH SURF FOG
## 228 200
## ASTRONOMICAL LOW TIDE RIP CURRENTS
## 148 139
## EXTREME COLD WIND
## 134 125
## STORM SURGE SNOW
## 120 114
## ASTRONOMICAL HIGH TIDE STORM SURGE/TIDE
## 103 98
## DUST DEVIL HURRICANE/TYPHOON
## 95 88
## MARINE HIGH WIND FREEZING RAIN
## 76 59
## TROPICAL DEPRESSION LIGHT SNOW
## 59 55
## SLEET HURRICANE
## 54 42
## MODERATE SNOWFALL FREEZING FOG
## 42 41
## COASTAL FLOODING DRY MICROBURST
## 37 37
## FREEZE MARINE STRONG WIND
## 37 36
## WINTRY MIX STRONG WINDS
## 30 25
## UNSEASONABLY WARM HEAVY SURF
## 25 23
## MIXED PRECIPITATION MONTHLY PRECIPITATION
## 23 22
## VOLCANIC ASH GUSTY WINDS
## 20 19
## RECORD WARMTH FROST
## 18 17
## UNSEASONABLY DRY EXTREME WINDCHILL
## 16 15
## SMALL HAIL SLEET STORM
## 14 12
## SMOKE LAKESHORE FLOOD
## 11 10
## RECORD HEAT SEICHE
## 10 10
## THUNDERSTORM TSUNAMI
## 10 10
## DENSE SMOKE WIND ADVISORY
## 9 9
## UNSEASONABLY COLD BLACK ICE
## 7 6
## COLD WINTER WEATHER MIX
## 6 6
## DRY GUSTY THUNDERSTORM WINDS
## 5 5
## HARD FREEZE MUDSLIDE
## 5 5
## PROLONG COLD SNOW SHOWERS
## 5 5
## UNSEASONABLY COOL WIND CHILL
## 5 5
## ACCUMULATED SNOWFALL COLD WIND CHILL TEMPERATURES
## 4 4
## EXTREME WINDCHILL TEMPERATURES ICY ROADS
## 4 4
## PROLONG WARMTH SNOW SQUALLS
## 4 4
## UNSEASONABLY HOT UNUSUALLY COLD
## 4 4
## GUSTY THUNDERSTORM WIND HIGH SEAS
## 3 3
## HIGH SURF ADVISORY LIGHT FREEZING RAIN
## 3 3
## MUD SLIDE SNOW/BLOWING SNOW
## 3 3
## SNOW/SLEET SNOWMELT FLOODING
## 3 3
## VOLCANIC ASHFALL ABNORMALLY DRY
## 3 2
## BLOWING DUST CSTL FLOODING/EROSION
## 2 2
## DRY CONDITIONS EARLY SNOWFALL
## 2 2
## FALLING SNOW/ICE FIRST SNOW
## 2 2
## FREEZING DRIZZLE GRADIENT WIND
## 2 2
## GUSTY WIND ICE/SNOW
## 2 2
## LAKE EFFECT SNOW NON TSTM WIND
## 2 2
## RECORD LOW RAINFALL RED FLAG CRITERIA
## 2 2
## ROUGH SEAS SEVERE THUNDERSTORMS
## 2 2
## SNOW DROUGHT SNOW/FREEZING RAIN
## 2 2
## VERY DRY HIGH SURF ADVISORY
## 2 1
## WATERSPOUT ABNORMALLY WET
## 1 1
## BEACH EROSION BRUSH FIRE
## 1 1
## COLD WEATHER DROWNING
## 1 1
## DRY SPELL DUST DEVEL
## 1 1
## EXCESSIVE HEAT/DROUGHT EXTREMELY WET
## 1 1
## FUNNEL CLOUDS GUSTY LAKE WIND
## 1 1
## HAZARDOUS SURF HEAVY RAIN EFFECTS
## 1 1
## HIGH SURF ADVISORIES HIGH WATER
## 1 1
## ICE ON ROAD LATE SEASON SNOW
## 1 1
## NON-TSTM WIND NORTHERN LIGHTS
## 1 1
## OTHER PATCHY ICE
## 1 1
## RAIN RECORD COLD
## 1 1
## ROGUE WAVE SNOW ADVISORY
## 1 1
## THUNDERSTORMS TORNADO DEBRIS
## 1 1
## UNUSUALLY LATE SNOW VERY WARM
## 1 1
## WHIRLWIND WIND DAMAGE
## 1 1
## WND
## 1
This is a problematic list since it has way too many event types, many of which are recorded only a single-digit number of times.A few manual comparisons of these to the NWS Instruction 10-1605 file (Table 1, p. 6) reveals a pattern of reporting with non-standard terms. Online suggestions include correcting entries that have extraneous spaces and creating a lookup list for the 48 official event names then looking for the non-matches to deal with individually or, perhaps, in groups.
#Remove hidden spaces
decade_data$EVTYPE <- trimws(decade_data$EVTYPE)
#Create the official list of event types
official_events <- c(
"ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD", "COASTAL FLOOD",
"COLD/WIND CHILL", "DEBRIS FLOW", "DENSE FOG", "DENSE SMOKE", "DROUGHT",
"DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT", "EXTREME COLD/WIND CHILL",
"FLASH FLOOD", "FLOOD", "FROST/FREEZE", "FUNNEL CLOUD", "FREEZING FOG",
"HAIL", "HEAT", "HEAVY RAIN", "HEAVY SNOW", "HIGH SURF", "HIGH WIND",
"HURRICANE (TYPHOON)", "ICE STORM", "LAKE-EFFECT SNOW", "LAKESHORE FLOOD",
"LIGHTNING", "MARINE HAIL", "MARINE HIGH WIND", "MARINE STRONG WIND",
"MARINE THUNDERSTORM WIND", "RIP CURRENT", "SEICHE", "SLEET",
"STORM SURGE/TIDE", "STRONG WIND", "THUNDERSTORM WIND", "TORNADO",
"TROPICAL DEPRESSION", "TROPICAL STORM", "TSUNAMI", "VOLCANIC ASH",
"WATERSPOUT", "WILDFIRE", "WINTER STORM", "WINTER WEATHER"
)
length(official_events)
## [1] 48
The count of official event names matches the number listed in NWS Instruction 10-1605 file (Table 1, p. 6) so we have the “lookup table” we need. Next, we’ll create the other table needed for a comparison: the one for rows that do not appear in the official_events list.
unofficial_names <- unique(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
#length(unofficial_names) #first run gave 121 names
#head(unofficial_names, 20)
Looking at just the first 20 unofficial event names reveals some of the issues with the way the data was entered into the database. TSTM as an abbreviation for Thunderstorm, Frost and Freeze are combined in the official list, Forest is not part of the event name Wildfire. We can fix these with the mutate function and iterate until we cannot determine the proper official event.
mutate_it1 <- decade_data %>%
mutate(EVTYPE = case_when(
EVTYPE == "TSTM WIND" ~ "THUNDERSTORM WIND",
EVTYPE == "FROST" ~ "FROST/FREEZE",
EVTYPE == "FREEZE" ~ "FROST/FREEZE",
EVTYPE == "WILD/FOREST FIRE" ~ "WILDFIRE",
TRUE ~ EVTYPE)
)
unofficial_names <- unique(mutate_it1$EVTYPE[!(mutate_it1$EVTYPE %in% official_events)])
#length(unofficial_names) #second run gave 117
#head(unofficial_names, 20)
Well, that is a disappointing improvement that makes me realize that the mutate function would mean looking at each individual unofficial name…117 more times. Though I also see a couple here that I will not be able to assign any official event name to without looking at the remarks in the database (i.e., Landslide, Fog and Snow). Let’s do the the obvious ones in this list of 20 then try grepl on some of the others.
mutate_it2 <- mutate_it1 %>%
mutate(EVTYPE = case_when(
EVTYPE == "EXTREME COLD" ~ "EXTREME COLD/WIND CHILL",
EVTYPE == "TSTM WIND/HAIL" ~ "THUNDERSTOM WIND",
EVTYPE == "RIP CURRENTS" ~ "RIP CURRENT",
EVTYPE == "HEAVY SURF" ~ "HIGH SURF",
EVTYPE == "WINTER WEATHER MIX" ~ "WINTER WEATHER",
TRUE ~ EVTYPE)
)
unofficial_names <- unique(mutate_it2$EVTYPE[!(mutate_it2$EVTYPE %in% official_events)])
#length(unofficial_names)
#head(unofficial_names, 30)
Still too many. Trying grepl.
mutate_it2$EVTYPE[grepl("FLASH", mutate_it2$EVTYPE)] <- "FLASH FLOOD"
mutate_it2$EVTYPE[grepl("COASTAL", mutate_it2$EVTYPE)] <- "COASTAL FLOOD"
mutate_it2$EVTYPE[grepl("LAKESHORE", mutate_it2$EVTYPE)] <- "LAKESHORE FLOOD"
mutate_it2$EVTYPE[grepl("FLOOD", mutate_it2$EVTYPE) &
!(mutate_it2$EVTYPE %in% official_events)] <- "FLOOD"
unofficial_names <- unique(mutate_it2$EVTYPE[!(mutate_it2$EVTYPE %in% official_events)])
#length(unofficial_names)
#head(unofficial_names)
This is an insanely manual and slow process. I searched for some more advice online and will regroup with a frequency-based prioritization of which unofficial_names to tackle. Then try the recommended regex categorization approach using the stringr package. This allows for
mismatched_counts <- table(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
mismatched_counts <- sort(mismatched_counts, decreasing = TRUE)
length(mismatched_counts)
## [1] 121
Back to the 121 mismatches seen earlier. Using the stringr package for R allows for replacing the text in an entire cell with the official name as well as collecting groups of abbreviations and misspellings. Note: I used Google Gemini to read the NWS Instructions and the EVTYPE column to match up these variants with official designation then edited them after seeing the next - much shorter - list of unmatched names.
decade_data <- decade_data %>%
mutate(EVTYPE = case_when(
# Catch all variants of Thunderstorm Wind (TSTM, TSTM WIND, THUNDERSTORM WINDS, etc.)
str_detect(EVTYPE, "TSTM|THU|BURST") ~ "THUNDERSTORM WIND",
# Catch all variants of Marine/Coastal/Flash Flooding
str_detect(EVTYPE, "FLASH") ~ "FLASH FLOOD",
str_detect(EVTYPE, "COASTAL|CSTL|TIDE") ~ "COASTAL FLOOD",
str_detect(EVTYPE, "FLOOD|FLD|STREAM") & !str_detect(EVTYPE, "FLASH|COASTAL") ~ "FLOOD",
# Catch Winter/Cold variants
str_detect(EVTYPE, "WINT|SNOW|ICE|FREEZE|FROST|HYPO") ~ "WINTER WEATHER",
str_detect(EVTYPE, "CHILL|COLD") ~ "COLD/WIND CHILL",
# Catch Heat variants
str_detect(EVTYPE, "HEAT|WARM|HOT") ~ "HEAT",
# Catch Wind variants that aren't TSTMs
str_detect(EVTYPE, "WIND") & !str_detect(EVTYPE, "THUNDERSTORM") ~ "HIGH WIND",
# Catch Fire variants
str_detect(EVTYPE, "WILD|FOREST|GRASS|BRUSH") ~ "WILDFIRE",
# Keep original for now if no match
TRUE ~ EVTYPE
))
#How did that help with cleaning the data?
mismatched_counts2 <- table(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
mismatched_counts2 <- sort(mismatched_counts2, decreasing = TRUE)
unofficial_names <- unique(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
length(unofficial_names)
## [1] 49
head(unofficial_names, 50)
## [1] "RIP CURRENTS" "HEAVY SURF" "FOG"
## [4] "DRY" "STORM SURGE" "ABNORMALLY DRY"
## [7] "SMALL HAIL" "RED FLAG CRITERIA" "MUD SLIDE"
## [10] "RAIN" "WND" "FREEZING RAIN"
## [13] "UNSEASONABLY DRY" "UNSEASONABLY COOL" "SMOKE"
## [16] "HURRICANE" "EXTREMELY WET" "VERY DRY"
## [19] "RECORD LOW RAINFALL" "ROGUE WAVE" "LANDSLIDE"
## [22] "MONTHLY PRECIPITATION" "MIXED PRECIPITATION" "DRY CONDITIONS"
## [25] "DUST DEVEL" "MUDSLIDE" "OTHER"
## [28] "DRY SPELL" "BEACH EROSION" "ROUGH SEAS"
## [31] "LIGHT FREEZING RAIN" "HEAVY RAIN EFFECTS" "BLOWING DUST"
## [34] "FUNNEL CLOUDS" "HIGH WATER" "NORTHERN LIGHTS"
## [37] "HIGH SURF ADVISORY" "HIGH SEAS" "HAZARDOUS SURF"
## [40] "ABNORMALLY WET" "TORNADO DEBRIS" "FREEZING DRIZZLE"
## [43] "ICY ROADS" "DROWNING" "HIGH SURF ADVISORIES"
## [46] "HURRICANE/TYPHOON" "HEAVY SURF/HIGH SURF" "SLEET STORM"
## [49] "VOLCANIC ASHFALL"
That was MUCH more satisfying! There are still some unofficial names that can be batched after consultation with NWS Instruction 10-1605.
#Note: I asked Gemini to read the NWS Instruction and review the code I had written to match the above format. This took a few rounds to make this code chunk efficient. I do remember that I missed "MUD SLIDE" and the misspellings of hurricane and typhoon; it missed all of the "DROUGHT" and "WINTER WEATHER" because they were not originally specified in my prompt. Gemini added the references to the Sections, though I could have done that manually if I had thought of it to start with.
decade_data <- decade_data %>%
mutate(EVTYPE = case_when(
# DEBRIS FLOW (Section 7.8: Includes Landslides and Mudslides)
str_detect(EVTYPE, "LANDSLIDE|MUD SLIDE|MUDSLIDE|DEBRIS") ~ "DEBRIS FLOW",
# HURRICANE / TYPHOON (Section 7.25: Covers all named tropical systems)
str_detect(EVTYPE, "HURR|TYPH") ~ "HURRICANE (TYPHOON)",
# HIGH SURF (Section 7.23)
str_detect(EVTYPE, "SURF|SWELL|ROUGH SEAS|HAZARDOUS SURF|BEACH") ~ "HIGH SURF",
# DOUGHT (Section 7.25)
str_detect(EVTYPE, "ABNORMALLY DRY|UNSEASONABLY DRY|VERY DRY|DRY SPELL|DRY") ~ "DROUGHT",
# Winter Weather (Section 7.25)
str_detect(EVTYPE, "MIX|FREEZING|DRIZZLE|ICY ROADS|RAIN/SNOW|UNSEASONABLY COOL") ~ "WINTER WEATHER",
# Keep original for everything else
TRUE ~ EVTYPE
))
unofficial_names <- unique(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
length(unofficial_names)
## [1] 24
head(unofficial_names, 40)
## [1] "RIP CURRENTS" "FOG" "STORM SURGE"
## [4] "SMALL HAIL" "RED FLAG CRITERIA" "RAIN"
## [7] "WND" "SMOKE" "EXTREMELY WET"
## [10] "RECORD LOW RAINFALL" "ROGUE WAVE" "MONTHLY PRECIPITATION"
## [13] "DUST DEVEL" "OTHER" "HEAVY RAIN EFFECTS"
## [16] "BLOWING DUST" "FUNNEL CLOUDS" "HIGH WATER"
## [19] "NORTHERN LIGHTS" "HIGH SEAS" "ABNORMALLY WET"
## [22] "DROWNING" "SLEET STORM" "VOLCANIC ASHFALL"
With only 24 unmatched names, a manual inspection of these few make the small stuff (pluralizing, typos) and extraneous verbiage much easier to see and fix. One more round of direct reassignments will leave us with a short enough list that leaving them out will have little/no effect on the analysis.
final_rename <- decade_data %>%
mutate(EVTYPE = case_when(
EVTYPE == "RIP CURRENTS" ~ "RIP CURRENT",
EVTYPE == "HEAVY RAIN EFFECTS" ~ "HEAVY RAIN",
EVTYPE == "SLEET STORM" ~ "SLEET",
EVTYPE == "DUST DEVEL" ~ "DUST DEVIL",
EVTYPE == "VOLCANIC ASHFALL" ~ "VOLCANIC ASH",
EVTYPE == "STORM SURGE" ~ "STORM SURGE/TIDE",
EVTYPE == "WND" ~ "WIND",
EVTYPE == "SMALL HAIL" ~ "HAIL",
EVTYPE == "FUNNEL CLOUDS" ~ "FUNNEL CLOUD",
TRUE ~ EVTYPE)
)
unofficial_names <- unique(final_rename$EVTYPE[!(final_rename$EVTYPE %in% official_events)])
table(unofficial_names)
## unofficial_names
## ABNORMALLY WET BLOWING DUST DROWNING
## 1 1 1
## EXTREMELY WET FOG HIGH SEAS
## 1 1 1
## HIGH WATER MONTHLY PRECIPITATION NORTHERN LIGHTS
## 1 1 1
## OTHER RAIN RECORD LOW RAINFALL
## 1 1 1
## RED FLAG CRITERIA ROGUE WAVE SMOKE
## 1 1 1
## WIND
## 1
Digging into just 16 records to determine if items like “RAIN” and “WIND” fit into official events is not that much work considering where we started. However, removing 16 records out of ’r length(decade_data)` seems inconsequential. Turning to Gemini could also do the digging to the possibility of adding a few to official categorizations is another option here. I will choose to filter them out since this analysis is only practice with R, not actual public safety.
clean_data <- final_rename %>%
filter(!EVTYPE %in% c("ABNORMALLY WET", "BLOWING DUST", "DROWNING", "EXTREMELY WET", "FOG", "HIGH SEAS", "HIGH WATER", "MONTHLY PRECIPITATION", "NORTHERN LIGHTS", "OTHER", "RAIN", "RECORD LOW RAINFALL", "RED FLAG CRITERIA", "ROGUE WAVE", "SMOKE", "WIND"))
With the data now processed, the research questions are ready to be answered. In each case, we’ll start with a plot. Rather than plotting all 48 types when the questions are asking about “most” or “greatest”, we’ll stick to looking at just the top ten. This will make it easier to see the event names and determine the highest counts.
# Aggregate
top_10_health <- clean_data %>%
group_by(EVTYPE) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
Total = Fatalities + Injuries
) %>%
slice_max(Total, n = 10) %>%
pivot_longer(cols = c(Fatalities, Injuries),
names_to = "Harm_Type",
values_to = "Count")
# Plot
ggplot(top_10_health, aes(x = reorder(EVTYPE, Total), y = Count, fill = Harm_Type)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("Fatalities" = "#D55E00", "Injuries" = "#56B4E9")) +
labs(
title = "Top 10 Most Harmful Weather Events in the US",
subtitle = "Stacked by impact type (Fatalities vs. Injuries)",
x = "Weather Event Type",
y = "Number of People Affected",
fill = "Impact Type"
) +
theme_minimal()
Figure 1: Top 10 Most Harmful Weather Events (2001-2010). Tornadoes represent the highest total impact, while Heat shows a higher mortality rate.
# Creating a reference table for the text
health_stats <- clean_data %>%
group_by(EVTYPE) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
Total = Fatalities + Injuries
)
It is abundantly clear that tornadoes have the greatest impact on human lives in the 2001-2010 timeframe. The combined total from Tornadoes is 8733 that includes 565 fatalities and 8168 injuries. It is also notable that Heat had a larger number of fatalities (987) even with a much smaller total (4702).
# Mapping exponents to their numeric values
damage_stats <- clean_data %>%
mutate(
prop_multiplier = case_when(
PROPDMGEXP %in% c("K", "k") ~ 1000,
PROPDMGEXP %in% c("M", "m") ~ 1000000,
PROPDMGEXP %in% c("B", "b") ~ 1000000000,
TRUE ~ 1
),
crop_multiplier = case_when(
CROPDMGEXP %in% c("K", "k") ~ 1000,
CROPDMGEXP %in% c("M", "m") ~ 1000000,
CROPDMGEXP %in% c("B", "b") ~ 1000000000,
TRUE ~ 1
),
Prop_Damage = PROPDMG * prop_multiplier,
Crop_Damage = CROPDMG * crop_multiplier,
Total_Econ_Damage = Prop_Damage + Crop_Damage
)
#Aggregating
top_10_econ <- damage_stats %>%
group_by(EVTYPE) %>%
summarise(
Property = sum(Prop_Damage, na.rm = TRUE),
Crops = sum(Crop_Damage, na.rm = TRUE),
Total = sum(Total_Econ_Damage, na.rm = TRUE)
) %>%
slice_max(Total, n = 10) %>%
pivot_longer(cols = c(Property, Crops),
names_to = "Damage_Type",
values_to = "USD")
# Plotting with the Billion-dollar scale
ggplot(top_10_econ, aes(x = reorder(EVTYPE, Total), y = USD / 1e9, fill = Damage_Type)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("Property" = "#0072B2", "Crops" = "#009E73")) +
labs(
title = "Top 10 Weather Events with Greatest Economic Impact",
subtitle = "Property and Crop Damage (2001-2010)",
x = "Event Type",
y = "Total Damage (Billions of USD)",
fill = "Type of Damage"
) +
theme_minimal()
Figure 2: Top 10 Most Expensive Weather Events (2001-2010). Floods represent the highest total total cost, while Drought shows the highest cost in crop damate.
# Creating a reference table for the text
econ_stats <- damage_stats %>%
group_by(EVTYPE) %>%
summarise(
Property = sum(Prop_Damage, na.rm = TRUE),
Crops = sum(Crop_Damage, na.rm = TRUE),
Total = sum(Total_Econ_Damage, na.rm = TRUE)
)
Once again, we see a visual distinction for the weather event causing the most damage between 2001 and 2010. The economic impact is largest from Flood. The damage to property is huge at $ 126.26 billion with only a relatively tiny portion of the total ($ 129.72 billion) being damage to crops ($ 3.46 billion).
The damage to crops is larger for drought events at ($ 6.67 billion) even though it has a much smaller total ($ 7.51 billion) for the combined damages of property and crops by drought.