RepRes_Proj2

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This document presents some analyses of the NOAA storm database that can be useful for leaders and members of local communities in planning for potentially destructive weather events.

Synopsis of Analysis

The analysis and conclusions described below are directed at identifying the weather events that had the most impact to people, property and crops between 2001 and 2010. The impact to people will be computed using counts of injury and fatality recorded in the storm database. It is then the economic consequences of weather events that will be informed by the damage (in dollars) to property and crops. These will be conducted in two separate analyses, each of which will begin with a visualization that informs the specific computations regarding the type of weather event that may need more analysis to find potential actions that could avoid these costs.

Pre-processing the data, as usual, was the most time-consuming part of the analysis. All code as well as the output are below, including the deadends encountered in trying to clean up the EVTYPE entries with many typos, abbreviation and extraneous adjectives. The use of bulk replacements and individual corrections may or may not be useful on a different time frame since they are based on what is in the EVTYPE column after subsetting on the dates.

Final outcomes: Tornadoes cause the most harm to humans but Heat events have a higher mortality rate. Economically, Flooding causes the most property damage while Drought is the most damaging to crops.

Data Processing

NOAA’s Storm Database was read into R to begin the pre-processing before analysis.

storm_data <- read.csv("repdata_data_StormData.csv.bz2", quote="\"") #Be patient, it is a large file
dim(storm_data)

## [1] 902297     37

head(storm_data)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

We see that there 902297 rows and 37 columns in this file. Because this is a large dataset and there are 1745947 NA entries, it makes sense to find out more about the NAs and consider some data reduction before continuing with analysis.

# Extract the year and count records per year
   storm_data$YEAR <- year(as.Date(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))
   table(storm_data$YEAR)

## 
##  1950  1951  1952  1953  1954  1955  1956  1957  1958  1959  1960  1961  1962 
##   223   269   272   492   609  1413  1703  2184  2213  1813  1945  2246  2389 
##  1963  1964  1965  1966  1967  1968  1969  1970  1971  1972  1973  1974  1975 
##  1968  2348  2855  2388  2688  3312  2926  3215  3471  2168  4463  5386  4975 
##  1976  1977  1978  1979  1980  1981  1982  1983  1984  1985  1986  1987  1988 
##  3768  3728  3657  4279  6146  4517  7132  8322  7335  7979  8726  7367  7257 
##  1989  1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  2000  2001 
## 10410 10946 12522 13534 12607 20631 27970 32270 28680 38128 31289 34471 34962 
##  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011 
## 36293 39752 39363 39184 44034 43289 55663 45817 48161 62174

The online description of NOAA’s data collection timeline gives a very big hint about the many NAs by visualizing the addition of more Event Types on separate occasions. The 1950 beginning of this database only tracked tornados. We also see that improving technology changed Collection Sources and format for Supplemental Information can account for the growth in reporting since the mid-1990s.

To focus on a single, recent (this file ends in November 2011) decade, we first must convert the beginning date column (BGN_date) to the actual date format.

# Convert the BGN_DATE column to an actual Date format
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")

Now we can subset the data to contain only the rows from the first day of 2001 to the last day of 2010 which we can used to answer the research questions:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

decade_data <- subset(storm_data, BGN_DATE >= as.Date("2001-01-01") & BGN_DATE <= as.Date("2010-12-31"))
range(decade_data$BGN_DATE) #confirm date range

## [1] "2001-01-01" "2010-12-31"

dim(decade_data) #confirm data reduction

## [1] 426518     38

Now we have fewer than half as many rows, the same number of columns and the data file is restricted to the years 2001-2010.Under the admittedly shaky assumption that the most harmful/damaging events will occur the most frequently, lets find out which of the 48 NWS events are the most numerous in the EVTYPE column.

sort(table(decade_data$EVTYPE), decreasing = TRUE)

## 
##                           HAIL                      TSTM WIND 
##                         136697                          72855 
##              THUNDERSTORM WIND                    FLASH FLOOD 
##                          59724                          34860 
##                          FLOOD                        TORNADO 
##                          15417                          14326 
##                      HIGH WIND                     HEAVY SNOW 
##                          13434                           9167 
##                   WINTER STORM                     HEAVY RAIN 
##                           8086                           8035 
##                      LIGHTNING               MARINE TSTM WIND 
##                           7978                           6175 
##                 WINTER WEATHER       MARINE THUNDERSTORM WIND 
##                           5651                           4267 
##                   FUNNEL CLOUD                    STRONG WIND 
##                           3758                           3012 
##                     WATERSPOUT                       WILDFIRE 
##                           2140                           1957 
##                       BLIZZARD                        DROUGHT 
##                           1843                           1550 
##                   FROST/FREEZE                      ICE STORM 
##                           1221                           1210 
##             WINTER WEATHER/MIX                      DENSE FOG 
##                           1104                           1035 
##           URBAN/SML STREAM FLD        EXTREME COLD/WIND CHILL 
##                            903                            847 
##                 EXCESSIVE HEAT               LAKE-EFFECT SNOW 
##                            812                            569 
##               WILD/FOREST FIRE                           HEAT 
##                            561                            519 
##                      LANDSLIDE                      HIGH SURF 
##                            496                            488 
##                COLD/WIND CHILL                  COASTAL FLOOD 
##                            473                            467 
##                 TROPICAL STORM                    MARINE HAIL 
##                            446                            396 
##                    RIP CURRENT                 TSTM WIND/HAIL 
##                            385                            338 
##                     DUST STORM                      AVALANCHE 
##                            314                            279 
##           HEAVY SURF/HIGH SURF                            FOG 
##                            228                            200 
##          ASTRONOMICAL LOW TIDE                   RIP CURRENTS 
##                            148                            139 
##                   EXTREME COLD                           WIND 
##                            134                            125 
##                    STORM SURGE                           SNOW 
##                            120                            114 
##         ASTRONOMICAL HIGH TIDE               STORM SURGE/TIDE 
##                            103                             98 
##                     DUST DEVIL              HURRICANE/TYPHOON 
##                             95                             88 
##               MARINE HIGH WIND                  FREEZING RAIN 
##                             76                             59 
##            TROPICAL DEPRESSION                     LIGHT SNOW 
##                             59                             55 
##                          SLEET                      HURRICANE 
##                             54                             42 
##              MODERATE SNOWFALL                   FREEZING FOG 
##                             42                             41 
##               COASTAL FLOODING                 DRY MICROBURST 
##                             37                             37 
##                         FREEZE             MARINE STRONG WIND 
##                             37                             36 
##                     WINTRY MIX                   STRONG WINDS 
##                             30                             25 
##              UNSEASONABLY WARM                     HEAVY SURF 
##                             25                             23 
##            MIXED PRECIPITATION          MONTHLY PRECIPITATION 
##                             23                             22 
##                   VOLCANIC ASH                    GUSTY WINDS 
##                             20                             19 
##                  RECORD WARMTH                          FROST 
##                             18                             17 
##               UNSEASONABLY DRY              EXTREME WINDCHILL 
##                             16                             15 
##                     SMALL HAIL                    SLEET STORM 
##                             14                             12 
##                          SMOKE                LAKESHORE FLOOD 
##                             11                             10 
##                    RECORD HEAT                         SEICHE 
##                             10                             10 
##                   THUNDERSTORM                        TSUNAMI 
##                             10                             10 
##                    DENSE SMOKE                  WIND ADVISORY 
##                              9                              9 
##              UNSEASONABLY COLD                      BLACK ICE 
##                              7                              6 
##                           COLD             WINTER WEATHER MIX 
##                              6                              6 
##                            DRY       GUSTY THUNDERSTORM WINDS 
##                              5                              5 
##                    HARD FREEZE                       MUDSLIDE 
##                              5                              5 
##                   PROLONG COLD                   SNOW SHOWERS 
##                              5                              5 
##              UNSEASONABLY COOL                     WIND CHILL 
##                              5                              5 
##           ACCUMULATED SNOWFALL   COLD WIND CHILL TEMPERATURES 
##                              4                              4 
## EXTREME WINDCHILL TEMPERATURES                      ICY ROADS 
##                              4                              4 
##                 PROLONG WARMTH                   SNOW SQUALLS 
##                              4                              4 
##               UNSEASONABLY HOT                 UNUSUALLY COLD 
##                              4                              4 
##        GUSTY THUNDERSTORM WIND                      HIGH SEAS 
##                              3                              3 
##             HIGH SURF ADVISORY            LIGHT FREEZING RAIN 
##                              3                              3 
##                      MUD SLIDE              SNOW/BLOWING SNOW 
##                              3                              3 
##                     SNOW/SLEET              SNOWMELT FLOODING 
##                              3                              3 
##               VOLCANIC ASHFALL                 ABNORMALLY DRY 
##                              3                              2 
##                   BLOWING DUST          CSTL FLOODING/EROSION 
##                              2                              2 
##                 DRY CONDITIONS                 EARLY SNOWFALL 
##                              2                              2 
##               FALLING SNOW/ICE                     FIRST SNOW 
##                              2                              2 
##               FREEZING DRIZZLE                  GRADIENT WIND 
##                              2                              2 
##                     GUSTY WIND                       ICE/SNOW 
##                              2                              2 
##               LAKE EFFECT SNOW                  NON TSTM WIND 
##                              2                              2 
##            RECORD LOW RAINFALL              RED FLAG CRITERIA 
##                              2                              2 
##                     ROUGH SEAS           SEVERE THUNDERSTORMS 
##                              2                              2 
##                   SNOW DROUGHT             SNOW/FREEZING RAIN 
##                              2                              2 
##                       VERY DRY             HIGH SURF ADVISORY 
##                              2                              1 
##                     WATERSPOUT                 ABNORMALLY WET 
##                              1                              1 
##                  BEACH EROSION                     BRUSH FIRE 
##                              1                              1 
##                   COLD WEATHER                       DROWNING 
##                              1                              1 
##                      DRY SPELL                     DUST DEVEL 
##                              1                              1 
##         EXCESSIVE HEAT/DROUGHT                  EXTREMELY WET 
##                              1                              1 
##                  FUNNEL CLOUDS                GUSTY LAKE WIND 
##                              1                              1 
##                 HAZARDOUS SURF             HEAVY RAIN EFFECTS 
##                              1                              1 
##           HIGH SURF ADVISORIES                     HIGH WATER 
##                              1                              1 
##                    ICE ON ROAD               LATE SEASON SNOW 
##                              1                              1 
##                  NON-TSTM WIND                NORTHERN LIGHTS 
##                              1                              1 
##                          OTHER                     PATCHY ICE 
##                              1                              1 
##                           RAIN                    RECORD COLD 
##                              1                              1 
##                     ROGUE WAVE                  SNOW ADVISORY 
##                              1                              1 
##                  THUNDERSTORMS                 TORNADO DEBRIS 
##                              1                              1 
##            UNUSUALLY LATE SNOW                      VERY WARM 
##                              1                              1 
##                      WHIRLWIND                    WIND DAMAGE 
##                              1                              1 
##                            WND 
##                              1

This is a problematic list since it has way too many event types, many of which are recorded only a single-digit number of times.A few manual comparisons of these to the NWS Instruction 10-1605 file (Table 1, p. 6) reveals a pattern of reporting with non-standard terms. Online suggestions include correcting entries that have extraneous spaces and creating a lookup list for the 48 official event names then looking for the non-matches to deal with individually or, perhaps, in groups.

#Remove hidden spaces
decade_data$EVTYPE <- trimws(decade_data$EVTYPE)

#Create the official list of event types
official_events <- c(
  "ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD", "COASTAL FLOOD",
  "COLD/WIND CHILL", "DEBRIS FLOW", "DENSE FOG", "DENSE SMOKE", "DROUGHT",
  "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT", "EXTREME COLD/WIND CHILL",
  "FLASH FLOOD", "FLOOD", "FROST/FREEZE", "FUNNEL CLOUD", "FREEZING FOG",
  "HAIL", "HEAT", "HEAVY RAIN", "HEAVY SNOW", "HIGH SURF", "HIGH WIND",
  "HURRICANE (TYPHOON)", "ICE STORM", "LAKE-EFFECT SNOW", "LAKESHORE FLOOD",
  "LIGHTNING", "MARINE HAIL", "MARINE HIGH WIND", "MARINE STRONG WIND",
  "MARINE THUNDERSTORM WIND", "RIP CURRENT", "SEICHE", "SLEET",
  "STORM SURGE/TIDE", "STRONG WIND", "THUNDERSTORM WIND", "TORNADO",
  "TROPICAL DEPRESSION", "TROPICAL STORM", "TSUNAMI", "VOLCANIC ASH",
  "WATERSPOUT", "WILDFIRE", "WINTER STORM", "WINTER WEATHER"
   )

length(official_events)

## [1] 48

The count of official event names matches the number listed in NWS Instruction 10-1605 file (Table 1, p. 6) so we have the “lookup table” we need. Next, we’ll create the other table needed for a comparison: the one for rows that do not appear in the official_events list.

unofficial_names <- unique(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])

#length(unofficial_names) #first run gave 121 names
#head(unofficial_names, 20)

Looking at just the first 20 unofficial event names reveals some of the issues with the way the data was entered into the database. TSTM as an abbreviation for Thunderstorm, Frost and Freeze are combined in the official list, Forest is not part of the event name Wildfire. We can fix these with the mutate function and iterate until we cannot determine the proper official event.

Deadends (most output removed)

mutate_it1 <- decade_data %>%
  mutate(EVTYPE = case_when(
    EVTYPE == "TSTM WIND" ~ "THUNDERSTORM WIND",
    EVTYPE == "FROST" ~ "FROST/FREEZE",
    EVTYPE == "FREEZE" ~ "FROST/FREEZE",
    EVTYPE == "WILD/FOREST FIRE" ~ "WILDFIRE", 
    TRUE ~ EVTYPE)
    )

unofficial_names <- unique(mutate_it1$EVTYPE[!(mutate_it1$EVTYPE %in% official_events)])
#length(unofficial_names) #second run gave 117
#head(unofficial_names, 20)

Well, that is a disappointing improvement that makes me realize that the mutate function would mean looking at each individual unofficial name…117 more times. Though I also see a couple here that I will not be able to assign any official event name to without looking at the remarks in the database (i.e., Landslide, Fog and Snow). Let’s do the the obvious ones in this list of 20 then try grepl on some of the others.

mutate_it2 <- mutate_it1 %>%
  mutate(EVTYPE = case_when(
    EVTYPE == "EXTREME COLD" ~ "EXTREME COLD/WIND CHILL",
    EVTYPE == "TSTM WIND/HAIL" ~ "THUNDERSTOM WIND",
    EVTYPE == "RIP CURRENTS" ~ "RIP CURRENT",
    EVTYPE == "HEAVY SURF" ~ "HIGH SURF",
    EVTYPE == "WINTER WEATHER MIX" ~ "WINTER WEATHER",
    TRUE ~ EVTYPE)
    )

unofficial_names <- unique(mutate_it2$EVTYPE[!(mutate_it2$EVTYPE %in% official_events)])
#length(unofficial_names)
#head(unofficial_names, 30)

Still too many. Trying grepl.

mutate_it2$EVTYPE[grepl("FLASH", mutate_it2$EVTYPE)] <- "FLASH FLOOD"
mutate_it2$EVTYPE[grepl("COASTAL", mutate_it2$EVTYPE)] <- "COASTAL FLOOD"
mutate_it2$EVTYPE[grepl("LAKESHORE", mutate_it2$EVTYPE)] <- "LAKESHORE FLOOD"
mutate_it2$EVTYPE[grepl("FLOOD", mutate_it2$EVTYPE) & 
                   !(mutate_it2$EVTYPE %in% official_events)] <- "FLOOD"
unofficial_names <- unique(mutate_it2$EVTYPE[!(mutate_it2$EVTYPE %in% official_events)])
#length(unofficial_names)
#head(unofficial_names)

Starting Over

This is an insanely manual and slow process. I searched for some more advice online and will regroup with a frequency-based prioritization of which unofficial_names to tackle. Then try the recommended regex categorization approach using the stringr package. This allows for

mismatched_counts <- table(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
mismatched_counts <- sort(mismatched_counts, decreasing = TRUE)

length(mismatched_counts)

## [1] 121

Back to the 121 mismatches seen earlier. Using the stringr package for R allows for replacing the text in an entire cell with the official name as well as collecting groups of abbreviations and misspellings. Note: I used Google Gemini to read the NWS Instructions and the EVTYPE column to match up these variants with official designation then edited them after seeing the next - much shorter - list of unmatched names.

decade_data <- decade_data %>%
  mutate(EVTYPE = case_when(
    # Catch all variants of Thunderstorm Wind (TSTM, TSTM WIND, THUNDERSTORM WINDS, etc.)
    str_detect(EVTYPE, "TSTM|THU|BURST") ~ "THUNDERSTORM WIND",
    
    # Catch all variants of Marine/Coastal/Flash Flooding
    str_detect(EVTYPE, "FLASH") ~ "FLASH FLOOD",
    str_detect(EVTYPE, "COASTAL|CSTL|TIDE") ~ "COASTAL FLOOD",
    str_detect(EVTYPE, "FLOOD|FLD|STREAM") & !str_detect(EVTYPE, "FLASH|COASTAL") ~ "FLOOD",
    
    # Catch Winter/Cold variants
    str_detect(EVTYPE, "WINT|SNOW|ICE|FREEZE|FROST|HYPO") ~ "WINTER WEATHER",
    str_detect(EVTYPE, "CHILL|COLD") ~ "COLD/WIND CHILL",
    
    # Catch Heat variants
    str_detect(EVTYPE, "HEAT|WARM|HOT") ~ "HEAT",
    
    # Catch Wind variants that aren't TSTMs
    str_detect(EVTYPE, "WIND") & !str_detect(EVTYPE, "THUNDERSTORM") ~ "HIGH WIND",
    
    # Catch Fire variants
    str_detect(EVTYPE, "WILD|FOREST|GRASS|BRUSH") ~ "WILDFIRE",
    
    # Keep original for now if no match
    TRUE ~ EVTYPE
  ))

#How did that help with cleaning the data?
mismatched_counts2 <- table(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
mismatched_counts2 <- sort(mismatched_counts2, decreasing = TRUE)

unofficial_names <- unique(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
length(unofficial_names)

## [1] 49

head(unofficial_names, 50)

##  [1] "RIP CURRENTS"          "HEAVY SURF"            "FOG"                  
##  [4] "DRY"                   "STORM SURGE"           "ABNORMALLY DRY"       
##  [7] "SMALL HAIL"            "RED FLAG CRITERIA"     "MUD SLIDE"            
## [10] "RAIN"                  "WND"                   "FREEZING RAIN"        
## [13] "UNSEASONABLY DRY"      "UNSEASONABLY COOL"     "SMOKE"                
## [16] "HURRICANE"             "EXTREMELY WET"         "VERY DRY"             
## [19] "RECORD LOW RAINFALL"   "ROGUE WAVE"            "LANDSLIDE"            
## [22] "MONTHLY PRECIPITATION" "MIXED PRECIPITATION"   "DRY CONDITIONS"       
## [25] "DUST DEVEL"            "MUDSLIDE"              "OTHER"                
## [28] "DRY SPELL"             "BEACH EROSION"         "ROUGH SEAS"           
## [31] "LIGHT FREEZING RAIN"   "HEAVY RAIN EFFECTS"    "BLOWING DUST"         
## [34] "FUNNEL CLOUDS"         "HIGH WATER"            "NORTHERN LIGHTS"      
## [37] "HIGH SURF ADVISORY"    "HIGH SEAS"             "HAZARDOUS SURF"       
## [40] "ABNORMALLY WET"        "TORNADO DEBRIS"        "FREEZING DRIZZLE"     
## [43] "ICY ROADS"             "DROWNING"              "HIGH SURF ADVISORIES" 
## [46] "HURRICANE/TYPHOON"     "HEAVY SURF/HIGH SURF"  "SLEET STORM"          
## [49] "VOLCANIC ASHFALL"

That was MUCH more satisfying! There are still some unofficial names that can be batched after consultation with NWS Instruction 10-1605.

#Note: I asked Gemini to read the NWS Instruction and review the code I had written to match the above format. This took a few rounds to make this code chunk efficient. I do remember that I missed "MUD SLIDE" and the misspellings of hurricane and typhoon; it missed all of the "DROUGHT" and "WINTER WEATHER" because they were not originally specified in my prompt. Gemini added the references to the Sections, though I could have done that manually if I had thought of it to start with.

decade_data <- decade_data %>%
  mutate(EVTYPE = case_when(
    # DEBRIS FLOW (Section 7.8: Includes Landslides and Mudslides)
    str_detect(EVTYPE, "LANDSLIDE|MUD SLIDE|MUDSLIDE|DEBRIS") ~ "DEBRIS FLOW",
    
    # HURRICANE / TYPHOON (Section 7.25: Covers all named tropical systems)
    str_detect(EVTYPE, "HURR|TYPH") ~ "HURRICANE (TYPHOON)",
    
    # HIGH SURF (Section 7.23)
    str_detect(EVTYPE, "SURF|SWELL|ROUGH SEAS|HAZARDOUS SURF|BEACH") ~ "HIGH SURF",
    
    # DOUGHT (Section 7.25)
    str_detect(EVTYPE, "ABNORMALLY DRY|UNSEASONABLY DRY|VERY DRY|DRY SPELL|DRY") ~ "DROUGHT",
    
    # Winter Weather (Section 7.25)
    str_detect(EVTYPE, "MIX|FREEZING|DRIZZLE|ICY ROADS|RAIN/SNOW|UNSEASONABLY COOL") ~ "WINTER WEATHER",

    # Keep original for everything else
    TRUE ~ EVTYPE
  ))

unofficial_names <- unique(decade_data$EVTYPE[!(decade_data$EVTYPE %in% official_events)])
length(unofficial_names)

## [1] 24

head(unofficial_names, 40)

##  [1] "RIP CURRENTS"          "FOG"                   "STORM SURGE"          
##  [4] "SMALL HAIL"            "RED FLAG CRITERIA"     "RAIN"                 
##  [7] "WND"                   "SMOKE"                 "EXTREMELY WET"        
## [10] "RECORD LOW RAINFALL"   "ROGUE WAVE"            "MONTHLY PRECIPITATION"
## [13] "DUST DEVEL"            "OTHER"                 "HEAVY RAIN EFFECTS"   
## [16] "BLOWING DUST"          "FUNNEL CLOUDS"         "HIGH WATER"           
## [19] "NORTHERN LIGHTS"       "HIGH SEAS"             "ABNORMALLY WET"       
## [22] "DROWNING"              "SLEET STORM"           "VOLCANIC ASHFALL"

With only 24 unmatched names, a manual inspection of these few make the small stuff (pluralizing, typos) and extraneous verbiage much easier to see and fix. One more round of direct reassignments will leave us with a short enough list that leaving them out will have little/no effect on the analysis.

final_rename <- decade_data %>%
  mutate(EVTYPE = case_when(
    EVTYPE == "RIP CURRENTS" ~ "RIP CURRENT",
    EVTYPE == "HEAVY RAIN EFFECTS" ~ "HEAVY RAIN",
    EVTYPE == "SLEET STORM" ~ "SLEET",
    EVTYPE == "DUST DEVEL" ~ "DUST DEVIL", 
    EVTYPE == "VOLCANIC ASHFALL" ~ "VOLCANIC ASH",
    EVTYPE == "STORM SURGE" ~ "STORM SURGE/TIDE",
    EVTYPE == "WND" ~ "WIND",
    EVTYPE == "SMALL HAIL" ~ "HAIL",
    EVTYPE == "FUNNEL CLOUDS" ~ "FUNNEL CLOUD",
    TRUE ~ EVTYPE)
    )

unofficial_names <- unique(final_rename$EVTYPE[!(final_rename$EVTYPE %in% official_events)])

table(unofficial_names)

## unofficial_names
##        ABNORMALLY WET          BLOWING DUST              DROWNING 
##                     1                     1                     1 
##         EXTREMELY WET                   FOG             HIGH SEAS 
##                     1                     1                     1 
##            HIGH WATER MONTHLY PRECIPITATION       NORTHERN LIGHTS 
##                     1                     1                     1 
##                 OTHER                  RAIN   RECORD LOW RAINFALL 
##                     1                     1                     1 
##     RED FLAG CRITERIA            ROGUE WAVE                 SMOKE 
##                     1                     1                     1 
##                  WIND 
##                     1

Digging into just 16 records to determine if items like “RAIN” and “WIND” fit into official events is not that much work considering where we started. However, removing 16 records out of ’r length(decade_data)` seems inconsequential. Turning to Gemini could also do the digging to the possibility of adding a few to official categorizations is another option here. I will choose to filter them out since this analysis is only practice with R, not actual public safety.

clean_data <- final_rename %>%
  filter(!EVTYPE %in% c("ABNORMALLY WET", "BLOWING DUST", "DROWNING", "EXTREMELY WET", "FOG", "HIGH SEAS", "HIGH WATER", "MONTHLY PRECIPITATION", "NORTHERN LIGHTS", "OTHER", "RAIN", "RECORD LOW RAINFALL", "RED FLAG CRITERIA", "ROGUE WAVE", "SMOKE", "WIND"))

Results

With the data now processed, the research questions are ready to be answered. In each case, we’ll start with a plot. Rather than plotting all 48 types when the questions are asking about “most” or “greatest”, we’ll stick to looking at just the top ten. This will make it easier to see the event names and determine the highest counts.

Q1: Across the United States, which types of events are most harmful with respect to population health?

# Aggregate
top_10_health <- clean_data %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries = sum(INJURIES, na.rm = TRUE),
    Total = Fatalities + Injuries
  ) %>%
  slice_max(Total, n = 10) %>%
  pivot_longer(cols = c(Fatalities, Injuries), 
               names_to = "Harm_Type", 
               values_to = "Count")

# Plot
ggplot(top_10_health, aes(x = reorder(EVTYPE, Total), y = Count, fill = Harm_Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("Fatalities" = "#D55E00", "Injuries" = "#56B4E9")) +
  labs(
    title = "Top 10 Most Harmful Weather Events in the US",
    subtitle = "Stacked by impact type (Fatalities vs. Injuries)",
    x = "Weather Event Type",
    y = "Number of People Affected",
    fill = "Impact Type"
  ) +
  theme_minimal()

Figure 1: Top 10 Most Harmful Weather Events (2001-2010). Tornadoes represent the highest total impact, while Heat shows a higher mortality rate.

# Creating a reference table for the text
health_stats <- clean_data %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries = sum(INJURIES, na.rm = TRUE),
    Total = Fatalities + Injuries
  )

It is abundantly clear that tornadoes have the greatest impact on human lives in the 2001-2010 timeframe. The combined total from Tornadoes is 8733 that includes 565 fatalities and 8168 injuries. It is also notable that Heat had a larger number of fatalities (987) even with a much smaller total (4702).

Q2: Across the United States, which types of events have the greatest economic consequences?

# Mapping exponents to their numeric values
damage_stats <- clean_data %>%
  mutate(
    prop_multiplier = case_when(
      PROPDMGEXP %in% c("K", "k") ~ 1000,
      PROPDMGEXP %in% c("M", "m") ~ 1000000,
      PROPDMGEXP %in% c("B", "b") ~ 1000000000,
      TRUE ~ 1
    ),
    crop_multiplier = case_when(
      CROPDMGEXP %in% c("K", "k") ~ 1000,
      CROPDMGEXP %in% c("M", "m") ~ 1000000,
      CROPDMGEXP %in% c("B", "b") ~ 1000000000,
      TRUE ~ 1
    ),
    Prop_Damage = PROPDMG * prop_multiplier,
    Crop_Damage = CROPDMG * crop_multiplier,
    Total_Econ_Damage = Prop_Damage + Crop_Damage
  )

#Aggregating
top_10_econ <- damage_stats %>%
  group_by(EVTYPE) %>%
  summarise(
    Property = sum(Prop_Damage, na.rm = TRUE),
    Crops = sum(Crop_Damage, na.rm = TRUE),
    Total = sum(Total_Econ_Damage, na.rm = TRUE)
  ) %>%
  slice_max(Total, n = 10) %>%
  pivot_longer(cols = c(Property, Crops), 
               names_to = "Damage_Type", 
               values_to = "USD")

# Plotting with the Billion-dollar scale
ggplot(top_10_econ, aes(x = reorder(EVTYPE, Total), y = USD / 1e9, fill = Damage_Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("Property" = "#0072B2", "Crops" = "#009E73")) +
  labs(
    title = "Top 10 Weather Events with Greatest Economic Impact",
    subtitle = "Property and Crop Damage (2001-2010)",
    x = "Event Type",
    y = "Total Damage (Billions of USD)",
    fill = "Type of Damage"
  ) +
  theme_minimal()

Figure 2: Top 10 Most Expensive Weather Events (2001-2010). Floods represent the highest total total cost, while Drought shows the highest cost in crop damate.

# Creating a reference table for the text
econ_stats <- damage_stats %>%
  group_by(EVTYPE) %>%
  summarise(
    Property = sum(Prop_Damage, na.rm = TRUE),
    Crops = sum(Crop_Damage, na.rm = TRUE),
    Total = sum(Total_Econ_Damage, na.rm = TRUE)
  )

Once again, we see a visual distinction for the weather event causing the most damage between 2001 and 2010. The economic impact is largest from Flood. The damage to property is huge at $ 126.26 billion with only a relatively tiny portion of the total ($ 129.72 billion) being damage to crops ($ 3.46 billion).

The damage to crops is larger for drought events at ($ 6.67 billion) even though it has a much smaller total ($ 7.51 billion) for the combined damages of property and crops by drought.