Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which types of severe weather events are most harmful with respect to population health and have the greatest economic consequences. The database contains data from 1950-2011 tracking characteristics of major storms and weather events including fatalities, injuries, property damage, and crop damage. Our analysis reveals that tornadoes cause the most fatalities and injuries, while floods cause the most property damage and droughts cause the most crop damage.

Data Processing

Loading and Initial Examination of Data

# Load the storm data file
data <- read.csv("repdata_data_StormData.csv.bz2")

# Check the structure of the data
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
# Check for missing values in key variables
summary(data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")])
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##     CROPDMG       
##  Min.   :  0.000  
##  1st Qu.:  0.000  
##  Median :  0.000  
##  Mean   :  1.527  
##  3rd Qu.:  0.000  
##  Max.   :990.000
# Examine the first few rows
head(data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "PROPDMGEXP", "CROPDMGEXP")])
##    EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PROPDMGEXP CROPDMGEXP
## 1 TORNADO          0       15    25.0       0          K           
## 2 TORNADO          0        0     2.5       0          K           
## 3 TORNADO          0        2    25.0       0          K           
## 4 TORNADO          0        2     2.5       0          K           
## 5 TORNADO          0        2     2.5       0          K           
## 6 TORNADO          0        6     2.5       0          K

Data Cleaning and Preparation

# Function to convert damage values based on exponent
convert_damage <- function(dmg, exp) {
  exp <- toupper(exp)
  multiplier <- ifelse(exp == "K", 1000,
                ifelse(exp == "M", 1000000, 
                ifelse(exp == "B", 1000000000,
                ifelse(exp %in% c("H", "2"), 100,
                ifelse(exp %in% c("3", "4", "5", "6", "7", "8"), 10^as.numeric(exp),
                1)))))
  dmg * multiplier
}

# Calculate actual property and crop damage values
data$PROPDMG_ACTUAL <- convert_damage(data$PROPDMG, data$PROPDMGEXP)
## Warning in ifelse(exp %in% c("3", "4", "5", "6", "7", "8"),
## 10^as.numeric(exp), : NAs introduced by coercion
data$CROPDMG_ACTUAL <- convert_damage(data$CROPDMG, data$CROPDMGEXP)

# Check for missing values in processed data
cat("Missing values in key variables:\n")
## Missing values in key variables:
cat("FATALITIES:", sum(is.na(data$FATALITIES)), "\n")
## FATALITIES: 0
cat("INJURIES:", sum(is.na(data$INJURIES)), "\n")
## INJURIES: 0
cat("PROPDMG_ACTUAL:", sum(is.na(data$PROPDMG_ACTUAL)), "\n")
## PROPDMG_ACTUAL: 0
cat("CROPDMG_ACTUAL:", sum(is.na(data$CROPDMG_ACTUAL)), "\n")
## CROPDMG_ACTUAL: 0

Results

Question 1: Which types of events are most harmful to population health?

# Aggregate fatalities and injuries by event type
health_impact <- data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE),
    total_health_impact = sum(FATALITIES, na.rm = TRUE) + sum(INJURIES, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  arrange(desc(total_health_impact))

# Top 10 most harmful events for population health
top_health <- head(health_impact, 10)
print(top_health)
## # A tibble: 10 x 4
##    EVTYPE            total_fatalities total_injuries total_health_impact
##    <chr>                        <dbl>          <dbl>               <dbl>
##  1 TORNADO                       5633          91346               96979
##  2 EXCESSIVE HEAT                1903           6525                8428
##  3 TSTM WIND                      504           6957                7461
##  4 FLOOD                          470           6789                7259
##  5 LIGHTNING                      816           5230                6046
##  6 HEAT                           937           2100                3037
##  7 FLASH FLOOD                    978           1777                2755
##  8 ICE STORM                       89           1975                2064
##  9 THUNDERSTORM WIND              133           1488                1621
## 10 WINTER STORM                   206           1321                1527
# Separate analysis for fatalities and injuries
top_fatalities <- health_impact %>%
  arrange(desc(total_fatalities)) %>%
  head(10)

top_injuries <- health_impact %>%
  arrange(desc(total_injuries)) %>%
  head(10)

cat("\nTop 10 events by fatalities:\n")
## 
## Top 10 events by fatalities:
print(top_fatalities[c("EVTYPE", "total_fatalities")])
## # A tibble: 10 x 2
##    EVTYPE         total_fatalities
##    <chr>                     <dbl>
##  1 TORNADO                    5633
##  2 EXCESSIVE HEAT             1903
##  3 FLASH FLOOD                 978
##  4 HEAT                        937
##  5 LIGHTNING                   816
##  6 TSTM WIND                   504
##  7 FLOOD                       470
##  8 RIP CURRENT                 368
##  9 HIGH WIND                   248
## 10 AVALANCHE                   224
cat("\nTop 10 events by injuries:\n")
## 
## Top 10 events by injuries:
print(top_injuries[c("EVTYPE", "total_injuries")])
## # A tibble: 10 x 2
##    EVTYPE            total_injuries
##    <chr>                      <dbl>
##  1 TORNADO                    91346
##  2 TSTM WIND                   6957
##  3 FLOOD                       6789
##  4 EXCESSIVE HEAT              6525
##  5 LIGHTNING                   5230
##  6 HEAT                        2100
##  7 ICE STORM                   1975
##  8 FLASH FLOOD                 1777
##  9 THUNDERSTORM WIND           1488
## 10 HAIL                        1361

Question 2: Which types of events have the greatest economic consequences?

# Aggregate property and crop damage by event type
economic_impact <- data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_property_damage = sum(PROPDMG_ACTUAL, na.rm = TRUE),
    total_crop_damage = sum(CROPDMG_ACTUAL, na.rm = TRUE),
    total_economic_damage = sum(PROPDMG_ACTUAL, na.rm = TRUE) + sum(CROPDMG_ACTUAL, na.rm = TRUE),
    .groups = 'drop'
  ) %>%
  arrange(desc(total_economic_damage))

# Top 10 most economically damaging events
top_economic <- head(economic_impact, 10)
print(top_economic)
## # A tibble: 10 x 4
##    EVTYPE           total_property_damage total_crop_damage total_economic_dama…
##    <chr>                            <dbl>             <dbl>                <dbl>
##  1 FLOOD                    144657709807         5661968450        150319678257 
##  2 HURRICANE/TYPHO…          69305840000         2607872800         71913712800 
##  3 TORNADO                   56947380676.         414953270         57362333946.
##  4 STORM SURGE               43323536000               5000         43323541000 
##  5 HAIL                      15735267513.        3025954473         18761221986.
##  6 FLASH FLOOD               16822673978.        1421317100         18243991078.
##  7 DROUGHT                    1046106000        13972566000         15018672000 
##  8 HURRICANE                 11868319010         2741910000         14610229010 
##  9 RIVER FLOOD                5118945500         5029459000         10148404500 
## 10 ICE STORM                  3944927860         5022113500          8967041360
# Separate analysis for property and crop damage
top_property <- economic_impact %>%
  arrange(desc(total_property_damage)) %>%
  head(10)

top_crop <- economic_impact %>%
  arrange(desc(total_crop_damage)) %>%
  head(10)

cat("\nTop 10 events by property damage (in billions):\n")
## 
## Top 10 events by property damage (in billions):
top_property$total_property_damage <- top_property$total_property_damage / 1e9
print(top_property[c("EVTYPE", "total_property_damage")])
## # A tibble: 10 x 2
##    EVTYPE            total_property_damage
##    <chr>                             <dbl>
##  1 FLOOD                            145.  
##  2 HURRICANE/TYPHOON                 69.3 
##  3 TORNADO                           56.9 
##  4 STORM SURGE                       43.3 
##  5 FLASH FLOOD                       16.8 
##  6 HAIL                              15.7 
##  7 HURRICANE                         11.9 
##  8 TROPICAL STORM                     7.70
##  9 WINTER STORM                       6.69
## 10 HIGH WIND                          5.27
cat("\nTop 10 events by crop damage (in billions):\n")
## 
## Top 10 events by crop damage (in billions):
top_crop$total_crop_damage <- top_crop$total_crop_damage / 1e9
print(top_crop[c("EVTYPE", "total_crop_damage")])
## # A tibble: 10 x 2
##    EVTYPE            total_crop_damage
##    <chr>                         <dbl>
##  1 DROUGHT                       14.0 
##  2 FLOOD                          5.66
##  3 RIVER FLOOD                    5.03
##  4 ICE STORM                      5.02
##  5 HAIL                           3.03
##  6 HURRICANE                      2.74
##  7 HURRICANE/TYPHOON              2.61
##  8 FLASH FLOOD                    1.42
##  9 EXTREME COLD                   1.29
## 10 FROST/FREEZE                   1.09

Visualization of Results

# Plot for health impact
health_plot_data <- head(health_impact, 10) %>%
  mutate(EVTYPE = reorder(EVTYPE, total_health_impact))

p1 <- ggplot(health_plot_data, aes(x = EVTYPE, y = total_health_impact)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Weather Events Most Harmful to Population Health",
       x = "Event Type", 
       y = "Total Health Impact (Fatalities + Injuries)") +
  theme_minimal() +
  theme(plot.title = element_text(size = 12, hjust = 0.5))

print(p1)

Figure 1: Top 10 Weather Events Most Harmful to Population Health

# Plot for economic impact  
economic_plot_data <- head(economic_impact, 10) %>%
  mutate(
    EVTYPE = reorder(EVTYPE, total_economic_damage),
    total_economic_damage_billions = total_economic_damage / 1e9
  )

p2 <- ggplot(economic_plot_data, aes(x = EVTYPE, y = total_economic_damage_billions)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  coord_flip() +
  labs(title = "Top 10 Weather Events with Greatest Economic Impact",
       x = "Event Type", 
       y = "Total Economic Damage (Billions USD)") +
  theme_minimal() +
  theme(plot.title = element_text(size = 12, hjust = 0.5))

print(p2)

Figure 2: Top 10 Weather Events with Greatest Economic Impact

Conclusions

Based on our analysis of the NOAA storm database from 1950-2011:

Population Health Impact: - Tornadoes are by far the most harmful weather events to population health, causing the highest number of both fatalities and injuries combined. - Excessive Heat ranks second in fatalities, while TSTM Wind (thunderstorm wind) ranks second in injuries. - The top weather events for health impact are: Tornado, Excessive Heat, Flash Flood, Heat, Lightning, TSTM Wind, Flood, Rip Current, High Wind, and Avalanche.

Economic Consequences: - Floods cause the most property damage overall, followed by hurricanes/typhoons and tornadoes. - Droughts cause the most crop damage, followed by floods and river floods. - When combining property and crop damage, floods have the greatest total economic impact.

Recommendations: Public health and emergency management resources should prioritize: 1. Tornado preparedness and warning systems given their extreme health impact 2. Heat wave prevention programs, especially for vulnerable populations 3. Flood mitigation and insurance programs given the massive economic consequences 4. Drought preparedness in agricultural regions to minimize crop losses

These findings emphasize the need for comprehensive weather event preparedness that addresses both the human health and economic dimensions of severe weather impacts.