Synopsis

Across the United States, severe weather events have significant impacts on both population health and the economy. This analysis uses the NOAA Storm Database to identify which types of events are most harmful and costly. The study processes the raw dataset, cleans inconsistent values, and summarizes the data to highlight the events that result in the highest fatalities, injuries, property damage, and crop damage. Using plots and tables, we visualize these impacts to provide insights for government or municipal managers. Our results indicate patterns of risk by event type, showing which events pose the greatest threat to human life and economic assets. This analysis serves as a data-driven foundation for prioritizing preparedness resources.

Data Processing

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(readr)

storm_data <- read.csv("/Users/vivaannanda/Documents/reproducible_research/repdata-data-StormData.csv.bz2")

str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
summary(storm_data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES           INJURIES        
##  Min.   :0.00     Min.   :    0.0   Min.   :  0.00000   Min.   :   0.0000  
##  1st Qu.:0.00     1st Qu.:    0.0   1st Qu.:  0.00000   1st Qu.:   0.0000  
##  Median :1.00     Median :   50.0   Median :  0.00000   Median :   0.0000  
##  Mean   :0.91     Mean   :   46.9   Mean   :  0.01678   Mean   :   0.1557  
##  3rd Qu.:1.00     3rd Qu.:   75.0   3rd Qu.:  0.00000   3rd Qu.:   0.0000  
##  Max.   :5.00     Max.   :22000.0   Max.   :583.00000   Max.   :1700.0000  
##  NA's   :843563                                                            
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
storm_data <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

exp_to_multiplier <- function(exp) {
  case_when(
    exp %in% c("K", "k") ~ 1e3,
    exp %in% c("M", "m") ~ 1e6,
    exp %in% c("B", "b") ~ 1e9,
    TRUE ~ 1
  )
}

storm_data <- storm_data %>%
  mutate(
    PROPDMGNUM = PROPDMG * exp_to_multiplier(PROPDMGEXP),
    CROPDMGNUM = CROPDMG * exp_to_multiplier(CROPDMGEXP)
  )

storm_data <- storm_data %>%
  mutate(TOTALDMG = PROPDMGNUM + CROPDMGNUM)

In this section, we load the raw NOAA Storm Database CSV file directly into R and explore its structure. The dataset contains a variety of variables describing severe weather events, including event type (EVTYPE), human impacts (FATALITIES and INJURIES), and economic impacts (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP). We first select only the relevant columns needed for our analysis. Since property and crop damage values use exponent codes (e.g., “K” for thousands, “M” for millions, “B” for billions), we create a function to convert these codes into numeric multipliers. We then calculate the actual property and crop damage amounts and combine them into a single variable representing total economic damage (TOTALDMG). This preprocessing ensures that all subsequent analyses on population health and economic consequences are based on clean, numeric data directly derived from the raw dataset.

Results

1. Events Most Harmful to Population Health

health_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  mutate(total_harm = total_fatalities + total_injuries) %>%
  arrange(desc(total_harm))

top_health_events <- head(health_impact, 10)
top_health_events
## # A tibble: 10 × 4
##    EVTYPE            total_fatalities total_injuries total_harm
##    <chr>                        <dbl>          <dbl>      <dbl>
##  1 TORNADO                       5633          91346      96979
##  2 EXCESSIVE HEAT                1903           6525       8428
##  3 TSTM WIND                      504           6957       7461
##  4 FLOOD                          470           6789       7259
##  5 LIGHTNING                      816           5230       6046
##  6 HEAT                           937           2100       3037
##  7 FLASH FLOOD                    978           1777       2755
##  8 ICE STORM                       89           1975       2064
##  9 THUNDERSTORM WIND              133           1488       1621
## 10 WINTER STORM                   206           1321       1527
ggplot(top_health_events, aes(x = reorder(EVTYPE, -total_harm), y = total_harm)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Top 10 Weather Events Harmful to Population Health",
       x = "Event Type",
       y = "Total Harm (Fatalities + Injuries)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

To evaluate the impact on human life, we group the data by event type (EVTYPE) and summarize the total fatalities and injuries for each type. We then calculate the combined total harm as the sum of fatalities and injuries. Sorting these totals in descending order allows us to identify which types of severe weather events have been most harmful to populations in the United States. The top ten events are visualized using a bar chart for clear comparison.

2. Events with Greatest Economic Consequences

economic_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(total_damage = sum(TOTALDMG, na.rm = TRUE)) %>%
  arrange(desc(total_damage))

top_econ_events <- head(economic_impact, 10)
top_econ_events
## # A tibble: 10 × 2
##    EVTYPE             total_damage
##    <chr>                     <dbl>
##  1 FLOOD             150319678257 
##  2 HURRICANE/TYPHOON  71913712800 
##  3 TORNADO            57352114049.
##  4 STORM SURGE        43323541000 
##  5 HAIL               18758221521.
##  6 FLASH FLOOD        17562129167.
##  7 DROUGHT            15018672000 
##  8 HURRICANE          14610229010 
##  9 RIVER FLOOD        10148404500 
## 10 ICE STORM           8967041360
ggplot(top_econ_events, aes(x = reorder(EVTYPE, -total_damage), y = total_damage/1e9)) +
  geom_bar(stat = "identity", fill = "tomato") +
  labs(title = "Top 10 Weather Events with Greatest Economic Damage",
       x = "Event Type",
       y = "Total Damage (Billion $)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

To assess economic impacts, we group the data by event type and sum the total economic damage (TOTALDMG) across all events of each type. Sorting these sums in descending order highlights the events with the greatest economic consequences. The top ten most costly events are plotted in a bar chart, showing damage in billions of dollars to provide an intuitive understanding of the financial impact.