Synopsis

Severe weather events constitute a significant source of risk to both public health and economic stability, frequently resulting in fatalities, injuries, and substantial property damage. Given the magnitude of these impacts, it is essential to systematically evaluate which types of events pose the greatest threats. This study undertakes an analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, a comprehensive record of major storms and weather events across the United States. The database provides detailed information on the timing, geographic distribution, and consequences of such events, including estimates of human health outcomes and economic losses. The central objective of this analysis is to address two key research questions: which event types are most harmful to population health, and which have the greatest economic consequences nationwide. To achieve this, the data set is analyzed by using the R programming language and relevant packages.

Data Processing

Our analysis begins with loading the data set and conducting an initial examination of its contents.

data <- read.csv("repdata_data_StormData.csv.bz2", header=TRUE, 
            na.strings=c("NA", ""), stringsAsFactors = FALSE)
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The variables PROPDMGEXP and CROPDMGEXP are utilized to calculate total property and crop damage. Following the guidelines provided in the corresponding documentation these values are combined to derive the totaldmg variable, which represents the overall economic impact.

data <- data %>%
    mutate(totalpropdmg = ifelse(PROPDMGEXP == "H", PROPDMG * 100,
        ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
        ifelse(PROPDMGEXP == "M", PROPDMG * 1e6,
        ifelse(PROPDMGEXP == "B", PROPDMG * 1e9, NA))))) %>%
    mutate(totalcropdmg = ifelse(PROPDMGEXP == "H", PROPDMG * 100,
        ifelse(CROPDMGEXP == "K", CROPDMG * 1e3,
        ifelse(CROPDMGEXP == "M", CROPDMG * 1e6,
        ifelse(CROPDMGEXP == "B", CROPDMG * 1e9, NA))))) %>%
    mutate(totaldmg=totalpropdmg + totalcropdmg)  

Results

Which types of events are most harmful?

To assess which event types pose the greatest threat to population health, we examine their impact in terms of fatalities and injuries. Specifically, we consider the top five event types most frequently associated with fatalities and, separately, the five event types most frequently associated with injuries.

top_n = 5
top_fatalities <- data %>%
    group_by(EVTYPE) %>%
    summarize(fatalities=sum(FATALITIES, na.rm = TRUE)) %>%
    arrange(desc(fatalities)) %>%
    head(top_n)

g1 <- ggplot(top_fatalities, aes(x=EVTYPE, y=fatalities)) +
    geom_bar(stat="identity") +
    xlab("Event Type") + ylab("Total number of fatalities") +
    scale_x_discrete(limits = top_fatalities$EVTYPE) + 
    theme_classic() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

top_injuries <- data %>%
    group_by(EVTYPE) %>%
    summarize(injuries=sum(INJURIES, na.rm = TRUE)) %>%
    arrange(desc(injuries)) %>%
    head(top_n)

g2 <- ggplot(top_injuries, aes(x=EVTYPE, y=injuries)) +
    geom_bar(stat="identity") +
    xlab("Event Type") + ylab("Total number of injuries") +
    scale_x_discrete(limits = top_injuries$EVTYPE) + 
    theme_classic() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

grid.arrange(g1, g2, ncol=2)

The results indicate that tornadoes account for the largest share of fatalities, followed by excessive heat and flash floods. A similar analysis is conducted for injuries. In this case, tornadoes again emerge as the most harmful event type, with thunderstorm winds ranking second and floods ranking third.

Property Damage

To evaluate which event types result in significant economic impacts, their effects are assessed in terms of total property and crop damage. The analysis focuses specifically on the five event types most frequently associated with such losses.

top_n = 5
top_dmg <- data %>%
    group_by(EVTYPE) %>%
    summarize(totaldmg=sum(totaldmg, na.rm = TRUE)) %>%
    arrange(desc(totaldmg)) %>%
    head(top_n)

g <- ggplot(top_dmg, aes(x=EVTYPE, y=totaldmg)) +
    geom_bar(stat="identity") +
    xlab("Event Type") + ylab("Total damage") +
    scale_x_discrete(limits = top_dmg$EVTYPE) + 
    theme_classic() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
print(g)

Regarding economic consequences, the findings reveal that floods cause the greatest financial losses, followed by hurricanes and typhoons, with tornadoes occupying the third position.

Conclusion

Our analysis demonstrates that tornadoes represent the most significant threat to population health, causing the highest number of both fatalities and injuries. Other severe events, such as excessive heat, flash floods, and thunderstorm winds, also contribute substantially to adverse health outcomes. From an economic perspective, floods, hurricanes, and typhoons are the primary drivers of financial losses, with tornadoes ranking third. These findings highlight the differential impacts of various natural hazards on human health and economic stability, emphasizing the need for targeted risk mitigation and preparedness strategies.