Synopsis

This analysis uses the raw NOAA Storm Data CSV to identify which event types (EVTYPE) are most harmful to population health (injuries + fatalities) and which have the greatest economic consequences (property + crop damage). The analysis downloads the raw compressed CSV, performs conservative cleaning of event-type strings, translates damage exponent codes into numeric multipliers, and aggregates impacts by event type. Results are presented as ranked tables and two figures showing the top-10 event types by human impact and by economic cost. All code is shown to ensure reproducibility; heavy steps are cached where indicated. The output is suitable for a government or municipal manager who must prioritise resources for different severe-weather events.

Data Processing

{r setup, echo=TRUE, message=FALSE} # packages library(dplyr) library(ggplot2) library(readr) library(stringr) library(scales)

Downloading and reading the raw data

The analysis starts from the raw CSV file (compressed as .bz2). We download the file directly and read it into R. This chunk is cached to avoid re-downloading during iterative editing.

```{r get-data, echo=TRUE, cache=TRUE} # URL for the raw StormData CSV (Coursera / Johns Hopkins dataset mirror) url <- “https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

if (!file.exists(“StormData.csv.bz2”)) { download.file(url, destfile = “StormData.csv.bz2”, mode = “wb”) }

read directly from the compressed file

storm <- read.csv(“StormData.csv.bz2”, stringsAsFactors = FALSE)

quick look

dim(storm) str(storm[c(‘EVTYPE’,‘FATALITIES’,‘INJURIES’,‘PROPDMG’,‘PROPDMGEXP’,‘CROPDMG’,‘CROPDMGEXP’)])



## Cleaning and feature engineering

We will:

- Standardise EVTYPE to uppercase and trim whitespace.
- Create a `health` metric = FATALITIES + INJURIES.
- Convert PROPDMG/CROPDMG with their exponent fields into numeric dollars.

```{r clean, echo=TRUE}
# normalize EVTYPE
storm <- storm %>%
  mutate(EVTYPE = str_trim(str_to_upper(EVTYPE)),
         HEALTH = FATALITIES + INJURIES,
         PROPDMG = as.numeric(PROPDMG),
         CROPDMG = as.numeric(CROPDMG),
         PROPDMGEXP = str_trim(toupper(as.character(PROPDMGEXP))),
         CROPDMGEXP = str_trim(toupper(as.character(CROPDMGEXP))))

# function to translate exponent codes to multipliers
exp_to_mult <- function(e) {
  e <- ifelse(is.na(e), "", e)
  e <- str_trim(toupper(e))
  sapply(e, function(x) {
    if (x %in% c("", "-", "0")) return(1)
    if (x == "K") return(1e3)
    if (x == "M") return(1e6)
    if (x == "B") return(1e9)
    if (x == "H") return(1e2)
    # if single digit numeric exponent like '2' => 10^2
    if (grepl("^[0-9]+$", x)) return(10^as.numeric(x))
    # common nuisances: '+' or '?' or other letters -> treat as 1
    return(1)
  })
}

# apply multipliers
storm <- storm %>%
  mutate(PROP_MULT = exp_to_mult(PROPDMGEXP),
         CROP_MULT = exp_to_mult(CROPDMGEXP),
         PROP_DMG_USD = PROPDMG * PROP_MULT,
         CROP_DMG_USD = CROPDMG * CROP_MULT,
         ECONOMIC_DMG = PROP_DMG_USD + CROP_DMG_USD)

# quick checks
summary(storm$HEALTH)
summary(storm$ECONOMIC_DMG)

Results

We aggregate by EVTYPE and rank event types by total human impact and by economic damage. Results present the top 10 of each.

```{r summarize, echo=TRUE} # aggregate by_event <- storm %>% group_by(EVTYPE) %>% summarise(TOTAL_FATALITIES = sum(FATALITIES, na.rm=TRUE), TOTAL_INJURIES = sum(INJURIES, na.rm=TRUE), TOTAL_HEALTH = sum(HEALTH, na.rm=TRUE), TOTAL_PROP = sum(PROP_DMG_USD, na.rm=TRUE), TOTAL_CROP = sum(CROP_DMG_USD, na.rm=TRUE), TOTAL_ECON = sum(ECONOMIC_DMG, na.rm=TRUE)) %>% ungroup()

top 10 by health

top_health <- by_event %>% arrange(desc(TOTAL_HEALTH)) %>% slice_head(n=10) # top 10 by economic cost top_econ <- by_event %>% arrange(desc(TOTAL_ECON)) %>% slice_head(n=10)

show tables

knitr::kable(top_health, caption = “Top 10 event types by human impact (injuries + fatalities)”)

knitr::kable(top_econ, caption = “Top 10 event types by economic impact (property + crop damage in USD)”)



## Figures

Two figures: (1) bar chart of top-10 event types by human impact, (2) bar chart of top-10 event types by economic cost. These count as two figures and satisfy the requirement of at least one figure and no more than three.

```{r fig-health, echo=TRUE, fig.width=8, fig.height=5}
# Figure 1: human impact
p1 <- ggplot(top_health, aes(x = reorder(EVTYPE, TOTAL_HEALTH), y = TOTAL_HEALTH)) +
  geom_col() +
  coord_flip() +
  labs(title = "Top 10 Event Types by Human Impact",
       x = "Event Type",
       y = "Total people affected (injuries + fatalities)") +
  scale_y_continuous(labels = comma)

print(p1)

```{r fig-econ, echo=TRUE, fig.width=8, fig.height=5} # Figure 2: economic impact p2 <- ggplot(top_econ, aes(x = reorder(EVTYPE, TOTAL_ECON), y = TOTAL_ECON)) + geom_col() + coord_flip() + labs(title = “Top 10 Event Types by Economic Cost (USD)”, x = “Event Type”, y = “Total damage (USD)”) + scale_y_continuous(labels = scales::dollar_format(prefix = “$”, scale = 1, big.mark = “,”))

print(p2)



# Short interpretation of results

```{r interpret, echo=TRUE}
# Print top 5 concise interpretation elements
top_health %>% select(EVTYPE, TOTAL_HEALTH) %>% slice_head(n=5)

top_econ %>% select(EVTYPE, TOTAL_ECON) %>% slice_head(n=5)

In this analysis the top event types for human health and economic cost can be seen in the tables and plots above. If you plan to present this to a municipal manager, highlight both the frequency and per-event severity: some event types cause many small injuries across many events while others cause extreme damage in few events.

Reproducibility and publishing

Appendix: notes on EVTYPE quality

EVTYPE values in the NOAA Storm Database are known to be messy (typos, variants like “HURRICANE/TYPHOON” vs “HURRICANE” etc.). For programmatic prioritisation you may wish to apply a second-stage mapping that groups synonyms (for example grouping all hurricane/tropical cyclone variants together, or grouping “FLASH FLOOD” and “FLOOD” depending on your policy). That more advanced cleaning was not performed here in order to keep the analysis fully reproducible and minimal.


End of analysis.