Synopsis

This report analyzes the U.S. National Weather Service Storm Data to identify the most harmful event types with respect to population health and economic consequences. Data was processed to clean up leading and trailing spaces and to categorize the various Event Types into a few of them to aid analysis.Through this analysis we are able to infer that tornadoes cause the highest number of casualties (both fatalties and injuries), while floods and hurricanes have the largest economic impact.

# download the file
download_link<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

filename<-"StormData.csv.bz2"

if(!file.exists(filename))
{
        download.file(download_link,filename,method = "auto")
}

storm_data<-read.csv(bzfile(filename))
        
library(dplyr)
library(ggplot2)

# working on a reduced data set
working_dataset<-storm_data[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Data processing

Here we’ve processed data to remove leading and trailing spaces and to categorize Event Types into just a few categories

# code to clean up the working dataset
working_dataset$EVTYPE<-toupper(trimws(working_dataset$EVTYPE))

working_dataset <-working_dataset %>%
        mutate(event_category= case_when(
                grepl("TORNADO", EVTYPE)~"TORNADO",
                grepl("TSTM|THUNDERSTORM", EVTYPE) ~ "THUNDERSTORM",
                grepl("FLASH FLOOD", EVTYPE) ~ "FLASH FLOOD",
                grepl("FLOOD", EVTYPE) ~ "FLOOD",
                grepl("EXCESSIVE HEAT|EXTREME HEAT|RECORD HEAT|HEAT WAVE", EVTYPE) ~ "EXCESSIVE HEAT",
                grepl("^HEAT$", EVTYPE) ~ "HEAT",
                grepl("EXTREME COLD|RECORD COLD|WIND CHILL", EVTYPE) ~ "EXTREME COLD/WIND CHILL",
                grepl("COLD", EVTYPE) ~ "COLD/WIND CHILL",
                grepl("WINTER STORM", EVTYPE) ~ "WINTER STORM",
                grepl("HEAVY SNOW", EVTYPE) ~ "HEAVY SNOW",
                grepl("SNOW", EVTYPE) ~ "WINTER WEATHER",
                grepl("HAIL", EVTYPE) ~ "HAIL",
                grepl("HURRICANE|TYPHOON", EVTYPE) ~ "HURRICANE (TYPHOON)",
                grepl("LIGHTNING|LIGNTNING|LIGHTING", EVTYPE) ~ "LIGHTNING",
                grepl("WILDFIRE|FOREST FIRE|WILD FIRES?", EVTYPE) ~ "WILDFIRE",
    TRUE ~ EVTYPE
        ))

The data downloaded from the NOAA Storm Database needed to be cleaned up especially for the Data signifying the “Event type”. The impact of this clean up is shown by:

EVTYPE’s before consolidation: 985
EVTYPE’s after consolidation: 890

Part 1: Here we investigate which Events cause maximum impact to population health

# code to aggregate fatalities and injuries by event type
aggregated_casualties<-working_dataset %>%
        group_by(event_category) %>%
        summarise(
                total_fatalities=sum(FATALITIES, na.rm = TRUE),
                total_injuries=sum(INJURIES, na.rm = TRUE),
                total_casualties=total_fatalities+total_injuries
                )

# code to order them by the most impactful
aggregated_casualties<-aggregated_casualties %>%
                        arrange(desc(total_casualties))

# extracting the top
top_events<-aggregated_casualties %>%
                slice_head(n=10)

# code to plot the data
ggplot(top_events,aes(x=reorder(event_category, -total_casualties), y=total_casualties))+
        geom_col()+
        scale_y_continuous(labels = scales::comma)+
        labs(title = "Top 10 Event Categories by Health Impact",
       x = "Event Category",
       y = "Total Casualties (Fatalities + Injuries)")+
        theme(
                axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1, color = "blue"),
                axis.text.y =  element_text(color="blue")
                )

Result of analysis on population health (both fatalties and casualties)

From the above bar chart we can see that Tornadoes, Thunderstorms and Heat are the top 3 events that cause the maximum casualties when using data from the NOAA Storm Database

ggplot(top_events,aes(x=reorder(event_category, -total_fatalities), y=total_fatalities))+
        geom_col()+
        scale_y_continuous(labels = scales::comma)+
        labs(title = "Top 10 Event Categories by Health Impact",
       x = "Event Category",
       y = "Total Fatalities")+
        theme(
                axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1, color="blue"),
                axis.text.y= element_text(color="blue"))

Result of analysis of impact on just fatalties

We can view the data from the perspective of fatalities since they carry more weight than plain injuries. View the bar plot from the perspective of fatalities show us that Tornadoes, Heat and Flash Floods cause the maximum fatalities

Part 2: Here we investigate those events that cause maximum economic impact

Data processing

PROPDMGEXP and CROPDMGEXP were cleaned up here to map them to the right multipliers corresponding to the exponents they indicated

# code to convert Property and Crop Damage exponents
exp_map <- c(
  "-" = 1,
  "+" = 1,
  "?" = 1,
  "0" = 1,
  "1" = 10,
  "2" = 100,
  "3" = 1000,
  "4" = 10000,
  "5" = 100000,
  "6" = 1000000,
  "7" = 10000000,
  "8" = 100000000,
  "9" = 1000000000,
  "H" = 100,
  "K" = 1000,
  "M" = 1000000,
  "B" = 1000000000
)

exp_map[""] <- 1

working_dataset<-working_dataset%>%
        mutate(
                PROPDMGEXP = toupper(PROPDMGEXP),
                CROPDMGEXP = toupper(CROPDMGEXP),
                prop_multiplier = exp_map[PROPDMGEXP],
                crop_multiplier = exp_map[CROPDMGEXP],
                prop_multiplier = ifelse(is.na(prop_multiplier), 1, prop_multiplier),
                crop_multiplier = ifelse(is.na(crop_multiplier), 1, crop_multiplier),
                property_damage = PROPDMG * prop_multiplier,
                crop_damage = CROPDMG * crop_multiplier,
               )

# code to aggregate property and crop damage by event type
agg_economic_consq<-working_dataset %>%
        group_by(event_category) %>%
        summarise(
                total_crop_damage=sum(crop_damage, na.rm=TRUE),
                total_prop_damage=sum(property_damage, na.rm=TRUE),
                total_economic_damage=total_crop_damage+total_prop_damage
                )

# code to order them by the most impactful
agg_economic_consq<-agg_economic_consq %>%
                        arrange(desc(total_economic_damage))

# extracting the top
top_eco_events<-agg_economic_consq %>%
                slice_head(n=10)

# code to plot the data
ggplot(top_eco_events,aes(x=reorder(event_category, -total_economic_damage), y=total_economic_damage/1000000))+
        geom_col()+
        scale_y_continuous(labels = scales::comma)+
        labs(title = "Top 10 Event Categories by Economic Impact",
       x = "Event Category",
       y = "Total Economic Impact (in Millions)")+
        theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1, color="blue"),
              axis.text.y = element_text(color="blue")
              )

Result of analysis of economic impact of Events

Taking the economic perspective, Floods, Hurricanes and Tornadoes cause the maximum economic damage.

Conclusion

While Tornadoes cause the maximum Fatalties and Injuries, Floods cause the maximum economic impact. While Heat doesn’t cause injuries or economic damage, concentrating on them can save Human Lives. Better preparedness towards Tornadoes and Floods can reduce both casualties and impact to the economy.