---
title: "NOAA Storm Data Analysis"
author: "Giridhar"
output: html_document
---
# Synopsis
This analysis explores the NOAA Storm Database to identify which weather events are most harmful to population health and which cause the greatest economic damage. The data covers events from 1950 to 2011. To assess health impact, we summed fatalities and injuries per event type. For economic impact, we converted the exponential multipliers (K, M, B) for property and crop damage into numeric values. Our findings show that Tornadoes are the most harmful to health, while Flooding causes the greatest economic loss.
# Data Processing
First, we load the required libraries and download the dataset directly from the source.
``` r
library(dplyr)
library(ggplot2)
# Download and read the raw data
url <- "[https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)"
if(!file.exists("stormData.csv.bz2")) {
download.file(url, "stormData.csv.bz2")
}
stormData <- read.csv("stormData.csv.bz2")
Next, we process the data. A critical step is converting the
PROPDMGEXP and CROPDMGEXP columns (which
contain letters like K, M, B) into actual numbers so we can calculate
the total cost in dollars.
# Function to convert multipliers to numbers
convert_exp <- function(x) {
x <- toupper(as.character(x))
if (x == "K") return(1000)
if (x == "M") return(1000000)
if (x == "B") return(1000000000)
return(1)
}
# Apply conversion to multipliers
stormData$prop_mult <- sapply(stormData$PROPDMGEXP, convert_exp)
stormData$crop_mult <- sapply(stormData$CROPDMGEXP, convert_exp)
# Create combined Health and Economic variables
processed_data <- stormData %>%
mutate(Total_Health = FATALITIES + INJURIES,
Total_Econ = (PROPDMG * prop_mult) + (CROPDMG * crop_mult)) %>%
select(EVTYPE, Total_Health, Total_Econ)
We grouped the data by event type to see which ones caused the most total injuries and fatalities.
health_summary <- processed_data %>%
group_by(EVTYPE) %>%
summarize(Sum_Health = sum(Total_Health)) %>%
arrange(desc(Sum_Health)) %>%
head(10)
ggplot(health_summary, aes(x = reorder(EVTYPE, -Sum_Health), y = Sum_Health)) +
geom_bar(stat = "identity", fill = "indianred") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Top 10 Weather Events Harmful to Health",
x = "Event Type", y = "Total Injuries and Fatalities")
We summed the property and crop damage. Note that we display the Y-axis in Billions of dollars for clarity.
econ_summary <- processed_data %>%
group_by(EVTYPE) %>%
summarize(Sum_Econ = sum(Total_Econ)) %>%
arrange(desc(Sum_Econ)) %>%
head(10)
ggplot(econ_summary, aes(x = reorder(EVTYPE, -Sum_Econ), y = Sum_Econ / 1e9)) +
geom_bar(stat = "identity", fill = "steelblue") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Top 10 Weather Events with Greatest Economic Impact",
x = "Event Type", y = "Total Damage (Billions of USD)")
```