Severe weather events can have a significant impact on public health
and the economy. Understanding which events pose the greatest risks can
help policymakers and emergency responders better prepare for and
respond to these events.
This analysis attempts to do just
that, by exploring the impact of severe weather events on public health
and economic consequences in the United States using the NOAA Storm
Database.
This analysis focuses on two main questions:
Which events are most harmful to population health?
Which events have the greatest economic consequences?
The data is processed and analyzed using R, with visualizations
created to illustrate the findings. The results show which types of
severe weather events pose the greatest risks to public safety and
economic stability, providing valuable insights for emergency
preparedness and resource allocation.
Before we start the analysis, we need to import the necessary libraries for data processing and visualization.
library(data.table)
We first download the data from NOAA Storm Database.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "repdata_data_StormData.csv.bz2")
Once we download the data, we can load it into our session
data <- fread("repdata_data_StormData.csv.bz2")
Let us look at the data structure to understand the columns and values.
str(data)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
That’s clearly way too many variables. Luckily, we are only interested in a few of them. Let’s extract the relevant columns and clean the data. ## Cleaning The Data
Since we are only interested in the economic and health impact of the events, we will extract the relevant columns for analysis. The relevant columns are:
EVTYPE: Event TypeFATALITIES: Number of FatalitiesINJURIES: Number of InjuriesPROPDMG & PROPDMGEXP: Property
DamageCROPDMG & CROPDMGEXP: Crop Damagecleaned_data <- data[, .(EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, PROPDMGEXP, CROPDMGEXP)]
Since some of the data is missing, we will remove rows with missing values to ensure the accuracy of our analysis.
cleaned_data <- cleaned_data[complete.cases(cleaned_data), ]
Observe that the property and crop damage columns have an exponential
multiplier column (PROPDMGEXP and CROPDMGEXP)
to indicate the order of magnitude for the damage values.
.So, the
true property damage is PROPDMG multiplied bu the exponent
as indicated by PROPDMGEXP, and same is true for crop
damage.
We will adjust the damage values accordingly.
convert_exp <- function(exp) {
if (exp %in% c("K", "k")) return(1e3)
if (exp %in% c("M", "m")) return(1e6)
if (exp %in% c("B", "b")) return(1e9)
return(1)
}
cleaned_data[, PROPDMG := PROPDMG * sapply(PROPDMGEXP, convert_exp)]
cleaned_data[, CROPDMG := CROPDMG * sapply(CROPDMGEXP, convert_exp)]
Finally, we are ready to analyze the data to answer the two main questions we asked at the beginning.
First, let us extract only the relevant columns for analysis and group the data by event type.
fatalities_data <- cleaned_data[, .(FATALITIES = sum(FATALITIES)), by = EVTYPE]
injuries_data <- cleaned_data[, .(INJURIES = sum(INJURIES)), by = EVTYPE]
Finally, we can order the data by the number of fatalities and injuries to identify the most harmful events.
fatalities_data <- fatalities_data[order(-FATALITIES)]
injuries_data <- injuries_data[order(-INJURIES)]
Since we want to focus on the top events that cause the most harm we will only look at the top 4 events and aggregate the rest as “Others”.
top_fatalities_data <- fatalities_data[1:4]
# Sum the rest as "Others"
others_fatalities <- sum(fatalities_data[5:.N, FATALITIES])
top_fatalities_data <- rbind(top_fatalities_data, data.table(EVTYPE = "Others", FATALITIES = others_fatalities))
# Same for injuries
top_injuries_data <- injuries_data[1:4]
others_injuries <- sum(injuries_data[5:.N, INJURIES])
top_injuries_data <- rbind(top_injuries_data, data.table(EVTYPE = "Others", INJURIES = others_injuries))
Let us now plot the data to see which events are the most severe in
terms of fatalities and injuries.
We have decided to use pie charts
for this purpose, as we want to show the distribution of harm across
different event types.
par(mfrow = c(1, 2))
# Plot pie chart for fatalities with percentages and reduced font size
fatalities_pct <- round(top_fatalities_data$FATALITIES / sum(top_fatalities_data$FATALITIES) * 100, 1)
pie(top_fatalities_data$FATALITIES, labels = paste(top_fatalities_data$EVTYPE, fatalities_pct, "%"), main = "Fatalities by Event Type", cex = 0.5)
# Plot pie chart for injuries with percentages and reduced font size
injuries_pct <- round(top_injuries_data$INJURIES / sum(top_injuries_data$INJURIES) * 100, 1)
pie(top_injuries_data$INJURIES, labels = paste(top_injuries_data$EVTYPE, injuries_pct, "%"), main = "Injuries by Event Type", cex = 0.5)
Looking at the pie chart it is clear as day that
Tornadoes are the most harmful event in terms of
fatalities and injuries.
Tornadoes account for
37.2% of total fatalities and 65% of
total injuries.
Therefore, it is clear that Tornadoes are the most
harmful event in terms of population health.
Similar to the previous analysis, we will summarize the data for property and crop damage by event type and order them according to their economic impact.
property_data <- cleaned_data[, .(PROPDMG = sum(PROPDMG)), by = EVTYPE]
crop_data <- cleaned_data[, .(CROPDMG = sum(CROPDMG)), by = EVTYPE]
property_data <- property_data[order(-PROPDMG)]
crop_data <- crop_data[order(-CROPDMG)]
Again, we shall only focus on the top 4 events that cause the most economic damage and aggregate the rest as “Others”.
top_property_data <- property_data[1:4]
other_property <- sum(property_data[5:.N, PROPDMG])
top_property_data <- rbind(top_property_data, data.table(EVTYPE = "Others", PROPDMG = other_property))
top_crop_data <- crop_data[1:4]
other_crop <- sum(crop_data[5:.N, CROPDMG])
top_crop_data <- rbind(top_crop_data, data.table(EVTYPE = "Others", CROPDMG = other_crop))
Let us see which events have the greatest economic consequences by plotting the property and crop damage data.
par(mfrow = c(1, 2))
# Plot pie chart for property damage with percentages and reduced font size
property_pct <- round(top_property_data$PROPDMG / sum(top_property_data$PROPDMG) * 100, 1)
pie(top_property_data$PROPDMG, labels = paste(top_property_data$EVTYPE, property_pct, "%"), main = "Property Damage by Event Type", cex = 0.5)
# Plot pie chart for crop damage with percentages and reduced font size
crop_pct <- round(top_crop_data$CROPDMG / sum(top_crop_data$CROPDMG) * 100, 1)
pie(top_crop_data$CROPDMG, labels = paste(top_crop_data$EVTYPE, crop_pct, "%"), main = "Crop Damage by Event Type", cex = 0.5)
Looking at the data, it looks like Floods are the most harmful in terms of property damage, accounting for 33.9% of total property damage and Drought is the most harmful in terms of crop damage, accounting for 28.5% of total crop damage.
While the above analysis is useful, it does not provide the full picture. There are still some questions worth answering.
Looking at these questions it doesn’t look like our question has been
answered yet.
Since both PROPDMG and
CROPDMG measure the economic impact in terms of monetary
value, we can combine them to get a better understanding of the overall
economic impact of each event.
damage_data <- merge( property_data, crop_data, by = "EVTYPE")
damage_data[, DAMAGE := PROPDMG + CROPDMG]
damage_data <- damage_data[order(-DAMAGE)]
damage_data <- damage_data[, .(EVTYPE, DAMAGE)]
This time, let us focus on the top 7 events instead of the top 4 to have a better understanding of the distribution
top_damage_data <- damage_data[1:7]
other_damage <- sum(damage_data[8:.N, DAMAGE])
top_damage_data <- rbind(top_damage_data, data.table(EVTYPE = "Others", DAMAGE = other_damage))
Let us plot the combined data to see the results.
damage_pct <- round(top_damage_data$DAMAGE / sum(top_damage_data$DAMAGE) * 100, 1)
pie(top_damage_data$DAMAGE, labels = paste(top_damage_data$EVTYPE, damage_pct, "%"), main = "Total Damage by Event Type", cex = 0.5)
Looking at the pie chart, it is clear that Floods are the most harmful, accounting for 31.6% of the total economic impact.
From our analysis, we can conclude that Tornadoes
are the most harmful event in terms of population health and **Floods*
are most harmful in terms of economic consequences.
So,
policymakers and emergency respondents should focus on preparing for and
responding to Tornadoes & Floods to minimize the impact on public
safety and economic stability.