This project focuses on exploring the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database, which provides detailed information on significant storms and weather events across the United States. The database records the location and timing of these events, as well as their impact in terms of fatalities, injuries, and property damage.
Severe weather events, such as storms, can lead to major public health and economic challenges for communities and local governments. Many of these events result in significant loss of life, injury, and damage to property, making it crucial to mitigate these impacts as much as possible.
The objective of this analysis is to investigate the NOAA Storm Database and answer key questions about severe weather events:
Data was analyzed using R, with a focus on identifying the types of events that are most harmful to public health and those that result in the greatest economic losses. Graphs and tables were used to illustrate the findings
Reading data and calculating damage from storms and floods
StormData <- read.csv("StormData.csv")
# Convert PROPDMGEXP values
StormData$PROPDMGEXP <- toupper(StormData$PROPDMGEXP) # Convert all symbols to uppercase
convert_prop <- function(exp) {
if (exp == "K") return(1e3)
else if (exp == "M") return(1e6)
else if (exp == "B") return(1e9)
else if (exp == "H") return(1e2)
else if (exp %in% 0:8) return(10^as.numeric(exp))
else return(1) # dealing with empty values and unknown values
}
StormData$PROPDMGEXP_numeric <- sapply(StormData$PROPDMGEXP, convert_prop)
# Convert units to numbers
StormData$CROPDMGEXP <- toupper(StormData$CROPDMGEXP) # Convert all symbols to uppercase
convert_crop <- function(exp) {
if (exp == "K") return(1e3)
else if (exp == "M") return(1e6)
else if (exp == "B") return(1e9)
else if (exp %in% 0:8) return(10^as.numeric(exp))
else return(1) # dealing with empty values and unknown values
}
StormData$CROPDMGEXP_numeric <- sapply(StormData$CROPDMGEXP, convert_crop)
# Calculate final damage values
StormData$Total_PROPDMG <- StormData$PROPDMG * StormData$PROPDMGEXP_numeric
StormData$Total_CROPDMG <- StormData$CROPDMG * StormData$CROPDMGEXP_numeric
The goal is to create a subset of the number of injuries and deaths for each type of event (such as hurricanes or floods) and then plot this data to see which events cause the most injuries or deaths.
Create a subset of data: It collects the data related to injuries and deaths (INJURIES and FATALITIES) for each type of event (EVTYPE), such as hurricanes or floods.
Data Analysis: Using the summarise function from the dplyr library, it will collect the total injuries and deaths for each type of event.
Visualization: The ggplot2 library is used to plot the data. The plot will show which type of event caused the most injuries or deaths.
Knowing which events are most damaging to a population helps direct resources towards early warnings and preventive measures. As Benjamin Franklin said, “An ounce of prevention is worth a pound of cure.”
library(dplyr)
library(ggplot2)
library(reshape2)
Injuries_and_deaths_per_event <- StormData %>% group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES), deaths = sum(FATALITIES)) %>%
mutate(number_EVTYPE = seq_along(EVTYPE)) %>%
melt(.,id = c("number_EVTYPE","EVTYPE"),
measure.vars = c("Injuries","deaths"))
max_Injuries <- Injuries_and_deaths_per_event %>%
filter(value %in% sort(value[1:(length(value)/2)],
decreasing=T)[1:5])
max_deaths <- Injuries_and_deaths_per_event %>%
filter(value %in% sort(value[(length(value)/2):length(value)],
decreasing=T)[1:5])
ggplot(data = Injuries_and_deaths_per_event,
aes(x = number_EVTYPE,y = value,group = 1)) +
geom_line() + facet_grid(. ~ variable) +
theme_bw(base_family = "Times" ) +
geom_text(data = max_Injuries,
aes(x = number_EVTYPE, y = value+4000, label = EVTYPE),
size = 2,angle=45,col="blue") +
geom_text(data = max_deaths,
aes(x = number_EVTYPE, y = value+4000, label = EVTYPE),
size = 2,angle=45,col="blue")
The goal is to create a subset of data that contains the total property and crop damage for each type of event (such as hurricanes or floods) and then plot this data to see which events cause the most economic damage.
Create a subset of data: Collects data on property and crop damage (PROPDMG and CROPDMG) for each type of event (EVTYPE), such as hurricanes or floods.
Data analysis: Using the summarise function from the dplyr library, you will sum the total property and crop damage for each type of event.
Visualization: The ggplot2 library is used to plot the data. The plot will show which type of event caused the most economic damage to property and crops.
Knowing which events cause the most economic damage helps direct efforts to protect resources and property. Proactive measures can be taken to mitigate this damage, whether it is through insurance or strengthening infrastructure to cope with these events.
PROPDMG_and_CROPDMG_per_event <- StormData %>% group_by(EVTYPE) %>%
summarise(PROPDMG = sum(Total_PROPDMG), CROPDMG = sum(Total_CROPDMG))%>%
mutate(number_EVTYPE = seq_along(EVTYPE)) %>%
melt(.,id = c("number_EVTYPE","EVTYPE"),
measure.vars = c("PROPDMG","CROPDMG"))
max_PROPDMG <- PROPDMG_and_CROPDMG_per_event %>%
filter(value %in% sort(value[1:(length(value)/2)],
decreasing=T)[1:5])
max_CROPDMG <- PROPDMG_and_CROPDMG_per_event %>%
filter(value %in% sort(value[(length(value)/2):length(value)],
decreasing=T)[1:5])
ggplot(data = PROPDMG_and_CROPDMG_per_event,
aes(x = number_EVTYPE,y = value,group = 1)) +
geom_line() + facet_grid(. ~ variable) +
theme_bw(base_family = "Times" ) +
geom_text(data = max_PROPDMG,
aes(x = number_EVTYPE, y = value+5e+9, label = EVTYPE),
size = 2,angle=15,col="blue") +
geom_text(data = max_CROPDMG,
aes(x = number_EVTYPE, y = value+5e+9, label = EVTYPE),
size = 2,angle=15,col="blue")