This report utilizes data from the National Oceanic and Atmospheric
Administration (NOAA) storm database to evaluate the severity of severe
weather events. The investigation specifically examines the consequences
across two primary categories: population health and economic
impact.
Population health is assessed using two indicators: total injuries and
total fatalities for each event type. Our analysis revealed that
tornadoes have the most detrimental effects on both indicators.
Economic damage is measured using two indicators: total financial losses
from property and crop destruction. Droughts have the most significant
negative impact on crops, while floods pose the greatest threat to
property damage.
We download the given data in our working directory and load it to our session
library(dplyr)
library(ggplot2)
mydata=read.csv("repdata_data_StormData.csv", header=TRUE,
na.strings=c("NA", ""))
We first extract the relevant columns columns for our analysis from the data set: injuries, fatalities, property damage and its multiplier, crop damage and its multiplier.
mydata1= mydata%>% select(EVTYPE, FATALITIES,INJURIES,CROPDMG,CROPDMGEXP,PROPDMG,PROPDMGEXP)
. For the heath outcome, we shall use the number of fatalities and the number of injuries as separate outcomes and aggregate them by event type across the US. We make sure to remove any NA from the sum when aggregating.
temp=mydata1%>%select(EVTYPE,FATALITIES)
fatal=aggregate(temp$FATALITIES, list(tolower(temp$EVTYPE)), FUN=sum, na.rm=TRUE)
colnames(fatal)=c("Event Type","Fatalities")
fatal=fatal%>%filter(Fatalities > 0) %>%
arrange(desc(Fatalities))
temp1=mydata1%>%select(EVTYPE,INJURIES)
injury=aggregate(temp1$INJURIES, list(tolower(temp1$EVTYPE)), FUN=sum, na.rm=TRUE)
colnames(injury)=c("Event Type","Injuries")
injury=injury%>%filter(Injuries > 0) %>%
arrange(desc(Injuries))
We clean up the data and calculate the total property damage and total crop damage by using their multipliers suitably.
econ_data<-mydata1 %>%
mutate(prop_damage = ifelse(PROPDMGEXP == "H"|PROPDMGEXP == "h"|PROPDMGEXP == "2", PROPDMG * 100,
ifelse(PROPDMGEXP == "M"|PROPDMGEXP == "m"|PROPDMGEXP == "6",PROPDMG * 1000000,
ifelse(PROPDMGEXP == "K"|PROPDMGEXP == "3", PROPDMG * 1000,
ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000,
ifelse(PROPDMGEXP == "8", PROPDMG * 100000000,
ifelse(PROPDMGEXP == "7", PROPDMG * 10000000,
ifelse(PROPDMGEXP == "5", PROPDMG * 100000,
ifelse(PROPDMGEXP == "4", PROPDMG * 10000,
ifelse(PROPDMGEXP == "1", PROPDMG * 10,
ifelse(PROPDMGEXP == "0", PROPDMG, NA))))))))))) %>%
mutate(crop_damage = ifelse(CROPDMGEXP == "K"|CROPDMGEXP == "k", CROPDMG * 1000,
ifelse(CROPDMGEXP == "M"|CROPDMGEXP == "m", CROPDMG * 1000000,
ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000,
ifelse(CROPDMGEXP == "2", CROPDMG * 100,
ifelse(CROPDMGEXP == "0", CROPDMG ,NA))))))
Just as in the case for health outcome, we aggregate the crop damage and property damage into two data sets.
temp2=econ_data%>%select(EVTYPE,crop_damage)
crop_data=aggregate(temp2$crop_damage, list(tolower(temp2$EVTYPE)), FUN=sum, na.rm=TRUE)
colnames(crop_data)=c("Event Type","crop_damage")
crop_data=crop_data%>%filter(crop_damage > 0) %>%
arrange(desc(crop_damage))
temp3=econ_data%>%select(EVTYPE,prop_damage)
prop_data=aggregate(temp3$prop_damage, list(tolower(temp3$EVTYPE)), FUN=sum, na.rm=TRUE)
colnames(prop_data)=c("Event Type","prop_damage")
prop_data=prop_data%>%filter(prop_damage > 0) %>%
arrange(desc(prop_damage))
We display a bar plot for the five most (negatively) impactful events for injuries and fatalities.
fatal_5=fatal[1:5,]
injury_5=injury[1:5,]
par(mfrow = c(1,2), mar = c(11,5,3,2), cex = 0.9)
barplot(fatal_5$Fatalities, las = 3, names.arg = fatal_5$`Event Type`, main = "Events with Maximum Fatalities", ylab = "Number. of Fatalities", col = "green")
barplot(injury_5$Injuries, las = 3, names.arg = injury_5$`Event Type`, main = "Events with Maximum Injuries", ylab = "Number of Injuries", col = "red")
We can see that tornadoes have the most negative health impact.
We display a barplot of the five most( negatively) impactful events on crop damage and property damage.
crop_data_5=crop_data[1:5,]
prop_data_5=prop_data[1:5,]
par(mfrow = c(1,2), mar = c(11,5,3,2), cex = 0.9)
barplot(crop_data_5$crop_damage, las = 3, names.arg = crop_data_5$`Event Type`, main = "Events with Max. Crop Damage", ylab = "Crop Damage (in $)", col = "blue")
barplot(prop_data_5$prop_damage, las = 3, names.arg = prop_data_5$`Event Type`, main = "Events with Max. Property Damage", ylab = "Property damage (in $)", col = "pink")
We can see from the plot that drought causes the most crop damage while flood causes the most property damage.