Impact of Severe Weather Events in the US
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data analysis must address the following questions :
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ? Across the United States, which types of events have the greatest economic consequences ?
This analysis shows by aggregating the data by storm event types :
Excessive heat and tornados are the most harmfull events on population health with regarding to the number of fatalities, and tornado is the most harmful event on polulation health with regarding to the number of injuries. Flood is the most harmful event on economy with regarding to the cost on property damage, and drought is the most harmful event on economy with regarding to the cost on crop damage.
Loading the data.
library(dtplyr)
data<-read.table(file = "StormData.csv", sep = ",", header = TRUE, na.strings = "NA")
Since only a few types of weather events were recorded before 1996, the data set is subsetted to include only the records after 1996.
data$BGN_DATE<-as.Date(as.character(data$BGN_DATE), format = "%m/%d/%Y")
data_1996<-subset(data, BGN_DATE>as.Date("1995-12-31"))
Subset the data set further by including only the columns related to the analysis.
data_sub<-subset(data_1996, select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
Match the typos in the event types against the 48 official event types by the NOAA with amatch() function in stringdist package.
library(stringdist)
data_sub$EVTYPE<-as.character(data_sub$EVTYPE)
dict<-c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood","Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
match_matrix<-amatch(data_sub$EVTYPE, dict, maxDist = 10)
data_sub$EVTYPE[!is.na(match_matrix)]<-dict[match_matrix[complete.cases(match_matrix)]]
Subset the data set further into four dataframes for our analysis on fatalities, injuries, property damage, and crop damage.
data_fata<-data_sub[!(data_sub$FATALITIES==0),c(1,2)]
data_injr<-data_sub[!(data_sub$INJURIES==0),c(1,3)]
data_prop<-data_sub[!(data_sub$PROPDMG==0),c(1,4,5)]
data_crop<-data_sub[!(data_sub$CROPDMG==0),c(1,6,7)]
Create a function for calculating the exponential value of property and crop damage.
CalcExp <- function(e) {
if (e %in% c("h", "H"))
return(2)
else if (e %in% c("k", "K"))
return(3)
else if (e %in% c("m", "M"))
return(6)
else if (e %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid value.")
}
}
Aggregate on event types, and sort the result in descending order.
health_fata<-aggregate(FATALITIES ~ EVTYPE, data = data_fata, sum)
health_fata<-health_fata[order(-health_fata$FATALITIES),]
health_injr<-aggregate(INJURIES ~ EVTYPE, data = data_injr, sum)
health_injr<-health_injr[order(-health_injr$INJURIES),]
data_prop$exp<-sapply(data_prop$PROPDMGEXP, FUN = CalcExp)
data_crop$exp<-sapply(data_crop$CROPDMGEXP, FUN = CalcExp)
data_prop$COST<-data_prop$PROPDMG*(10**data_prop$exp)
data_crop$COST<-data_crop$CROPDMG*(10**data_crop$exp)
econm_prop<-aggregate(COST ~ EVTYPE, data = data_prop, sum)
econm_prop<-econm_prop[order(-econm_prop$COST),]
econm_crop<-aggregate(COST ~ EVTYPE, data = data_crop, sum)
econm_crop<-econm_crop[order(-econm_crop$COST),]
List of the top 10 weather events affecting the populations health (fatalities and injuries) are shown. For the given time period, excessive heat and tornados are the most harmfull events on population health with regarding to the number of fatalities, and tornado is the most harmful event on polulation health with regarding to the number of injuries.
library(ggplot2)
library(gridExtra)
p1 <- ggplot(data=head(health_fata,10), aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES)) +
geom_bar(fill="red4",stat="identity") + coord_flip() +
ylab("Total number of fatalities") + xlab("Event type") +
ggtitle("Health impact of weather events in the US since 1996") +
theme(legend.position="none")
p2 <- ggplot(data=head(health_injr,10), aes(x=reorder(EVTYPE, INJURIES), y=INJURIES)) +
geom_bar(fill="olivedrab",stat="identity") + coord_flip() +
ylab("Total number of injuries") + xlab("Event type") +
theme(legend.position="none")
grid.arrange(p1, p2, nrow =2)
Lists of the top 10 weather events causing financial damage to both property and crops are shown below. For the given time period, flood is the most harmful event on economy with regarding to the cost on property damage, and drought is the most harmful event on economy with regarding to the cost on crop damage.
p1 <- ggplot(data=head(econm_prop,10), aes(x=reorder(EVTYPE, COST), y=COST)) +
geom_bar(fill="darkred", stat="identity") + coord_flip() +
xlab("Event type") + ylab("Property damage in dollars") +
ggtitle("Economic impact of weather events in the US since 1996") +
theme(plot.title = element_text(hjust = 0))
p2 <- ggplot(data=head(econm_crop,10), aes(x=reorder(EVTYPE, COST), y=COST)) +
geom_bar(fill="goldenrod", stat="identity") + coord_flip() +
xlab("Event type") + ylab("Crop damage in dollars") +
theme(legend.position="none")
grid.arrange(p1, p2, nrow =2)