Assignment
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Questions
Your data analysis must address the following questions:
Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.
According to the data, Tornados cause the most injuries as well as the most fatalities. Further, the top 6 event types that lead to the most fatalities are also the same events as those that cause the most injuries, just in a different ranking. These 6 event types as the most dangerous to public health. Economic impact can be measure in both property and crop damage. Various flooding is responsible for the most overal damage. Property damage is proportionally larger than crop damage, leading us to conclude that the greatest financial impact from storms is due to property damage.
library(data.table)
library(ggplot2)
library(dplyr)
library(tidyr)
library(knitr)
library(gridExtra)
Aside from some subsetting, the data had a couple of quirks that we need to sort out before we can begin our analysis.
First, we need to group similar events and sum them by event type to get a sense for how many injuries and fatalities are attributable to each.
Secondly, we need transform the monetary damage figures based on the identifier (m, k, or b) to identify an accurate measure of economic impact. We use a conditional statement to check for the presence of the m, k or b variable in one column and transform the monetary amount
#Read data in and format
StormData <- read.csv("~/Data_Science/ReproResearch2/StormData.csv", stringsAsFactors=FALSE)
StormData$EVTYPE <- as.factor(StormData$EVTYPE)
StormData <- as.data.table(StormData)
#Subset for Question 1
question1 <- StormData[,list(EVTYPE,FATALITIES,INJURIES)]
q1tbl.fatalities <- question1%>%group_by(EVTYPE)%>%summarise(Fatalities=sum(FATALITIES,na.rm=TRUE))%>%arrange(desc(Fatalities))
q1tbl.injuries <- question1%>%group_by(EVTYPE)%>%summarise(Injuries = sum(INJURIES))%>%arrange(desc(Injuries))
#Subset for Question 2
question2 <- StormData[,list(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)]
#Table for Property Damage
q2tbl.prop <- question2%>%mutate( NewPropDmg = ifelse((PROPDMGEXP == 'k' | PROPDMGEXP == 'K'), PROPDMG*1000,
ifelse((PROPDMGEXP == 'b' | PROPDMGEXP == 'B'), PROPDMG*10000,
ifelse((PROPDMGEXP == 'm' | PROPDMGEXP == 'M'), PROPDMG*100, PROPDMG*1))))%>%group_by(EVTYPE)%>%summarise(PropDmg_Bn=sum(NewPropDmg,na.rm=TRUE)/10000000)%>%arrange(desc(PropDmg_Bn))
#Table for Crop Damage
q2tbl.crop <- question2%>%mutate( NewCropDmg = ifelse((CROPDMGEXP == 'k' | CROPDMGEXP == 'K'), CROPDMG*1000,
ifelse((CROPDMGEXP == 'b' | CROPDMGEXP == 'B'), CROPDMG*10000,
ifelse((CROPDMGEXP == 'm' | CROPDMGEXP == 'M'), CROPDMG*100, CROPDMG*1))))%>%group_by(EVTYPE)%>%summarise(CropDmg_Bn=sum(NewCropDmg,na.rm=TRUE)/10000000)%>%arrange(desc(CropDmg_Bn))
#Subset data for plot 1
plot1 <- transform(q1tbl.fatalities[1:6,], EVTYPE=reorder(EVTYPE, Fatalities))
#Plot the data
c <- ggplot(plot1, aes(x=EVTYPE, y=Fatalities, fill=EVTYPE))
c <- c + geom_bar(stat= "identity") +theme(legend.position="none")+ggtitle("Fig 1.1: Fatalities by Event Type") + xlab("Event")
#Subset data for plot 2
plot2 <- transform(q1tbl.injuries [1:6,], EVTYPE=reorder(EVTYPE, Injuries))
d <- ggplot(plot2, aes(x=EVTYPE, y=Injuries, fill=EVTYPE))
d <- d + geom_bar(stat= "identity") + theme(legend.position = "none") + ggtitle("Fig 1.2: Injuries by Event Type") + xlab("Event")
grid.arrange(c,d)
In our municipality, we do not encounter Tornados but Floods are quite common. Its notable then, to observe that based on the charts above Floods rank 3rd on both the list of Fatalities and Injuries. Given our population size and the condition of our dams and roads, floods have historically posed a serious health risk to our community.
kable(cbind(head(q2tbl.crop),head(q2tbl.prop)), caption = "Fig 2: Economic Damage from Storms ($B)")
| EVTYPE | CropDmg_Bn | EVTYPE | PropDmg_Bn |
|---|---|---|---|
| HAIL | 57.736936 | TORNADO | 316.83811 |
| FLASH FLOOD | 17.808144 | FLASH FLOOD | 140.72156 |
| FLOOD | 16.308839 | TSTM WIND | 133.30737 |
| TSTM WIND | 10.880187 | FLOOD | 88.18827 |
| TORNADO | 9.957481 | THUNDERSTORM WIND | 87.43522 |
| THUNDERSTORM WIND | 6.647789 | HAIL | 67.64508 |
As for the economic impact described in the table above, Floods wreak havoc here as well accounting for a large portion of Crop and Property damage. Currently, our budget is not sufficient to cover potential damages from floods.