The type of weather events that are most likely to have human costs (measured by fatalities and injuries) differed frome those most likely to have economic effects (measure by property and crop damage). In the case of human costs, the top 5 event types were: tornados, heat, flood, wind and lightning, though their relative order varied slightly. Tornados were the the leading cause of fatalities by about a factor of 2, while it was the leading cause of injuries by a factor of about eight. Economic costs has more variation between the two measures: property damage was mostly caused by tornado, wind or flood, with lesser damage done by hail, lighning, winter weather and wildfire, while crop damage was caused mostly by hail, flood and wind, with lesser damage done by tornados, drought, heavy rain, frost and cold.
This code assume that the file is in your working directory
# First, read in the data -> this will automatically uncompress the data.
#stormData<-read.csv('repdata_data_StormData.csv.bz2')
# Subset out the values for event type, fatalities, injuries, and damage, since these are the things the analysis will be based on
#stormDataImpacts <- stormData[c("EVTYPE", "FATALITIES", "INJURIES","PROPDMG", "CROPDMG")]
library(readr)
stormDataImpacts <- read_csv("stormDataImpacts.csv")
# transform to using only lower case letters for event types, so that they group properly
stormDataImpacts$EVTYPE <- tolower(stormDataImpacts$EVTYPE)
# Summarize the data by event type, using sum
fatalitiesByType <-aggregate(FATALITIES ~ EVTYPE, stormDataImpacts, sum)
injuriesByType <-aggregate(INJURIES ~ EVTYPE, stormDataImpacts, sum)
propDmgByType <-aggregate(PROPDMG ~ EVTYPE, stormDataImpacts, sum)
cropDmgByType <-aggregate(CROPDMG ~ EVTYPE, stormDataImpacts, sum)
# Pull out all event types with at least 1 outcome (ie one fatality, dollar of damage, etc)
fatalitiesGreaterZero<-fatalitiesByType[which(fatalitiesByType$FATALITIES >0),]
injuriesGreaterZero<-injuriesByType[which(injuriesByType$INJURIES >0),]
propDmgGreaterZero<-propDmgByType[which(propDmgByType$PROPDMG >0),]
cropDmgGreaterZero<-cropDmgByType[which(cropDmgByType$CROPDMG >0),]
# Reorder the factors (which changes the order used by ggplot2) and sort by the number of fatalities/injuries (which changes the order used by everything else)
fatalitiesGreaterZero$EVTYPE <- reorder(fatalitiesGreaterZero$EVTYPE, -fatalitiesGreaterZero$FATALITIES)
fatalitiesGreaterZero<-fatalitiesGreaterZero[order(fatalitiesGreaterZero[,2], na.last = TRUE, decreasing = TRUE),]
injuriesGreaterZero$EVTYPE <- reorder(injuriesGreaterZero$EVTYPE, -injuriesGreaterZero$INJURIES)
injuriesGreaterZero<-injuriesGreaterZero[order(injuriesGreaterZero[,2], na.last = TRUE, decreasing = TRUE),]
propDmgGreaterZero<-propDmgGreaterZero[order(propDmgGreaterZero[,2], na.last = TRUE, decreasing = TRUE),]
propDmgGreaterZero$EVTYPE <- reorder(propDmgGreaterZero$EVTYPE, -propDmgGreaterZero$PROPDMG)
cropDmgGreaterZero<-cropDmgGreaterZero[order(cropDmgGreaterZero[,2], na.last = TRUE, decreasing = TRUE),]
cropDmgGreaterZero$EVTYPE <- reorder(cropDmgGreaterZero$EVTYPE, -cropDmgGreaterZero$CROPDMG)
To examine the results of the analysis, the top eight event types for each type of damage are graphed below.
# Take the top several categories
topFatalEvTypes <- head(fatalitiesGreaterZero, n=8)
topInjuryEvTypes <- head(injuriesGreaterZero, n=8)
topPropDmgEvTypes <- head(propDmgGreaterZero, n=8)
topCropDmgEvTypes <- head(cropDmgGreaterZero, n=8)
Load packages needed for the plots (warnings are suppressed)
library(ggplot2)
library("gridExtra")
library("cowplot")
Then graph the fatalities and injuries.
p1 <- ggplot(data=topFatalEvTypes, aes(x=EVTYPE, y=FATALITIES)) + ggtitle("Count of fatalities for the top event types") +geom_bar(stat="identity")
p2 <- ggplot(data=topInjuryEvTypes, aes(x=EVTYPE, y=INJURIES)) + ggtitle("Count of injuries for the top event types") +geom_bar(stat="identity")
grid.arrange(p1, p2, ncol=1, nrow =2, top="Top event types for human cost")
As can be seen from the above figure, tornados are by far the most deadly event and the largest source of injuries. Excessive heat, wind and flooding are the next three most common causes of both fatalities and injuries, though the relative numbers differ between fatalities and injuries.
Next, graph the property damage and crop damage.
p3 <- ggplot(data=topPropDmgEvTypes, aes(x=EVTYPE, y=PROPDMG)) + ggtitle("Total cost of property damange for the top event types") +geom_bar(stat="identity")
p4 <- ggplot(data=topCropDmgEvTypes, aes(x=EVTYPE, y=CROPDMG)) + ggtitle("Total cost of crop for the top event types") +geom_bar(stat="identity")
grid.arrange(p3, p4, ncol=1, nrow =2, top="Top event types for material cost")
This figure shows that the leading causes of economic damage differ significantly between property damage and crop damamge. As with the human costs, tornados are the top cause of property damage. Wind and hail follow closely as the next most common causes of propery damage. Other common causes of property damage include hail, lighning, winter storm, heavy snow and wildfire. However, the most common cause of crop damage is hail, followed by flooding and then wind. Other common causes of crop damamge are tornado, drought, heavy rain, and frost/cold.