Synopsis

This analysis performed on the US National Weather Service Storm database shows the impact of weather events on population health and on the US economy between 1950 and 2011. Upon exploration of the storm database, it is found that the weather events with the highest impact on population health, including both injuries and fatalities, are tornados. As far as economic consequences go, the weather event causing the most damage to crops is drought, followed closely by flooding. The weather event causing the most property damage is flooding.

Data Processing

In order to load and process the database used, it is first necessary to install and load specific packages in R. At this time, the packages required to plot the data and find the results needed will also be installed and loaded.

require(R.utils)
require(data.table)
require(dplyr)
require(stringr)
require(knitr)
require(ggplot2)
require(grid)
require(gridExtra)

From here, it is now possible to load, unzip, and read in the dataset

url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv.bz2")
bunzip2("StormData.csv.bz2", "StormData.csv")

stormData<-read.csv("StormData.csv")

To process the dataset, it can be condensed into a subset showcasing only the necessary information. From there, the data subset can be further divided into one subset regarding population health and another regarding economic consquence. The event types were also condensed using the grep function due to the various event type names within the same overall type. Before condensing, all event types were forced to uppercase to allow for consistency.

##Subset the data to what is needed
eventData<-stormData[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

##Coerce all event types to uppercase for consistency
eventData[,1] = as.data.frame(sapply(eventData[,1], toupper))

##Condense event types by major types
eventData$EVTYPE[grep(".*TORNADO.*", eventData$EVTYPE, ignore.case = TRUE)] <- "TORNADO"
eventData$EVTYPE[grep(".*TORNDAO.*", eventData$EVTYPE, ignore.case = TRUE)] <- "TORNADO"
eventData$EVTYPE[grep(".*TSTM.*", eventData$EVTYPE, ignore.case = TRUE)] <- "THUNDERSTORM"
eventData$EVTYPE[grep(".*THUNDERSTORM.*", eventData$EVTYPE, ignore.case = TRUE)] <- "THUNDERSTORM"
eventData$EVTYPE[grep(".*WIND.*", eventData$EVTYPE, ignore.case = TRUE)] <- "HIGH WIND"
eventData$EVTYPE[grep(".*WND.*", eventData$EVTYPE, ignore.case = TRUE)] <- "HIGH WIND"
eventData$EVTYPE[grep(".*FLOOD.*", eventData$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
eventData$EVTYPE[grep(".*FLD.*", eventData$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
eventData$EVTYPE[grep(".*FLOOOD.*", eventData$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
eventData$EVTYPE[grep(".*COOL.*", eventData$EVTYPE, ignore.case = TRUE)] <- "COLD"
eventData$EVTYPE[grep(".*SNOW.*", eventData$EVTYPE, ignore.case = TRUE)] <- "SNOW/ICE"
eventData$EVTYPE[grep(".*WINTER.*", eventData$EVTYPE, ignore.case = TRUE)] <- "SNOW/ICE"
eventData$EVTYPE[grep(".*BLIZZARD.*", eventData$EVTYPE, ignore.case = TRUE)] <- "SNOW/ICE"
eventData$EVTYPE[grep(".*ICE.*", eventData$EVTYPE, ignore.case = TRUE)] <- "SNOW/ICE"
eventData$EVTYPE[grep(".*SLEET.*", eventData$EVTYPE, ignore.case = TRUE)] <- "SNOW/ICE"
eventData$EVTYPE[grep(".*WINTRY.*", eventData$EVTYPE, ignore.case = TRUE)] <- "SNOW/ICE"
eventData$EVTYPE[grep(".*HURRICANE.*", eventData$EVTYPE, ignore.case = TRUE)] <- "HURRICANE"
eventData$EVTYPE[grep(".*LIGHTNING.*", eventData$EVTYPE, ignore.case = TRUE)] <- "LIGHTNING"
eventData$EVTYPE[grep(".*LIGHTING.*", eventData$EVTYPE, ignore.case = TRUE)] <- "LIGHTNING"
eventData$EVTYPE[grep(".*TROPICAL.*", eventData$EVTYPE, ignore.case = TRUE)] <- "TROPICAL STORM"
eventData$EVTYPE[grep(".*DUST.*", eventData$EVTYPE, ignore.case = TRUE)] <- "DUST STORM"
eventData$EVTYPE[grep(".*DRY.*", eventData$EVTYPE, ignore.case = TRUE)] <- "DRY CONDITIONS"
eventData$EVTYPE[grep(".*DRIEST.*", eventData$EVTYPE, ignore.case = TRUE)] <- "DRY CONDITIONS"

##Subset data for each question (Health and Economic)
healthData<-eventData[,c("EVTYPE","FATALITIES","INJURIES")]
economicData<-eventData[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

summary(healthData)
##           EVTYPE         FATALITIES          INJURIES        
##  THUNDERSTORM:336807   Min.   :  0.0000   Min.   :   0.0000  
##  HAIL        :288661   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  FLOOD       : 86109   Median :  0.0000   Median :   0.0000  
##  TORNADO     : 60701   Mean   :  0.0168   Mean   :   0.1557  
##  SNOW/ICE    : 42284   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  HIGH WIND   : 28195   Max.   :583.0000   Max.   :1700.0000  
##  (Other)     : 59540
summary(economicData)
##           EVTYPE          PROPDMG          PROPDMGEXP    
##  THUNDERSTORM:336807   Min.   :   0.00          :465934  
##  HAIL        :288661   1st Qu.:   0.00   K      :424665  
##  FLOOD       : 86109   Median :   0.00   M      : 11330  
##  TORNADO     : 60701   Mean   :  12.06   0      :   216  
##  SNOW/ICE    : 42284   3rd Qu.:   0.50   B      :    40  
##  HIGH WIND   : 28195   Max.   :5000.00   5      :    28  
##  (Other)     : 59540                     (Other):    84  
##     CROPDMG          CROPDMGEXP    
##  Min.   :  0.000          :618413  
##  1st Qu.:  0.000   K      :281832  
##  Median :  0.000   M      :  1994  
##  Mean   :  1.527   k      :    21  
##  3rd Qu.:  0.000   0      :    19  
##  Max.   :990.000   B      :     9  
##                    (Other):     9

Across the United States, which types of events are most harmful with respect to population health?

To answer this question, it is necessary to process the data further to highlight the top five weather events with the highest number of fatalities and the highest number of injuries. Comparing these statistics in two separate bar plots will give the best visual representation of the results.

First, the health data subset must be condensed by finding the sum of fatalities by event type and the sum of injuries by event type.

fatalitiesByEvent<-aggregate(healthData$FATALITIES, by=list(healthData$EVTYPE), FUN=sum,na.rm=TRUE)
colnames(fatalitiesByEvent)<-c("Event Type", "Fatalities")
injuriesByEvent<-aggregate(healthData$INJURIES, by=list(healthData$EVTYPE), FUN=sum,na.rm=TRUE)
colnames(injuriesByEvent)<-c("Event Type", "Injuries")

From here, the aggregated subset is used to determine the top five weather events with the highest number of fatalities and the highest number of injuries respectively.

##Subset healthData to take top 5 counts for fatalities and injuries respectively
topFiveFatal<-fatalitiesByEvent[order(fatalitiesByEvent$Fatalities,decreasing=TRUE),][1:5,]
topFiveInjur<-injuriesByEvent[order(injuriesByEvent$Injuries,decreasing=TRUE),][1:5,]

#Define event types as factors to keep descending order when creating the bar plot
topFiveFatal$`Event Type`<-factor(topFiveFatal$`Event Type`,levels=topFiveFatal$`Event Type`)
topFiveInjur$`Event Type`<-factor(topFiveInjur$`Event Type`,levels=topFiveInjur$`Event Type`)

Using the final condensed subsets for both fatalities and injuries, a panel plot is created to show a side-by-side comparison of the top five weather events affecting population health.

Figure 1

fatalPlot<-ggplot()+geom_bar(data=topFiveFatal,
                             aes(x=topFiveFatal[,1], 
                                 y=topFiveFatal[,2],
                                 fill=interaction(topFiveFatal[,2],topFiveFatal[,1])), 
                             stat="identity", 
                             show.legend=FALSE)+
        theme(axis.title.x = element_blank())+
        ylab("Number of Fatalities")+
        ggtitle("Fatalities vs. Event Type")+
        theme(plot.title = element_text(size = 12,face="bold"))+
        theme(axis.text.x=element_text(angle=30, hjust=1))+
        theme(panel.background=element_rect(fill = "white",colour = "white",size = 0.5, linetype = "solid"),
                panel.grid.major=element_line(size = 0.5, linetype = 'dotted',colour = "darkgrey"), 
                panel.grid.minor=element_line(size = 0.25, linetype = 'dotted',colour = "darkgrey"))

injuryPlot<-ggplot()+geom_bar(data=topFiveInjur,
                              aes(x=topFiveInjur[,1], 
                                  y=topFiveInjur[,2],
                                  fill=interaction(topFiveInjur[,2], topFiveInjur[,1])), 
                              stat="identity", 
                              show.legend=FALSE)+
        theme(axis.title.x = element_blank())+
        ylab("Number of Injuries")+
        ggtitle("Injuries vs. Event Type")+
        theme(plot.title = element_text(size = 12,face="bold"))+
        theme(axis.text.x=element_text(angle=30, hjust=1))+
        theme(panel.background=element_rect(fill = "white",colour = "white",size = 0.5, linetype = "solid"),
              panel.grid.major=element_line(size = 0.5, linetype = 'dotted',colour = "darkgrey"), 
              panel.grid.minor=element_line(size = 0.25, linetype = 'dotted',colour = "darkgrey"))

grid.arrange(injuryPlot,
             fatalPlot,
             ncol=2,
             nrow=1,
             top=textGrob("Top 5 Weather Events Harmful to Population Health",gp=gpar(fontsize=20,font=3)))

Across the United States, which types of events have the greatest economic consequences?

In order to assess the economic consequences due to weather events, it is necessary to process the economic data subset further. In the original subset for economic data, the property and crop damage expenses are denoted in two separate columns where the first column has the number value and the second has the identifier for amount of money (i.e: “k” or “K” for thousands, “M” or “m” for millions, etc. ). Using the documentation for the database and the gsub function, the variable identifiers can be replaced with a numerical value in the second column. Once the second column is numerical, it can be multiplied by the numerical value in the first column to produce a third column showing total monetary damage to both property and crops.

economicData$PROPDMGEXP<-gsub("2", 100, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("H", 100, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("h", 100, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("K", 1000, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("k", 1000, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("3", 1000, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("4", 1e+04, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("5", 1e+05, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("6", 1e+06, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("M", 1e+06, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("m", 1e+06, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("7", 1e+07, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("8", 1e+08, economicData$PROPDMGEXP, ignore.case = TRUE)
economicData$PROPDMGEXP<-gsub("B", 1e+09, economicData$PROPDMGEXP, ignore.case = TRUE)

economicData$CROPDMGEXP<-gsub("2", 100, economicData$CROPDMGEXP, ignore.case = TRUE)
economicData$CROPDMGEXP<-gsub("K", 1000, economicData$CROPDMGEXP, ignore.case = TRUE)
economicData$CROPDMGEXP<-gsub("k", 1000, economicData$CROPDMGEXP, ignore.case = TRUE)
economicData$CROPDMGEXP<-gsub("M", 1e+06, economicData$CROPDMGEXP, ignore.case = TRUE)
economicData$CROPDMGEXP<-gsub("m", 1e+06, economicData$CROPDMGEXP, ignore.case = TRUE)
economicData$CROPDMGEXP<-gsub("B", 1e+09, economicData$CROPDMGEXP, ignore.case = TRUE)

##force columns to numeric
economicData$PROPDMGEXP<-as.numeric(economicData$PROPDMGEXP)
economicData$CROPDMGEXP<-as.numeric(economicData$CROPDMGEXP)

##calculate total damage to crops and properties & add in respective columns to economic subset
economicData$totalPROPDMG<-economicData$PROPDMG * economicData$PROPDMGEXP
economicData$totalCROPDMG<-economicData$CROPDMG * economicData$CROPDMGEXP

Once the two columns for total damage are included in the dataset, it is possible to calculate the total sum of damage to property and crops by event type.

PROPDMGByEvent<-aggregate(economicData$totalPROPDMG, by=list(economicData$EVTYPE), FUN=sum,na.rm=TRUE)
colnames(PROPDMGByEvent)<-c("Event Type", "TotalPROPDMG")
CROPDMGByEvent<-aggregate(economicData$totalCROPDMG, by=list(economicData$EVTYPE), FUN=sum,na.rm=TRUE)
colnames(CROPDMGByEvent)<-c("Event Type", "TotalCROPDMG")

Using this new aggregated data subset, it is possible to extract the top five weather events causing the most damage to property and crops respectively.

For consistency, the event types have been coerced to all uppercase due to the original data having a mix of lowercase and uppercase.

topFivePROPDMG<-PROPDMGByEvent[order(PROPDMGByEvent$TotalPROPDMG,decreasing=TRUE),][1:5,]
topFiveCROPDMG<-CROPDMGByEvent[order(CROPDMGByEvent$TotalCROPDMG,decreasing=TRUE),][1:5,]

topFivePROPDMG[,1] = as.data.frame(sapply(topFivePROPDMG[,1], toupper))
topFiveCROPDMG[,1] = as.data.frame(sapply(topFiveCROPDMG[,1], toupper))

#Define event types as factors to keep descending order when creating the bar plot
topFivePROPDMG$`Event Type`<-factor(topFivePROPDMG$`Event Type`,levels=topFivePROPDMG$`Event Type`)
topFiveCROPDMG$`Event Type`<-factor(topFiveCROPDMG$`Event Type`,levels=topFiveCROPDMG$`Event Type`)

Once the top five weather events have been extracted, the bar plot comparison can be made.

Figure 2

PROPDMGPlot<-ggplot()+geom_bar(data=topFivePROPDMG,
                             aes(x=topFivePROPDMG[,1], 
                                 y=topFivePROPDMG[,2],
                                 fill=interaction(topFivePROPDMG[,2],topFivePROPDMG[,1])), 
                             stat="identity", 
                             show.legend=FALSE)+
        theme(axis.title.x = element_blank())+
        ylab("Total Property Damage in Dollars")+
        ggtitle("Property Damage vs. Event Type")+
        theme(plot.title = element_text(size = 12,face="bold"))+
        theme(axis.text.x=element_text(angle=30, hjust=1,size=8))+
        theme(panel.background=element_rect(fill = "white",
                                            colour = "white",
                                            size = 0.5, linetype = "solid"),
              panel.grid.major=element_line(size = 0.5, linetype = 'dotted',
                                            colour = "darkgrey"), 
              panel.grid.minor=element_line(size = 0.25, linetype = 'dotted',
                                            colour = "darkgrey"))

CROPDMGPlot<-ggplot()+geom_bar(data=topFiveCROPDMG,
                              aes(x=topFiveCROPDMG[,1], 
                                  y=topFiveCROPDMG[,2],
                                  fill=interaction(topFiveCROPDMG[,2], topFiveCROPDMG[,1])), 
                              stat="identity",
                              show.legend=FALSE)+
        theme(axis.title.x = element_blank())+
        ylab("Total Crop Damage in Dollars")+
        ggtitle("Crop Damage vs. Event Type")+
        theme(plot.title = element_text(size = 12,face="bold"))+
        theme(axis.text.x=element_text(angle=30, hjust=1,size=8))+
        theme(panel.background=element_rect(fill = "white",
                                            colour = "white",
                                            size = 0.5, linetype = "solid"),
              panel.grid.major=element_line(size = 0.5, linetype = 'dotted',
                                            colour = "darkgrey"), 
              panel.grid.minor=element_line(size = 0.25, linetype = 'dotted',
                                            colour = "darkgrey"))

grid.arrange(CROPDMGPlot,
             PROPDMGPlot,
             ncol=2,
             nrow=1,
             top=textGrob("Top 5 Weather Events with Economic Consquences",gp=gpar(fontsize=20,font=3)))

Results

Based on the results in Figure 1, tornados are by far the number one weather event causing fatalities and injuries to the population. Based on the results in Figure 2, the weather event causing the most crop damage is drought with flooding being a close second while the weather event causing the most property damage is flooding.