This report will examine how weather impacts population and the economy. We’re using the NOAA Storm Database to examine patterns between weather event types and population health and economic consequences. The dataset tracks storm events in the United States between 1950 and 2011.
### Load Libraries
require(dplyr)
require(tidyr)
require(ggplot2)
require(lubridate)
require(readr)
require(stringr)
The data was downloaded from the Coursera Website at the following link: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
rawData <- read.csv("repdata_data_StormData.csv.bz2")
The data was examined using the functions below
# The following functions were used to examine the data
nrow(rawData)
ncol(rawData)
#Data has 37 columns and 902,297 rows
str(rawData)
head(rawData)
tail(rawData)
#Examine The Event Types
unique(rawData$EVTYPE)
#There are 985 Event Types
#View Statistics the following columns: FATALITIES,INJURIES,PROPDMG, CROPDMG
summary(rawData$FATALITIES)
summary(rawData$INJURIES)
summary(rawData$PROPDMG)
summary(rawData$CROPDMG)
Since the analysis is focused on weather with health or economic consequences, the data was filtered for events that have fatalities, injuries, crop damage, or property damage. For the health impact analysis injuries and fatalities were grouped together. After grouping the data by storm event, there were event types in the top 15 most impactful events that needed to be cleaned up to match the NOAA codes.
#Filter the Data for storms that have a health or economic impact
stormImpacts <- filter(rawData,FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 |CROPDMG > 0)
#Convert the to data frame tbl. I couldn't get group_by to work without this?
tableImpacts <- tbl_df(stormImpacts)
#Calculate the Total Health Impacts
tableImpacts <- mutate(tableImpacts, healthimpacts = FATALITIES + INJURIES)
#After running the group by Calculation some event types need to be Cleaned Duplicate Types in the Top 15
tableImpacts$EVTYPE[grep("TSTM WIND",tableImpacts$EVTYPE)]<-"THUNDERSTORM WIND"
tableImpacts$EVTYPE[grep("THUNDERSTORM WINDS",tableImpacts$EVTYPE)]<-"THUNDERSTORM WIND"
#Group the Events by Type and total the sum for each
healthImpactByEvent <- tableImpacts %>%
group_by(EVTYPE) %>%
summarise(healthimpacts = sum(healthimpacts)) %>%
arrange(desc(healthimpacts))
top_n(healthImpactByEvent,15)
head(healthImpactByEvent,15)
For the economic analysis the property damage and crop damage numbers needed to be calculated from two spearate columns. The “EXP” column for property and crop damage was recalculated to a number based on it’s text value (k=1000,m=1,000,000,b= 1,000,000,000). This factor was multiplied by the damage column and the two columns were combined for a total economic impact. After grouping the data by storm event, there were event types in the top 15 most ipmactful events that needed to be cleaned up to match the NOAA codes.
############### Economic Impacts Analysis###############
#Examine the factors
unique(tableImpacts$CROPDMGEXP)
#? 0 2 B k K m M
#Create a Crop Damage Total Variable and set it to Zero
tableImpacts$cropDamage[tableImpacts$CROPDMG==0] <- 0
#Calculate crop damage using the CROPDMGEXP column
tableImpacts$cropDamage[tableImpacts$CROPDMGEXP=="K" | tableImpacts$CROPDMGEXP=="k"] <- 1000*tableImpacts$CROPDMG[tableImpacts$CROPDMGEXP=="K" | tableImpacts$CROPDMGEXP=="k"]
tableImpacts$cropDamage[tableImpacts$CROPDMGEXP=="m" | tableImpacts$CROPDMGEXP=="M"] <- 1000000*tableImpacts$CROPDMG[tableImpacts$CROPDMGEXP=="m" | tableImpacts$CROPDMGEXP=="M"]
tableImpacts$cropDamage[tableImpacts$CROPDMGEXP=="B"] <- 1000000000*tableImpacts$CROPDMG[tableImpacts$CROPDMGEXP=="B"]
#Examine the factors for Property Damage
unique(tableImpacts$PROPDMGEXP)
#K M B m + 0 5 6 4 h 2 7 3 H -
#Create a Property Damage Total Variable and set it to Zero
tableImpacts$propertyDamage[tableImpacts$PROPDMG==0] <- 0
#Calculate crop damage using the PROPDMGEXP column
tableImpacts$propertyDamage[tableImpacts$PROPDMGEXP=="h" | tableImpacts$PROPDMGEXP=="H"] <- 100*tableImpacts$PROPDMG[tableImpacts$PROPDMGEXP=="h" | tableImpacts$PROPDMGEXP=="H"]
tableImpacts$propertyDamage[tableImpacts$PROPDMGEXP=="K" | tableImpacts$PROPDMGEXP=="k"] <- 1000*tableImpacts$PROPDMG[tableImpacts$PROPDMGEXP=="K" | tableImpacts$PROPDMGEXP=="k"]
tableImpacts$propertyDamage[tableImpacts$PROPDMGEXP=="m" | tableImpacts$PROPDMGEXP=="M"] <- 1000000*tableImpacts$PROPDMG[tableImpacts$PROPDMGEXP=="m" | tableImpacts$PROPDMGEXP=="M"]
tableImpacts$propertyDamage[tableImpacts$PROPDMGEXP=="B"] <- 1000000000*tableImpacts$CROPDMG[tableImpacts$PROPDMGEXP=="B"]
#Calculate Total Economic Impact
tableImpacts <- mutate(tableImpacts, econImpact = cropDamage + propertyDamage)
#Clean Event Types
tableImpacts$EVTYPE[grep("WILD/FOREST FIRE",tableImpacts$EVTYPE)]<-"WILDFIRE"
tableImpacts$EVTYPE[grep("WILD FIRES",tableImpacts$EVTYPE)]<-"WILDFIRE"
tableImpacts$EVTYPE[grep("FREEZE",tableImpacts$EVTYPE)]<-"FROST/FREEZE"
tableImpacts$EVTYPE[grep("STORM SURGE",tableImpacts$EVTYPE)]<-"STORM SURGE/TIDE"
tableImpacts$EVTYPE[grep("HURRICANE OPAL",tableImpacts$EVTYPE)]<-"HURRICANE/TYPHOON"
tableImpacts$EVTYPE[grep("HURRICANE",tableImpacts$EVTYPE)]<-"HURRICANE/TYPHOON"
tableImpacts$EVTYPE[grep("TYPHOON",tableImpacts$EVTYPE)]<-"HURRICANE/TYPHOON"
#Group the Events by Type and total the sum for Economic Impacts
econImpactByEvent <- tableImpacts %>%
group_by(EVTYPE) %>%
summarise(econImpact = sum(econImpact)) %>%
arrange(desc(econImpact))
top_n(econImpactByEvent,15)
Based on this analysis Tornados have had the highest health impact and Hurricanes have had the highest economic impact in the United States.
### Health Impacts Plot ###########
ggplot(data = top_n(healthImpactByEvent,10), aes(x= reorder(EVTYPE,-healthimpacts),y=healthimpacts, fill = healthimpacts)) +
geom_bar(stat = "identity")+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Storm Event Type") + ylab("Health Impacts (Injuries & Fatalities)") +
ggtitle("Storm Events With Highest Health Impact")
### Economic Impacts Plot ####
ggplot(data = top_n(econImpactByEvent,10), aes(x= reorder(EVTYPE,-econImpact),y=econImpact, fill = econImpact)) +
geom_bar(stat = "identity")+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Storm Event Type") + ylab("Economic Impacts (Property & Crop Damage)") +
ggtitle("Storm Events With Highest Economic Impact")