In this paper, we are going to look at storm data in the United States between the years 1950 and 2011. We will be trying to find, which types of weather event caused the most damage with regards to population health. We will track this damage by looking at how many injuries and fatalities each type of event caused. Next we will look at how much economic damage was caused by each type of weather event. At the end of the report we hope to know how we should allocate resources with regards to preparation for any coming weather events.
Here I loaded the packages. Then I downloaded the file and created the variable Storm_data to hold the data as a data frame.
Here I loaded the packages. Then I downloaded the file and created the variable Storm_data to hold the data as a data frame.
library(dplyr)
library(ggplot2)
library(gridExtra)
#Download, Read In and Store Data in a Data Frame
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(fileURL, destfile="StormData.csv.bz2", method = "curl")
storm_data <- read.csv("StormData.csv.bz2")
Here I added the columns crop_damage and property_damage. This process was necessary because the columns PROPDMGEXP and CROPDMGEXP need to be combined with PROPDMG and CROPDMG in order to get a readable number. The former variables are exponents that need to be multiplied by the latter variables to get the accurate assessment of the damage.
get_exponent <- function(exponent) {
exponent <- as.character(exponent)
exponent[toupper(exponent) == "H"] <- "2"
exponent[toupper(exponent) == "K"] <- "3"
exponent[toupper(exponent) == "M"] <- "6"
exponent[toupper(exponent) == "B"] <- "9"
exponent[is.na(exponent)] <- "0"
exponent <- as.numeric(exponent)
}
storm_data$PROPDMGEXP <- get_exponent(storm_data$PROPDMGEXP)
storm_data$CROPDMGEXP <- get_exponent(storm_data$CROPDMGEXP)
storm_data <- storm_data %>%
mutate(property_damage = PROPDMG * 10^PROPDMGEXP,
crop_damage = CROPDMG * 10^CROPDMGEXP)
Here I changed the class of the column BGN_DATE to a date class.
#Change BGN_DATE column to a Date Class
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, "%m/%d/%Y %H:%M:%S")
Here I found out which types of weather event caused the most fatalities in the whole data set. Then I found out which types of weather event caused the most fatalities in the most recent twenty years in the data set.
top_fatality <- storm_data %>%
group_by(EVTYPE) %>% #Group by Event Type
summarize(total_fatality = sum(FATALITIES)) %>% #Find total fatalities per event type
arrange(desc(total_fatality)) %>% #Arrange in descending order
head(20)
top_fatality #Print results
## Source: local data frame [20 x 2]
##
## EVTYPE total_fatality
## (fctr) (dbl)
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
#Perform same as previous chunk just filtering out everything before November 1991
top_fatality_20y <- storm_data %>%
filter(BGN_DATE > "1991-11-01") %>%
group_by(EVTYPE) %>%
summarize(total_fatality = sum(FATALITIES)) %>%
arrange(desc(total_fatality)) %>%
head(20)
top_fatality_20y
## Source: local data frame [20 x 2]
##
## EVTYPE total_fatality
## (fctr) (dbl)
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1662
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 FLOOD 470
## 7 RIP CURRENT 368
## 8 TSTM WIND 255
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
Here I found which types of weather event caused the most injuries during the most recent twenty years and within the whole dataset.
#Performing same process as earlier just for injuries
top_injury <- storm_data %>%
group_by(EVTYPE) %>%
summarize(total_injury = sum(INJURIES)) %>%
arrange(desc(total_injury)) %>%
head(20)
top_injury
## Source: local data frame [20 x 2]
##
## EVTYPE total_injury
## (fctr) (dbl)
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
## 11 WINTER STORM 1321
## 12 HURRICANE/TYPHOON 1275
## 13 HIGH WIND 1137
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
## 16 THUNDERSTORM WINDS 908
## 17 BLIZZARD 805
## 18 FOG 734
## 19 WILD/FOREST FIRE 545
## 20 DUST STORM 440
top_injury_20y <- storm_data %>%
filter(BGN_DATE > "1991-11-01") %>%
group_by(EVTYPE) %>%
summarize(total_injury = sum(INJURIES)) %>%
arrange(desc(total_injury)) %>%
head(20)
top_injury_20y
## Source: local data frame [20 x 2]
##
## EVTYPE total_injury
## (fctr) (dbl)
## 1 TORNADO 24739
## 2 FLOOD 6789
## 3 EXCESSIVE HEAT 6525
## 4 LIGHTNING 5230
## 5 TSTM WIND 3973
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 WINTER STORM 1321
## 11 HURRICANE/TYPHOON 1275
## 12 HIGH WIND 1137
## 13 HAIL 1068
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
## 16 THUNDERSTORM WINDS 908
## 17 BLIZZARD 805
## 18 FOG 734
## 19 WILD/FOREST FIRE 545
## 20 DUST STORM 440
Here I graphed the findings of Top 20: Fatalities by Weather Event Type and Top 20: Injury by Weather Event Types.
fatality_graph <- ggplot(top_fatality, aes(x = EVTYPE, y = total_fatality)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Fatalities", title = "Fatalities")
fatality_graph20 <- ggplot(top_fatality_20y, aes(x = EVTYPE,
y = total_fatality)) + geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Fatalities", title = "Fatalities (1991-2011)")
injury_graph <- ggplot(top_injury, aes(x = EVTYPE, y = total_injury)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Injuries", title = "Injuries")
injury_graph20 <- ggplot(top_injury_20y, aes(x = EVTYPE,
y = total_injury)) + geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Injuries", title = "Injuries (1991-2011)")
grid.arrange(fatality_graph, fatality_graph20, ncol=2)
grid.arrange(injury_graph, injury_graph20, ncol=2)
Here I found the results of crop damage done for the whole data set and within the most recent 20 years of the data. There isn’t much difference between these two datasets. This suggests missing values from early in the dataset or better record keeping more recently
top_crop <- storm_data %>%
group_by(EVTYPE) %>%
summarize(total_crop = sum(crop_damage)) %>%
arrange(desc(total_crop)) %>%
head(20)
top_crop
## Source: local data frame [20 x 2]
##
## EVTYPE total_crop
## (fctr) (dbl)
## 1 EXCESSIVE WETNESS 142000000
## 2 COLD AND WET CONDITIONS 66000000
## 3 Early Frost 42000000
## 4 Damaging Freeze 34130000
## 5 Freeze 10500000
## 6 HURRICANE OPAL/HIGH WINDS 10000000
## 7 UNSEASONAL RAIN 10000000
## 8 HIGH WINDS/COLD 7000000
## 9 Unseasonable Cold 5100000
## 10 COOL AND WET 5000000
## 11 WINTER STORM HIGH WINDS 5000000
## 12 TORNADOES, TSTM WIND, HAIL 2500000
## 13 Heavy Rain/High Surf 1500000
## 14 DUST STORM/HIGH WINDS 500000
## 15 FOREST FIRES 500000
## 16 TROPICAL STORM GORDON 500000
## 17 FLASH FLOODING/FLOOD 175000
## 18 Frost/Freeze 100000
## 19 HAIL/WINDS 50050
## 20 HEAT WAVE DROUGHT 50000
top_crop_20y <- storm_data %>%
filter(BGN_DATE > "1991-11-01") %>%
group_by(EVTYPE) %>%
summarize(total_crop = sum(crop_damage)) %>%
arrange(desc(total_crop)) %>%
head(20)
top_crop_20y
## Source: local data frame [20 x 2]
##
## EVTYPE total_crop
## (fctr) (dbl)
## 1 EXCESSIVE WETNESS 142000000
## 2 COLD AND WET CONDITIONS 66000000
## 3 Early Frost 42000000
## 4 Damaging Freeze 34130000
## 5 Freeze 10500000
## 6 HURRICANE OPAL/HIGH WINDS 10000000
## 7 UNSEASONAL RAIN 10000000
## 8 HIGH WINDS/COLD 7000000
## 9 Unseasonable Cold 5100000
## 10 COOL AND WET 5000000
## 11 WINTER STORM HIGH WINDS 5000000
## 12 TORNADOES, TSTM WIND, HAIL 2500000
## 13 Heavy Rain/High Surf 1500000
## 14 DUST STORM/HIGH WINDS 500000
## 15 FOREST FIRES 500000
## 16 TROPICAL STORM GORDON 500000
## 17 FLASH FLOODING/FLOOD 175000
## 18 Frost/Freeze 100000
## 19 HAIL/WINDS 50050
## 20 HEAT WAVE DROUGHT 50000
Here I found the results of property damaage done in the whole dataset and within the most recent 20 years of the data. Again we don’t see much change in the two datasets.
top_prop <- storm_data %>%
group_by(EVTYPE) %>%
summarize(total_prop = sum(property_damage)) %>%
arrange(desc(total_prop)) %>%
head(20)
top_prop
## Source: local data frame [20 x 2]
##
## EVTYPE total_prop
## (fctr) (dbl)
## 1 TORNADOES, TSTM WIND, HAIL 1600000000
## 2 WILD FIRES 624100000
## 3 HAILSTORM 241000000
## 4 HIGH WINDS/COLD 110500000
## 5 River Flooding 106155000
## 6 MAJOR FLOOD 105000000
## 7 HURRICANE OPAL/HIGH WINDS 100000000
## 8 WINTER STORM HIGH WINDS 60000000
## 9 HURRICANE EMILY 50000000
## 10 Erosion/Cstl Flood 16200000
## 11 COASTAL FLOODING/EROSION 15000000
## 12 Heavy Rain/High Surf 13500000
## 13 LAKESHORE FLOOD 7540000
## 14 HIGH WINDS HEAVY RAINS 7500000
## 15 FLOODS 6000000
## 16 ICE JAM FLOODING 5516000
## 17 FOREST FIRES 5000000
## 18 HEAVY RAIN/LIGHTNING 5000000
## 19 HEAVY SNOW/BLIZZARD/AVALANCHE 5000000
## 20 HEAVY SNOWPACK 5000000
top_prop_20y <- storm_data %>%
filter(BGN_DATE > "1991-11-01") %>%
group_by(EVTYPE) %>%
summarize(total_prop = sum(property_damage)) %>%
arrange(desc(total_prop)) %>%
head(20)
top_prop_20y
## Source: local data frame [20 x 2]
##
## EVTYPE total_prop
## (fctr) (dbl)
## 1 TORNADOES, TSTM WIND, HAIL 1600000000
## 2 WILD FIRES 624100000
## 3 HAILSTORM 241000000
## 4 HIGH WINDS/COLD 110500000
## 5 River Flooding 106155000
## 6 MAJOR FLOOD 105000000
## 7 HURRICANE OPAL/HIGH WINDS 100000000
## 8 WINTER STORM HIGH WINDS 60000000
## 9 HURRICANE EMILY 50000000
## 10 Erosion/Cstl Flood 16200000
## 11 COASTAL FLOODING/EROSION 15000000
## 12 Heavy Rain/High Surf 13500000
## 13 LAKESHORE FLOOD 7540000
## 14 HIGH WINDS HEAVY RAINS 7500000
## 15 FLOODS 6000000
## 16 ICE JAM FLOODING 5516000
## 17 FOREST FIRES 5000000
## 18 HEAVY RAIN/LIGHTNING 5000000
## 19 HEAVY SNOW/BLIZZARD/AVALANCHE 5000000
## 20 HEAVY SNOWPACK 5000000
Here I graphed the findings of Top 20: Crop Damage By Weather Event Type and Top 20: Property Damage by Weather Event Type
crop_graph <- ggplot(top_crop, aes(x = EVTYPE, y = total_crop)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Crop Damage", title = "Crop Damage")
crop_graph20 <- ggplot(top_crop_20y, aes(x = EVTYPE, y = total_crop)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Crop Damage", title = "Crop Damage (1991-2011)")
prop_graph <- ggplot(top_prop, aes(x = EVTYPE, y = total_prop)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Property Damage", title = "Property Damage")
prop_graph20 <- ggplot(top_prop_20y, aes(x = EVTYPE, y = total_prop)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=90)) +
labs(x = "Event Type", y = "Total Property Damage", title = "Property Damage (1991-2011)")
grid.arrange(crop_graph, crop_graph20, ncol=2)
grid.arrange(prop_graph, prop_graph20, ncol=2)
From this analysis, we conclude that the most dangerous weather events are tornadoes, heat, flooding and lightning. These events cause the most fatalities and injuries. We can also conclude that floods, hurricaines, tornadoes, and hail cause the most economic damage to property and crops. From our analysis it appears that floods and tornadoes cause the most overall damage to the economy and to public health.