The weather. It’s a topic of conversation at nearly every social event and gets a slot in every evening news broadcast. Sometimes when we focus our attention on daily weather, it is difficult to put into perspective which weather events really have the greatest impact on our lives. This analysis seeks to answer that question by examining data from the National Oceanographic and Atmospheric Administration (NOAA) collected between 1950 and 2011.
To perform the analysis the 902,297 weather events in the database were first assign to one of 21 primary categories: avalanche, blizzard, cold, drought, fire, flood, fog, hail, heat, hurricane, ince/frost, landslide, lightning, rain, snow, strom, surf/tide, tornado, volcano, wind and other. For each weather event type, the researchers calculated the total and average fatalities, injuries, property damage and crop damage based in available data.
Mean Fatalities per Incident was selected as the metric to determine human impact and Mean Combined Property and Crop Damage per Incident was selected as the metric to determine economic impact. Not surprisingly, Hurricanes are the event with the greatest economic impact per event, but readers may be surprised to learn that Excessive Heat is the event with the greatest human impact.
The data for this analysis are downloaded as a zip file from cloud storage. The file originated from the National Oceanographic and Atmospheric Administration and contains data for over 900,000 weather events in the United States between 1950 and 2011.
#load libraries
library(utils)
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:graphics':
##
## layout
library(plyr)
library(stringr)
library(pander)
#set options
options("scipen" = 10)
#read the data
setwd("C:/Users/kroppa/Desktop/Coursera/Reproducible Research")
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,"StormData.csv.bz2")
data <- read.csv(bzfile("StormData.csv.bz2"))
Before proceeding with the quantitative analysis the 985 unique values in the data field ‘EVTYPE’ containing the event type needed to be rationalized. Visual inspection of the event type lable revealed many minor variations which could be logically combined such as ‘TORNADO DEBRIS’ with ‘TORNADO’ or ‘WIND GUSTS’ with ‘NON TSTM WIND’. Based on this visual inspection, the researchers created 21 standard event categories: avalanche, blizzard, cold, drought, fire, flood, fog, hail, heat, hurricane, ince/frost, landslide, lightning, rain, snow, strom, surf/tide, tornado, volcano, wind and other.
To assign an event to a standard category the researchers used regular expression to search within the original ‘EVTYPE’ field for matches. The goal was to assign each event to a single standard category. This required certain judgement calls for event types whose description spanned more than one of the standard categories such as ‘TSTM WIND AND LIGHTNING’ or ‘HEAVY SNOW/HIGH WINDS & FLOOD’. The placement of each original event into a standard category can be discerned from the code chunk below.
data$event_type <- str_to_lower(data$EVTYPE) #convert all text to lowercase
data$event_standard <- rep("Other", nrow(data)) #Set all rows to other; rules below will fill in matched values
data$event_standard[grepl("storm",data$event_type) == TRUE] <- "storm" #most general at the top; overwritten if more specific name given
data$event_standard[grepl("winter weather",data$event_type) == TRUE] <- "storm"
data$event_standard[grepl("rain",data$event_type) == TRUE] <- "rain"
data$event_standard[grepl("freez(?!rain)",data$event_type,perl = TRUE) == TRUE] <- "rain" #freeze not followed by rain
data$event_standard[grepl("cold",data$event_type) == TRUE] <- "cold"
data$event_standard[grepl("wind[ -]?chill",data$event_type) == TRUE] <- "cold"
data$event_standard[grepl("heat",data$event_type) == TRUE] <- "heat"
data$event_standard[grepl("warm",data$event_type) == TRUE] <- "heat"
data$event_standard[grepl("\\bhot\\b",data$event_type) == TRUE] <- "heat"
data$event_standard[grepl("hail",data$event_type) == TRUE] <- "hail"
data$event_standard[grepl("tornado",data$event_type) == TRUE] <- "tornado"
data$event_standard[grepl("funnel",data$event_type) == TRUE] <- "tornado"
data$event_standard[grepl("flood",data$event_type) == TRUE] <- "flood"
data$event_standard[grepl("stream fld",data$event_type) == TRUE] <- "flood"
data$event_standard[grepl("high water",data$event_type) == TRUE] <- "flood"
data$event_standard[grepl("wind(?!chill)",data$event_type,perl = TRUE) == TRUE] <- "wind" #wind not followed by chill
data$event_standard[grepl("lightning",data$event_type) == TRUE] <- "lightning"
data$event_standard[grepl("snow",data$event_type) == TRUE] <- "snow"
data$event_standard[grepl("blizzard",data$event_type) == TRUE] <- "blizzard"
data$event_standard[grepl("drought",data$event_type) == TRUE] <- "drought"
data$event_standard[grepl("dry",data$event_type) == TRUE] <- "drought"
data$event_standard[grepl("fire",data$event_type) == TRUE] <- "fire"
data$event_standard[grepl("\\bic[e|y]\\b",data$event_type) == TRUE] <- "ice/frost"
data$event_standard[grepl("frost",data$event_type) == TRUE] <- "ice/frost"
data$event_standard[grepl("sleet",data$event_type) == TRUE] <- "ice/frost"
data$event_standard[grepl("\\bfog",data$event_type) == TRUE] <- "fog"
data$event_standard[grepl("surf",data$event_type) == TRUE] <- "surf/tide"
data$event_standard[grepl("tid[e|al]",data$event_type) == TRUE] <- "surf/tide"
data$event_standard[grepl("rip current",data$event_type) == TRUE] <- "surf/tide"
data$event_standard[grepl("tsunami",data$event_type) == TRUE] <- "surf/tide"
data$event_standard[grepl("//bseas//b",data$event_type) == TRUE] <- "surf/tide"
data$event_standard[grepl("swell",data$event_type) == TRUE] <- "surf/tide"
data$event_standard[grepl("land[ -]?slide",data$event_type) == TRUE] <- "landslide"
data$event_standard[grepl("mud[ -]?slide",data$event_type) == TRUE] <- "landslide"
data$event_standard[grepl("avalanche",data$event_type) == TRUE] <- "avalanche"
data$event_standard[grepl("hurricane",data$event_type) == TRUE] <- "hurricane"
data$event_standard[grepl("typhoon",data$event_type) == TRUE] <- "hurricane"
data$event_standard[grepl("tropical depression",data$event_type) == TRUE] <- "hurricane"
data$event_standard[grepl("volcan",data$event_type) == TRUE] <- "volcano"
The dollar values for the property and crop damage spanned 2 columns each in the original data set - one for the value and one for the units. For example the ‘PROPDMG’ field might contain the value ‘2.5’ while the PROPDMGEXP field contains ‘M’. The correct interpretation is this case is 2.5 million or 25000000. To be able to calculate summary statistics by event type the researchers first needed to combine the value and units columns. This too required some judgement calls which can be read from the code chunk below.
data$Property_Damage_Multiplier <- data$PROPDMGEXP #Copy original values specifying units
data$Property_Damage_Multiplier[data$PROPDMGEXP == ""] <-1 #blank means no exponent or multiplier
data$Property_Damage_Multiplier <- gsub("[[:punct:]]",1,data$Property_Damage_Multiplier,perl=TRUE) #punctuation marks such as - or ? assumed to mean no exponent or multiplier
data$Property_Damage_Multiplier <- gsub("0",1,data$Property_Damage_Multiplier) #zero means no exponent or multiplier
data$Property_Damage_Multiplier <- gsub("B",1000000000,data$Property_Damage_Multiplier) #b stands for billions
data$Property_Damage_Multiplier <- gsub("M",1000000,data$Property_Damage_Multiplier) #m stands for millions
data$Property_Damage_Multiplier <- gsub("m",1000000,data$Property_Damage_Multiplier) #m stands for millions
data$Property_Damage_Multiplier <- gsub("k",1000,data$Property_Damage_Multiplier) #k stands for millions
data$Property_Damage_Multiplier <- gsub("K",1000,data$Property_Damage_Multiplier) #k stands for millions
data$Property_Damage_Multiplier <- gsub("h",100,data$Property_Damage_Multiplier) # h stands for hundred
data$Property_Damage_Multiplier <- gsub("H",100,data$Property_Damage_Multiplier) # h stands for hundred
data$Property_Damage_Multiplier <- gsub(2,100,data$Property_Damage_Multiplier) #100 is 10^2
data$Property_Damage_Multiplier <- gsub(3,1000,data$Property_Damage_Multiplier) #10^3
data$Property_Damage_Multiplier <- gsub(4,10000,data$Property_Damage_Multiplier) #10^4
data$Property_Damage_Multiplier <- gsub(5,100000,data$Property_Damage_Multiplier) #10^5
data$Property_Damage_Multiplier <- gsub(6,1000000,data$Property_Damage_Multiplier) #10^6
data$Property_Damage_Multiplier <- gsub(7,10000000,data$Property_Damage_Multiplier) # 10^7
data$Property_Damage_Multiplier <- gsub(8,100000000,data$Property_Damage_Multiplier) # 10^8
data$Property_Damage_Multiplier <- as.numeric(data$Property_Damage_Multiplier)
data$Property_Damage_Multiplier[is.na(data$Property_Damage_Multiplier)] <- 1 # replace NA with 1 to set up for multiplication step
data$Property_Damage <- NA
data$Property_Damage <- data$Property_Damage_Multiplier*data$PROPDMG #The clean property damage as a number which can be compared across all events
data$Crop_Damage_Multiplier <- data$CROPDMGEXP
data$Crop_Damage_Multiplier[data$CROPDMGEXP == ""] <-1 #blank means no exponent
## Warning in `[<-.factor`(`*tmp*`, data$CROPDMGEXP == "", value =
## structure(c(NA, : invalid factor level, NA generated
data$Crop_Damage_Multiplier <- gsub("[[:punct:]]",1,data$Crop_Damage_Multiplier,perl=TRUE) #punctuation mark assumed to mean no exponent
data$Crop_Damage_Multiplier <- gsub("0",1,data$Crop_Damage_Multiplier) #zero means no exponent
data$Crop_Damage_Multiplier <- gsub("B",1000000000,data$Crop_Damage_Multiplier) #b stands for billions
data$Crop_Damage_Multiplier <- gsub("M",1000000,data$Crop_Damage_Multiplier) #m stands for millions
data$Crop_Damage_Multiplier <- gsub("m",1000000,data$Crop_Damage_Multiplier) #m stands for millions
data$Crop_Damage_Multiplier <- gsub("k",1000,data$Crop_Damage_Multiplier) #k stands for millions
data$Crop_Damage_Multiplier <- gsub("K",1000,data$Crop_Damage_Multiplier) #k stands for millions
data$Crop_Damage_Multiplier <- gsub("h",100,data$Crop_Damage_Multiplier) # h stands for hundred
data$Crop_Damage_Multiplier <- gsub("H",100,data$Crop_Damage_Multiplier) # h stands for hundred
data$Crop_Damage_Multiplier <- gsub(2,100,data$Crop_Damage_Multiplier) #100 is 10^2
data$Crop_Damage_Multiplier <- gsub(3,1000,data$Crop_Damage_Multiplier) #10^3
data$Crop_Damage_Multiplier <- gsub(4,10000,data$Crop_Damage_Multiplier) #10^4
data$Crop_Damage_Multiplier <- gsub(5,100000,data$Crop_Damage_Multiplier) #10^5
data$Crop_Damage_Multiplier <- gsub(6,1000000,data$Crop_Damage_Multiplier) #10^6
data$Crop_Damage_Multiplier <- gsub(7,10000000,data$Crop_Damage_Multiplier) # 10^7
data$Crop_Damage_Multiplier <- gsub(8,100000000,data$Crop_Damage_Multiplier) # 10^8
data$Crop_Damage_Multiplier <- as.numeric(data$Crop_Damage_Multiplier)
data$Crop_Damage_Multiplier[is.na(data$Crop_Damage_Multiplier)] <- 1 # replace NA with 1 to set up for multiplication step
data$Crop_Damage <- NA
data$Crop_Damage <- data$Crop_Damage_Multiplier*data$CROPDMG #The clean property damage as a number which can be compared across all events
data$counter <- 1
#new data frame to hold summary values
human_impact <- data.frame(aggregate(data$counter, by = list(data$event_standard), FUN = sum))
#The next 4 lines calculate aggegate values and merge them into the sumamry data frame
human_impact <- merge(human_impact,aggregate(data$FATALITIES, by = list(data$event_standard), FUN = sum), by="Group.1")
human_impact <- merge(human_impact,aggregate(data$FATALITIES, by = list(data$event_standard), FUN = mean), by="Group.1")
human_impact <- merge(human_impact,aggregate(data$INJURIES, by = list(data$event_standard), FUN = sum), by="Group.1")
human_impact <- merge(human_impact,aggregate(data$INJURIES, by = list(data$event_standard), FUN = mean), by="Group.1")
#add column names
colnames(human_impact) <- c("Event_Type","Event_Count","Fatalities_Sum","Fatalities_Mean", "Injuries_Sum","Injuries_Mean")
#round for the purposing of table displays
human_impact$Fatalities_Mean <- round(human_impact$Fatalities_Mean,3)
human_impact$Injuries_Mean <- round(human_impact$Fatalities_Mean,3)
#combine fatalities and injuires
human_impact$FatInj_Sum <- human_impact$Fatalities_Sum + human_impact$Injuries_Sum
human_impact$FatInj_Mean <- round(human_impact$FatInj_Sum/human_impact$Event_Count,3)
Quantifying human impact is challenging because not all of the consequences are evident immediately after the event and easily tallied. For the purposes of this analysis we define human impact from natural distaters in terms of the fatalities and injuries recorded in official government databases. The true human impact undoubtably extends far beyond these numbers affecting the lives of the families and communities of the individuals injured and killed.
Some events occur frequently with minimal injuries or fatalities during each occurance. Others occur only rarely, but have a devasting impact when they do. How should these types of events be compared?
The best single metric is the Mean Fatalities per Incident. We are choosing to ignore injuires for the purposes of this comparison for two reasons. First, humans can recover from injuries and second, events severe enough to cause fatalities will probably also result in injuries while the reverse it not true.
The event type with the greatest number of fatalities per event is Excessive Heat. This is a composite category which includes events tagged in the original database as heat, excessive heat, abnormal warmth, prolong warmth, record warm, record warm temps, record warmth, unseasonably warm, unseasonably warm & wet, unseasonably warm and dry, unseasonably warm year, unseasonably warm/wet, unusual warmth, unusual/record warmth, unusually warm, very warm, warm dry conditions, warm weather, dry hot weather, hot and dry, hot pattern, hot spell, hot weather, hot/dry pattern, or unseasonably hot. Excessive Heat results in 1.063 deaths per incident whereas the next highest cause - Avalanche - results in 0.579 deaths per incident.
The total number of deaths from Excessive Heat stands at 3143 from 2958 separate events across 61 years. This is the second highest total number of death in the six decade span behind Tornados which occur more frequently but are less deadly.
p <- plot_ly(human_impact, y=Event_Type, x=Fatalities_Mean, type="bar",orientation = "h") %>%
layout(
title = "Average Fatalities by Event Type",
yaxis = list(
title = "", # xaxis's title:
showgrid = F # xaxis's showgrid:
),
xaxis = list(
title = "Avergae Fatalities", # yaxis's title:
showgrid = F # yaxis's showgrid:
)
)
p
#new data frame to hold summary values
economic_impact <- data.frame(aggregate(data$counter, by = list(data$event_standard), FUN = sum))
#The next 4 lines calculate aggegate values and merge them into the sumamry data frame
economic_impact <- merge(economic_impact,aggregate(data$Property_Damage, by = list(data$event_standard), FUN = sum), by="Group.1")
economic_impact <- merge(economic_impact,aggregate(data$Property_Damage, by = list(data$event_standard), FUN = mean), by="Group.1")
economic_impact <- merge(economic_impact,aggregate(data$Crop_Damage, by = list(data$event_standard), FUN = sum), by="Group.1")
economic_impact <- merge(economic_impact,aggregate(data$Crop_Damage, by = list(data$event_standard), FUN = mean), by="Group.1")
#add column names
colnames(economic_impact) <- c("Event_Type","Event_Count","Property_Sum_M","Property_Mean_K", "Crop_Sum_M","Crop_Mean_K")
#convert to the millions or thousands as appropriate
economic_impact$Property_Sum_M <- economic_impact$Property_Sum_M/1000000
economic_impact$Property_Mean_K <- economic_impact$Property_Mean_K/1000
economic_impact$Crop_Sum_M <- economic_impact$Crop_Sum_M/1000000
economic_impact$Crop_Mean_K <- economic_impact$Crop_Mean_K/1000
#combine property and crop
economic_impact$PropCrop_Sum_M <- economic_impact$Property_Sum_M + economic_impact$Crop_Sum_M
economic_impact$PropCrop_Mean_K <- (economic_impact$PropCrop_Sum_M/economic_impact$Event_Count)*1000
Quantifying economic impact is similarly challenging because certain events have a very debrief impact on the economic activity of a region while others have long-lasting effects. Longer-term effects arise when a certain key building is not reconstructed after a disaster thus permananetly removing jobs from the local economy or when a piece of land is flooded with sea-water and become unsuitable for planting for many years. For the purposes of this analysis we define economic impact from natural distaters in terms of the property damage and crop damage recorded in official government databases. The true economic impact undoubtably extends far beyond these numbers.
Some events occur frequently with minimal damage during each occurance. Others occur only rarely but have a devasting impact when they do. How should these types of events be compared?
The best single metric is the Mean Combined Property and Crop Damage per Incident. This measure combines the damage to real property based on its reconstruction cost and the damage to crop based on their market value.
The event type with the greatest economic impact is Hurricanes. This is a composite category which includes events tagged in the original database as tropical depression, typhoon, hurricane, hurricane/typhoon, hurricane-generated swells,hurricane edouard, hurricane emily, hurricane erin, hurricane felix, hurricane gordon, hurricane opal, and hurricane opal/high winds Hurricanes result in $211 million in damage per incident. To other event category exceeds 3 million per incident making Hurricanes the most devasting economic event by nearly a fact of of 100.
The total amount of Hurricane-related damage stands $75.9 billion dollars from 359 separate events across 61 years. This is second in total damage to Floods which accounted for $131.6 billion in damage across the same time period from 86,062 separate events.
c <- plot_ly(economic_impact, y=Event_Type, x=PropCrop_Mean_K, type="bar", orientation = "h") %>%
layout(
title = "Average Combined Property and Crop Damage by Event Type",
yaxis = list(
title = "", # xaxis's title:
showgrid = F # xaxis's showgrid:
),
xaxis = list(
title = "Avergae Damage (in Thousands)", # yaxis's title:
showgrid = F # yaxis's showgrid:
)
)
c
This table below summarizes the human and economic impact from the 21 types of weather events studied.
combined <- merge(human_impact, economic_impact, by=c("Event_Type","Event_Count"))
combined <- combined[-c(7,8,13,14)]
colnames(combined) <- c("Event Type", "Event Count", "Total Fatalities"," Average Fatalities per Event", "Total Injuiries" ,"Average Injuries per Event","Total Property Damage (in millions)", "Average Property Damge (in thousands)", "Total Crop Damage (in millions)", "Average Crop Damage (in thousands)")
pandoc.table(combined,style = "rmarkdown", splits.cells = 15)
## Warning in table.expand(t.colnames, t.width, justify, sep.col):
## '.Random.seed' is not an integer vector but of type 'NULL', so ignored
##
##
## | Event Type | Event Count | Total Fatalities |
## |:------------:|:-------------:|:------------------:|
## | avalanche | 387 | 224 |
## | blizzard | 2744 | 101 |
## | cold | 1106 | 231 |
## | drought | 2819 | 38 |
## | fire | 4240 | 90 |
## | flood | 86062 | 1556 |
## | fog | 1883 | 81 |
## | hail | 289274 | 15 |
## | heat | 2958 | 3143 |
## | hurricane | 359 | 135 |
## | ice/frost | 3759 | 110 |
## | landslide | 649 | 44 |
## | lightning | 15775 | 817 |
## | Other | 4703 | 59 |
## | rain | 12256 | 111 |
## | snow | 17580 | 164 |
## | storm | 21081 | 383 |
## | surf/tide | 2326 | 783 |
## | tornado | 67687 | 5636 |
## | volcano | 29 | 0 |
## | wind | 364620 | 1424 |
##
## Table: Table continues below
##
##
##
## | Average Fatalities per Event | Total Injuiries |
## |:------------------------------:|:-----------------:|
## | 0.579 | 171 |
## | 0.037 | 805 |
## | 0.209 | 284 |
## | 0.013 | 48 |
## | 0.021 | 1608 |
## | 0.018 | 8680 |
## | 0.043 | 1077 |
## | 0 | 1371 |
## | 1.063 | 9228 |
## | 0.376 | 1333 |
## | 0.029 | 2200 |
## | 0.068 | 55 |
## | 0.052 | 5232 |
## | 0.013 | 418 |
## | 0.009 | 318 |
## | 0.009 | 1153 |
## | 0.018 | 2778 |
## | 0.337 | 910 |
## | 0.083 | 91410 |
## | 0 | 0 |
## | 0.004 | 11449 |
##
## Table: Table continues below
##
##
##
## | Average Injuries per Event | Total Property Damage (in millions) |
## |:----------------------------:|:-------------------------------------:|
## | 0.579 | 8.722 |
## | 0.037 | 659.9 |
## | 0.209 | 125.1 |
## | 0.013 | 1053 |
## | 0.021 | 8502 |
## | 0.018 | 168254 |
## | 0.043 | 25.01 |
## | 0 | 15977 |
## | 1.063 | 20.13 |
## | 0.376 | 85358 |
## | 0.029 | 3987 |
## | 0.068 | 326.7 |
## | 0.052 | 935.7 |
## | 0.013 | 18.56 |
## | 0.009 | 3229 |
## | 0.009 | 1020 |
## | 0.018 | 58968 |
## | 0.337 | 4910 |
## | 0.083 | 57004 |
## | 0 | 0.5 |
## | 0.004 | 17843 |
##
## Table: Table continues below
##
##
##
## | Average Property Damge (in thousands) | Total Crop Damage (in millions) |
## |:---------------------------------------:|:---------------------------------:|
## | 22.54 | 0 |
## | 240.5 | 112.1 |
## | 113.2 | 1426 |
## | 373.6 | 13973 |
## | 2005 | 403.3 |
## | 1955 | 12271 |
## | 13.28 | 0 |
## | 55.23 | 3047 |
## | 6.804 | 904.4 |
## | 237766 | 5516 |
## | 1061 | 6229 |
## | 503.4 | 20.02 |
## | 59.32 | 12.1 |
## | 3.946 | 148 |
## | 263.4 | 1600 |
## | 58.03 | 134.7 |
## | 2797 | 758.6 |
## | 2111 | 2.37 |
## | 842.2 | 415 |
## | 17.24 | 0 |
## | 48.94 | 2132 |
##
## Table: Table continues below
##
##
##
## | Average Crop Damage (in thousands) |
## |:------------------------------------:|
## | 0 |
## | 40.84 |
## | 1289 |
## | 4957 |
## | 95.11 |
## | 142.6 |
## | 0 |
## | 10.53 |
## | 305.8 |
## | 15365 |
## | 1657 |
## | 30.84 |
## | 0.7669 |
## | 31.48 |
## | 130.5 |
## | 7.661 |
## | 35.99 |
## | 1.019 |
## | 6.131 |
## | 0 |
## | 5.848 |