In this document, we analyse NOAA weather data to quantify the impact of adverse weather on human health and property. We seek out data where the weather has caused any fatality or injury to human beings as well as events that caused property or crop damage.
From the data and the plots, we can infer that tornadoes have adversely affected the health conditions most. Over 5000 deaths and more than 90,000 injuries have been caused by tornadoes.
In terms of economic damage, floods are the leading causes for property damage, causing over $150B in damages. However, drought and excessive heat has been the leading cause for agricultural damage in the US.
The following sections will perform various operations on the data toward our required assessments
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(grid))
suppressMessages(library(gridExtra))
suppressMessages(library(scales))
stormData <- read.csv("repdata-data-StormData.csv", as.is = TRUE)
str(stormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
In this section, we weed out data elements that are ot directly relevant to assessing the impact to health and economy. As you can see from the structure of the data set, there are many elements that are unrelated to our assessment.
# Identify the fields needed for our assessment so that we can reduce the data during analysis
reqColumns <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
# Subset the data to include only the essential data elements
essentialStormData <- stormData[, reqColumns]
There are over 900 event types listed in the dataset, great many of which are either repeats of some kind or ones with minimal impact in terms of either economic or health damage. Therefore, we will consolidate the event types to a manageable set of 10.
The event types that have minimal impact have been categorized as “Other”.
# The number of unique event types
eventTypes <- unique(stormData$EVTYPE)
# Create a new variable to represent the new set of event types and set to "Other"
essentialStormData$disasterType <- "Other"
# Search for the approximate type and consolidate to one in the new set
essentialStormData[grepl("flood|surf|blow-out|swells|fld|dam break|tsunami",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Floods"
essentialStormData[grepl("storm|hurricane|typhoon",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Hurricanes & Storms"
essentialStormData[grepl("tornado|spout|funnel|whirlwind",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Tornadoes"
essentialStormData[grepl("heat|warmth|warm|dry|hot|drought|record high|record temperature|temperature record",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Drought & Heat"
essentialStormData[grepl("cold|cool|ice|icy|frost|freeze|snow|winter|blizzard|chill|freezing|avalanche|glaze|sleet|wintry|wintery",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Freeze"
essentialStormData[grepl("tstm wind|tstm|thunderstorm|lightning|thunder|hail",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Thunder & Lightining"
essentialStormData[grepl("fire|smoke|volcanic",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Fire"
essentialStormData[grepl("erosion|slide|slump",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Erosion"
essentialStormData[grepl("dust|saharan|wind|wnd",
essentialStormData$EVTYPE, ignore.case = TRUE), "disasterType"] <- "Winds"
# Create as factor, the new event type
essentialStormData$disasterType <- as.factor(essentialStormData$disasterType)
In further data processing, as we are assessing only the impact to the health and economy, we can further subset the dataset to only those types where the impact is non-zero.
# Subset the data to only the ones that we care about, which is; the set of rows where the fatalities or injuries is non-zero
healthEssentialStormData <- subset(essentialStormData, !(FATALITIES == 0 & INJURIES == 0))
# Subset the data to only the ones that we care about, which is; the set of rows where the property damage or crop damage is non-zero
economyEssentialStormData <- subset(essentialStormData, !(PROPDMG == 0 & CROPDMG == 0))
The PROPDMGEXP and CROPDMGEXP is a factor variable with multiple levels represented by the amount with which the PROPDMG and CROPDMG values have to be multiplied to get the actual damage value. We shall first set the EXP values to a standard set. Then, we shall store the multiplied values as PROPDMGVAL and CROPDMGVAL
# Alter the Property Damage exponent
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == ""] <- 1
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "B"] <- 1e+09
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "H"] <- 100
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "K"] <- 1000
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "M"] <- 1e+06
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "h"] <- 100
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "m"] <- 1e+06
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "0"] <- 1
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "1"] <- 10
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "2"] <- 100
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "3"] <- 1000
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "4"] <- 10000
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "5"] <- 1e+05
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "6"] <- 1e+06
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "7"] <- 1e+07
economyEssentialStormData$PROPEXP[economyEssentialStormData$PROPDMGEXP == "8"] <- 1e+08
# Alter the crop damage exponent
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "B"] <- 1e+09
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "K"] <- 1000
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "M"] <- 1e+06
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "k"] <- 1000
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "m"] <- 1e+06
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "0"] <- 1
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "2"] <- 100
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == ""] <- 1
economyEssentialStormData$CROPEXP[economyEssentialStormData$CROPDMGEXP == "?"] <- 0
# Create new consolidated variables for Damage Values
economyEssentialStormData$PROPDMGVAL <- economyEssentialStormData$PROPDMG * economyEssentialStormData$PROPEXP
economyEssentialStormData$CROPDMGVAL <- economyEssentialStormData$CROPDMG * economyEssentialStormData$CROPEXP
We shall create health and economy related aggregate datasets for plotting
injuries <- healthEssentialStormData %>% group_by(disasterType) %>% summarise(total=sum(INJURIES)) %>% arrange(-total)
fatalities <- healthEssentialStormData %>% group_by(disasterType) %>% summarise(total=sum(FATALITIES)) %>% arrange(-total)
# Aggregate economy related data; property and crop damage by disaster types
propDamage <- aggregate(PROPDMGVAL ~ disasterType, economyEssentialStormData, sum) %>% arrange(desc(PROPDMGVAL))
cropDamage <- aggregate(CROPDMGVAL ~ disasterType, economyEssentialStormData, sum) %>% arrange(desc(CROPDMGVAL))
The following sub-sections detail by way of plots, the impact of adverse weather to human health and economy.
The two leading causes of fatalities related to adverse weather in the US are Tornadoes and Excessive heat.
injuriesPlot <- qplot(disasterType, data = injuries, weight = total, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") +
ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
fatalitiesPlot <- qplot(disasterType, data = fatalities, weight = total, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
grid.arrange(fatalitiesPlot, injuriesPlot, ncol = 2)
The most property damage is caused by Floods and storms in the US. Significant agricultural damage on the other hand is caused by multiple events such as Drought, Floods, Excessive cold and storms.
propDmagePlot <- qplot(disasterType, data = propDamage, weight = PROPDMGVAL/1e+09, geom = "bar", binwidth = 1) +
scale_y_continuous("Damage in Billions USD") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Property Damage by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
cropDamagePlot <- qplot(disasterType, data = cropDamage, weight = CROPDMGVAL/1e+09, geom = "bar", binwidth = 1) +
scale_y_continuous("Damage in Billions USD") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Crop Damage by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
grid.arrange(propDmagePlot, cropDamagePlot, ncol = 2)