In this report we aim to identify the leading weather-related events that cause the most damage to people and the economy. For this, we analyzed the NOAA Storm Database that contained events spanning from 1950 to almost the end of 2011. The data availability is not uniform over this time span with sparser data collection in the earlier years. This data set required a lot of cleaning and in the absence of additional expert information from experts in the field, several assumptions were made. From our analysis, we identified that excessive heat is the number one cause of fatalities while thunderstorms are the leading cause of injuries to people. On the economic front, floods were the leading cause of property damage and drought were the reason for crop damage; the combined cause for economic loss was floods.
The Storm Data provided for the project (from the course website) is a comma-separated-value (csv) file compressed via the bzip2 algorithm to reduce its size to about 48Mb. Additional documentation of the data is avaiable at National Weather Service Storm Data Documentation and National Climatic Data Center Storm Events FAQ
We first set the workding directory, load in the required libraries and read in the data from its compressed file. Missing data were coded either as NA, “?”, or a blank space.
library(dplyr)
library(tidyr)
library(reshape2)
#read in the data file
noaa_data = read.csv(bzfile('StormData.csv.bz2'),na.strings=c("","?","NA"),stringsAsFactors=FALSE)
#explore the data
dim(noaa_data)
## [1] 902297 37
names(noaa_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
head(noaa_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 2 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 3 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 4 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 5 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 6 TORNADO 0 <NA> <NA> <NA> <NA> 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 <NA> <NA> 14.0 100 3 0 0
## 2 NA 0 <NA> <NA> 2.0 150 2 0 0
## 3 NA 0 <NA> <NA> 0.1 123 2 0 0
## 4 NA 0 <NA> <NA> 0.0 100 2 0 0
## 5 NA 0 <NA> <NA> 0.0 150 2 0 0
## 6 NA 0 <NA> <NA> 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0 <NA> <NA> <NA> <NA>
## 2 0 2.5 K 0 <NA> <NA> <NA> <NA>
## 3 2 25.0 K 0 <NA> <NA> <NA> <NA>
## 4 2 2.5 K 0 <NA> <NA> <NA> <NA>
## 5 2 2.5 K 0 <NA> <NA> <NA> <NA>
## 6 6 2.5 K 0 <NA> <NA> <NA> <NA>
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 <NA> 1
## 2 3042 8755 0 0 <NA> 2
## 3 3340 8742 0 0 <NA> 3
## 4 3458 8626 0 0 <NA> 4
## 5 3412 8642 0 0 <NA> 5
## 6 3450 8748 0 0 <NA> 6
any(is.na(noaa_data)) # returns TRUE - so there is missing data
## [1] TRUE
length(sort(unique(noaa_data$EVTYPE))) # there are 985 unique event types
## [1] 984
A cursory look at the data reveals that this data set has 902297 rows and 37 variables/columns. We check the first few rows of the data. Since we are interested in the types of events that impact population health and economy, we examine this varible some more. There are 985 unique event types. However, the NOAA document lists only 48 event types (page 6 of the document). A closer examination revealed that, for example, there are over 100 variations of the NOAA event Thunderstorm Wind. These variations are due to data entry errors in spelling, non-standard abbreviations as well as because of tacking on the speed of the wind. ### Data Cleaning - Phase I
# Retaining only necessary columns
noaa <- noaa_data[,c(1,2,7,8,12,23,24,25,26,27,28,29,30)]
noaa$EVTYPE <- gsub("[[:space:]+]","",noaa$EVTYPE) #get rid of spaces & whitespace characters
noaa <- noaa[!grepl("^Summary",noaa$EVTYPE),] # remove rows starting with "Summary"
noaa <- noaa[!grepl("NA",noaa$EVTYPE),] # removing rows with missing EVTYPE denoted by ?
# retaining only rows with fatalities, injuries, property or crop damage
noaa <- noaa[!(noaa$FATALITIES==0) | !(noaa$INJURIES==0) | !(noaa$PROPDMG == 0) | !(noaa$CROPDMG==0), ]
# convert all EVTYPE to uppercase
noaa$EVTYPE <- tolower(noaa$EVTYPE)
We first did some minimal cleaning by retaining only the necessary variables. For example, the variable on TIME_ZONE could be removed. We decided to focus only on 13 of the 37 variables. We also elliminated white spaces in the EVTYPE variable. We next eliminated rows with missing EVTYPE, EVTYPE whose name started with “Summary” as well as rows where there no kind of damage or loss to people or property. Further we also coverted the EVTYPE values to all lower case to eliminate distinctions in same event due to case. This minimal cleaning substantially reduced the number of observations in our data set from 902297 observations to 214627 observations.
Before we modified anything with the EVTYPE variable, we wanted to get an idea of the leading weather cause for fatalities, injuries, and property & crop damage. For this, we have to first calculate the property and crop damage because, for example, a damage worth $2500 is recorded as “2.5 K” over two variables, On p.12 3rd para of 2.7, it states,
Estimates can be obtained from emergency managers, U.S. Geological Survey, U.S. Army Corps of Engineers, power utility companies, and newspaper articles. If the values provided are rough estimates, then this should be stated as such in the narrative. Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry. When damage is due to more than one element of the storm, indicate, when possible, the amount of damage caused by each element. If the dollar amount of damage is unknown, or not available, check the “no information available” box.
Based on this, we are taking the PROPDMGEXP and CROPDMGEXP variables to mean the corresponding exponent. In addition to K, M, B etc., and numbers for the exponents, there were also “+” and “-” values which we decided to code as 1.
y <- ifelse(noaa$PROPDMGEXP=="B",9,ifelse(noaa$PROPDMGEXP=="K",3,ifelse(noaa$PROPDMGEXP=="M",6,
ifelse(noaa$PROPDMGEXP=="m",6,ifelse(noaa$PROPDMGEXP=="+",1,ifelse(noaa$PROPDMGEXP=="h",2,
ifelse(noaa$PROPDMGEXP=="H",2,ifelse(noaa$PROPDMGEXP=="-",1,ifelse(noaa$PROPDMGEXP=="0",0,
ifelse(noaa$PROPDMGEXP=="2",2,ifelse(noaa$PROPDMGEXP=="3",3,ifelse(noaa$PROPDMGEXP=="4",4,
ifelse(noaa$PROPDMGEXP=="5",5,ifelse(noaa$PROPDMGEXP=="6",6,ifelse(noaa$PROPDMGEXP=="7",7,0)))))))))))))))
y[is.na(y)] <- 0
noaa$PDMGEXP <- y
z <- ifelse(noaa$CROPDMGEXP=="B",9,ifelse(noaa$CROPDMGEXP=="K",3,ifelse(noaa$CROPDMGEXP=="M",6,
ifelse(noaa$CROPDMGEXP=="m",6,ifelse(noaa$CROPDMGEXP=="0",0,ifelse(noaa$CROPDMGEXP=="k",3,0))))))
z[is.na(z)] <- 0
noaa$CDMGEXP <- z
# Currently we are not adjusting for inflation
noaa$PDMG <- noaa$PROPDMG*(10^noaa$PDMGEXP) # total property damage
noaa$CDMG <- noaa$CROPDMG*(10^noaa$CDMGEXP) # total crop damage
noaa$EDMG <- noaa$PDMG + noaa$CDMG # total economic damage
noaa$HDMG <- noaa$FATALITIES + noaa$INJURIES # total human loss/damage
Without adjusting for inflation, we calcuated the property damage, crop damage, total economic damage (as the sum of property and crop damage) and total human damage (as the sum of fatalities and injuries, although we note that a fatality and an injury cannot be considered equivalent).
noaaF <- aggregate(noaa$FATALITIES ~ noaa$EVTYPE, noaa, sum, na.action=na.omit)
colnames(noaaF) <- c("EVTYPE","FATALITIES")
noaaF<- noaaF[order(-noaaF$FATALITIES),]
noaaI <- aggregate(noaa$INJURIES ~ noaa$EVTYPE, noaa, sum, na.action=na.omit)
colnames(noaaI) <- c("EVTYPE","INJURIES")
noaaI<- noaaI[order(-noaaI$INJURIES),]
noaaH <- aggregate(noaa$HDMG ~ noaa$EVTYPE, noaa, sum, na.action=na.omit)
colnames(noaaH) <- c("EVTYPE","HDMG")
noaaH<- noaaH[order(-noaaH$HDMG),]
noaaP <- aggregate(noaa$PDMG ~ noaa$EVTYPE, noaa, sum, na.action=na.omit)
colnames(noaaP) <- c("EVTYPE","PDMG")
noaaP<- noaaP[order(-noaaP$PDMG),]
noaaC <- aggregate(noaa$CDMG ~ noaa$EVTYPE, noaa, sum, na.action=na.omit)
colnames(noaaC) <- c("EVTYPE","CDMG")
noaaC<- noaaC[order(-noaaC$CDMG),]
noaaE <- aggregate(noaa$EDMG ~ noaa$EVTYPE, noaa, sum, na.action=na.omit)
colnames(noaaE) <- c("EVTYPE","EDMG")
noaaE<- noaaE[order(-noaaE$EDMG),]
## Over the entire US, events causing major damage BEFORE cleaning up the event types
noaaF[1,]
## EVTYPE FATALITIES
## 46 excessiveheat 1903
noaaI[1,]
## EVTYPE INJURIES
## 361 tstmwind 6957
noaaP[1,]
## EVTYPE PDMG
## 69 flood 144657709807
noaaC[1,]
## EVTYPE CDMG
## 35 drought 13972566000
noaaE[1,]
## EVTYPE EDMG
## 69 flood 150319678257
noaaH[1,]
## EVTYPE HDMG
## 46 excessiveheat 8428
We aggregated the total damage by event type (as given in the data set) for each category: FATALITIES, INJURIES, HDMG, PDMG, CDMG, and EDMG. Fatalities, injuries and HDMG are listed in units of number of people, and PDMG (property), CDMG (crop) and EDMG (economy) in units of dollars. Over the entire US, we identified the events causing the most damage in each categoty before cleaning up the EVTYPE variable.
# EVTYPE FATALITIES
# excessiveheat 1903
# EVTYPE INJURIES
# tstmwind 6957
# EVTYPE PDMG
# flood 144657709807
# EVTYPE CDMG
# drought 13972566000
# EVTYPE EDMG
# flood 150319678257
# EVTYPE HDMG
# excessiveheat 8428
Based on the above cumulative ordering, we list the top 5 EVTYPES most harmful for each kind of health and economic loss:
FATALITIES (54.61%): excessive heat, flash flood, heat, lightning, and thunderstorm wind
INJURIES (56.36%): thunderstorm wind, flood, excessive heat, lightning, and heat
PROPERTY DAMAGE (78.45%): flood, hurricane, storm surge, flash flood, and hail
CROP DAMAGE (67.24%): drought, flood, riverflood, icestorm, hail
ECONOMIC DAMAGE (72.36%): flood, hurricane, stormsurge, hail, and flash flood
HUMAN LOSS/DAMAGE (55.21%): excessive heat, thunderstorm wind, flood, lighting, and heat
and the numbers beside each category state the percentage of the total damage for that categaory caused by the top 5 event types within each category. For example, 54.61% of weather-related fatalities in the US are caused by excessive heat, flash flood, heat, lightning and thunderstorm wind.
To get some deeper understanding of the EVTYPEs, we look at all the heat-related EVTYPEs (since excessive heat was the number one cause for fatalities).
unique(noaa[grepl(".*heat.*",noaa$EVTYPE),]$EVTYPE)
## [1] "heat" "excessiveheat" "heatwave"
## [4] "extremeheat" "drought/excessiveheat" "recordheat"
## [7] "heatwavedrought" "heatwaves" "record/excessiveheat"
Heat-related causes were variably recorded as: “heat”, “excessiveheat”, “extremeheat”, “heatwave”, “record/excessiveheat”, “heatwaves”, “heatwavedrought”, “drought/excessiveheat” and “recordheat”. On closer examination of the data, we noted that during a 4-day period in July 1995 (July 12-16), there were 583 fatalities in Illinois and the reason was recorded as “Heat” and not “Excessive Heat” or “Record Heat”. However, a search of the Internet, reveals that this was a rare event. From the CDC website:
Heat-Related Mortality – Chicago, July 1995: During July 12-16, 1995, Chicago experienced unusually high maximum daily temperatures, ranging from 93 F to 104 F (33.9 C to 40.0 C). On July 13, the heat index peaked at 119 F (48.3 C) – a record high for the city.
Given these discrepancies in data gathering and entry, we decided to fold in all the above heat conditions into “Heat”. This leads us to our second phase of data cleaning.
As noted earlier, the data set has nrow(noaa_data) unique EVTYPEs while NOAA lists only 48 unique types. Due to the ambiguous nature and spelling variations in the EVTYPEs, discrepancies in recording as explained above, the lack of input from a subject matter expert, and based on the top 5 EVTYPES in each category, we chose to use the following broad categories: heat, thunderstorm, flood, hurricane, hail, lightning, storm surge, drought, icestorm, and further additional broad categories such as snow, rain, winter weather, avalanche, blizzard, surf, wind, tsunami, tornado, wildfire, tropical storm, rip current and fog. Note that:
we are folding in multiple events per EVTYPE into one of these; for example, “TORNADO/WATERSPOUT” will be recorded as “Tornado”.
on a syntactical issue, we may deviate very slightly from the NOAA names; for example, we record “THUNDERSTORM WINDS” as just “Thunderstorm”.
After this, we subset the data set to include only the above mentioned data
library(qdap)
## Warning: package 'qdap' was built under R version 3.2.1
## Loading required package: qdapDictionaries
## Loading required package: qdapRegex
## Warning: package 'qdapRegex' was built under R version 3.2.1
## Loading required package: qdapTools
## Warning: package 'qdapTools' was built under R version 3.2.1
## Loading required package: RColorBrewer
##
## Attaching package: 'qdap'
##
## The following object is masked from 'package:base':
##
## Filter
patterns = c(".*heat.*")
replacements = c("Heat")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c("^thu.*|^ts.*|^tu.*|^coastalstorm|.*thunderstorm.*")
replacements = c("Thunderstorm")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c(".*stream.*|dambreak|.*flood.*|.*landslide.*|.*mudslide.*",".*hurricane.*|.*typhoon.*",".*hail.*")
replacements = c("Flood","Hurricane","Hail")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c("^light.*","^stormsurge.*|.*tide.*","^drought","^icestorm.*")
replacements = c("Lightning","Storm Surge", "Drought","Ice Storm")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c(".*snow.*",".*rain.*")
replacements = c("Snow","Rain")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c("^hypotherm.*|^lowtemp.*|.*cold.*|.*freez.*|^winter.*|^wintry.*|^sleet",".*avalanc.*",".*blizzard.*")
replacements = c("Winter Weather","Avalanche","Blizzard")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c(".*surf.*",".*wind.*",".*tsunami.*","^torndao|.*tornado.*",".*fire.*")
replacements = c("Surf","Wind","Tsunami","Tornado","Wildfire")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
patterns = c(".*trop.*","^ripcurrent.*",".*fog.*")
replacements = c("Tropical Storm","Rip Current","Fog")
noaa$EVTYPE <- mgsub(patterns, replacements, noaa$EVTYPE, fixed = FALSE)
events <- c("Heat","Thunderstorm","Flood","Hurricane","Hail","Lightning","Storm Surge","Drought","Ice Storm","Snow","Rain",
"Winter Weather", "Avalanche", "Blizzard", "Surf","Wind","Tsunami","Tornado","Wildfire","Tropical Storm",
"Rip Current","Fog")
noaaSmall <- noaa[noaa$EVTYPE %in% events,]
noaaRest <- noaa[!(noaa$EVTYPE %in% events),]
nrow(noaa)
## [1] 214627
nrow(noaaSmall)
## [1] 214065
nrow(noaaRest)
## [1] 562
After cleaning the EVTYPE as above, we subset the data set to include only the above-mentioned 22 weather events. Ignoring the remaining weather events only eliminated 0.26% of the data.
As done during the exploratory data analysis, we once again determine the leading causes for human/economic loss.
noaaF <- aggregate(noaaSmall$FATALITIES ~ noaaSmall$EVTYPE, noaaSmall, sum, na.action=na.omit)
colnames(noaaF) <- c("EVTYPE","FATALITIES")
noaaF<- noaaF[order(-noaaF$FATALITIES),]
noaaI <- aggregate(noaaSmall$INJURIES ~ noaaSmall$EVTYPE, noaaSmall, sum, na.action=na.omit)
colnames(noaaI) <- c("EVTYPE","INJURIES")
noaaI<- noaaI[order(-noaaI$INJURIES),]
noaaP <- aggregate(noaaSmall$PDMG ~ noaaSmall$EVTYPE, noaaSmall, sum, na.action=na.omit)
colnames(noaaP) <- c("EVTYPE","PDMG")
noaaP<- noaaP[order(-noaaP$PDMG),]
noaaC <- aggregate(noaaSmall$CDMG ~ noaaSmall$EVTYPE, noaaSmall, sum, na.action=na.omit)
colnames(noaaC) <- c("EVTYPE","CDMG")
noaaC<- noaaC[order(-noaaC$CDMG),]
noaaE <- aggregate(noaaSmall$EDMG ~ noaaSmall$EVTYPE, noaaSmall, sum, na.action=na.omit)
colnames(noaaE) <- c("EVTYPE","EDMG")
noaaE<- noaaE[order(-noaaE$EDMG),]
noaaH <- aggregate(noaaSmall$HDMG ~ noaaSmall$EVTYPE, noaaSmall, sum, na.action=na.omit)
colnames(noaaH) <- c("EVTYPE","HDMG")
noaaH<- noaaH[order(-noaaH$HDMG),]
## Over the entire US, events causing major damage AFTER cleaning up the event types
noaaF[1,]
## EVTYPE FATALITIES
## 7 Heat 3138
noaaI[1,]
## EVTYPE INJURIES
## 16 Thunderstorm 9538
noaaP[1,]
## EVTYPE PDMG
## 4 Flood 168597088783
noaaC[1,]
## EVTYPE CDMG
## 3 Drought 13972566000
noaaE[1,]
## EVTYPE EDMG
## 4 Flood 181005672983
noaaH[1,]
## EVTYPE HDMG
## 7 Heat 12362
Over the entire US, we identified the events causing the most damage in each categoty after cleaning up the EVTYPE variable and they are consistent with what we observed before cleaning up the EVTYPE variable.
# EVTYPE FATALITIES
# Heat 3138
# EVTYPE INJURIES
# Thunderstorm 9538
# EVTYPE PDMG
# Flood 168597088783
# EVTYPE CDMG
# Drought 13972566000
# EVTYPE EDMG
# Flood 181005672983
# EVTYPE HDMG
# Heat 12362
We plot the distribution of each category - fatalities, injuries, property damage, crop damage - across their respective top 3 weather events.
par(mfrow=c(2,2)) # arranges the plots by row
#top left plot
barplot(noaaF$FATALITIES[1:3], axisnames=TRUE, names.arg=noaaF$EVTYPE[1:3], col="red",
main="Fatalities - Top 3 Weather Causes", xlab="Weather Event",ylab="Number of Fatalities")
#top right plot
barplot(noaaP$PDMG[1:3], axisnames=TRUE, names.arg=noaaP$EVTYPE[1:3], col="cyan",
main="Property Damage - Top 3 Weather Causes", xlab="Weather Event",ylab="Damage ($)")
#bottom left plot
barplot(noaaI$INJURIES[1:3], axisnames=TRUE, names.arg=noaaI$EVTYPE[1:3], col="pink",
main="Injuries - Top 3 Weather Causes", xlab="Weather Event",ylab="Number of Injuries")
#bottom right plot
barplot(noaaC$CDMG[1:3], axisnames=TRUE, names.arg=noaaC$EVTYPE[1:3], col="green",
main="Crop Damage - Top 3 Weather Causes", xlab="Weather Event",ylab="Damage ($)")
Next, for each category of damage, we aggregate the leading cause of damage by State/Territory as listed in the STATE variable,
noaaF <- aggregate(noaaSmall$FATALITIES ~ noaaSmall$EVTYPE+noaaSmall$STATE, noaaSmall, sum, na.action=na.omit)
colnames(noaaF) <- c("EVTYPE","STATE","FATALITIES")
noaaF <- noaaF[order(noaaF$STATE,-noaaF$FATALITIES),]
noaaFS <- noaaF[order(noaaF$STATE, -noaaF$FATALITIES),]
noaaF.max <- noaaFS[!duplicated(noaaFS$STATE),] #removing duplicated states if 2 equal max
noaaI <- aggregate(noaaSmall$INJURIES ~ noaaSmall$EVTYPE+noaaSmall$STATE, noaaSmall, sum, na.action=na.omit)
colnames(noaaI) <- c("EVTYPE","STATE","INJURIES")
noaaI <- noaaI[order(noaaI$STATE,-noaaI$INJURIES),]
noaaIS <- noaaI[order(noaaI$STATE, -noaaI$INJURIES),]
noaaI.max <- noaaIS[!duplicated(noaaIS$STATE),] #removing duplicated states if 2 equal max
noaaP <- aggregate(noaaSmall$PDMG ~ noaaSmall$EVTYPE+noaaSmall$STATE, noaaSmall, sum, na.action=na.omit)
colnames(noaaP) <- c("EVTYPE","STATE","PDMG")
noaaP <- noaaP[order(noaaP$STATE,-noaaP$PDMG),]
noaaPS <- noaaP[order(noaaP$STATE, -noaaP$PDMG),]
noaaP.max <- noaaPS[!duplicated(noaaPS$STATE),] #removing duplicated states if 2 equal max
noaaC <- aggregate(noaaSmall$CDMG ~ noaaSmall$EVTYPE+noaaSmall$STATE, noaaSmall, sum, na.action=na.omit)
colnames(noaaC) <- c("EVTYPE","STATE","CDMG")
noaaC <- noaaC[order(noaaC$STATE,-noaaC$CDMG),]
noaaCS <- noaaC[order(noaaC$STATE, -noaaC$CDMG),]
noaaC.max <- noaaCS[!duplicated(noaaCS$STATE),] #removing duplicated states if 2 equal max
noaaE <- aggregate(noaaSmall$EDMG ~ noaaSmall$EVTYPE+noaaSmall$STATE, noaaSmall, sum, na.action=na.omit)
colnames(noaaE) <- c("EVTYPE","STATE","EDMG")
noaaE <- noaaE[order(noaaE$STATE,-noaaE$EDMG),]
noaaES <- noaaE[order(noaaE$STATE, -noaaE$EDMG),]
noaaE.max <- noaaES[!duplicated(noaaES$STATE),] #removing duplicated states if 2 equal max
noaaH <- aggregate(noaaSmall$HDMG ~ noaaSmall$EVTYPE+noaaSmall$STATE, noaaSmall, sum, na.action=na.omit)
colnames(noaaH) <- c("EVTYPE","STATE","HDMG")
noaaH <- noaaH[order(noaaH$STATE,-noaaH$HDMG),]
noaaHS <- noaaH[order(noaaH$STATE, -noaaH$HDMG),]
noaaH.max <- noaaHS[!duplicated(noaaHS$STATE),] #removing duplicated states if 2 equal max
# FULL TABLE
results <- data.frame(cbind(noaaF.max$STATE,noaaF.max$EVTYPE,noaaF.max$FATALITIES,
noaaI.max$EVTYPE,noaaI.max$INJURIES,noaaP.max$EVTYPE,noaaP.max$PDMG,noaaC.max$EVTYPE,noaaC.max$CDMG))
colnames(results) <- c("State","Fatality Event", "# Fatalities","Injury Event", "#Injuries", "P.Damage Event", "Property Damage ($)","C.Damage Event", "Crop Damage ($)")
#rownames(results) <- NULL
print("Weather Events Causing the Largest Damage to Population Health and Economic Damage by Each State/Region")
## [1] "Weather Events Causing the Largest Damage to Population Health and Economic Damage by Each State/Region"
print(results, row.names = FALSE)
## State Fatality Event # Fatalities Injury Event #Injuries P.Damage Event
## AK Avalanche 33 Ice Storm 34 Flood
## AL Thunderstorm 40 Thunderstorm 431 Winter Weather
## AM Thunderstorm 6 Thunderstorm 22 Wind
## AN Wind 11 Wind 19 Thunderstorm
## AR Flood 61 Thunderstorm 237 Ice Storm
## AS Flood 4 Hurricane 20 Hurricane
## AZ Flood 63 Thunderstorm 189 Hail
## CA Heat 118 Wildfire 1128 Flood
## CO Avalanche 48 Lightning 260 Hail
## CT Wind 10 Thunderstorm 64 Tropical Storm
## DC Heat 22 Heat 316 Tropical Storm
## DE Heat 8 Surf 66 Flood
## FL Rip Current 271 Lightning 859 Hurricane
## GA Flood 43 Blizzard 402 Flood
## GM Thunderstorm 1 Thunderstorm 0 Wind
## GU Rip Current 38 Hurricane 339 Hurricane
## HI Surf 28 Surf 32 Flood
## IA Thunderstorm 12 Thunderstorm 293 Flood
## ID Avalanche 16 Thunderstorm 153 Flood
## IL Heat 983 Heat 594 Flood
## IN Flood 43 Thunderstorm 282 Flood
## KS Flood 24 Thunderstorm 352 Flood
## KY Flood 60 Thunderstorm 402 Flood
## LA Heat 62 Thunderstorm 278 Storm Surge
## LE Thunderstorm 0 Thunderstorm 0 Thunderstorm
## LM Thunderstorm 2 Thunderstorm 1 Wind
## LO Thunderstorm 0 Thunderstorm 0 Wind
## LS Wind 1 Wind 0 Wind
## MA Thunderstorm 10 Lightning 171 Flood
## MD Heat 100 Heat 545 Tropical Storm
## ME Lightning 6 Lightning 70 Ice Storm
## MH Surf 0 Surf 1 Surf
## MI Thunderstorm 49 Heat 594 Thunderstorm
## MN Flood 18 Thunderstorm 123 Flood
## MO Heat 233 Heat 4185 Hail
## MS Heat 26 Thunderstorm 246 Hurricane
## MT Winter Weather 10 Wildfire 33 Hail
## NC Flood 70 Lightning 278 Hurricane
## ND Winter Weather 14 Blizzard 97 Flood
## NE Winter Weather 12 Thunderstorm 100 Hail
## NH Thunderstorm 8 Lightning 85 Flood
## NJ Heat 48 Heat 304 Flood
## NM Flood 18 Lightning 52 Wildfire
## NV Heat 67 Flood 64 Flood
## NY Heat 100 Thunderstorm 345 Flood
## OH Flood 54 Ice Storm 1652 Flood
## OK Heat 87 Thunderstorm 324 Thunderstorm
## OR Wind 20 Wind 50 Flood
## PA Heat 514 Heat 381 Flood
## PH Wind 1 Wind 0 Wind
## PK Wind 0 Wind 0 Wind
## PR Flood 48 Thunderstorm 11 Hurricane
## PZ Wind 5 Wind 3 Wind
## RI Surf 3 Lightning 17 Flood
## SC Heat 41 Thunderstorm 241 Ice Storm
## SD Ice Storm 8 Thunderstorm 111 Flood
## SL Wind 0 Wind 0 Wind
## TN Flood 59 Thunderstorm 253 Flood
## TX Heat 298 Flood 6926 Tropical Storm
## UT Avalanche 44 Winter Weather 415 Flood
## VA Flood 50 Heat 252 Hurricane
## VI Surf 3 Lightning 1 Hurricane
## VT Flood 7 Thunderstorm 34 Flood
## WA Wind 45 Wind 81 Flood
## WI Heat 98 Thunderstorm 226 Flood
## WV Flood 42 Thunderstorm 151 Flood
## WY Avalanche 23 Winter Weather 119 Hail
## Property Damage ($) C.Damage Event Crop Damage ($)
## 195077200 Wind 157000
## 5002043000 Heat 400100000
## 5e+05 Thunderstorm 50000
## 169000 Hail 0
## 687091000 Flood 150090000
## 60550000 Flood 1267000
## 2828908700 Tropical Storm 2e+08
## 117349439520.41 Winter Weather 1016062000
## 1426944755 Hail 116490000
## 60004000 Hail 30000
## 127600000 Drought 5000
## 73409500 Drought 29100000
## 31794496000 Hurricane 1448210000
## 701850720 Drought 717285000
## 3326740 Thunderstorm 0
## 914090000 Hurricane 105575000
## 162867050 Wind 2600000
## 1788777500 Drought 2009630000
## 130624100 Thunderstorm 6038000
## 6048302010 Flood 5070459050
## 1167822680 Flood 790916500
## 548165425 Hail 259405300
## 790431010 Drought 2.26e+08
## 31827987000 Drought 587430000
## 25000 Thunderstorm 0
## 2627600 Thunderstorm 0
## 50000 Thunderstorm 0
## 4e+05 Wind 0
## 280546000 Thunderstorm 1260000
## 538505000 Drought 99720000
## 318230000 Rain 5e+05
## 5e+06 Surf 0
## 381743904 Drought 1.5e+08
## 1557371000 Hail 140700800
## 1136027370 Flood 664863300
## 14178100010 Ice Storm 5000060000
## 94729700 Hail 34345000
## 5569621000 Hurricane 1456730000
## 4071822500 Thunderstorm 196163000
## 920903070 Hail 738083650
## 105317770 Flood 2e+05
## 2904290000 Drought 8e+07
## 1648582000 Drought 14400000
## 739302100 Flood 6040000
## 3209516490 Drought 100200000
## 1717481404 Drought 2e+08
## 1018746955 Drought 1097040000
## 743918500 Hail 36028000
## 2523575009 Drought 539400000
## 0 Wind 0
## 31000 Wind 0
## 1824431000 Hurricane 4.51e+08
## 76000 Wind 0
## 93541000 Blizzard 0
## 153207500 Winter Weather 28050000
## 137903300 Flood 79292000
## 15000 Wind 0
## 4764892274 Thunderstorm 9729500
## 5491598000 Drought 6373438000
## 383398000 Wind 2110000
## 635012000 Drought 297480000
## 28220000 Drought 2e+05
## 1428573500 Flood 26675000
## 277802000 Winter Weather 270070000
## 986565750 Flood 817578500
## 786503100 Drought 19746000
## 111222200 Hail 1881200
library(mapproj)
## Loading required package: maps
par(mfrow=c(1,1))
GW=noaaH.max$EVTYPE
# Map for ground water withdrawals
n = length(GW)
data(state.fips)
colorBuckets = rep(0,n)
for( i in 1:n){
if(GW[i]=="Flood") colorBuckets[i] = 1
if(GW[i]=="Heat") colorBuckets[i] = 2
if(GW[i]=="Blizzard") colorBuckets[i] = 3
if(GW[i]=="Ice Storm") colorBuckets[i] = 4
if(GW[i]=="Lightning") colorBuckets[i] = 5
if(GW[i]=="Thunderstorm") colorBuckets[i] = 6
if(GW[i]=="Surf") colorBuckets[i] = 7
if(GW[i]=="Wildfire") colorBuckets[i] = 8
if(GW[i]=="Wind") colorBuckets[i] = 9
if(GW[i]=="Winter Weather") colorBuckets[i] = 10
}
colors = c("darkBlue","red","white","lightBlue","yellow","green","grey","orange","brown","cyan")
leg.txt <- c("Flood","Heat","Blizzard","Ice Storm","Lightning","Thunderstorm","Surf","Wildfire","Wind","Winter Weather")
colorsmatched <- colorBuckets [match(state.fips$abb, noaaH.max[[2]])]
# draw map
map("state", col = colors[colorsmatched], fill = TRUE, resolution = 0, lwd=.2,
lty = 1, projection = "polyconic")
legend("bottomleft", leg.txt, horiz = F, fill = colors,cex=.8)
title("Weather Events Causing the Largest Number of Human Loss/Damage \n (Fatalities & Injuries) by State")
For the contiguous US, the two maps indicate by state, the leading weather-related cause for total human loss/damage and total economic loss/damage. At first glance, it may seem strange that the leading cause of human loss/damage for Georgia, a southern state, is blizzard. But a quick internet search revealed that it was due to the Great Blizzard of 1993.
library(mapproj)
par(mfrow=c(1,1))
GW=noaaE.max$EVTYPE
# Map for ground water withdrawals
n = length(GW)
data(state.fips)
colorBuckets = rep(0,n)
for( i in 1:n){
if(GW[i]=="Flood") colorBuckets[i] = 1
if(GW[i]=="Hurricane") colorBuckets[i] = 2
if(GW[i]=="Ice Storm") colorBuckets[i] = 3
if(GW[i]=="Thunderstorm") colorBuckets[i] = 4
if(GW[i]=="Drought") colorBuckets[i] = 5
if(GW[i]=="Hail") colorBuckets[i] = 6
if(GW[i]=="Storm Surge") colorBuckets[i] = 7
}
colors = c("darkBlue","orange","lightBlue","green","brown","cyan","grey")
leg.txt <- c("Flood","Hurricane","Ice Storm","Thunderstorm","Drought","Hail","Storm Surge")
colorsmatched <- colorBuckets [match(state.fips$abb, noaaE.max[[2]])]
# draw map
map("state", col = colors[colorsmatched], fill = TRUE, resolution = 0, lwd=.2,
lty = 1, projection = "polyconic")
legend("bottomleft", leg.txt, horiz = F, fill = colors,cex=.8)
title("Weather Events Causing the Largest Economic Damage \n (Property & Crop) by State")