Storms and other severe weather events can cause both public health and economic consequences for communities. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, crop and property damage from the years 1950 to 2011. Our goals for the analysis is to find out the types of events are most harmful to population health; and the types of events having the greatest economic consequences across the United States.
This section details the data processing steps, assumptions and justifications taken to address the two questions.
The data is read from the StormData.csv.bz2 file.
#downloading the required file to the woring directory
if(!file.exists("./repdata-data-StormData.csv.bz2")) {
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
,"./repdata-data-StormData.csv.bz2")
}
#reading the data
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"),header=TRUE, sep=",")
#loading the required packages
library(lubridate)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
Exploratory data analysis on this huge dataset was carried out to determine the years to zoom into, select the column variables that are relevant to the analysis after interpreting the question.
The NOAA website http://www.ncdc.noaa.gov/stormevents/details.jsp details the event types in the Storm Events Database. With reference to this link, the dataset was further grouped by the years 1950-1954, 1955-1989 and 1990-2011. The unique event types in these years was studied further.
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
The variables needed for question 1 are the begin date, event type, fatalities and injuries. The begin date is taken so that the year of incident can be extracted from the data. Fatlities and injuries are the metrics representing population health.
dataq1 <- data[,c("BGN_DATE","EVTYPE","FATALITIES","INJURIES")]
#there are a total of 902297 events in dataq1 dataset. 895323 events have zero fatalities and 884693 events have zero injuries.
Since our goal is focusing on the most harmful,it is interpreted that fatalities have a serious impact than injuries. Hence the data was filtered such that fatlities were more than zero and the associated injuries were also included. The event types for the years 1950 to 2011 are studied further.
dataq1fat <- subset(dataq1, FATALITIES>0)
#6974 rows where fatalities are more than 0.
dataq1fat <- mutate(dataq1fat, YEAR = year(as.POSIXlt(BGN_DATE,format="%m/%d/%Y")))
#unique event types in the years 1950 to 1954
unique((subset(dataq1fat,YEAR>=1950 & YEAR <=1954))$EVTYPE)
## [1] TORNADO
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
#unique event types in the years 1955 to 1989
unique((subset(dataq1fat,YEAR>=1955 & YEAR <= 1989))$EVTYPE)
## [1] TORNADO TSTM WIND
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
#unique event types in the years 1990 to 2011
unique((subset(dataq1fat,YEAR>= 1990))$EVTYPE)
## [1] TORNADO TSTM WIND
## [3] HAIL WINTER STORM
## [5] HURRICANE OPAL/HIGH WINDS DENSE FOG
## [7] RIP CURRENT HEAT
## [9] LIGHTNING COLD
## [11] FLOODING HEAVY RAIN
## [13] FLASH FLOOD EXTREME COLD
## [15] THUNDERSTORM WINDS MARINE MISHAP
## [17] HIGH WIND/SEAS HIGH SEAS
## [19] DUST STORM SLEET
## [21] FLOOD EXCESSIVE HEAT
## [23] GUSTY WINDS HIGH WINDS
## [25] HIGH SURF WILD FIRES
## [27] WINTER STORM HIGH WINDS WINTER STORMS
## [29] THUNDERSTORM WIND FLOOD/FLASH FLOOD
## [31] HEAVY SNOW ICE STORM
## [33] HEAT WAVE UNSEASONABLY WARM
## [35] STRONG WINDS HURRICANE ERIN
## [37] WIND HURRICANE OPAL
## [39] TORNADOES, TSTM WIND, HAIL TROPICAL STORM GORDON
## [41] WATERSPOUT STORM SURGE
## [43] BLIZZARD FREEZING RAIN/SNOW
## [45] THUNDERSNOW HIGH WIND
## [47] FLASH FLOODING FREEZING RAIN
## [49] FOG SNOW AND ICE
## [51] WIND STORM ICE
## [53] URBAN AND SMALL STREAM FLOODIN WATERSPOUT/TORNADO
## [55] EXTREME HEAT FREEZE
## [57] LIGHTNING. THUNDERTORM WINDS
## [59] COLD WAVE FLOOD/RIVER FLOOD
## [61] RIVER FLOOD AVALANCE
## [63] HEAVY SNOW AND HIGH WINDS RIP CURRENTS/HEAVY SURF
## [65] FOG AND COLD TEMPERATURES HEAVY SURF
## [67] ICY ROADS SNOW
## [69] RIP CURRENTS HURRICANE FELIX
## [71] BLOWING SNOW RAIN/WIND
## [73] HEAT WAVE DROUGHT HEAT WAVES
## [75] UNSEASONABLY WARM AND DRY UNSEASONABLY COLD
## [77] RECORD/EXCESSIVE HEAT THUNDERSTORM WIND G52
## [79] HIGH WAVES LOW TEMPERATURE
## [81] HYPOTHERMIA COLD/WINDS
## [83] RECORD COLD SNOW/ BITTER COLD
## [85] COLD WEATHER RAPIDLY RISING WATER
## [87] HIGH WINDS/SNOW FLASH FLOOD/FLOOD
## [89] EXCESSIVE RAINFALL FLASH FLOODING/FLOOD
## [91] GLAZE LANDSLIDE
## [93] HIGH WIND AND SEAS RIVER FLOODING
## [95] MINOR FLOODING DROUGHT/EXCESSIVE HEAT
## [97] HEAVY SEAS AVALANCHE
## [99] FLOOD & HEAVY RAIN FLASH FLOODS
## [101] TROPICAL STORM URBAN/SML STREAM FLD
## [103] ROUGH SURF Marine Accident
## [105] Heavy Surf DRY MICROBURST
## [107] WINDS COASTAL STORM
## [109] HURRICANE High Surf
## [111] Extreme Cold Strong Winds
## [113] Extended Cold EXTREME WINDCHILL
## [115] Whirlwind MIXED PRECIP
## [117] Freezing Spray Mudslides
## [119] STRONG WIND Cold
## [121] Cold Temperature Coastal Flooding
## [123] COLD AND SNOW RAIN/SNOW
## [125] Hypothermia/Exposure HYPOTHERMIA/EXPOSURE
## [127] BLACK ICE COASTALSTORM
## [129] blowing snow FREEZING DRIZZLE
## [131] FROST WILD/FOREST FIRE
## [133] Snow Squalls Mudslide
## [135] Heavy surf and wind LANDSLIDES
## [137] HIGH SWELLS TSTM WIND/HAIL
## [139] TSTM WIND (G35) SNOW SQUALL
## [141] HYPERTHERMIA/EXPOSURE RECORD HEAT
## [143] GUSTY WIND WINTRY MIX
## [145] ROUGH SEAS THUNDERSTORM WIND (G40)
## [147] HIGH WATER LIGHT SNOW
## [149] THUNDERSTORM FALLING SNOW/ICE
## [151] COASTAL FLOODING ICE ON ROAD
## [153] DROWNING EXTREME COLD/WIND CHILL
## [155] HURRICANE/TYPHOON WILDFIRE
## [157] HEAVY SURF/HIGH SURF WINTER WEATHER/MIX
## [159] DUST DEVIL MARINE TSTM WIND
## [161] WINTER WEATHER COLD/WIND CHILL
## [163] MARINE THUNDERSTORM WIND MARINE STRONG WIND
## [165] COASTAL FLOOD STORM SURGE/TIDE
## [167] MARINE HIGH WIND TSUNAMI
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
All the fatalities that occured from 1950 to 1954 are attributed to tornado event type. All the fatalities that occured from 1955 to 1989 are attributed to tornado and thunderstorm wind event type.
For the years 1990 to 2011, there are 168 unique event types. For further analysis, data from the year 1990 will be taken and the data were fatalities more than zero are taken from the raw data. The event types are mapped to the recognisable event names provided in the National Weather Service instructions 10-1605 https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
#data considered from 1990 onwards
dataq1fat_1990 <- subset(dataq1fat,YEAR>= 1990)
#intialised a new column in the dataframe with an input and these will be replaced later as they are mapped
dataq1fat_1990["EVTYPEMAP"] = "AVALANCHE"
#For cases with two categories, the first type is taken and mapped to one of the 48 events, i.e.EVTYPE=Rain/Wind, EVTYPEMAP= Heavy Rain in this case. Grep is used for matching.
dataq1fat_1990$EVTYPEMAP[grep("^ava",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "AVALANCHE"
dataq1fat_1990$EVTYPEMAP[grep("^blizzard",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "BLIZZARD"
dataq1fat_1990$EVTYPEMAP[grep("^coastal",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "COASTAL FLOOD"
dataq1fat_1990$EVTYPEMAP[grep("^cold|low temperature|unseasonably cold",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "COLD/WIND CHILL"
dataq1fat_1990$EVTYPEMAP[grep("slide",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "DEBRIS FLOW"
dataq1fat_1990$EVTYPEMAP[grep("fog",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "DENSE FOG"
dataq1fat_1990$EVTYPEMAP[grep("dust",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "DUST STORM"
dataq1fat_1990$EVTYPEMAP[grep("^heat wave|(excessive|record|extreme) heat|hyperthermia",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "EXCESSIVE HEAT"
dataq1fat_1990$EVTYPEMAP[grep("^cold wave|((extended|record|extreme) (cold|windchill))|hypothermia",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "EXTREME COLD/WIND CHILL"
dataq1fat_1990$EVTYPEMAP[grep("flash flood",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "FLASH FLOOD"
dataq1fat_1990$EVTYPEMAP[grep("^(river?) flood|^flood|flooding$|urban",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "FLOOD"
dataq1fat_1990$EVTYPEMAP[grep("^freez|frost",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "FROST/FREEZE"
dataq1fat_1990$EVTYPEMAP[grep("^hail",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HAIL"
dataq1fat_1990$EVTYPEMAP[grep("^heat$|warm",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HEAT"
dataq1fat_1990$EVTYPEMAP[grep("^rain|^excessive rain|^heavy rain",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HEAVY RAIN"
dataq1fat_1990$EVTYPEMAP[grep("^snow|thundersnow|blowing|falling snow|heavy snow|squall",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HEAVY SNOW"
dataq1fat_1990$EVTYPEMAP[grep("^(heavy|high|rough) surf|high waves",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HIGH SURF"
dataq1fat_1990$EVTYPEMAP[grep("^high wind|gusty|dry microburst|^wind",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HIGH WIND"
dataq1fat_1990$EVTYPEMAP[grep("^hurricane",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "HURRICANE (TYPHOON)"
dataq1fat_1990$EVTYPEMAP[grep("^ice|icy|black ice",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "ICE STORM"
dataq1fat_1990$EVTYPEMAP[grep("^lightn",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "LIGHTNING"
dataq1fat_1990$EVTYPEMAP[grep("^marine (high wind|mishap|accident)",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "MARINE HIGH WIND"
dataq1fat_1990$EVTYPEMAP[grep("^marine strong wind",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "MARINE STRONG WIND"
dataq1fat_1990$EVTYPEMAP[grep("^marine (tstm|thunderstorm) wind",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "MARINE THUNDERSTORM WIND"
dataq1fat_1990$EVTYPEMAP[grep("^rip current",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "RIP CURRENT"
dataq1fat_1990$EVTYPEMAP[grep("^sleet",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "SLEET"
dataq1fat_1990$EVTYPEMAP[grep("tide|storm surge|^(high|heavy|rough) seas|swells|water$|drown",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "STORM SURGE/TIDE"
dataq1fat_1990$EVTYPEMAP[grep("^strong wind|whirl",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "STRONG WIND"
dataq1fat_1990$EVTYPEMAP[grep("^(tstm|thunderstorm) wind|wind storm",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "THUNDERSTORM WIND"
dataq1fat_1990$EVTYPEMAP[grep("^tornado",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "TORNADO"
dataq1fat_1990$EVTYPEMAP[grep("tropical storm",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "TROPICAL STORM"
dataq1fat_1990$EVTYPEMAP[grep("tsunami",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "TSUNAMI"
dataq1fat_1990$EVTYPEMAP[grep("^waterspout",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "WATERSPOUT"
dataq1fat_1990$EVTYPEMAP[grep("^wild|fire",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "WILDFIRE"
dataq1fat_1990$EVTYPEMAP[grep("^winter storm",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "WINTER STORM"
dataq1fat_1990$EVTYPEMAP[grep("^winter weather|precip|wintry|glaze",dataq1fat_1990$EVTYPE,ignore.case=TRUE)] <- "WINTER WEATHER"
EVq1length <- length(unique(dataq1fat_1990$EVTYPEMAP))
There are 36 unique mapped events for the health dataset.
unique(dataq1fat_1990$EVTYPEMAP)
## [1] "TORNADO" "THUNDERSTORM WIND"
## [3] "HAIL" "WINTER STORM"
## [5] "HURRICANE (TYPHOON)" "DENSE FOG"
## [7] "RIP CURRENT" "HEAT"
## [9] "LIGHTNING" "COLD/WIND CHILL"
## [11] "FLOOD" "HEAVY RAIN"
## [13] "FLASH FLOOD" "EXTREME COLD/WIND CHILL"
## [15] "MARINE HIGH WIND" "HIGH WIND"
## [17] "STORM SURGE/TIDE" "DUST STORM"
## [19] "SLEET" "EXCESSIVE HEAT"
## [21] "HIGH SURF" "WILDFIRE"
## [23] "HEAVY SNOW" "ICE STORM"
## [25] "STRONG WIND" "TROPICAL STORM"
## [27] "WATERSPOUT" "BLIZZARD"
## [29] "FROST/FREEZE" "AVALANCHE"
## [31] "WINTER WEATHER" "DEBRIS FLOW"
## [33] "COASTAL FLOOD" "MARINE THUNDERSTORM WIND"
## [35] "MARINE STRONG WIND" "TSUNAMI"
Across the United States, which types of events have the greatest economic consequences?
The variables needed for question 2 are the begin date, event type, crop and property damage and the associated exponents. The begin date is taken so that the year of incident can be extracted from the data. Crop and property damage are the metrics representing economic consequences. The exponents for crop and property damage can be used to calculate the total damage cost.
dataq2 <- data[,c("BGN_DATE","EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
Since our goal is focusing on the greatest economic consequence, the data was filtered such that crop or property damage were more than zero.
dataq2Dam <- subset(dataq2, CROPDMG>0|PROPDMG>0)
dataq2Dam <- mutate(dataq2Dam, YEAR = year(as.POSIXlt(BGN_DATE,format="%m/%d/%Y")))
#unique event types in the years 1950 to 1954
unique((subset(dataq2Dam,YEAR>=1950 & YEAR <=1954))$EVTYPE)
## [1] TORNADO
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
#unique event types in the years 1955 to 1989
unique((subset(dataq2Dam,YEAR>=1955 & YEAR <= 1989))$EVTYPE)
## [1] TORNADO
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
#unique event types in the years 1990 to 2011
unique1990 <-length(unique((subset(dataq2Dam,YEAR>= 1990))$EVTYPE))
All the crop and property damage that occured from 1950 to 1989 are attributed to tornado event type.
For the years 1990 to 2011, there are 431 unique event types. For further analysis, data from the year 1990 will be taken and the data where crop or property damage than zero are taken from the raw data.
The exponent levels for crop and property damage are tabulated to assess if they can be simplified. The exponents B(billion), M(million), K(thousands) were considered more relevant and likelt to impact the total damage cost. Hence, the data where the damage was in billions, millions and thousands were filtered. The damage cost (in millions) for each entry was calculated and included in the dataset.
#data considered from 1990 onwards
dataq2Dam_1990 <- subset(dataq2Dam,YEAR>= 1990)
#To see the levels of the EXP. taking on B,H,K,M as these are more significant
table(dataq2Dam_1990$YEAR, dataq2Dam_1990$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 1990 0 0 0 0 0 0 0 0 0 0 0
## 1991 0 0 0 0 0 0 0 0 0 0 0
## 1992 0 0 0 0 0 0 0 0 0 0 0
## 1993 135 0 0 0 1 0 0 0 0 0 0
## 1994 214 0 0 1 25 0 0 0 1 1 0
## 1995 327 1 0 4 183 0 1 1 3 17 3
## 1996 273 0 0 0 0 0 0 0 0 0 0
## 1997 371 0 0 0 0 0 0 0 0 0 0
## 1998 442 0 0 0 0 0 0 0 0 0 0
## 1999 286 0 0 0 0 0 0 0 0 0 0
## 2000 504 0 0 0 0 0 0 0 0 0 0
## 2001 502 0 0 0 0 0 0 0 0 0 0
## 2002 227 0 0 0 0 0 0 0 0 0 0
## 2003 269 0 0 0 0 0 0 0 0 0 0
## 2004 375 0 0 0 0 0 0 0 0 0 0
## 2005 204 0 0 0 0 0 0 0 0 0 0
## 2006 228 0 0 0 0 0 0 0 0 0 0
## 2007 0 0 0 0 0 0 0 0 0 0 0
## 2008 0 0 0 0 0 0 0 0 0 0 0
## 2009 0 0 0 0 0 0 0 0 0 0 0
## 2010 0 0 0 0 0 0 0 0 0 0 0
## 2011 0 0 0 0 0 0 0 0 0 0 0
##
## 7 8 B h H K m M
## 1990 0 0 0 0 0 606 0 174
## 1991 0 0 0 0 0 592 0 86
## 1992 0 0 0 0 0 721 0 139
## 1993 0 0 3 0 0 5257 0 291
## 1994 0 0 0 1 0 8710 1 314
## 1995 2 0 5 0 6 8762 6 578
## 1996 0 0 0 0 0 8795 0 500
## 1997 0 0 1 0 0 9084 0 392
## 1998 0 0 1 0 0 12501 0 557
## 1999 0 0 1 0 0 9437 0 370
## 2000 0 0 1 0 0 10233 0 282
## 2001 0 0 1 0 0 8960 0 349
## 2002 0 0 0 0 0 9469 0 294
## 2003 0 0 2 0 0 9848 0 504
## 2004 0 0 5 0 0 9297 0 445
## 2005 0 0 10 0 0 9060 0 373
## 2006 0 0 1 0 0 10894 0 462
## 2007 0 0 0 0 0 11153 0 455
## 2008 0 0 2 0 0 16560 0 712
## 2009 0 0 0 0 0 13658 0 437
## 2010 0 0 2 0 0 15188 0 416
## 2011 0 0 5 0 0 19312 0 815
table(dataq2Dam_1990$YEAR, dataq2Dam_1990$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 1990 780 0 0 0 0 0 0 0 0
## 1991 678 0 0 0 0 0 0 0 0
## 1992 860 0 0 0 0 0 0 0 0
## 1993 4845 2 2 0 1 0 798 0 39
## 1994 7639 0 9 0 1 18 1550 0 51
## 1995 8952 4 6 0 3 3 797 1 133
## 1996 7902 0 0 0 0 0 1486 0 180
## 1997 8014 0 0 0 0 0 1701 0 133
## 1998 11165 0 0 0 0 0 2181 0 155
## 1999 9090 0 0 0 0 0 902 0 102
## 2000 9124 0 0 0 0 0 1792 0 104
## 2001 8091 0 0 0 0 0 1652 0 69
## 2002 8777 0 0 0 0 0 1136 0 77
## 2003 9183 0 0 0 0 0 1339 0 101
## 2004 8633 0 0 0 0 0 1421 0 68
## 2005 8844 0 0 0 1 0 729 0 73
## 2006 9126 0 0 0 1 0 2356 0 102
## 2007 0 0 0 0 0 0 11511 0 97
## 2008 0 0 0 0 0 0 17119 0 155
## 2009 0 0 0 0 0 0 14015 0 80
## 2010 0 0 0 0 0 0 15469 0 137
## 2011 0 0 0 0 0 0 20006 0 126
#filtered only the B,H,K,M components as they will make more impact to the economic consequence
dataq2Dam_1990_Exp <- filter(dataq2Dam_1990, PROPDMGEXP %in% c("B","k","K","m","M","h","H") & CROPDMGEXP %in% c("B","k","K","m","M"))
#Assigning the damage exponents to a consistent variable. Converting the exponent to a value so the expenditure in millions can be calculated.
dataq2Dam_1990_Exp$CROPDMGEXP[which(dataq2Dam_1990_Exp$CROPDMGEXP=="k")]<- "K"
dataq2Dam_1990_Exp$CROPDMGEXP[which(dataq2Dam_1990_Exp$CROPDMGEXP=="m")]<- "M"
dataq2Dam_1990_Exp$PROPDMGEXP[which(dataq2Dam_1990_Exp$PROPDMGEXP=="m")]<- "M"
dataq2Dam_1990_Exp$PROPDMGEXP[which(dataq2Dam_1990_Exp$PROPDMGEXP=="k")]<- "K"
#Assigning the exponent values so that to compute the damage cost in millions for each row.
dataq2Dam_1990_Exp$CropExpV[which(dataq2Dam_1990_Exp$CROPDMGEXP=="K")] <- 0.001
dataq2Dam_1990_Exp$CropExpV[which(dataq2Dam_1990_Exp$CROPDMGEXP=="M")] <- 1
dataq2Dam_1990_Exp$CropExpV[which(dataq2Dam_1990_Exp$CROPDMGEXP=="B")] <- 1000
dataq2Dam_1990_Exp$PropExpV[which(dataq2Dam_1990_Exp$PROPDMGEXP=="K")] <- 0.001
dataq2Dam_1990_Exp$PropExpV[which(dataq2Dam_1990_Exp$PROPDMGEXP=="M")] <- 1
dataq2Dam_1990_Exp$PropExpV[which(dataq2Dam_1990_Exp$PROPDMGEXP=="B")] <- 1000
#calculated crop damge and property damage cost in millions
dataq2Dam_1990_Cost <- mutate(dataq2Dam_1990_Exp, CROPCOST = CROPDMG*CropExpV, PROPCOST = PROPDMG*PropExpV)
The unique event types for this data are shown below.
unique(dataq2Dam_1990_Exp$EVTYPE)
## [1] HURRICANE OPAL/HIGH WINDS THUNDERSTORM WINDS
## [3] HURRICANE ERIN HURRICANE OPAL
## [5] TORNADO THUNDERSTORM WINDS/HAIL
## [7] FLASH FLOOD FLASH FLOODING
## [9] THUNDERSTORM WINDS HAIL FLOODING
## [11] HEAVY RAIN HIGH WINDS HEAVY RAINS
## [13] FLOOD THUNDERSTORM WIND
## [15] RIVER FLOOD HAIL
## [17] WINTER STORM HIGH WINDS
## [19] WINTER STORM HIGH WINDS WINTER STORMS
## [21] SEVERE THUNDERSTORMS SEVERE THUNDERSTORM WINDS
## [23] THUNDERSTORMS FLOOD/FLASH FLOOD
## [25] LIGHTNING HEAVY SNOW
## [27] TORNADOES, TSTM WIND, HAIL TROPICAL STORM
## [29] TROPICAL STORM GORDON TROPICAL STORM JERRY
## [31] TORNADO F0 BLIZZARD
## [33] HEAT GLAZE ICE
## [35] URBAN FLOOD URBAN FLOODING
## [37] THUNDERSTORMS WINDS GUSTNADO
## [39] COLD AIR TORNADO WIND DAMAGE
## [41] HIGH WIND SNOW
## [43] HAIL/WINDS HAIL/WIND
## [45] HEAVY RAINS TSTM WIND
## [47] THUNDERSTORM WINDS LIGHTNING ICE STORM
## [49] DUST STORM/HIGH WINDS ICE JAM FLOODING
## [51] FOREST FIRES HAIL 100
## [53] HEAT WAVE HEAVY SNOW/HIGH WINDS & FLOOD
## [55] FLOODS HURRICANE FELIX
## [57] THUNDERSTORM HAIL THUDERSTORM WINDS
## [59] DROUGHT HEAT WAVE DROUGHT
## [61] COASTAL FLOODING WINDS
## [63] HEAVY RAINS/FLOODING HIGH WINDS/COLD
## [65] RIVER FLOODING WILD/FOREST FIRE
## [67] THUNDERSTORMS WIND FLASH FLOOD/FLOOD
## [69] FLASH FLOODING/FLOOD THUNDERSTORM WINDSS
## [71] TROPICAL STORM DEAN THUNDERSTORM WINDS/ FLOOD
## [73] WILD/FOREST FIRES WILDFIRES
## [75] HURRICANE TSTM WIND/HAIL
## [77] URBAN/SML STREAM FLD WATERSPOUT
## [79] Heavy Rain/High Surf STORM SURGE
## [81] River Flooding Frost/Freeze
## [83] EXTREME COLD TYPHOON
## [85] GUSTY WINDS ICY ROADS
## [87] FOG DUST STORM
## [89] FREEZE DRY MICROBURST
## [91] SMALL HAIL STRONG WIND
## [93] WILDFIRE HURRICANE/TYPHOON
## [95] ASTRONOMICAL HIGH TIDE COASTAL FLOOD
## [97] LANDSLIDE EXCESSIVE HEAT
## [99] TSUNAMI WINTER WEATHER
## [101] HIGH SURF DENSE FOG
## [103] LAKE-EFFECT SNOW SEICHE
## [105] LAKESHORE FLOOD MARINE STRONG WIND
## [107] MARINE THUNDERSTORM WIND COLD/WIND CHILL
## [109] FROST/FREEZE ASTRONOMICAL LOW TIDE
## [111] AVALANCHE STORM SURGE/TIDE
## [113] DENSE SMOKE EXTREME COLD/WIND CHILL
## [115] DUST DEVIL TROPICAL DEPRESSION
## [117] FUNNEL CLOUD MARINE HIGH WIND
## [119] MARINE HAIL FREEZING FOG
## [121] RIP CURRENT
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
The event types are mapped to the recognisable event names provided in the National Weather Service instructions 10-1605 https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
#intialised a new column in the dataframe with an input and these will be replaced later as they are mapped
dataq2Dam_1990_Cost["EVTYPEMAP"] = "AVALANCHE"
#For cases with two categories, the first type is taken and mapped to one of the 48 events, i.e.EVTYPE=Rain/Wind, EVTYPEMAP= Heavy Rain in this case.
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^astrono",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "ASTRONOMICAL LOW TIDE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^ava",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "AVALANCHE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^blizzard",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "BLIZZARD"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^coastal",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "COASTAL FLOOD"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^cold",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "COLD/WIND CHILL"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("slide",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "DEBRIS FLOW"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^dense fog|^fog",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "DENSE FOG"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("smoke",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "DENSE SMOKE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^drought",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "DROUGHT"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^dust devil",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "DUST DEVIL"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^dust storm",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "DUST STORM"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^heat wave|(excessive|record|extreme) heat",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "EXCESSIVE HEAT"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^extreme (cold|windchill)",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "EXTREME COLD/WIND CHILL"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^flash flood",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "FLASH FLOOD"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^(river?) flood|^flood|^flooding$|urban",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "FLOOD"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^freezing fog",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "FREEZING FOG"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^freeze|frost",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "FROST/FREEZE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^funnel cloud",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "FUNNEL CLOUD"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^hail",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HAIL"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^heat$|warm",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HEAT"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^rain|^excessive rain|^heavy rain",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HEAVY RAIN"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^snow|thundersnow|blowing|falling snow|heavy snow|squall",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HEAVY SNOW"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^(heavy|high|rough) surf|high waves",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HIGH SURF"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^high wind|gusty|dry microburst|^wind",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HIGH WIND"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^hurricane",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "HURRICANE (TYPHOON)"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^ice|icy|black ice",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "ICE STORM"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^lake-effect",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "LAKE-EFFECT SNOW"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^lightn",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "LIGHTNING"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^marine (high wind)",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "MARINE HIGH WIND"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^marine strong wind",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "MARINE STRONG WIND"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^marine (tstm|thunderstorm) wind",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "MARINE THUNDERSTORM WIND"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^rip current",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "RIP CURRENT"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^seiche",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "SEICHE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("storm surge",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "STORM SURGE/TIDE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^strong wind",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "STRONG WIND"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^(tstm|thunderstorm) wind|severe",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "THUNDERSTORM WIND"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("tornado|gustnado",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "TORNADO"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^tropical storm",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "TROPICAL STORM"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^tropical depression",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "TROPICAL DEPRESSION"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^tsunami",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "TSUNAMI"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^waterspout",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "WATERSPOUT"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^wild|fire",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "WILDFIRE"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^winter storm",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "WINTER STORM"
dataq2Dam_1990_Cost$EVTYPEMAP[grep("^winter weather|glaze",dataq2Dam_1990_Cost$EVTYPE,ignore.case=TRUE)] <- "WINTER WEATHER"
EVq2length <- length(unique(dataq2Dam_1990_Cost$EVTYPEMAP))
There are 44 unique mapped events for this dataset.
unique(dataq2Dam_1990_Cost$EVTYPEMAP)
## [1] "HURRICANE (TYPHOON)" "THUNDERSTORM WIND"
## [3] "TORNADO" "FLASH FLOOD"
## [5] "FLOOD" "HEAVY RAIN"
## [7] "HIGH WIND" "HAIL"
## [9] "WINTER STORM" "AVALANCHE"
## [11] "LIGHTNING" "HEAVY SNOW"
## [13] "TROPICAL STORM" "BLIZZARD"
## [15] "HEAT" "WINTER WEATHER"
## [17] "ICE STORM" "DUST STORM"
## [19] "WILDFIRE" "EXCESSIVE HEAT"
## [21] "DROUGHT" "COASTAL FLOOD"
## [23] "WATERSPOUT" "STORM SURGE/TIDE"
## [25] "FROST/FREEZE" "EXTREME COLD/WIND CHILL"
## [27] "DENSE FOG" "STRONG WIND"
## [29] "ASTRONOMICAL LOW TIDE" "DEBRIS FLOW"
## [31] "TSUNAMI" "HIGH SURF"
## [33] "LAKE-EFFECT SNOW" "SEICHE"
## [35] "MARINE STRONG WIND" "MARINE THUNDERSTORM WIND"
## [37] "COLD/WIND CHILL" "DENSE SMOKE"
## [39] "DUST DEVIL" "TROPICAL DEPRESSION"
## [41] "FUNNEL CLOUD" "MARINE HIGH WIND"
## [43] "FREEZING FOG" "RIP CURRENT"
This section presents the results for the two questions.
The plot below shows the total number of fatalities and injuries from 1990 to 2011 for different events in the Storm Database. The tabulated results for fatalities and injuries for 36 different event types are also shown.
q1ans <- arrange(aggregate(cbind(FATALITIES, INJURIES)~EVTYPEMAP, data=dataq1fat_1990,sum), desc(FATALITIES),desc(INJURIES))
#converting wide to long form to enable plotting
q1ansL <- reshape(data = q1ans,varying = c("FATALITIES","INJURIES"),v.names = "NUMBER",timevar = "HARM",times=c("Fatalities","Injuries"),new.row.names = 1:100,direction="long")
qplot(x=log(NUMBER),y=EVTYPEMAP,data=q1ansL,colour=HARM,xlab="log(TotalNumber)",ylab="Mapped Event Type",main="Total number of fatalities and injuries from 1990 to 2011 \nfor different Storm Data events")
q1ans
## EVTYPEMAP FATALITIES INJURIES
## 1 EXCESSIVE HEAT 2202 5075
## 2 TORNADO 1777 16012
## 3 FLASH FLOOD 999 641
## 4 HEAT 977 1420
## 5 LIGHTNING 817 649
## 6 RIP CURRENT 577 154
## 7 FLOOD 551 2717
## 8 THUNDERSTORM WIND 533 642
## 9 HIGH WIND 325 405
## 10 EXTREME COLD/WIND CHILL 317 216
## 11 AVALANCHE 228 81
## 12 WINTER STORM 217 631
## 13 COLD/WIND CHILL 164 50
## 14 HIGH SURF 162 54
## 15 HEAVY SNOW 146 247
## 16 HURRICANE (TYPHOON) 135 1237
## 17 STRONG WIND 112 41
## 18 HEAVY RAIN 105 64
## 19 ICE STORM 102 1755
## 20 BLIZZARD 101 718
## 21 WILDFIRE 90 427
## 22 DENSE FOG 81 466
## 23 WINTER WEATHER 71 242
## 24 TROPICAL STORM 66 317
## 25 STORM SURGE/TIDE 46 9
## 26 DEBRIS FLOW 44 28
## 27 TSUNAMI 33 129
## 28 DUST STORM 24 106
## 29 MARINE THUNDERSTORM WIND 19 10
## 30 HAIL 15 6
## 31 MARINE STRONG WIND 14 10
## 32 FROST/FREEZE 13 21
## 33 MARINE HIGH WIND 9 8
## 34 COASTAL FLOOD 7 1
## 35 WATERSPOUT 6 39
## 36 SLEET 2 0
In order to combine the effect of fatalities and injuries to be able to identify the most harmful event type, the fatalities and injuries were added. The multiplier used for the fatalities and injuries are 1 and 0.5 respectively. The top ten harmful events with respect to population health are ranked for the event types and are presented below.
#multiplier to combine fatality and injury
q1ansM <- arrange(
mutate(q1ans,COMBINED=FATALITIES+0.5*INJURIES),
desc(COMBINED))
head(q1ansM,10)
## EVTYPEMAP FATALITIES INJURIES COMBINED
## 1 TORNADO 1777 16012 9783.0
## 2 EXCESSIVE HEAT 2202 5075 4739.5
## 3 FLOOD 551 2717 1909.5
## 4 HEAT 977 1420 1687.0
## 5 FLASH FLOOD 999 641 1319.5
## 6 LIGHTNING 817 649 1141.5
## 7 ICE STORM 102 1755 979.5
## 8 THUNDERSTORM WIND 533 642 854.0
## 9 HURRICANE (TYPHOON) 135 1237 753.5
## 10 RIP CURRENT 577 154 654.0
The plot below shows the total crop damage cost and property damage cost in millions from 1990 to 2011 for different events in the Storm Database. The tabulated results for crop and property damage cost in millions for 44 different event types are shown.
q2ans <- arrange(aggregate(cbind(CROPCOST, PROPCOST)~EVTYPEMAP, data=dataq2Dam_1990_Cost,sum), desc(CROPCOST),desc(PROPCOST))
#converting wide to long form to enable plotting
q2ansL <- reshape(data = q2ans,varying = c("CROPCOST","PROPCOST"),v.names = "COST",timevar = "DAMAGE",times=c("Crop","Property"),new.row.names = 1:100,direction="long")
qplot(x=log(COST),y=EVTYPEMAP,data=q2ansL,colour=DAMAGE,xlab="log(Total Cost in Millions)",ylab="Mapped Event Type",main="Total cost(millions) of crop and property damage from 1990 to 2011 \nfor different Storm Data events")
q2ans
## EVTYPEMAP CROPCOST PROPCOST
## 1 FLOOD 10335.60205 138162.97265
## 2 HURRICANE (TYPHOON) 5332.29280 38981.15300
## 3 ICE STORM 5027.11000 908.08430
## 4 HAIL 2028.86795 7992.28919
## 5 DROUGHT 1652.69600 233.72100
## 6 FLASH FLOOD 1398.22510 7661.63808
## 7 THUNDERSTORM WIND 1168.61675 4344.11810
## 8 FROST/FREEZE 932.37600 10.68000
## 9 HIGH WIND 658.85385 2635.81684
## 10 EXCESSIVE HEAT 492.60000 1.76320
## 11 TROPICAL STORM 468.26100 1062.09135
## 12 TORNADO 355.88526 17766.83299
## 13 WILDFIRE 285.82210 3552.72747
## 14 HEAVY SNOW 131.67310 179.79300
## 15 HEAVY RAIN 126.93880 339.92193
## 16 BLIZZARD 112.06000 94.98100
## 17 STRONG WIND 64.94850 119.25206
## 18 WINTER STORM 29.22400 1078.34420
## 19 DEBRIS FLOW 20.01700 150.30350
## 20 WINTER WEATHER 15.00080 20.20200
## 21 LIGHTNING 5.51215 315.27398
## 22 DUST STORM 2.85000 3.44900
## 23 EXTREME COLD/WIND CHILL 2.25500 9.39300
## 24 AVALANCHE 1.39100 25.77480
## 25 STORM SURGE/TIDE 0.85500 4643.55800
## 26 HEAT 0.73500 1.65500
## 27 COLD/WIND CHILL 0.60000 1.99000
## 28 COASTAL FLOOD 0.05600 192.88056
## 29 MARINE THUNDERSTORM WIND 0.05000 0.43640
## 30 TSUNAMI 0.02000 144.06200
## 31 HIGH SURF 0.00000 83.01750
## 32 LAKE-EFFECT SNOW 0.00000 40.03500
## 33 WATERSPOUT 0.00000 5.20620
## 34 DENSE FOG 0.00000 2.87400
## 35 FREEZING FOG 0.00000 2.18200
## 36 TROPICAL DEPRESSION 0.00000 1.30200
## 37 MARINE HIGH WIND 0.00000 1.14001
## 38 MARINE STRONG WIND 0.00000 0.40333
## 39 DUST DEVIL 0.00000 0.38113
## 40 ASTRONOMICAL LOW TIDE 0.00000 0.32500
## 41 DENSE SMOKE 0.00000 0.10000
## 42 SEICHE 0.00000 0.10000
## 43 FUNNEL CLOUD 0.00000 0.06510
## 44 RIP CURRENT 0.00000 0.00100
In order to identify the event type causing the greatest economic consequence, the crop and property damage cost were added. A multiplier of 1 is assumed for both as food and shelter are the basic necessities for human beings and damage translates to additional cost incurred and thus economic consequence. The top ten events with greatest economic consequences are ranked for the event types and are presented below.
q2ansM <- arrange(
mutate(q2ans,TOTALCOST_M=CROPCOST+PROPCOST),
desc(TOTALCOST_M))
head(q2ansM,10)
## EVTYPEMAP CROPCOST PROPCOST TOTALCOST_M
## 1 FLOOD 10335.6020 138162.9727 148498.575
## 2 HURRICANE (TYPHOON) 5332.2928 38981.1530 44313.446
## 3 TORNADO 355.8853 17766.8330 18122.718
## 4 HAIL 2028.8680 7992.2892 10021.157
## 5 FLASH FLOOD 1398.2251 7661.6381 9059.863
## 6 ICE STORM 5027.1100 908.0843 5935.194
## 7 THUNDERSTORM WIND 1168.6167 4344.1181 5512.735
## 8 STORM SURGE/TIDE 0.8550 4643.5580 4644.413
## 9 WILDFIRE 285.8221 3552.7275 3838.550
## 10 HIGH WIND 658.8538 2635.8168 3294.671
The weather events in the StormEvents database occuring in United States are studied from the year 1990 to 2011. The data are analysed and it is found that tornado, excessive heat and flood are the most harmful events with respect to population health. Flood, hurricane and tornado events have resulted in greatest economic consequences.