In this analysis we consider the effects of harmful meteorological events for communities and municipalities.
Based on the collected data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, we need to understand which of these events are the most destructive and malicious. We also estimate the probability of occurrence of these events, based on the frequency of their occurrence in the past. The ultimate goal of this research is to prepare for severe weather events and to prioritize resources for different types of events.
Based on this analysis, we can conclude that excessive heat causes the most fatalities, tornadoes the most injuries, drought has the most damages in crops and hurricanes/typhoons has the greatest economic impacts.
Then we present polar plots to demonstrate the frequencies of the events with the most impact during the year.
Throughout this analysis you can always find the code that I used to generate the output. When writing code chunks in the R markdown document,I always used echo = TRUE so that everyone to be able to read the code.
Then we load the libraries used in our analysis:
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.3
opts_chunk$set(echo = TRUE)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.3
## Loading required package: grid
library(plotrix)
## Warning: package 'plotrix' was built under R version 3.1.3
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Next, we show our system environment, maybe it is useful for reproducible proposals:
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: i386-w64-mingw32/i386 (32-bit)
##
## locale:
## [1] LC_COLLATE=Greek_Greece.1253 LC_CTYPE=Greek_Greece.1253
## [3] LC_MONETARY=Greek_Greece.1253 LC_NUMERIC=C
## [5] LC_TIME=Greek_Greece.1253
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.8 evaluate_0.7 formatR_1.2 htmltools_0.2.6
## [5] knitr_1.10.5 magrittr_1.5 rmarkdown_0.6.1 stringi_0.4-1
## [9] stringr_1.0.0 tools_3.1.2 yaml_2.1.13
The data for this analysis come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file here: Storm Data [47Mb]
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
First, we need a raw data for work. We check if the file is already in the working directory. If not, we download the file and load the data and assign it to the “data” data frame. We set NA’s to be values with: “NA” or “”.
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("StormData.csv.bz2")) {
download.file(fileURL, "./StormData.csv.bz2", method = "curl")
}
data <- read.csv(bzfile("StormData.csv.bz2"),
header = TRUE, na.strings = c("NA", ""))
Now, we can proceed with the data pre-examination, first we examine the structure of the data set.
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29600 levels "5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13512 1872 4597 10591 4371 10093 1972 23872 24417 4597 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 34 levels " N"," NW","E",..: NA NA NA NA NA NA NA NA NA NA ...
## $ BGN_LOCATI: Factor w/ 54428 levels "- 1 N Albion",..: NA NA NA NA NA NA NA NA NA NA ...
## $ END_DATE : Factor w/ 6662 levels "1/1/1993 0:00:00",..: NA NA NA NA NA NA NA NA NA NA ...
## $ END_TIME : Factor w/ 3646 levels " 0900CST"," 200CST",..: NA NA NA NA NA NA NA NA NA NA ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 23 levels "E","ENE","ESE",..: NA NA NA NA NA NA NA NA NA NA ...
## $ END_LOCATI: Factor w/ 34505 levels "- .5 NNW","- 11 ESE Jay",..: NA NA NA NA NA NA NA NA NA NA ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...
## $ WFO : Factor w/ 541 levels " CI","$AC","$AG",..: NA NA NA NA NA NA NA NA NA NA ...
## $ STATEOFFIC: Factor w/ 249 levels "ALABAMA, Central",..: NA NA NA NA NA NA NA NA NA NA ...
## $ ZONENAMES : Factor w/ 25111 levels " "| __truncated__,..: NA NA NA NA NA NA NA NA NA NA ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436780 levels "-2 at Deer Park\n",..: NA NA NA NA NA NA NA NA NA NA ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
We see that there are a lot of variables, so we will need to narrow them down.
Then we look into EVTYPE variable that has the event type of each observation.
length(unique(data$EVTYPE))
## [1] 985
As we can see above there are 985 unique event types in the raw data set.
In National Weather Service Storm Data Documentation paragraphs 7.1 through 7.48 there are 48 event types, so later we will group them to achieve this.
First, we add a new column named “YEAR” to specify the year of each observation.
data$YEAR <- as.numeric(format(as.Date(data$BGN_DATE,format="%m/%d/%Y %H:%M:%S"),"%Y"))
Then we select the columns needed for our analysis and subset our data set. These columns are:
“BGN_DATE”, “STATE”, “COUNTYNAME”, “EVTYPE”, “FATALITIES”, “INJURIES”, “CROPDMG”, “CROPDMGEXP”, “PROPDMG”, “PROPDMGEXP”, “YEAR”
cols <- c("BGN_DATE","STATE","COUNTYNAME", "EVTYPE", "FATALITIES", "INJURIES",
"CROPDMG", "CROPDMGEXP", "PROPDMG", "PROPDMGEXP", "YEAR")
data <- subset(data, select = cols)
Next, we remove unuseful events from the data set to reduce the date size for easier handel, the reason for this is if an event had no fatalities and no injured and no crop or property damage, than for the purpose of this analysis it is useless. We name this new data set “tidy”.
tidy <- subset(data, FATALITIES > 0 | INJURIES > 0 | CROPDMG > 0 | PROPDMG > 0)
As mentioned above, more recent years should be considered more complete. As we can see here, after 1996 data is more complete as all 48 types are recorded. So we calculate the percentage of observations after 1996.
per96 <- round(nrow(tidy[tidy$YEAR >=1996,]) / nrow(tidy) * 100, 2)
per96
## [1] 79.06
As 79.06 % of the observations are after 1996, we subset our data set for the years after 1996 and we create a new data set called “clean.1996”. Then we check EVTYPE again to see if it has changed. We also remove the previous data sets (“data”, “tidy”) to save memory.
clean.1996 <- tidy[tidy$YEAR >= 1996,]
rm(data, tidy)
unique(clean.1996$EVTYPE)
## [1] WINTER STORM TORNADO
## [3] TSTM WIND HIGH WIND
## [5] FLASH FLOOD FREEZING RAIN
## [7] EXTREME COLD LIGHTNING
## [9] HAIL FLOOD
## [11] TSTM WIND/HAIL EXCESSIVE HEAT
## [13] RIP CURRENTS Other
## [15] HEAVY SNOW WILD/FOREST FIRE
## [17] ICE STORM BLIZZARD
## [19] STORM SURGE Ice jam flood (minor
## [21] DUST STORM STRONG WIND
## [23] DUST DEVIL Tstm Wind
## [25] URBAN/SML STREAM FLD FOG
## [27] ROUGH SURF Heavy Surf
## [29] Dust Devil HEAVY RAIN
## [31] Marine Accident AVALANCHE
## [33] Freeze DRY MICROBURST
## [35] Strong Wind WINDS
## [37] COASTAL STORM Erosion/Cstl Flood
## [39] River Flooding WATERSPOUT
## [41] DAMAGING FREEZE Damaging Freeze
## [43] HURRICANE TROPICAL STORM
## [45] Beach Erosion High Surf
## [47] Heavy Rain/High Surf Unseasonable Cold
## [49] Early Frost Wintry Mix
## [51] Extreme Cold DROUGHT
## [53] Coastal Flooding Torrential Rainfall
## [55] Landslump Hurricane Edouard
## [57] Coastal Storm TIDAL FLOODING
## [59] Tidal Flooding Strong Winds
## [61] EXTREME WINDCHILL Glaze
## [63] Extended Cold Whirlwind
## [65] Heavy snow shower Light snow
## [67] COASTAL FLOOD Light Snow
## [69] MIXED PRECIP COLD
## [71] Freezing Spray DOWNBURST
## [73] Mudslides Microburst
## [75] Mudslide Cold
## [77] SNOW Coastal Flood
## [79] Snow Squalls Wind Damage
## [81] Light Snowfall Freezing Drizzle
## [83] Gusty wind/rain GUSTY WIND/HVY RAIN
## [85] Wind Cold Temperature
## [87] Heat Wave Snow
## [89] COLD AND SNOW HEAVY SURF
## [91] RAIN/SNOW WIND
## [93] FREEZE TSTM WIND (G45)
## [95] Gusty Winds GUSTY WIND
## [97] TSTM WIND 40 TSTM WIND 45
## [99] HARD FREEZE TSTM WIND (41)
## [101] HEAT RIVER FLOOD
## [103] TSTM WIND (G40) RIP CURRENT
## [105] HIGH SURF MUD SLIDE
## [107] Frost/Freeze SNOW AND ICE
## [109] COASTAL FLOODING AGRICULTURAL FREEZE
## [111] WINTER WEATHER STRONG WINDS
## [113] SNOW SQUALL ICY ROADS
## [115] OTHER THUNDERSTORM
## [117] Hypothermia/Exposure HYPOTHERMIA/EXPOSURE
## [119] Lake Effect Snow Freezing Rain
## [121] Mixed Precipitation BLACK ICE
## [123] COASTALSTORM LIGHT SNOW
## [125] DAM BREAK Gusty winds
## [127] blowing snow FREEZING DRIZZLE
## [129] FROST GRADIENT WIND
## [131] UNSEASONABLY COLD GUSTY WINDS
## [133] TSTM WIND AND LIGHTNING gradient wind
## [135] Gradient wind Freezing drizzle
## [137] WET MICROBURST Heavy surf and wind
## [139] FUNNEL CLOUD TYPHOON
## [141] LANDSLIDES HIGH SWELLS
## [143] HIGH WINDS SMALL HAIL
## [145] UNSEASONAL RAIN COASTAL FLOODING/EROSION
## [147] TSTM WIND (G45) TSTM WIND (G45)
## [149] HIGH WIND (G40) TSTM WIND (G35)
## [151] GLAZE COASTAL EROSION
## [153] UNSEASONABLY WARM SEICHE
## [155] COASTAL FLOODING/EROSION HYPERTHERMIA/EXPOSURE
## [157] WINTRY MIX RIVER FLOODING
## [159] ROCK SLIDE GUSTY WIND/HAIL
## [161] HEAVY SEAS TSTM WIND
## [163] LANDSPOUT RECORD HEAT
## [165] EXCESSIVE SNOW LAKE EFFECT SNOW
## [167] FLOOD/FLASH/FLOOD MIXED PRECIPITATION
## [169] WIND AND WAVE FLASH FLOOD/FLOOD
## [171] LIGHT FREEZING RAIN ICE ROADS
## [173] HIGH SEAS RAIN
## [175] ROUGH SEAS TSTM WIND G45
## [177] NON-SEVERE WIND DAMAGE WARM WEATHER
## [179] THUNDERSTORM WIND (G40) LANDSLIDE
## [181] HIGH WATER FLASH FLOOD
## [183] LATE SEASON SNOW WINTER WEATHER MIX
## [185] ROGUE WAVE FALLING SNOW/ICE
## [187] NON-TSTM WIND NON TSTM WIND
## [189] MUDSLIDE BRUSH FIRE
## [191] BLOWING DUST VOLCANIC ASH
## [193] HIGH SURF ADVISORY HAZARDOUS SURF
## [195] WILDFIRE COLD WEATHER
## [197] WHIRLWIND ICE ON ROAD
## [199] SNOW SQUALLS DROWNING
## [201] EXTREME COLD/WIND CHILL MARINE TSTM WIND
## [203] HURRICANE/TYPHOON DENSE FOG
## [205] WINTER WEATHER/MIX FROST/FREEZE
## [207] ASTRONOMICAL HIGH TIDE HEAVY SURF/HIGH SURF
## [209] TROPICAL DEPRESSION LAKE-EFFECT SNOW
## [211] MARINE HIGH WIND THUNDERSTORM WIND
## [213] TSUNAMI STORM SURGE/TIDE
## [215] COLD/WIND CHILL LAKESHORE FLOOD
## [217] MARINE THUNDERSTORM WIND MARINE STRONG WIND
## [219] ASTRONOMICAL LOW TIDE DENSE SMOKE
## [221] MARINE HAIL FREEZING FOG
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
As we can see, there are 222 event types, so now we will group them. To achieve this we used the official site Storm Events Database. We took every event type shown above and we used the state, the county name and the date to search and compare the search results with the one from the data set in order to group them in 48 event types as shown in National Weather Service Storm Data Documentation paragraphs 7.1 through 7.48.
Here is an example:
First we check the observation with the specific event type (“ROCK SLIDE”).
clean.1996[which(clean.1996$EVTYPE == "ROCK SLIDE"),]
## BGN_DATE STATE COUNTYNAME EVTYPE FATALITIES INJURIES
## 345192 3/22/1998 0:00:00 VA VAZ057 ROCK SLIDE 0 0
## CROPDMG CROPDMGEXP PROPDMG PROPDMGEXP YEAR
## 345192 0 <NA> 150 K 1998
Then we use the state, the county name and the date to search and compare the search results with the one from the data set as seen above. So we group this observation with “DEBRIS FLOW” event type category.
So after a very very long research, we grouped “EVTYPE” variable using these commands.
# Replace 'and' or '&' with space
clean.1996$EVTYPE <- gsub("(\\sAND(\\s|$))|(\\s&\\s)", " ", clean.1996$EVTYPE)
# Remove leading and trailing space
clean.1996$EVTYPE <- gsub("^\\s+|\\s+$", "", clean.1996$EVTYPE)
# Remove multiple spaces
clean.1996$EVTYPE <- gsub("\\s{2,}", " ", clean.1996$EVTYPE)
# uppercase all
clean.1996$EVTYPE <- toupper(clean.1996[, "EVTYPE"])
# rename and group EVTYPES
clean.1996$EVTYPE <- gsub("GUSTY ", "", clean.1996$EVTYPE)
clean.1996$EVTYPE <- gsub("TSTM", "THUNDERSTORM", clean.1996$EVTYPE)
clean.1996[grepl("COASTAL", clean.1996$EVTYPE)|
grepl("EROSION", clean.1996$EVTYPE)|
grepl("TIDAL", clean.1996$EVTYPE), c("EVTYPE")] <- "COASTAL FLOOD"
clean.1996[grepl("MUDSLIDES", clean.1996$EVTYPE)|
grepl("LANDSL", clean.1996$EVTYPE)|
grepl("ROCK SLIDE", clean.1996$EVTYPE), c("EVTYPE")] <- "DEBRIS FLOW"
clean.1996[grepl("FOG", clean.1996$EVTYPE), c("EVTYPE")] <- "DENSE FOG"
clean.1996[grepl("WHIRLWIND", clean.1996$EVTYPE)|
grepl("LANDSPOUT", clean.1996$EVTYPE), c("EVTYPE")] <- "DUST DEVIL"
clean.1996[grepl("BLOWING DUST", clean.1996$EVTYPE), c("EVTYPE")] <- "DUST STORM"
clean.1996[grepl("WARM WEATHER", clean.1996$EVTYPE), c("EVTYPE")] <- "EXCESSIVE HEAT"
clean.1996[grepl("EXTREME", clean.1996$EVTYPE), c("EVTYPE")] <- "EXTREME COLD/WIND CHILL"
clean.1996[grepl("FLASH", clean.1996$EVTYPE), c("EVTYPE")] <- "FLASH FLOOD"
clean.1996[grepl("FROST", clean.1996$EVTYPE) |
grepl("FREEZE", clean.1996$EVTYPE), c("EVTYPE")] <- "FROST/FREEZE"
clean.1996[grepl("SMALL HAIL", clean.1996$EVTYPE), c("EVTYPE")] <- "HAIL"
clean.1996[grepl("FREEZING SPRAY", clean.1996$EVTYPE), c("EVTYPE")] <- "FREEZING FOG"
clean.1996[grepl("HIGH WATER", clean.1996$EVTYPE) |
grepl("ICE JAM", clean.1996$EVTYPE) |
grepl("DAM BREAK", clean.1996$EVTYPE) |
grepl("DROWNING", clean.1996$EVTYPE) |
grepl("RIVER", clean.1996$EVTYPE) |
grepl("URBAN", clean.1996$EVTYPE), c("EVTYPE")] <- "FLOOD"
clean.1996[grepl("HEAT WAVE", clean.1996$EVTYPE) |
grepl("HYPERTHERMIA", clean.1996$EVTYPE) |
grepl("RECORD HEAT", clean.1996$EVTYPE) |
grepl("WARM", clean.1996$EVTYPE), c("EVTYPE")] <- "HEAT"
clean.1996[grepl("HEAT WAVE", clean.1996$EVTYPE) |
grepl("HEAVY SNOW SHOWER", clean.1996$EVTYPE) |
grepl("EXCESSIVE SNOW", clean.1996$EVTYPE) |
grepl("LATE SEASON SNOW", clean.1996$EVTYPE), c("EVTYPE")] <- "HEAVY SNOW"
clean.1996[grepl("HEAVY R", clean.1996$EVTYPE) |
grepl("HIGH WINDS", clean.1996$EVTYPE) |
grepl("MUD", clean.1996$EVTYPE) |
grepl("OTHER", clean.1996$EVTYPE) |
grepl("^RAIN$", clean.1996$EVTYPE) |
grepl("RAINFALL", clean.1996$EVTYPE) |
grepl("HVY RAIN", clean.1996$EVTYPE) |
grepl("NAL RAIN", clean.1996$EVTYPE), c("EVTYPE")] <- "HEAVY RAIN"
clean.1996[grepl("SURF", clean.1996$EVTYPE) |
grepl("SWELLS", clean.1996$EVTYPE) |
grepl("ROGUE WAVE", clean.1996$EVTYPE) |
grepl("SEAS$", clean.1996$EVTYPE), c("EVTYPE")] <- "HIGH SURF"
clean.1996[grepl("HURRICANE", clean.1996$EVTYPE) |
grepl("TYPHOON", clean.1996$EVTYPE), c("EVTYPE")] <- "HURRICANE/TYPHOON"
clean.1996[grepl("FALLING SNOW", clean.1996$EVTYPE), c("EVTYPE")] <- "ICE STORM"
clean.1996[grepl("EFFECT", clean.1996$EVTYPE), c("EVTYPE")] <- "LAKE-EFFECT SNOW"
clean.1996[grepl("MARINE ACCIDENT", clean.1996$EVTYPE), c("EVTYPE")] <- "MARINE HIGH WIND"
clean.1996[grepl("RIP CURRENTS", clean.1996$EVTYPE), c("EVTYPE")] <- "RIP CURRENT"
clean.1996[grepl("HIGH TIDE", clean.1996$EVTYPE) |
grepl("SURGE", clean.1996$EVTYPE), c("EVTYPE")] <- "STORM SURGE/TIDE"
clean.1996[grepl("GRADIENT WIND", clean.1996$EVTYPE) |
grepl("NON", clean.1996$EVTYPE) |
grepl("WINDS", clean.1996$EVTYPE) |
grepl("WIND/RAIN", clean.1996$EVTYPE), c("EVTYPE")] <- "STRONG WIND"
clean.1996[grepl("GRADIENT WIND", clean.1996$EVTYPE) |
grepl("^THUNDERSTORM", clean.1996$EVTYPE) |
grepl("BURST", clean.1996$EVTYPE) |
grepl("WIND/HAIL", clean.1996$EVTYPE), c("EVTYPE")] <- "THUNDERSTORM WIND"
clean.1996[grepl("FIRE", clean.1996$EVTYPE), c("EVTYPE")] <- "WILDFIRE"
clean.1996[grepl("^SNOW", clean.1996$EVTYPE) |
grepl("BLOWING SNOW", clean.1996$EVTYPE) |
grepl("ING RAIN", clean.1996$EVTYPE) |
grepl("ROAD", clean.1996$EVTYPE) |
grepl("BLACK ICE", clean.1996$EVTYPE) |
grepl("COLD SNOW", clean.1996$EVTYPE) |
grepl("MIX", clean.1996$EVTYPE) |
grepl("GLAZE", clean.1996$EVTYPE) |
grepl("LIGHT SNOW", clean.1996$EVTYPE) |
grepl("GLAZE", clean.1996$EVTYPE) |
grepl("RAIN/SNOW", clean.1996$EVTYPE) |
grepl("DRIZZLE", clean.1996$EVTYPE), c("EVTYPE")] <- "WINTER WEATHER"
clean.1996[grepl("^COLD", clean.1996$EVTYPE) |
grepl("COLD$", clean.1996$EVTYPE) |
grepl("HYPOTHERMIA", clean.1996$EVTYPE), c("EVTYPE")] <- "COLD/WIND CHILL"
clean.1996[grepl("^HIGH WIND", clean.1996$EVTYPE) |
grepl("^WIND", clean.1996$EVTYPE) |
grepl("HYPOTHERMIA", clean.1996$EVTYPE), c("EVTYPE")] <- "HIGH WIND"
# Print ordered unique(clean.1996$EVTYPE) to see result
o <- unique(clean.1996$EVTYPE)
o <- o[order(o)]
o
## [1] "ASTRONOMICAL LOW TIDE" "AVALANCHE"
## [3] "BLIZZARD" "COASTAL FLOOD"
## [5] "COLD/WIND CHILL" "DEBRIS FLOW"
## [7] "DENSE FOG" "DENSE SMOKE"
## [9] "DROUGHT" "DUST DEVIL"
## [11] "DUST STORM" "EXCESSIVE HEAT"
## [13] "EXTREME COLD/WIND CHILL" "FLASH FLOOD"
## [15] "FLOOD" "FREEZING FOG"
## [17] "FROST/FREEZE" "FUNNEL CLOUD"
## [19] "HAIL" "HEAT"
## [21] "HEAVY RAIN" "HEAVY SNOW"
## [23] "HIGH SURF" "HIGH WIND"
## [25] "HURRICANE/TYPHOON" "ICE STORM"
## [27] "LAKE-EFFECT SNOW" "LAKESHORE FLOOD"
## [29] "LIGHTNING" "MARINE HAIL"
## [31] "MARINE HIGH WIND" "MARINE STRONG WIND"
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"
## [35] "SEICHE" "STORM SURGE/TIDE"
## [37] "STRONG WIND" "THUNDERSTORM WIND"
## [39] "TORNADO" "TROPICAL DEPRESSION"
## [41] "TROPICAL STORM" "TSUNAMI"
## [43] "VOLCANIC ASH" "WATERSPOUT"
## [45] "WILDFIRE" "WINTER STORM"
## [47] "WINTER WEATHER"
During this research, we found a strange data observation.
clean.1996[grepl("CA", clean.1996$STATE) & grepl("B", clean.1996$PROPDMGEXP) & grepl("FLOOD", clean.1996$EVTYPE),]
## BGN_DATE STATE COUNTYNAME EVTYPE FATALITIES INJURIES
## 605953 1/1/2006 0:00:00 CA NAPA FLOOD 0 0
## CROPDMG CROPDMGEXP PROPDMG PROPDMGEXP YEAR
## 605953 32.5 M 115 B 2006
This means that property damage for this observation is 115 Billion Dollars. So we checked it and corrected it in Million Dollars.
clean.1996[grepl("CA", clean.1996$STATE) & grepl("B", clean.1996$PROPDMGEXP) & grepl("FLOOD", clean.1996$EVTYPE), c("PROPDMGEXP")]<-"M"
Then, we made the table of CROPDMGEXP variable in order to examine it.
table(clean.1996$CROPDMGEXP, useNA = "ifany")
##
## ? 0 2 B k K m M <NA>
## 0 0 0 2 0 96787 0 1762 102767
a<-clean.1996[is.na(clean.1996$CROPDMGEXP),]
table(a$CROPDMG, useNA = "ifany")
##
## 0
## 102767
We see that, when CROPDMGEXP variable is “NA” crop damage is zero.
Then, we made the table of PROPDMGEXP variable in order to examine it.
table(clean.1996$PROPDMGEXP, useNA = "ifany")
##
## - ? + 0 1 2 3 4 5 6
## 0 0 0 0 0 0 0 0 0 0
## 7 8 B h H K m M <NA>
## 0 0 31 0 0 185474 0 7365 8448
a<-clean.1996[is.na(clean.1996$PROPDMGEXP),]
table(a$PROPDMG, useNA = "ifany")
##
## 0
## 8448
We see that, when PROPDMGEXP variable is “NA” crop damage is zero.
We also see that exponents are: “B”, “M”, “K” for “Billions”, “Millions”, “Thousands”. So, in order to calculate crop and property damage we used a function to translate those exponents. Then we converted damage to millions of dollars (M). The result is saved in two new variables, “CROPDAMAGE” and “PROPDAMAGE”. We also created a variable “TOTAL” to calculate the total damage of an event type.
# Function to calculate damage in dollars
use_exponent <- function(varexp) {
if (is.na(varexp)) exponent <- 1
else {
exp <- toupper(varexp);
if (exp == "K") { exponent <- 1000 }
else if (exp == "M") { exponent <- 1000000 }
else if (exp == "B") { exponent <- 1000000000 }
}
exponent
}
# Convert damage to millions (M)
clean.1996$CROPDAMAGE <- round(with(clean.1996,
CROPDMG * sapply(CROPDMGEXP,
use_exponent)/1000000), 3)
clean.1996$PROPDAMAGE <- round(with(clean.1996,
PROPDMG * sapply(PROPDMGEXP,
use_exponent)/1000000), 3)
clean.1996$TOTAL <- clean.1996$CROPDAMAGE + clean.1996$PROPDAMAGE
# aggregate fatalities
fatalities <- aggregate(FATALITIES ~ EVTYPE, clean.1996, sum)
fatalities <- fatalities[order(-fatalities$FATALITIES), ]
fatalities$EVTYPE <- factor(fatalities$EVTYPE,
levels = fatalities[order(fatalities$FATALITIES),
"EVTYPE"])
# aggregate injuries
injuries <- aggregate(INJURIES ~ EVTYPE, clean.1996, sum)
injuries <- injuries[order(-injuries$INJURIES), ]
injuries$EVTYPE <- factor(injuries$EVTYPE,
levels = injuries[order(injuries$INJURIES),
"EVTYPE"])
After that, we make a table of ten most impactful in public health events
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.3
table.pub.health <- merge(fatalities, injuries,
by = 'EVTYPE', all.x = TRUE, all.y = TRUE)
table.pub.health <- format(table.pub.health, big.mark=",", decimal.mark=".")
table.pub.health <- table.pub.health[order(table.pub.health$FATALITIES,
decreasing=TRUE), ]
kable(table.pub.health[1:10,], format = "pandoc",
caption = "Events with greatest consequences in public health",
row.names = FALSE,
col.names=c("Event Type", "Fatalities", "Injuries"),
align=c("l", "r", "r"))
Event Type | Fatalities | Injuries |
---|---|---|
EXCESSIVE HEAT | 1,797 | 6,393 |
TORNADO | 1,511 | 20,667 |
FLASH FLOOD | 887 | 1,674 |
LIGHTNING | 651 | 4,141 |
RIP CURRENT | 542 | 503 |
FLOOD | 448 | 6,838 |
THUNDERSTORM WIND | 382 | 5,154 |
EXTREME COLD/WIND CHILL | 257 | 108 |
HIGH WIND | 254 | 1,168 |
HEAT | 240 | 1,309 |
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.3
## Loading required package: grid
# fatalities plot
fat.plot <- ggplot(fatalities[1:10,], aes(x = EVTYPE, y = FATALITIES)) +
theme(text = element_text(size=32, face = "bold")) +
xlab("Event Type") + ylab("Fatalities") + geom_bar(stat="identity", fill = "red") +
geom_text(aes(label = FATALITIES), hjust=1, size=9) + coord_flip() +
theme(plot.margin = unit(c(2, 2, 2, 0), "lines"))
# injuries plot
inj.plot <- ggplot(injuries[1:10,], aes(x = EVTYPE, y = INJURIES)) +
theme(text = element_text(size=32, face = "bold")) +
xlab("") + ylab("Injuries") + geom_bar(stat="identity", fill = "light blue") +
geom_text(aes(label = INJURIES), hjust=1, size=9) + coord_flip()+
theme(plot.margin = unit(c(2, 0, 2, 2), "lines"))
grid.arrange(fat.plot, inj.plot, ncol = 2,
main=textGrob("Fatalities and Injuries by Event",
gp=gpar(fontsize=50, col="blue")))
The table and the plots shows that Excessive Heat causes the most fatalities (1,797).
Furthermore, Tornadoes causes the most injuries (20,667) and the second most fatalities (1,511).
# aggregate cropdamage
cropdamage <- aggregate(CROPDAMAGE ~ EVTYPE, clean.1996, sum)
cropdamage$CROPDAMAGE <- round(cropdamage$CROPDAMAGE, 2)
cropdamage <- cropdamage[order(-cropdamage$CROPDAMAGE), ]
cropdamage$EVTYPE <- factor(cropdamage$EVTYPE,
levels = cropdamage[order(cropdamage$CROPDAMAGE),
"EVTYPE"])
# aggregate propdamage
propdamage <- aggregate(PROPDAMAGE ~ EVTYPE, clean.1996, sum)
propdamage$PROPDAMAGE <- round(propdamage$PROPDAMAGE, 2)
propdamage <- propdamage[order(-propdamage$PROPDAMAGE), ]
propdamage$EVTYPE <- factor(propdamage$EVTYPE,
levels = propdamage[order(propdamage$PROPDAMAGE),
"EVTYPE"])
# aggregate total damage
damage <- aggregate(TOTAL ~ EVTYPE, clean.1996, sum)
damage$TOTAL <- round(damage$TOTAL, 2)
damage <- damage[order(-damage$TOTAL), ]
damage$EVTYPE <- factor(damage$EVTYPE,
levels = damage[order(damage$TOTAL),
"EVTYPE"])
Then we make some functions to simplify plotting multiple plots in one figure.
# define function to create multi-plot setup (nrow, ncol)
vp.setup <- function(x,y){
# create a new layout with grid
grid.newpage()
# define viewports and assign it to grid layout
pushViewport(viewport(layout = grid.layout(x,y)))
}
# define function to easily access layout (row, col)
vp.layout <- function(x,y){
viewport(layout.pos.row=x, layout.pos.col=y)
}
After that, we make a table of ten most impactful in public health events
table_damage <- merge(merge(cropdamage, propdamage,
by = 'EVTYPE', all.x = TRUE, all.y = TRUE),
damage, by = 'EVTYPE', all.x = TRUE, all.y = TRUE)
table_damage <- format(table_damage, big.mark=",", decimal.mark=".")
table_damage <- table_damage[order(table_damage$TOTAL, decreasing=TRUE), ]
kable(table_damage[1:10,], format = "pandoc",
caption = "Events with greatest economic consequences (Millions $)",
row.names = FALSE,
col.names=c("Event Type", "Property Damages", "Crop Damages", "Total"),
align=c("l", "r", "r", "r"))
Event Type | Property Damages | Crop Damages | Total |
---|---|---|---|
HURRICANE/TYPHOON | 5,350.11 | 81,718.89 | 87,069.00 |
STORM SURGE/TIDE | 0.86 | 47,844.15 | 47,845.01 |
FLOOD | 5,013.16 | 29,245.54 | 34,258.70 |
TORNADO | 283.42 | 24,616.90 | 24,900.32 |
HAIL | 2,496.82 | 14,595.12 | 17,091.94 |
FLASH FLOOD | 1,334.90 | 15,222.20 | 16,557.10 |
DROUGHT | 13,367.57 | 1,046.10 | 14,413.67 |
THUNDERSTORM WIND | 1,016.95 | 7,913.86 | 8,930.81 |
TROPICAL STORM | 677.71 | 7,642.47 | 8,320.18 |
WILDFIRE | 402.26 | 7,760.45 | 8,162.70 |
crop.plot <- ggplot(cropdamage[1:10,], aes(x = EVTYPE, y = CROPDAMAGE)) +
theme(text = element_text(size=50, face = "bold")) +
xlab("Event Type") + ylab("Crop Damage (Million $)") +
geom_bar(stat="identity", fill = "green") +
geom_text(aes(label = CROPDAMAGE), hjust=1, size=15) + coord_flip() +
theme(plot.margin = unit(c(2, 0, 2, 0), "lines"))
prop.plot <- ggplot(propdamage[1:10,], aes(x = EVTYPE, y = PROPDAMAGE)) +
theme(text = element_text(size=52, face = "bold")) +
xlab("") + ylab("Property Damage (Million $)") +
geom_bar(stat="identity", fill = "orange") +
geom_text(aes(label = PROPDAMAGE), hjust=1, size=15) + coord_flip() +
theme(plot.margin = unit(c(2, 0, 2, 0), "lines"))
total.plot <- ggplot(damage[1:10,], aes(x = EVTYPE, y = TOTAL)) +
theme(text = element_text(size=54, face = "bold")) +
xlab("Event Type") + ylab("Total Damage (Million $)") +
geom_bar(stat="identity", fill = "royalblue3") +
geom_text(aes(label = TOTAL), hjust=1, size=15) + coord_flip() +
theme(plot.margin = unit(c(2, 0, 2, 0), "lines"))
par(mfrow=c(2,2))
vp.setup(2,2)
print(crop.plot, vp=vp.layout(1, 1))
print(prop.plot, vp=vp.layout(1, 2))
print(total.plot, vp=vp.layout(2, 1:2))
The table and the plots shows that Drought causes the most crop damage (13,367.57 Million Dollars).
Furthermore, Hurricane/Typhoon causes the most property damage (81,718.89 Million Dollars) and the most total damage (87,069 Million Dollars).
Finally, we present the frequency for the most harmful event types during the year in order to observe when and how often these event types occur.
These event types, based on the tables and plots above, are: Drought, Excessive Heat, Flash Flood, Flood, Hail, Hurricane/Typhoon, Lightning, Storm Surge/Tide and Tornado.
library(plotrix)
## Warning: package 'plotrix' was built under R version 3.1.3
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# function for evtype freq
evtype.freq <- function(evtype){
ans <- subset(clean.1996, clean.1996$EVTYPE == evtype, select = c("BGN_DATE"))
ans$MONTH <- as.numeric(format(as.Date(ans$BGN_DATE,
format="%m/%d/%Y %H:%M:%S"),"%m"))
ans <- count(ans, MONTH)
if (length(ans$n) == 12){
return(ans$n)
} else {
temp <-numeric(length = 12)
for (i in 1:length(ans$n)) {
temp[ans$MONTH[i]] <- ans$n[i]
}
}
return(temp)
}
# calculation of frequencies for each event type
drought <- evtype.freq("DROUGHT")
excessive.heat <- evtype.freq("EXCESSIVE HEAT")
flash.flood <- evtype.freq("FLASH FLOOD")
flood <- evtype.freq("FLOOD")
hail <- evtype.freq("HAIL")
hurricane.typhoon <- evtype.freq("HURRICANE/TYPHOON")
lightning <- evtype.freq("LIGHTNING")
storm.surge <- evtype.freq("STORM SURGE/TIDE")
tornado <- evtype.freq("TORNADO")
# function for plotting polar plot
polar.month.evtype <- function(evtype, name){
testpos<-seq(0,350, by = 30)
labels <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
ans <- polar.plot(evtype, testpos, labels, testpos,
main = paste("Month Polar Plot of", name),
radial.lim=c(0,max(evtype)), rp.type= "p",
start=90, clockwise=TRUE, lwd=3, line.col = "red",
show.grid.labels = TRUE, boxed.radial = FALSE)
}
# make 3x3 grid
par(mfrow=c(3,3))
# plot each event types polar plot
drought.plot <- polar.month.evtype(drought, "Drought")
excessive.heat.plot <- polar.month.evtype(excessive.heat, "Excessive Heat")
flash.flood.plot <- polar.month.evtype(flash.flood, "Flash Flood")
flood.plot <- polar.month.evtype(flood, "Flood")
hail.plot <- polar.month.evtype(hail, "Hail")
hurricane.typhoon.plot <- polar.month.evtype(hurricane.typhoon, "Hurricane/Typhoon")
lightning.plot <- polar.month.evtype(lightning, "Lightning")
storm.surge.plot <- polar.month.evtype(storm.surge, "Storm Surge/Tide")
tornado.plot <- polar.month.evtype(tornado, "Tornado")