In this assignment, we try to answer some basic questions about storm events that affect people and cause economic depletion. In order to do so, we explore the NOAA Storm Database that tracks major storm systems in the US and classifies them with best estimate information on property and crop damage along with fatalities and injuries to people.We also sample the data for recent information using statistical filters, which ensures analysis of more accurate recorded information, and also avoid any serious inflation and other cost of money factors that may creep in to the analysis.We find that some events are much more harmful and claim more lives and injure people than others, Tornado, Weather changes like excessive heat and dryness, Flood, Tropical storms and Lightning topping the list. Storm conditions like Flood,Hurricane,Tropical storm, Tornado, Hail, Calamities like drought and wildfire claimimg a lot of damage to property and crops.
The data for this report is obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The file however was downloaded from the course database.
(Note: bz2 is a zip algorithm, and the unzip feature is built into the read.csv command.The unzipping and loading the table is a lengthy process, and the users are advised to make a copy of the extracted data set, in case the original loaded dataset is corrupted if space permits)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "~\\Course 5 -Reproducible Research\\Week 4\\repdata_data_StormData.csv.bz2"
download.file(url, destfile)
setwd("~/Course 5 -Reproducible Research/Week 4")
stormdata <- read.csv("repdata_data_StormData.csv.bz2",header = TRUE)
copydata <- stormdata
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
https://rpubs.com/flyingdisc/PROPDMGEXP
+ [Hh] = Hundreds
+ [Kk] = Thousand
+ [Mm] = Million
+ [Bb] = Billion
+ [0-8] = Tens
+ [+] = Unit
+ [ /?/-]= 0
unique(stormdata$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(stormdata$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
stormdata$year <- as.numeric(format(as.Date(stormdata$BGN_DATE,format = "%m/%d/%Y %H:%M:%S"),"%Y"))
summary(stormdata$year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1950 1995 2002 1999 2007 2011
Since the data ranges from 1950 till 2011, the older data may be too few to base our research on, hence we will make some assumptions and filter the data to newer data. Lets first try to get the counts of the observations for the years
We can check the distribution of data by creating a histogram
library(ggplot2)
qplot(year,data = stormdata)+geom_hline(yintercept = 60000)+geom_vline(xintercept = 1995)+labs(label = "Histogram of observations by year", Subtitle = "with lines marking the data to be considered for analysis")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
anlstormdata <- subset(stormdata,year >= 1995)
dim(anlstormdata)
## [1] 681500 38
First we need to convert the exp values from alphanumeric characters to numbers, so we can extrapolate the damages in dollars.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Constants that need to be converted into exponents
zeros <- c("","?","-")
units <- "+"
tens <- c("0","1","2","3","4","5","6","7","8")
hundreds <- c("H","h")
thousands <- c("K","k")
millions <- c("M","m")
billions <- c("B","b")
# Evaluate property damage in dollars
anlstormdata[anlstormdata$PROPDMGEXP %in% zeros,"PROPDMGEXPN"] <- 0
anlstormdata[anlstormdata$PROPDMGEXP %in% units,"PROPDMGEXPN"] <- 1
anlstormdata[anlstormdata$PROPDMGEXP %in% tens,"PROPDMGEXPN"] <- 10
anlstormdata[anlstormdata$PROPDMGEXP %in% hundreds,"PROPDMGEXPN"] <- 100
anlstormdata[anlstormdata$PROPDMGEXP %in% thousands,"PROPDMGEXPN"] <- 1000
anlstormdata[anlstormdata$PROPDMGEXP %in% millions,"PROPDMGEXPN"] <- 1000000
anlstormdata[anlstormdata$PROPDMGEXP %in% billions,"PROPDMGEXPN"] <- 1000000000
anlstormdata$PROPERTYDAMAGE <- anlstormdata$PROPDMG * anlstormdata$PROPDMGEXPN
# Evaluate crop damage in dollars
anlstormdata[anlstormdata$CROPDMGEXP %in% zeros,"CROPDMGEXPN"] <- 0
anlstormdata[anlstormdata$CROPDMGEXP %in% units,"CROPDMGEXPN"] <- 1
anlstormdata[anlstormdata$CROPDMGEXP %in% tens,"CROPDMGEXPN"] <- 10
anlstormdata[anlstormdata$CROPDMGEXP %in% hundreds,"CROPDMGEXPN"] <- 100
anlstormdata[anlstormdata$CROPDMGEXP %in% thousands,"CROPDMGEXPN"] <- 1000
anlstormdata[anlstormdata$CROPDMGEXP %in% millions,"CROPDMGEXPN"] <- 1000000
anlstormdata[anlstormdata$CROPDMGEXP %in% billions,"CROPDMGEXPN"] <- 1000000000
anlstormdata$CROPDAMAGE <- anlstormdata$CROPDMG * anlstormdata$CROPDMGEXPN
length(unique(anlstormdata$EVTYPE))
## [1] 799
head(unique(anlstormdata$EVTYPE),30)
## [1] FREEZING RAIN SNOW
## [3] SNOW/ICE HURRICANE OPAL/HIGH WINDS
## [5] HAIL THUNDERSTORM WINDS
## [7] RECORD COLD HURRICANE ERIN
## [9] HURRICANE OPAL DENSE FOG
## [11] RIP CURRENT TORNADO
## [13] THUNDERSTORM WINS LIGHTNING
## [15] FLASH FLOOD FLASH FLOODING
## [17] HIGH WINDS TORNADO F0
## [19] THUNDERSTORM WINDS LIGHTNING FUNNEL CLOUD
## [21] THUNDERSTORM WINDS/HAIL THUNDERSTORM WIND
## [23] HEAT WIND
## [25] HEAVY RAINS LIGHTNING AND HEAVY RAIN
## [27] HEAVY RAIN THUNDERSTORM WINDS HAIL
## [29] FLOOD COLD
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
As we can see, there are about 800 classifications of event, and further, some have been explicitely named like “Hurricane Opal”. To make the classification broader than it is, so that we can study the costs and casualities in a simpler format, There is definitely a need for reclassification. By grouping similar terminologies, and also considering typo’s, here is an attempt in reclassification. we will mark the ambiguous ones as Others and ignore summary records as they do not really talk about any specific event.
(Note: the grepl string gives a clue on what actual event is being reclassified)
tsunami <- unique(anlstormdata[grepl("tsunami",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
hurricane <- unique(anlstormdata[grepl("hurricane|floy",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
typhoon <- unique(anlstormdata[grepl("typh",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
tornado <- unique(anlstormdata[grepl("tornado",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
thunderstorm <- unique(anlstormdata[grepl("thunderstorm|tstm",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
tropicalstorm <- unique(anlstormdata[grepl("tropical",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
lightning <- unique(anlstormdata[grepl("lightning|lignt",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
flood <- unique(anlstormdata[grepl("flood|fld|surge|dam f|dam b|high wa|rising|seiche",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
tide <- unique(anlstormdata[grepl("tide|sea|wave|swel",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
erosion <- unique(anlstormdata[grepl("eros",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
blizzard <- unique(anlstormdata[grepl("blizzard",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
avalanche <- unique(anlstormdata[grepl("avalanche",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
volcano <- unique(anlstormdata[grepl("volcan|vog",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
freeze <- unique(anlstormdata[grepl("freez|ice|icy|frost|sleet|glaze",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
drought <- unique(anlstormdata[grepl("drought|dry|drie",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
wildfire <- unique(anlstormdata[grepl("wildfire|fire",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
mudslide <- unique(anlstormdata[grepl("slide",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
hail <- unique(anlstormdata[grepl("hail",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
storm <- unique(anlstormdata[grepl("storm",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
rain <- unique(anlstormdata[grepl("rain|prec|wet|shower|heavy r|downbur|urb",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
snow <- unique(anlstormdata[grepl("snow|wint",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
heat <- unique(anlstormdata[grepl("heat|warm|hot|high t|hyper",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
wind <- unique(anlstormdata[grepl("wind|wnd|gust",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
cold <- unique(anlstormdata[grepl("cold|hypo|low t|cool",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
fog <- unique(anlstormdata[grepl("fog",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
surf <- unique(anlstormdata[grepl("surf",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
microburst <- unique(anlstormdata[grepl("micro|mico",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
dust <- unique(anlstormdata[grepl("dust",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
smoke <- unique(anlstormdata[grepl("smoke",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
spout <- unique(anlstormdata[grepl("spout",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
funnel <- unique(anlstormdata[grepl("funnel|cloud",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
current <- unique(anlstormdata[grepl("current",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
record <- unique(anlstormdata[grepl("record t|temperature rec|record h|record l",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
drowning <- unique(anlstormdata[grepl("drowning",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
slump <- unique(anlstormdata[grepl("slump",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
marine <- unique(anlstormdata[grepl("marine",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
summ <- unique(anlstormdata[grepl("summary",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
A sample of some of the classification is shown
hurricane
## [1] HURRICANE OPAL/HIGH WINDS HURRICANE ERIN
## [3] HURRICANE OPAL HURRICANE-GENERATED SWELLS
## [5] HURRICANE FELIX HURRICANE
## [7] Hurricane Edouard REMNANTS OF FLOYD
## [9] HURRICANE/TYPHOON
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
tornado
## [1] TORNADO TORNADO F0 TORNADOS
## [4] WATERSPOUT TORNADO WATERSPOUT/TORNADO WATERSPOUT-TORNADO
## [7] WATERSPOUT/ TORNADO TORNADO F3 TORNADO F1
## [10] TORNADO/WATERSPOUT TORNADO F2 TORNADO DEBRIS
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
thunderstorm
## [1] THUNDERSTORM WINDS THUNDERSTORM WINS
## [3] THUNDERSTORM WINDS LIGHTNING THUNDERSTORM WINDS/HAIL
## [5] THUNDERSTORM WIND THUNDERSTORM WINDS HAIL
## [7] THUNDERSTORM TSTM WIND
## [9] SEVERE THUNDERSTORMS SEVERE THUNDERSTORM WINDS
## [11] THUNDERSTORMS WINDS SEVERE THUNDERSTORM
## [13] LIGHTNING THUNDERSTORM WINDSS THUNDERSTORM WINDSS
## [15] LIGHTNING THUNDERSTORM WINDS LIGHTNING AND THUNDERSTORM WIN
## [17] THUNDERSTORM WINDS53 THUNDERSTORM WINDS URBAN FLOOD
## [19] THUNDERSTORM WINDS SMALL STREA THUNDERSTORM WINDS 2
## [21] TSTM WIND 51 TSTM WIND 50
## [23] TSTM WIND 52 TSTM WIND 55
## [25] THUNDERSTORM WINDS 61 THUNDERSTORM DAMAGE
## [27] THUNDERSTORMW 50 THUNDERSTORMS WIND
## [29] THUNDERSTORM WINDS THUNDERSTORM WINDS/ HAIL
## [31] THUNDERSTORM WIND/LIGHTNING THUNDERSTORM WIND G50
## [33] THUNDERSTORM WINDS/HEAVY RAIN THUNDERSTORM WINDS G
## [35] THUNDERSTORM WIND G60 THUNDERSTORM WIND G55
## [37] THUNDERSTORM WINDS G60 THUNDERSTORM WINDS FUNNEL CLOU
## [39] THUNDERSTORM WINDS/FLASH FLOOD THUNDERSTORM WIND 59
## [41] THUNDERSTORM WIND 52 THUNDERSTORM WIND 69
## [43] TSTM WIND G58 THUNDERSTORM WIND 60 MPH
## [45] THUNDERSTORM WIND 65MPH THUNDERSTORM WIND/ TREES
## [47] THUNDERSTORM WIND/AWNING THUNDERSTORM WIND 98 MPH
## [49] THUNDERSTORM WIND TREES THUNDERSTORM WIND 59 MPH
## [51] THUNDERSTORM WINDS 63 MPH THUNDERSTORM WIND/ TREE
## [53] THUNDERSTORM DAMAGE TO THUNDERSTORM WIND 65 MPH
## [55] THUNDERSTORM WIND. THUNDERSTORM WIND 59 MPH.
## [57] THUNDERSTORM WINDSHAIL THUNDERSTORM WINDS AND
## [59] TSTM WIND DAMAGE THUNDERSTORM WIND G52
## [61] THUNDERSTORM WIND G51 THUNDERSTORM WIND G61
## [63] THUNDERSTORM WINDS. THUNDERSTORM W INDS
## [65] THUNDERSTORM WIND 50 THUNDERSTORM WIND 56
## [67] THUNDERSTORMW TSTM WINDS
## [69] TSTM WIND 65) THUNDERSTORM WINDS/ FLOOD
## [71] THUNDERSTORM WINDS HEAVY RAIN TSTM WIND/HAIL
## [73] Tstm Wind THUNDERSTORMS
## [75] Thunderstorm Wind TSTM WIND (G45)
## [77] TSTM HEAVY RAIN TSTM WIND 40
## [79] TSTM WIND 45 TSTM WIND (41)
## [81] TSTM WIND (G40) TSTM WND
## [83] TSTM WIND TSTM WIND AND LIGHTNING
## [85] TSTM WIND (G45) TSTM WIND (G45)
## [87] TSTM WIND (G35) TSTM
## [89] TSTM WIND G45 THUNDERSTORM WIND (G40)
## [91] NON-TSTM WIND NON TSTM WIND
## [93] GUSTY THUNDERSTORM WINDS MARINE TSTM WIND
## [95] GUSTY THUNDERSTORM WIND MARINE THUNDERSTORM WIND
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
A new column with the new classification will be created in the reverse order, so that if there are any repititions of original event types, the new severe event type is considered. + for example, Hurricane Opal/high winds will be classified as hurricane and not wind
anlstormdata[anlstormdata$EVTYPE %in% summ,"EVENTTYPE"] <- "Summary"
anlstormdata[anlstormdata$EVTYPE %in% marine,"EVENTTYPE"] <- "Marine"
anlstormdata[anlstormdata$EVTYPE %in% slump,"EVENTTYPE"] <- "LandSlump"
anlstormdata[anlstormdata$EVTYPE %in% drowning,"EVENTTYPE"] <- "Drowning"
anlstormdata[anlstormdata$EVTYPE %in% record,"EVENTTYPE"] <- "RecordTemperature"
anlstormdata[anlstormdata$EVTYPE %in% current,"EVENTTYPE"] <- "RipCurrent"
anlstormdata[anlstormdata$EVTYPE %in% funnel,"EVENTTYPE"] <- "FunnelCloud"
anlstormdata[anlstormdata$EVTYPE %in% spout,"EVENTTYPE"] <- "Spouts"
anlstormdata[anlstormdata$EVTYPE %in% smoke,"EVENTTYPE"] <- "Smoke"
anlstormdata[anlstormdata$EVTYPE %in% dust,"EVENTTYPE"] <- "Dust"
anlstormdata[anlstormdata$EVTYPE %in% microburst,"EVENTTYPE"] <- "Microburst"
anlstormdata[anlstormdata$EVTYPE %in% surf,"EVENTTYPE"] <- "Surf"
anlstormdata[anlstormdata$EVTYPE %in% fog,"EVENTTYPE"] <- "Fog"
anlstormdata[anlstormdata$EVTYPE %in% cold,"EVENTTYPE"] <- "Cold"
anlstormdata[anlstormdata$EVTYPE %in% wind,"EVENTTYPE"] <- "Wind"
anlstormdata[anlstormdata$EVTYPE %in% heat,"EVENTTYPE"] <- "Heat"
anlstormdata[anlstormdata$EVTYPE %in% snow,"EVENTTYPE"] <- "Snow"
anlstormdata[anlstormdata$EVTYPE %in% rain,"EVENTTYPE"] <- "Rain"
anlstormdata[anlstormdata$EVTYPE %in% storm,"EVENTTYPE"] <- "Storm"
anlstormdata[anlstormdata$EVTYPE %in% hail,"EVENTTYPE"] <- "Hail"
anlstormdata[anlstormdata$EVTYPE %in% mudslide,"EVENTTYPE"] <- "Mudslide"
anlstormdata[anlstormdata$EVTYPE %in% wildfire,"EVENTTYPE"] <- "Wildfire"
anlstormdata[anlstormdata$EVTYPE %in% drought,"EVENTTYPE"] <- "Drought"
anlstormdata[anlstormdata$EVTYPE %in% freeze,"EVENTTYPE"] <- "Freeze"
anlstormdata[anlstormdata$EVTYPE %in% volcano,"EVENTTYPE"] <- "Volcano"
anlstormdata[anlstormdata$EVTYPE %in% avalanche,"EVENTTYPE"] <- "Avalanche"
anlstormdata[anlstormdata$EVTYPE %in% blizzard,"EVENTTYPE"] <- "Blizzard"
anlstormdata[anlstormdata$EVTYPE %in% erosion,"EVENTTYPE"] <- "Erosion"
anlstormdata[anlstormdata$EVTYPE %in% tide,"EVENTTYPE"] <- "Tide"
anlstormdata[anlstormdata$EVTYPE %in% flood,"EVENTTYPE"] <- "Flood"
anlstormdata[anlstormdata$EVTYPE %in% lightning,"EVENTTYPE"] <- "Lightning"
anlstormdata[anlstormdata$EVTYPE %in% tropicalstorm,"EVENTTYPE"] <- "Tropicalstorm"
anlstormdata[anlstormdata$EVTYPE %in% thunderstorm,"EVENTTYPE"] <- "Thunderstorm"
anlstormdata[anlstormdata$EVTYPE %in% tornado,"EVENTTYPE"] <- "Tornado"
anlstormdata[anlstormdata$EVTYPE %in% typhoon,"EVENTTYPE"] <- "Typhoon"
anlstormdata[anlstormdata$EVTYPE %in% hurricane,"EVENTTYPE"] <- "Hurricane"
anlstormdata[anlstormdata$EVTYPE %in% tsunami,"EVENTTYPE"] <- "Tsunami"
After the reclassification, we are left with only a few items, that we will be placing under the other bucket. since we do not have enough information about the summary records and others, we can filter those out
unique(anlstormdata[is.na(anlstormdata$EVENTTYPE),"EVTYPE"])
## [1] OTHER HEAVY MIX SOUTHEAST
## [4] EXCESSIVE Other No Severe Weather
## [7] NONE MONTHLY TEMPERATURE RED FLAG CRITERIA
## [10] NORTHERN LIGHTS
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
Filter off the unclassified data
rcstdata <- anlstormdata[!is.na(anlstormdata$EVENTTYPE),c("year","EVENTTYPE","FATALITIES","INJURIES","PROPERTYDAMAGE","CROPDAMAGE")]
Filter off the summary data
rcstdata <- subset(rcstdata,EVENTTYPE != "Summary")
Accumulate the People Affected
rcstdata$PeopleAffected <- rcstdata$FATALITIES + rcstdata$INJURIES
Accumulte the Total Damages
rcstdata$TotalDamage <- rcstdata$PROPERTYDAMAGE + rcstdata$CROPDAMAGE
table(rcstdata$EVENTTYPE)
##
## Avalanche Blizzard Cold Drought
## 380 2666 799 2720
## Drowning Dust Erosion Flood
## 1 150 6 83051
## Fog Freeze FunnelCloud Hail
## 1805 4048 6429 216523
## Heat Hurricane LandSlump Lightning
## 2708 284 2 14288
## Marine Microburst Mudslide Rain
## 3 6 634 11861
## RecordTemperature RipCurrent Smoke Snow
## 75 763 21 24574
## Spouts Storm Surf Thunderstorm
## 3567 11820 1057 234069
## Tide Tornado Tropicalstorm Tsunami
## 645 24373 749 20
## Typhoon Volcano Wildfire Wind
## 11 30 4215 27018
sort(tapply(rcstdata$PeopleAffected, rcstdata$EVENTTYPE, sum), decreasing = TRUE)
## Tornado Heat Flood Thunderstorm
## 23332 11633 10072 6126
## Lightning Wind Storm Snow
## 5362 2310 1957 1642
## Wildfire Hurricane RipCurrent Fog
## 1545 1462 1088 1065
## Hail Freeze Tide Blizzard
## 936 766 649 456
## Surf Rain Tropicalstorm Avalanche
## 405 399 395 382
## Cold Tsunami Mudslide Dust
## 296 162 99 45
## Drought Spouts Marine Typhoon
## 37 31 15 5
## Drowning FunnelCloud Erosion LandSlump
## 1 1 0 0
## Microburst RecordTemperature Smoke Volcano
## 0 0 0 0
people = as.data.frame.table(sort(tapply(rcstdata$PeopleAffected, rcstdata$EVENTTYPE,sum), decreasing = TRUE))
colnames(people) = c("Event", "PeopleAffected")
sort(tapply(rcstdata$TotalDamage, rcstdata$EVENTTYPE, sum), decreasing = TRUE)
## Flood Hurricane Tornado Hail
## 215088038498 90164972810 25227212402 17922530677
## Drought Thunderstorm Tropicalstorm Wildfire
## 14969925380 11002294042 8353958550 8163274130
## Wind Freeze Rain Storm
## 6293398855 5575689360 4103117240 1624267250
## Cold Heat Snow Lightning
## 1358809400 903474200 875704157 803095052
## Typhoon Blizzard Mudslide Tsunami
## 601055000 533568950 346093100 144082000
## Surf Tide Fog Spouts
## 95924500 67322550 21474500 5739200
## Avalanche Erosion Dust LandSlump
## 3716800 866000 723130 570000
## Volcano RipCurrent FunnelCloud Smoke
## 500000 163000 134100 100000
## Marine Microburst Drowning RecordTemperature
## 50000 20000 0 0
property = as.data.frame.table(sort(tapply(rcstdata$TotalDamage, rcstdata$EVENTTYPE,sum), decreasing = TRUE))
colnames(property) = c("Event", "TotalDamage")
ppl = ggplot(data = people,aes(x = Event, y = PeopleAffected)) + theme(axis.text.x = element_text(angle = 60,hjust = 1)) + geom_bar(stat = "identity") + labs(x = "Storm Weather Event", y = "# People Affected (Killed-Injured)",mar = c(4,4,2,1))
ppl + labs(subtitle = "Dangerous Storm Events")
p2 = ggplot(data = property,aes(x = Event, y = TotalDamage)) + theme(axis.text.x = element_text(angle = 60,hjust = 1)) + geom_bar(stat = "identity") + labs(x = "Storm Weather Event", y = "Damage to Property and Crops (Dollars)",mar = c(4,4,2,1))
p2 + labs(subtitle = "Expensive Storm Events")
It is seen from the plots that some of the major events that claim lot of lives and injures people historically are
And from the economic perspective
are leading storm events.These Events also make good candidates to have some kind of insurance for, either personal protection or for our properties, if we are situated in a geographical region that is prone to any of these storm events.
(Note: there could be some events that could have skewed the numbers in a big way, due to the un-preparedness of the country like Hurricane Andrew or the Flooding caused by Katrina)