Synopsis

In this assignment, we try to answer some basic questions about storm events that affect people and cause economic depletion. In order to do so, we explore the NOAA Storm Database that tracks major storm systems in the US and classifies them with best estimate information on property and crop damage along with fatalities and injuries to people.We also sample the data for recent information using statistical filters, which ensures analysis of more accurate recorded information, and also avoid any serious inflation and other cost of money factors that may creep in to the analysis.We find that some events are much more harmful and claim more lives and injure people than others, Tornado, Weather changes like excessive heat and dryness, Flood, Tropical storms and Lightning topping the list. Storm conditions like Flood,Hurricane,Tropical storm, Tornado, Hail, Calamities like drought and wildfire claimimg a lot of damage to property and crops.

Lets start!

Loading the Raw Data

The data for this report is obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The file however was downloaded from the course database.

(Note: bz2 is a zip algorithm, and the unzip feature is built into the read.csv command.The unzipping and loading the table is a lengthy process, and the users are advised to make a copy of the extracted data set, in case the original loaded dataset is corrupted if space permits)

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "~\\Course 5 -Reproducible Research\\Week 4\\repdata_data_StormData.csv.bz2"
download.file(url, destfile)
setwd("~/Course 5 -Reproducible Research/Week 4")
stormdata <- read.csv("repdata_data_StormData.csv.bz2",header = TRUE)

Optional step

copydata <- stormdata

Data Processing

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Important things to note from the str data and documentation

  • Data frame consists of about 902 thousand observations with 37 columns
  • Date is is the %m/%d/%Y format
  • Time is in the %H:%M:%S format
  • We have a time zone column
  • we have a county ID and a county Name of the incident
  • we have about 985 types of events and the magnitude in addition to F-scale values
    • Hail magnitude is represented in hundredths of an inch
    • Hurricane is represented by Saffir/Simpson Hurricane Scale (1-5)
    • Tornado wind speeds with Fujita scale (0-5)
  • we have the range of area’s affected, along with the lattitude and longitude of the area’s
  • A reference number for easy location of records and special remarks if any
  • Finally, we have the fatalities,injuries,crop and property damage information the exponent gives the multiplier, as per the documentation.

http://www.ire.org

https://rpubs.com/flyingdisc/PROPDMGEXP

    + [Hh]   = Hundreds
    + [Kk]   = Thousand
    + [Mm]   = Million
    + [Bb]   = Billion
    + [0-8]  = Tens
    + [+]    = Unit
    + [ /?/-]= 0
unique(stormdata$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(stormdata$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

Considering recent data for analysis

stormdata$year <- as.numeric(format(as.Date(stormdata$BGN_DATE,format = "%m/%d/%Y %H:%M:%S"),"%Y"))
summary(stormdata$year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1950    1995    2002    1999    2007    2011

Since the data ranges from 1950 till 2011, the older data may be too few to base our research on, hence we will make some assumptions and filter the data to newer data. Lets first try to get the counts of the observations for the years

We can check the distribution of data by creating a histogram

library(ggplot2)
qplot(year,data = stormdata)+geom_hline(yintercept = 60000)+geom_vline(xintercept = 1995)+labs(label = "Histogram of observations by year", Subtitle = "with lines marking the data to be considered for analysis")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

From the plot, we can see that >50 percentile of the data intersects with the year 1995.We will consider data from 1995 onwards as our sample for our analysis

anlstormdata <- subset(stormdata,year >= 1995)
dim(anlstormdata)
## [1] 681500     38

Calculation of Damages

First we need to convert the exp values from alphanumeric characters to numbers, so we can extrapolate the damages in dollars.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Constants that need to be converted into exponents
zeros <- c("","?","-")
units <- "+"
tens <- c("0","1","2","3","4","5","6","7","8")
hundreds <- c("H","h")
thousands <- c("K","k")
millions <- c("M","m")
billions <- c("B","b")

# Evaluate property damage in dollars 
anlstormdata[anlstormdata$PROPDMGEXP %in% zeros,"PROPDMGEXPN"] <- 0
anlstormdata[anlstormdata$PROPDMGEXP %in% units,"PROPDMGEXPN"] <- 1
anlstormdata[anlstormdata$PROPDMGEXP %in% tens,"PROPDMGEXPN"] <- 10
anlstormdata[anlstormdata$PROPDMGEXP %in% hundreds,"PROPDMGEXPN"] <- 100
anlstormdata[anlstormdata$PROPDMGEXP %in% thousands,"PROPDMGEXPN"] <- 1000
anlstormdata[anlstormdata$PROPDMGEXP %in% millions,"PROPDMGEXPN"] <- 1000000
anlstormdata[anlstormdata$PROPDMGEXP %in% billions,"PROPDMGEXPN"] <- 1000000000
anlstormdata$PROPERTYDAMAGE <- anlstormdata$PROPDMG * anlstormdata$PROPDMGEXPN

# Evaluate crop damage in dollars 
anlstormdata[anlstormdata$CROPDMGEXP %in% zeros,"CROPDMGEXPN"] <- 0
anlstormdata[anlstormdata$CROPDMGEXP %in% units,"CROPDMGEXPN"] <- 1
anlstormdata[anlstormdata$CROPDMGEXP %in% tens,"CROPDMGEXPN"] <- 10
anlstormdata[anlstormdata$CROPDMGEXP %in% hundreds,"CROPDMGEXPN"] <- 100
anlstormdata[anlstormdata$CROPDMGEXP %in% thousands,"CROPDMGEXPN"] <- 1000
anlstormdata[anlstormdata$CROPDMGEXP %in% millions,"CROPDMGEXPN"] <- 1000000
anlstormdata[anlstormdata$CROPDMGEXP %in% billions,"CROPDMGEXPN"] <- 1000000000
anlstormdata$CROPDAMAGE <- anlstormdata$CROPDMG * anlstormdata$CROPDMGEXPN

Too many event types

Number of distinct values in our sample data

length(unique(anlstormdata$EVTYPE))
## [1] 799

sample of the event Types

head(unique(anlstormdata$EVTYPE),30)
##  [1] FREEZING RAIN                SNOW                        
##  [3] SNOW/ICE                     HURRICANE OPAL/HIGH WINDS   
##  [5] HAIL                         THUNDERSTORM WINDS          
##  [7] RECORD COLD                  HURRICANE ERIN              
##  [9] HURRICANE OPAL               DENSE FOG                   
## [11] RIP CURRENT                  TORNADO                     
## [13] THUNDERSTORM WINS            LIGHTNING                   
## [15] FLASH FLOOD                  FLASH FLOODING              
## [17] HIGH WINDS                   TORNADO F0                  
## [19] THUNDERSTORM WINDS LIGHTNING FUNNEL CLOUD                
## [21] THUNDERSTORM WINDS/HAIL      THUNDERSTORM WIND           
## [23] HEAT                         WIND                        
## [25] HEAVY RAINS                  LIGHTNING AND HEAVY RAIN    
## [27] HEAVY RAIN                   THUNDERSTORM WINDS HAIL     
## [29] FLOOD                        COLD                        
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

As we can see, there are about 800 classifications of event, and further, some have been explicitely named like “Hurricane Opal”. To make the classification broader than it is, so that we can study the costs and casualities in a simpler format, There is definitely a need for reclassification. By grouping similar terminologies, and also considering typo’s, here is an attempt in reclassification. we will mark the ambiguous ones as Others and ignore summary records as they do not really talk about any specific event.

(Note: the grepl string gives a clue on what actual event is being reclassified)

tsunami <- unique(anlstormdata[grepl("tsunami",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
hurricane <- unique(anlstormdata[grepl("hurricane|floy",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
typhoon <- unique(anlstormdata[grepl("typh",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
tornado <- unique(anlstormdata[grepl("tornado",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
thunderstorm <- unique(anlstormdata[grepl("thunderstorm|tstm",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
tropicalstorm <- unique(anlstormdata[grepl("tropical",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
lightning <- unique(anlstormdata[grepl("lightning|lignt",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
flood <- unique(anlstormdata[grepl("flood|fld|surge|dam f|dam b|high wa|rising|seiche",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
tide <- unique(anlstormdata[grepl("tide|sea|wave|swel",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
erosion <- unique(anlstormdata[grepl("eros",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
blizzard <- unique(anlstormdata[grepl("blizzard",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
avalanche <- unique(anlstormdata[grepl("avalanche",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
volcano <- unique(anlstormdata[grepl("volcan|vog",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
freeze <- unique(anlstormdata[grepl("freez|ice|icy|frost|sleet|glaze",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
drought <- unique(anlstormdata[grepl("drought|dry|drie",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
wildfire <- unique(anlstormdata[grepl("wildfire|fire",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
mudslide <- unique(anlstormdata[grepl("slide",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
hail <- unique(anlstormdata[grepl("hail",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
storm <- unique(anlstormdata[grepl("storm",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"]) 
rain <- unique(anlstormdata[grepl("rain|prec|wet|shower|heavy r|downbur|urb",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
snow <- unique(anlstormdata[grepl("snow|wint",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
heat <- unique(anlstormdata[grepl("heat|warm|hot|high t|hyper",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
wind <- unique(anlstormdata[grepl("wind|wnd|gust",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
cold <- unique(anlstormdata[grepl("cold|hypo|low t|cool",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
fog <- unique(anlstormdata[grepl("fog",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
surf <- unique(anlstormdata[grepl("surf",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
microburst <- unique(anlstormdata[grepl("micro|mico",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
dust <- unique(anlstormdata[grepl("dust",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
smoke <- unique(anlstormdata[grepl("smoke",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
spout <- unique(anlstormdata[grepl("spout",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
funnel <- unique(anlstormdata[grepl("funnel|cloud",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
current <- unique(anlstormdata[grepl("current",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
record <- unique(anlstormdata[grepl("record t|temperature rec|record h|record l",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
drowning <- unique(anlstormdata[grepl("drowning",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
slump <- unique(anlstormdata[grepl("slump",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
marine <- unique(anlstormdata[grepl("marine",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])
summ <- unique(anlstormdata[grepl("summary",anlstormdata$EVTYPE,ignore.case = TRUE),"EVTYPE"])

A sample of some of the classification is shown

hurricane
## [1] HURRICANE OPAL/HIGH WINDS  HURRICANE ERIN            
## [3] HURRICANE OPAL             HURRICANE-GENERATED SWELLS
## [5] HURRICANE FELIX            HURRICANE                 
## [7] Hurricane Edouard          REMNANTS OF FLOYD         
## [9] HURRICANE/TYPHOON         
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND
tornado
##  [1] TORNADO             TORNADO F0          TORNADOS           
##  [4] WATERSPOUT TORNADO  WATERSPOUT/TORNADO  WATERSPOUT-TORNADO 
##  [7] WATERSPOUT/ TORNADO TORNADO F3          TORNADO F1         
## [10] TORNADO/WATERSPOUT  TORNADO F2          TORNADO DEBRIS     
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND
thunderstorm
##  [1] THUNDERSTORM WINDS             THUNDERSTORM WINS             
##  [3] THUNDERSTORM WINDS LIGHTNING   THUNDERSTORM WINDS/HAIL       
##  [5] THUNDERSTORM WIND              THUNDERSTORM WINDS HAIL       
##  [7] THUNDERSTORM                   TSTM WIND                     
##  [9] SEVERE THUNDERSTORMS           SEVERE THUNDERSTORM WINDS     
## [11] THUNDERSTORMS WINDS            SEVERE THUNDERSTORM           
## [13] LIGHTNING THUNDERSTORM WINDSS  THUNDERSTORM WINDSS           
## [15] LIGHTNING THUNDERSTORM WINDS   LIGHTNING AND THUNDERSTORM WIN
## [17] THUNDERSTORM WINDS53           THUNDERSTORM WINDS URBAN FLOOD
## [19] THUNDERSTORM WINDS SMALL STREA THUNDERSTORM WINDS 2          
## [21] TSTM WIND 51                   TSTM WIND 50                  
## [23] TSTM WIND 52                   TSTM WIND 55                  
## [25] THUNDERSTORM WINDS 61          THUNDERSTORM DAMAGE           
## [27] THUNDERSTORMW 50               THUNDERSTORMS WIND            
## [29] THUNDERSTORM  WINDS            THUNDERSTORM WINDS/ HAIL      
## [31] THUNDERSTORM WIND/LIGHTNING    THUNDERSTORM WIND G50         
## [33] THUNDERSTORM WINDS/HEAVY RAIN  THUNDERSTORM WINDS G          
## [35] THUNDERSTORM WIND G60          THUNDERSTORM WIND G55         
## [37] THUNDERSTORM WINDS G60         THUNDERSTORM WINDS FUNNEL CLOU
## [39] THUNDERSTORM WINDS/FLASH FLOOD THUNDERSTORM WIND 59          
## [41] THUNDERSTORM WIND 52           THUNDERSTORM WIND 69          
## [43] TSTM WIND G58                  THUNDERSTORM WIND 60 MPH      
## [45] THUNDERSTORM WIND 65MPH        THUNDERSTORM WIND/ TREES      
## [47] THUNDERSTORM WIND/AWNING       THUNDERSTORM WIND 98 MPH      
## [49] THUNDERSTORM WIND TREES        THUNDERSTORM WIND 59 MPH      
## [51] THUNDERSTORM WINDS 63 MPH      THUNDERSTORM WIND/ TREE       
## [53] THUNDERSTORM DAMAGE TO         THUNDERSTORM WIND 65 MPH      
## [55] THUNDERSTORM WIND.             THUNDERSTORM WIND 59 MPH.     
## [57] THUNDERSTORM WINDSHAIL         THUNDERSTORM WINDS AND        
## [59] TSTM WIND DAMAGE               THUNDERSTORM WIND G52         
## [61] THUNDERSTORM WIND G51          THUNDERSTORM WIND G61         
## [63] THUNDERSTORM WINDS.            THUNDERSTORM W INDS           
## [65] THUNDERSTORM WIND 50           THUNDERSTORM WIND 56          
## [67] THUNDERSTORMW                  TSTM WINDS                    
## [69] TSTM WIND 65)                  THUNDERSTORM WINDS/ FLOOD     
## [71] THUNDERSTORM WINDS HEAVY RAIN  TSTM WIND/HAIL                
## [73] Tstm Wind                      THUNDERSTORMS                 
## [75] Thunderstorm Wind              TSTM WIND (G45)               
## [77] TSTM HEAVY RAIN                TSTM WIND 40                  
## [79] TSTM WIND 45                   TSTM WIND (41)                
## [81] TSTM WIND (G40)                TSTM WND                      
## [83]  TSTM WIND                     TSTM WIND AND LIGHTNING       
## [85]  TSTM WIND (G45)               TSTM WIND  (G45)              
## [87] TSTM WIND (G35)                TSTM                          
## [89] TSTM WIND G45                  THUNDERSTORM WIND (G40)       
## [91] NON-TSTM WIND                  NON TSTM WIND                 
## [93] GUSTY THUNDERSTORM WINDS       MARINE TSTM WIND              
## [95] GUSTY THUNDERSTORM WIND        MARINE THUNDERSTORM WIND      
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Populating the reclassification column

A new column with the new classification will be created in the reverse order, so that if there are any repititions of original event types, the new severe event type is considered. + for example, Hurricane Opal/high winds will be classified as hurricane and not wind

anlstormdata[anlstormdata$EVTYPE %in% summ,"EVENTTYPE"] <- "Summary"
anlstormdata[anlstormdata$EVTYPE %in% marine,"EVENTTYPE"] <- "Marine"
anlstormdata[anlstormdata$EVTYPE %in% slump,"EVENTTYPE"] <- "LandSlump"
anlstormdata[anlstormdata$EVTYPE %in% drowning,"EVENTTYPE"] <- "Drowning"
anlstormdata[anlstormdata$EVTYPE %in% record,"EVENTTYPE"] <- "RecordTemperature"
anlstormdata[anlstormdata$EVTYPE %in% current,"EVENTTYPE"] <- "RipCurrent"
anlstormdata[anlstormdata$EVTYPE %in% funnel,"EVENTTYPE"] <- "FunnelCloud"
anlstormdata[anlstormdata$EVTYPE %in% spout,"EVENTTYPE"] <- "Spouts"
anlstormdata[anlstormdata$EVTYPE %in% smoke,"EVENTTYPE"] <- "Smoke"
anlstormdata[anlstormdata$EVTYPE %in% dust,"EVENTTYPE"] <- "Dust"
anlstormdata[anlstormdata$EVTYPE %in% microburst,"EVENTTYPE"] <- "Microburst"
anlstormdata[anlstormdata$EVTYPE %in% surf,"EVENTTYPE"] <- "Surf"
anlstormdata[anlstormdata$EVTYPE %in% fog,"EVENTTYPE"] <- "Fog"
anlstormdata[anlstormdata$EVTYPE %in% cold,"EVENTTYPE"] <- "Cold"
anlstormdata[anlstormdata$EVTYPE %in% wind,"EVENTTYPE"] <- "Wind"
anlstormdata[anlstormdata$EVTYPE %in% heat,"EVENTTYPE"] <- "Heat"
anlstormdata[anlstormdata$EVTYPE %in% snow,"EVENTTYPE"] <- "Snow"
anlstormdata[anlstormdata$EVTYPE %in% rain,"EVENTTYPE"] <- "Rain"
anlstormdata[anlstormdata$EVTYPE %in% storm,"EVENTTYPE"] <- "Storm"
anlstormdata[anlstormdata$EVTYPE %in% hail,"EVENTTYPE"] <- "Hail"
anlstormdata[anlstormdata$EVTYPE %in% mudslide,"EVENTTYPE"] <- "Mudslide"
anlstormdata[anlstormdata$EVTYPE %in% wildfire,"EVENTTYPE"] <- "Wildfire"
anlstormdata[anlstormdata$EVTYPE %in% drought,"EVENTTYPE"] <- "Drought"
anlstormdata[anlstormdata$EVTYPE %in% freeze,"EVENTTYPE"] <- "Freeze"
anlstormdata[anlstormdata$EVTYPE %in% volcano,"EVENTTYPE"] <- "Volcano"
anlstormdata[anlstormdata$EVTYPE %in% avalanche,"EVENTTYPE"] <- "Avalanche"
anlstormdata[anlstormdata$EVTYPE %in% blizzard,"EVENTTYPE"] <- "Blizzard"
anlstormdata[anlstormdata$EVTYPE %in% erosion,"EVENTTYPE"] <- "Erosion"
anlstormdata[anlstormdata$EVTYPE %in% tide,"EVENTTYPE"] <- "Tide"
anlstormdata[anlstormdata$EVTYPE %in% flood,"EVENTTYPE"] <- "Flood"
anlstormdata[anlstormdata$EVTYPE %in% lightning,"EVENTTYPE"] <- "Lightning"
anlstormdata[anlstormdata$EVTYPE %in% tropicalstorm,"EVENTTYPE"] <- "Tropicalstorm"
anlstormdata[anlstormdata$EVTYPE %in% thunderstorm,"EVENTTYPE"] <- "Thunderstorm"
anlstormdata[anlstormdata$EVTYPE %in% tornado,"EVENTTYPE"] <- "Tornado"
anlstormdata[anlstormdata$EVTYPE %in% typhoon,"EVENTTYPE"] <- "Typhoon"
anlstormdata[anlstormdata$EVTYPE %in% hurricane,"EVENTTYPE"] <- "Hurricane"
anlstormdata[anlstormdata$EVTYPE %in% tsunami,"EVENTTYPE"] <- "Tsunami"

After the reclassification, we are left with only a few items, that we will be placing under the other bucket. since we do not have enough information about the summary records and others, we can filter those out

unique(anlstormdata[is.na(anlstormdata$EVENTTYPE),"EVTYPE"])
##  [1] OTHER               HEAVY MIX           SOUTHEAST          
##  [4] EXCESSIVE           Other               No Severe Weather  
##  [7] NONE                MONTHLY TEMPERATURE RED FLAG CRITERIA  
## [10] NORTHERN LIGHTS    
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Get new fully reclassified data with selected columns to formulate the analytical model

Filter off the unclassified data

rcstdata <- anlstormdata[!is.na(anlstormdata$EVENTTYPE),c("year","EVENTTYPE","FATALITIES","INJURIES","PROPERTYDAMAGE","CROPDAMAGE")]

Filter off the summary data

rcstdata <- subset(rcstdata,EVENTTYPE != "Summary")

Accumulate the People Affected

rcstdata$PeopleAffected <- rcstdata$FATALITIES + rcstdata$INJURIES

Accumulte the Total Damages

rcstdata$TotalDamage <- rcstdata$PROPERTYDAMAGE + rcstdata$CROPDAMAGE
table(rcstdata$EVENTTYPE)
## 
##         Avalanche          Blizzard              Cold           Drought 
##               380              2666               799              2720 
##          Drowning              Dust           Erosion             Flood 
##                 1               150                 6             83051 
##               Fog            Freeze       FunnelCloud              Hail 
##              1805              4048              6429            216523 
##              Heat         Hurricane         LandSlump         Lightning 
##              2708               284                 2             14288 
##            Marine        Microburst          Mudslide              Rain 
##                 3                 6               634             11861 
## RecordTemperature        RipCurrent             Smoke              Snow 
##                75               763                21             24574 
##            Spouts             Storm              Surf      Thunderstorm 
##              3567             11820              1057            234069 
##              Tide           Tornado     Tropicalstorm           Tsunami 
##               645             24373               749                20 
##           Typhoon           Volcano          Wildfire              Wind 
##                11                30              4215             27018

Obtain the listing of events by the number of people affected

sort(tapply(rcstdata$PeopleAffected, rcstdata$EVENTTYPE, sum), decreasing = TRUE)
##           Tornado              Heat             Flood      Thunderstorm 
##             23332             11633             10072              6126 
##         Lightning              Wind             Storm              Snow 
##              5362              2310              1957              1642 
##          Wildfire         Hurricane        RipCurrent               Fog 
##              1545              1462              1088              1065 
##              Hail            Freeze              Tide          Blizzard 
##               936               766               649               456 
##              Surf              Rain     Tropicalstorm         Avalanche 
##               405               399               395               382 
##              Cold           Tsunami          Mudslide              Dust 
##               296               162                99                45 
##           Drought            Spouts            Marine           Typhoon 
##                37                31                15                 5 
##          Drowning       FunnelCloud           Erosion         LandSlump 
##                 1                 1                 0                 0 
##        Microburst RecordTemperature             Smoke           Volcano 
##                 0                 0                 0                 0

Lets save the data in a table format, so we can plot the data

people = as.data.frame.table(sort(tapply(rcstdata$PeopleAffected, rcstdata$EVENTTYPE,sum), decreasing = TRUE))
colnames(people) = c("Event", "PeopleAffected")

Obtain the listing of events by the total damage

sort(tapply(rcstdata$TotalDamage, rcstdata$EVENTTYPE, sum), decreasing = TRUE)
##             Flood         Hurricane           Tornado              Hail 
##      215088038498       90164972810       25227212402       17922530677 
##           Drought      Thunderstorm     Tropicalstorm          Wildfire 
##       14969925380       11002294042        8353958550        8163274130 
##              Wind            Freeze              Rain             Storm 
##        6293398855        5575689360        4103117240        1624267250 
##              Cold              Heat              Snow         Lightning 
##        1358809400         903474200         875704157         803095052 
##           Typhoon          Blizzard          Mudslide           Tsunami 
##         601055000         533568950         346093100         144082000 
##              Surf              Tide               Fog            Spouts 
##          95924500          67322550          21474500           5739200 
##         Avalanche           Erosion              Dust         LandSlump 
##           3716800            866000            723130            570000 
##           Volcano        RipCurrent       FunnelCloud             Smoke 
##            500000            163000            134100            100000 
##            Marine        Microburst          Drowning RecordTemperature 
##             50000             20000                 0                 0

saving the data for the plot

property = as.data.frame.table(sort(tapply(rcstdata$TotalDamage, rcstdata$EVENTTYPE,sum), decreasing = TRUE))
colnames(property) = c("Event", "TotalDamage")
ppl = ggplot(data = people,aes(x = Event, y = PeopleAffected)) + theme(axis.text.x = element_text(angle = 60,hjust = 1)) + geom_bar(stat = "identity") + labs(x = "Storm Weather Event", y = "# People Affected (Killed-Injured)",mar = c(4,4,2,1)) 
ppl + labs(subtitle = "Dangerous Storm Events")

p2 = ggplot(data = property,aes(x = Event, y = TotalDamage)) + theme(axis.text.x = element_text(angle = 60,hjust = 1)) + geom_bar(stat = "identity") + labs(x = "Storm Weather Event", y = "Damage to Property and Crops (Dollars)",mar = c(4,4,2,1))
p2 + labs(subtitle = "Expensive Storm Events")

Results

It is seen from the plots that some of the major events that claim lot of lives and injures people historically are

And from the economic perspective

are leading storm events.These Events also make good candidates to have some kind of insurance for, either personal protection or for our properties, if we are situated in a geographical region that is prone to any of these storm events.

(Note: there could be some events that could have skewed the numbers in a big way, due to the un-preparedness of the country like Hurricane Andrew or the Flooding caused by Katrina)