Synopis

This report is published to identify which type of storm events, across the United States, that are (1) most harmful in terms of population health and (2) has greatest economic consequences. Data used is from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, having 902,297 rows and 37 columns of data. While data availability is from 1950 to 2011, only those data having fatalities and injuries, and damages to crops and properties were selected, which results in two datasets of 21,929 records (3 columns) and 245,031 records (5 columns) respectively, for purpose of this report. The selected datasets were further processed to map the events into 48 types as per the National Weather Service Storm Data Documentation. Further data cleaning is done to fatalities, casualties, property damage exponent, and crop damage exponents column of the datasets. Finally, the data is aggregated and sorted accordingly to derive the answer. Based on the analysis, Tornado is most harmful in terms of population health (fatalities and injuries totalling 96,997 people) while Flood caused greatest economic consequences (damages of 161 billion dollars).

Data Processing

Loading and Understanding Data

df <- read.csv("repdata-data-StormData.csv.bz2")

#understand dataset
head(df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
summary(df)
##     STATE__                  BGN_DATE             BGN_TIME     
##  Min.   : 1.0   5/25/2011 0:00:00:  1202   12:00:00 AM: 10163  
##  1st Qu.:19.0   4/27/2011 0:00:00:  1193   06:00:00 PM:  7350  
##  Median :30.0   6/9/2011 0:00:00 :  1030   04:00:00 PM:  7261  
##  Mean   :31.2   5/30/2004 0:00:00:  1016   05:00:00 PM:  6891  
##  3rd Qu.:45.0   4/4/2011 0:00:00 :  1009   12:00:00 PM:  6703  
##  Max.   :95.0   4/2/2006 0:00:00 :   981   03:00:00 PM:  6700  
##                 (Other)          :895866   (Other)    :857229  
##    TIME_ZONE          COUNTY           COUNTYNAME         STATE       
##  CST    :547493   Min.   :  0.0   JEFFERSON :  7840   TX     : 83728  
##  EST    :245558   1st Qu.: 31.0   WASHINGTON:  7603   KS     : 53440  
##  MST    : 68390   Median : 75.0   JACKSON   :  6660   OK     : 46802  
##  PST    : 28302   Mean   :100.6   FRANKLIN  :  6256   MO     : 35648  
##  AST    :  6360   3rd Qu.:131.0   LINCOLN   :  5937   IA     : 31069  
##  HST    :  2563   Max.   :873.0   MADISON   :  5632   NE     : 30271  
##  (Other):  3631                   (Other)   :862369   (Other):621339  
##                EVTYPE         BGN_RANGE           BGN_AZI      
##  HAIL             :288661   Min.   :   0.000          :547332  
##  TSTM WIND        :219940   1st Qu.:   0.000   N      : 86752  
##  THUNDERSTORM WIND: 82563   Median :   0.000   W      : 38446  
##  TORNADO          : 60652   Mean   :   1.484   S      : 37558  
##  FLASH FLOOD      : 54277   3rd Qu.:   1.000   E      : 33178  
##  FLOOD            : 25326   Max.   :3749.000   NW     : 24041  
##  (Other)          :170878                      (Other):134990  
##          BGN_LOCATI                  END_DATE             END_TIME     
##               :287743                    :243411              :238978  
##  COUNTYWIDE   : 19680   4/27/2011 0:00:00:  1214   06:00:00 PM:  9802  
##  Countywide   :   993   5/25/2011 0:00:00:  1196   05:00:00 PM:  8314  
##  SPRINGFIELD  :   843   6/9/2011 0:00:00 :  1021   04:00:00 PM:  8104  
##  SOUTH PORTION:   810   4/4/2011 0:00:00 :  1007   12:00:00 PM:  7483  
##  NORTH PORTION:   784   5/30/2004 0:00:00:   998   11:59:00 PM:  7184  
##  (Other)      :591444   (Other)          :653450   (Other)    :622432  
##    COUNTY_END COUNTYENDN       END_RANGE           END_AZI      
##  Min.   :0    Mode:logical   Min.   :  0.0000          :724837  
##  1st Qu.:0    NA's:902297    1st Qu.:  0.0000   N      : 28082  
##  Median :0                   Median :  0.0000   S      : 22510  
##  Mean   :0                   Mean   :  0.9862   W      : 20119  
##  3rd Qu.:0                   3rd Qu.:  0.0000   E      : 20047  
##  Max.   :0                   Max.   :925.0000   NE     : 14606  
##                                                 (Other): 72096  
##            END_LOCATI         LENGTH              WIDTH         
##                 :499225   Min.   :   0.0000   Min.   :   0.000  
##  COUNTYWIDE     : 19731   1st Qu.:   0.0000   1st Qu.:   0.000  
##  SOUTH PORTION  :   833   Median :   0.0000   Median :   0.000  
##  NORTH PORTION  :   780   Mean   :   0.2301   Mean   :   7.503  
##  CENTRAL PORTION:   617   3rd Qu.:   0.0000   3rd Qu.:   0.000  
##  SPRINGFIELD    :   575   Max.   :2315.0000   Max.   :4400.000  
##  (Other)        :380536                                         
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00          :465934   Min.   :  0.000          :618413  
##  1st Qu.:   0.00   K      :424665   1st Qu.:  0.000   K      :281832  
##  Median :   0.00   M      : 11330   Median :  0.000   M      :  1994  
##  Mean   :  12.06   0      :   216   Mean   :  1.527   k      :    21  
##  3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000   0      :    19  
##  Max.   :5000.00   5      :    28   Max.   :990.000   B      :     9  
##                    (Other):    84                     (Other):     9  
##       WFO                                       STATEOFFIC    
##         :142069                                      :248769  
##  OUN    : 17393   TEXAS, North                       : 12193  
##  JAN    : 13889   ARKANSAS, Central and North Central: 11738  
##  LWX    : 13174   IOWA, Central                      : 11345  
##  PHI    : 12551   KANSAS, Southwest                  : 11212  
##  TSA    : 12483   GEORGIA, North and Central         : 11120  
##  (Other):690738   (Other)                            :595920  
##                                                                                                                                                                                                     ZONENAMES     
##                                                                                                                                                                                                          :594029  
##                                                                                                                                                                                                          :205988  
##  GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M                                                                                                                                         :   639  
##  GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA                                                                                                                                                       :   592  
##  JEFFERSON - JEFFERSON                                                                                                                                                                                   :   303  
##  MADISON - MADISON                                                                                                                                                                                       :   302  
##  (Other)                                                                                                                                                                                                 :100444  
##     LATITUDE      LONGITUDE        LATITUDE_E     LONGITUDE_    
##  Min.   :   0   Min.   :-14451   Min.   :   0   Min.   :-14455  
##  1st Qu.:2802   1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0  
##  Median :3540   Median :  8707   Median :   0   Median :     0  
##  Mean   :2875   Mean   :  6940   Mean   :1452   Mean   :  3509  
##  3rd Qu.:4019   3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735  
##  Max.   :9706   Max.   : 17124   Max.   :9706   Max.   :106220  
##  NA's   :47                      NA's   :40                     
##                                            REMARKS           REFNUM      
##                                                :287433   Min.   :     1  
##                                                : 24013   1st Qu.:225575  
##  Trees down.\n                                 :  1110   Median :451149  
##  Several trees were blown down.\n              :   568   Mean   :451149  
##  Trees were downed.\n                          :   446   3rd Qu.:676723  
##  Large trees and power lines were blown down.\n:   432   Max.   :902297  
##  (Other)                                       :588295
str(df)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

From the summary, the relevant columns (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) does not contain NAs.

Create a list of the 48 events for crosschecking after mapping the event types.

Events <- as.factor(c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Freezing Fog", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"))

Load relevant libraries

library(stringr)
library(ggplot2)
library(scales)

Processing and Analysing for Most Harmful Storm Event

The following steps were done to identify the most harmful storm event.

Select Relevant Rows and Columns

Select columns EVTYPE, FATALITIES and INJURIES. Select rows where FATALITIES OR INJURIES are more than zero.

df2 <- df[,c(8,23,24)]
df2.health <-df2[df2$FATALITIES>0|df2$INJURIES>0,]
nrow(df2.health)
## [1] 21929
sum(complete.cases(df2.health))
## [1] 21929

This results in 21929 rows of data. All columns does not contain NAs as indicated by sum(complete.cases) which returns same value as nrow().

Basic Cleaning

Convert values to upper case, remove leading/trailing whitespaces,remove whitespaces in between.

#upper case 
df2.health$EVTYPE <- toupper(df2.health$EVTYPE)

#remove leading or trailing white spaces
df2.health$EVTYPE <- str_trim(df2.health$EVTYPE)

#remove double spaces
df2.health$EVTYPE <- gsub("  "," ",df2.health$EVTYPE)

Map Event Types

Map the values in EVTYPE column to 48 types. Most mapping were done based on pattern matching. Some required further reading, by referring to the REMARKS column of the raw dataset, the storm data documentation or via Internet searches.

#correct Avalanche
df2.health$EVTYPE <- gsub("AVALANCE","AVALANCHE",df2.health$EVTYPE)

#correct Blizzard
df2.health$EVTYPE <- gsub("BLOWING SNOW|BLIZZARD SUMMARY","BLIZZARD",df2.health$EVTYPE)

#correct Coastal Flood
df2.health$EVTYPE <- gsub("^COASTAL FLOODING.*|^TIDAL FLOOD$|^TIDAL FLOODING$","COASTAL FLOOD",df2.health$EVTYPE)

#correct Cold/Wind Chill
df2.health$EVTYPE <- gsub("^LOW TEMPERATURE$|^COLD$|^COLD/WINDS$|COLD WEATHER|^COLD TEMPERATURE$|^COLD WAVE$|UNSEASONABLY COLD|HYPOTHERMIA.*|^COLD AND SNOW$|^FOG AND COLD/WIND CHILLS$","COLD/WIND CHILL",df2.health$EVTYPE)

#correct Debris Flow
df2.health$EVTYPE <- gsub("^MUDSLIDE.*|^LANDSLIDE.*","DEBRIS FLOW",df2.health$EVTYPE)

#correct Dense Fog
df2.health$EVTYPE <- gsub("^FOG$|^FOG AND COLD TEMPERATURES$","DENSE FOG",df2.health$EVTYPE)

#correct Dense Smoke - not found in dataset

#correct Drought - not found in dataset

#correct Dust Devil
df2.health$EVTYPE <- gsub("^OTHER$","DUST DEVIL",df2.health$EVTYPE)

#correct Dust Storm - not found in dataset

#correct Excessive Heat
df2.health$EVTYPE <- gsub("^EXTREME HEAT$|DROUGHT/EXCESSIVE HEAT|RECORD HEAT|RECORD/EXCESSIVE HEAT|^HEAT WAVE.*|HYPERTHERMIA.*","EXCESSIVE HEAT",df2.health$EVTYPE)

#correct Extreme Cold/Wind Chill
df2.health$EVTYPE <- gsub("^EXTREME WINDCHILL$|^EXTENDED COLD$|^EXTREME COLD$|RECORD COLD","EXTREME COLD/WIND CHILL",df2.health$EVTYPE)

#correct Flash Flood
df2.health$EVTYPE <- gsub("^FLASH FLOOD/FLOOD$|LOCAL FLASH FLOOD|^FLASH FLOODING$|^FLASH FLOODING/FLOOD$|^FLASH FLOODS$","FLASH FLOOD",df2.health$EVTYPE)

#correct Flood
df2.health$EVTYPE <- gsub("^FLOOD/FLASH FLOOD$|^FLOOD/FLOOD$|FLOOD & HEAVY RAIN|^FLOODING$|^LOCAL FLOOD|^MAJOR FLOOD$|^MINOR FLOOD$|^MINOR FLOODING$|^URBAN FLOOD$|^URBAN FLOODING$|^URBAN FLOODS$|URBAN/STREET FLOODING|URBAN/SML STREAM FLDG|URBAN/SML STREAM FLD|URBAN/SMALL STRM FLDG|URBAN/SMALL STREAM FLOODING|URBAN/SMALL STREAM FLOOD|URBAN/SMALL FLOODING|URBAN SMALL STREAM FLOOD|URBAN AND SMALL STREAM FLOODIN|URBAN AND SMALL STREAM FLOOD|SMALL STREAM AND URBAN FLOOD|SMALL STREAM AND URBAN FLOODIN|SMALL STREAM FLOOD|SMALL STREAM FLOODING|SMALL STREAM URBAN FLOOD|SMALL STREAM/URBAN FLOOD|SML STREAM FLD|^RURAL FLOOD$|^RIVER FLOOD$|^RIVER FLOODING$|RIVER AND STREAM FLOOD|^FLOOD/RIVER FLOOD$","FLOOD",df2.health$EVTYPE)

#correct Freezing Fog
df2.health$EVTYPE <- gsub("^GLAZE$|GLAZE/ICE STORM","FREEZING FOG",df2.health$EVTYPE)

#correct Frost/Freeze
df2.health$EVTYPE <- gsub("^ICE$|^ICE ROADS$|^ICY ROADS$|ICE ON ROAD|BLACK ICE|LIGHT SNOW|FROST\\FREEZE|HARD FREEZE|LATE FREEZE|DAMAGING FREEZE|AGRICULTURAL FREEZE|^FREEZING.*|^FROST$|^FREEZE$","FROST/FREEZE",df2.health$EVTYPE)

#correct Funnel Cloud - not found in dataset

#correct Hail
df2.health$EVTYPE <- gsub("^GUSTY WIND/HAIL$|DEEP HAIL|^WIND/HAIL$|HAIL ALOFT|HAIL FLOODING|HAIL DAMAGE|HAIL/ICY ROADS|LATE SEASON HAIL|^HAILSTORM$|^HAILSTORMS$|NON SEVER HAIL|SMALL HAIL","HAIL",df2.health$EVTYPE)

#correct Heat
df2.health$EVTYPE <- gsub("^WARM WEATHER$|^UNSEASONABLY WARM.*","HEAT",df2.health$EVTYPE)

#correct Heavy Rain
df2.health$EVTYPE <- gsub("^RAIN/WIND$|^HIGH WATER$|^HEAVY RAINFALL$|EXCESSIVE RAINFALL|^DROWNING$|RAPIDLY RISING WATER|TORRENTIAL RAIN|^HEAVY RAINS$","HEAVY RAIN",df2.health$EVTYPE)

#correct Heavy Snow
df2.health$EVTYPE <- gsub("^SNOW$|^SNOW AND ICE$|^SNOW/ BITTER COLD$|^SNOW/HIGH WINDS$|FALLING SNOW/ICE|^EXCESSIVE SNOW$|^SNOW SQUALL.*|^HEAVY SNOW.*","HEAVY SNOW",df2.health$EVTYPE)

#correct High Surf
df2.health$EVTYPE <- gsub("^ROGUE WAVE$|^ROUGH SEAS$|^ROUGH SURF$|^MARINE ACCIDENT$|^HEAVY SEAS$|^HIGH SEAS$|^HIGH SWELLS$|^HIGH WAVES$|HAZARDOUS SURF|HEAVY SURF.*","HIGH SURF",df2.health$EVTYPE)

#correct High Wind
df2.health$EVTYPE <- gsub("^NON THUNDERSTORM WIND$|^NON-SEVERE WIND DAMAGE$|^HIGH WIND.*|^HIGH$","HIGH WIND",df2.health$EVTYPE)

#correct Hurricane/Typhoon
df2.health$EVTYPE <- gsub("^HURRICANE.*|^TYPHOON$","HURRICANE/TYPHOON",df2.health$EVTYPE)

#correct Ice Storm
df2.health$EVTYPE <- gsub("^ICE STORM/FLASH FLOOD$","ICE STORM",df2.health$EVTYPE)

#correct Lakeshore Flood - not found in dataset

#correct "Lake-Effect Snow - not found in dataset

#correct Lightning
df2.health$EVTYPE <- gsub("LIGHTING|LIGHTNING .*|LIGHTNING.|LIGNTNING","LIGHTNING",df2.health$EVTYPE)

#correct Marine Hail - not found in dataset

#correct Marine High Wind - not found in dataset

#correct Marine Strong Wind
df2.health$EVTYPE <- gsub("MARINE MISHAP","MARINE STRONG WIND",df2.health$EVTYPE)

#correct Marine Thunderstorm Wind
df2.health$EVTYPE <- gsub("^MARINE TSTM WIND$","MARINE THUNDERSTORM WIND",df2.health$EVTYPE)

#correct Rip Current
df2.health$EVTYPE <- gsub("^RIP CURRENT.*","RIP CURRENT",df2.health$EVTYPE)

#correct Seiche - not found in dataset

#correct Sleet - not found in dataset

#correct Storm Surge/Tide
df2.health$EVTYPE <- gsub("^STORM SURGE$","STORM SURGE/TIDE",df2.health$EVTYPE)

#correct Strong Wind
df2.health$EVTYPE <- gsub("^NON TSTM WIND$|^WIND$|^WINDS$|^WIND STORM$|^STRONG WINDS$|^GUSTY WIND.*","STRONG WIND",df2.health$EVTYPE)

#correct Thunderstorm Wind
df2.health$EVTYPE <- gsub("^THUNDERSTORM$|^THUNDERSTORM WIND (G40)$|^THUNDERSTORM WIND G52$|^THUNDERSTORM WINDS$|^THUNDERSTORM WINDS 13$|^THUNDERSTORM WINDS/HAIL$|^THUNDERSTORM WINDSS$|^THUNDERSTORMS WIND$|^THUNDERSTORMS WINDS$|^THUNDERTORM WINDS$|^TORNADOES, TSTM WIND, HAIL$|^THUNDERSTORM$|^TSTM WIND$|^TSTM WIND \\(G.*|THUNDERSTORM WINS|DRY MICROBURST|DRY MIRCOBURST WIND|THUNDERESTORM WIND|^THUNDERSTORM WIND 13|THUNDERSTORM WIND.|^THUNDERSTORMW.*|THUNDERSTROM WIND|^THUNDERTORM WIND$|^THUNDERSTORMS$|^THUNDERSTORM WIND .*|WHIRLWIND|^TSTM WIND/HAIL$","THUNDERSTORM WIND",df2.health$EVTYPE)

#correct Tornado
df2.health$EVTYPE <- gsub("TORNADO DEBRIS|TORNDAO|^TORNADOES$|TORNADO F0|TORNADO F1|TORNADO F2|TORNADO F3|COLD AIR TORNADO|^TORNADOS$","TORNADO",df2.health$EVTYPE)

#correct Tropical Depression - not found in dataset

#correct Tropical Storm
df2.health$EVTYPE <- gsub("COASTAL STORM|COASTALSTORM|TROPICAL STORM.*","TROPICAL STORM",df2.health$EVTYPE)

#correct Waterspout
df2.health$EVTYPE <- gsub("WATERSPOUT TORNADO|WATERSPOUT/TORNADO","WATERSPOUT",df2.health$EVTYPE)

#correct Wildfire
df2.health$EVTYPE <- gsub("WILD FIRES|WILD/FOREST FIRE|WILD/FOREST FIRES|WILDFIRES|BRUSH FIRE|BRUSH FIRES|FOREST FIRES|GRASS FIRES","WILDFIRE",df2.health$EVTYPE)

#correct Winter Storm
df2.health$EVTYPE <- gsub("^RAIN/SNOW$|THUNDERSNOW|^MIXED PRECIP$|WINTER STORM HIGH WINDS|WINTER STORMS","WINTER STORM",df2.health$EVTYPE)

#correct Winter Weather
df2.health$EVTYPE <- gsub("WINTER WEATHER MIX|WINTER WEATHER/MIX|WINTRY MIX","WINTER WEATHER",df2.health$EVTYPE)

#final cleanup
df2.health$EVTYPE <- gsub("^THUNDERSTORM WINDS$","THUNDERSTORM WIND",df2.health$EVTYPE)
df2.health$EVTYPE <- gsub("^HEAVY RAINFALL$","HEAVY RAIN",df2.health$EVTYPE)

After the above mapping, all 21,929 values in EVTYPE has been mapped to the 48 Event Types. This can be verified by below code:

df2.health[!df2.health$EVTYPE %in% toupper(Events),]
## [1] EVTYPE     FATALITIES INJURIES  
## <0 rows> (or 0-length row.names)

There are no values that doesn’t match any event in the Events list.

Aggregate the Data

Next step is to aggregate the FATALITIES and INJURIES values by EVTYPE. Both values are merged into a table df2.sum.

df2.health.Fsum <- aggregate(FATALITIES ~ EVTYPE, data = df2.health,sum)
df2.health.Isum <- aggregate(INJURIES ~ EVTYPE, data = df2.health,sum)
df2.sum <- cbind(df2.health.Fsum,df2.health.Isum$INJURIES)

Calculate Total Casualties

For purpose of this analysis, criteria for establishing harm to public health is to add up number of fatalities and number of injuries as casualties.

df2.sum$CASUALTIES <- df2.sum$FATALITIES+df2.sum$`df2.health.Isum$INJURIES`

Rename columns appropriately.

colnames(df2.sum) <- c("EventType","Fatalities","Injuries","Total")

Ordering Data and Identifying Top Value

To identify the top event, the dataset is ordered in descending order by TOTAL column. Then the top 5 events are selected for comparison.

df2.ordered <- df2.sum[order(df2.sum$Total,decreasing = TRUE),]
df2.top <- df2.ordered[1:5,]

knitr::kable(df2.top)
EventType Fatalities Injuries Total
33 TORNADO 5633 91364 96997
32 THUNDERSTORM WIND 740 9537 10277
10 EXCESSIVE HEAT 2202 7124 9326
13 FLOOD 529 6888 7417
24 LIGHTNING 817 5232 6049
ggplot(df2.top,aes(EventType,Total)) + geom_bar(stat="identity",fill=rainbow(5)) + labs(title="Top 5 Most Harmful Events", x="Event Type", y="Total Casualties")

Top Five Harmful Storm Events

Based on the table and plot above, the most harmful storm type is TORNADO which caused casualties (fatalities + injuries) of 96,997. It is far ahead in terms of casualties if compared to the second ranked event.

Processing and Analysing for Most Economic Consequences Storm Event

The following steps were done to identify the storm event with most economic consequences.

Select Relevant Rows and Columns

Select columns EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP. Select rows where PROPDMG OR CROPDMG values are more than zero.

df3 <- df[,c(8,25,26,27,28)]
df3.economy <-df3[df3$PROPDMG>0|df3$CROPDMG>0,]
nrow(df3.economy)
## [1] 245031
sum(complete.cases(df3.economy))
## [1] 245031

This results in 245031 rows of data. Again, there is no NAs as the sum(complete.cases()) returns same value as nrow().

Basic Cleaning

Convert values to upper case, remove leading/trailing whitespaces, remove whitespaces in between.

#upper case 
df3.economy$EVTYPE <- toupper(df3.economy$EVTYPE)

#remove leading or trailing white spaces (reduces 8 duplicates)
df3.economy$EVTYPE <- str_trim(df3.economy$EVTYPE)

#remove double spaces
df3.economy$EVTYPE <- gsub("  "," ",df3.economy$EVTYPE)

There were 2 rows with invalid data for EVTYPE (values ? and “APACHE”). The strategy used is to remove the rows.

#remove rows with invalid value
df3.economy <- df3.economy[-grep("\\?|APACHE",df3.economy$EVTYPE),]

Map Event Types

Map the values in EVTYPE column to 48 types. Most mapping were done based on pattern matching. Some required further reading, by referring to the REMARKS column of the raw dataset, the storm data documentation or via Internet searches. For values stated as OTHER, the mapping was done based on their respective REMARKS columns.

#correct OTHER
df3.economy["399112",1] <- "DUST DEVIL"
df3.economy["296122",1] <- "DUST DEVIL"
df3.economy["249470",1] <- "FLOOD"
df3.economy$EVTYPE <- gsub("^OTHER$","HEAVY RAIN",df3.economy$EVTYPE)

#correct Avalanche
df3.economy$EVTYPE <- gsub("AVALANCE","AVALANCHE",df3.economy$EVTYPE)

#correct Blizzard
df3.economy$EVTYPE <- gsub("^GROUND BLIZZARD$|^BLIZZARD/WINTER STORM$|BLOWING SNOW|BLIZZARD SUMMARY","BLIZZARD",df3.economy$EVTYPE)

#correct Coastal Flood
df3.economy$EVTYPE <- gsub("^EROSION/CSTL FLOOD$|^COASTAL SURGE$|^COASTAL FLOODING.*|^TIDAL FLOOD$|^TIDAL FLOODING$","COASTAL FLOOD",df3.economy$EVTYPE)

#correct Cold/Wind Chill
df3.economy$EVTYPE <- gsub("^UNSEASONABLE COLD$|^COOL AND WET$|^LOW TEMPERATURE$|^COLD$|^COLD/WINDS$|COLD WEATHER|^COLD TEMPERATURE$|^COLD WAVE$|^UNSEASONABLY COLD$|^HYPOTHERMIA.*|^COLD AND SNOW$|^FOG AND COLD/WIND CHILLS$","COLD/WIND CHILL",df3.economy$EVTYPE)

#correct Debris Flow
df3.economy$EVTYPE <- gsub("^ROCK SLIDE$|^LANDSLUMP$|^ICE FLOES$|^MUDSLIDE.*|^MUD SLIDE.*|^LANDSLIDE.*","DEBRIS FLOW",df3.economy$EVTYPE)

#correct Dense Fog
df3.economy$EVTYPE <- gsub("^FOG$|^FOG AND COLD TEMPERATURES$","DENSE FOG",df3.economy$EVTYPE)

#correct Dense Smoke - not found in dataset

#correct Drought - not found in dataset

#correct Dust Devil
df3.economy$EVTYPE <- gsub("^LANDSPOUT$|^DUST DEVIL WATERSPOUT$|^BLOWING DUST$","DUST DEVIL",df3.economy$EVTYPE)

#correct Dust Storm
df3.economy$EVTYPE <- gsub("^DUST STORM/HIGH WINDS$","DUST STORM",df3.economy$EVTYPE)

#correct Excessive Heat
df3.economy$EVTYPE <- gsub("^EXTREME HEAT$|DROUGHT/EXCESSIVE HEAT|^RECORD HEAT$|RECORD/EXCESSIVE HEAT|^HEAT WAVE.*|^HYPERTHERMIA.*","EXCESSIVE HEAT",df3.economy$EVTYPE)

#correct Extreme Cold/Wind Chill
df3.economy$EVTYPE <- gsub("^EXTREME WIND CHILL$|^EXTREME WINDCHILL$|^EXTENDED COLD$|^EXTREME COLD$|^RECORD COLD$","EXTREME COLD/WIND CHILL",df3.economy$EVTYPE)

#correct Flash Flood
df3.economy$EVTYPE <- gsub("^FLASH FLOODING/THUNDERSTORM WI$|^FLASH FLOOD WINDS$|^FLOODING/HEAVY RAIN$|^FLOOD/FLASH$|^FLOOD FLASH$|^FLASH FLOOD/LANDSLIDE$|^FLASH FLOOD/ STREET$|^FLASH FLOOD LANDSLIDES$|^FLASH FLOOD/$|^FLASH FLOOD FROM ICE JAMS$|^FLASH FLOOD - HEAVY RAIN$|^FLASH FLOOD/FLOOD$|LOCAL FLASH FLOOD|^FLASH FLOODING$|^FLASH FLOODING/FLOOD$|^FLASH FLOODS$","FLASH FLOOD",df3.economy$EVTYPE)

#correct Flood
df3.economy$EVTYPE <- gsub("^SNOWMELT FLOODING$|^FLOODS$|^FLOOD/RAIN/WINDS$|^FLOOD/FLASH/FLOOD$|^FLOOD/FLASHFLOOD$|^BREAKUP FLOODING$|^FLOOD/FLASH FLOOD$|^FLOOD/FLOOD$|FLOOD & HEAVY RAIN|^FLOODING$|^LOCAL FLOOD|^MAJOR FLOOD$|^MINOR FLOOD$|^MINOR FLOODING$|^URBAN FLOOD$|^URBAN FLOODING$|^URBAN FLOODS$|URBAN/STREET FLOODING|URBAN/SML STREAM FLDG|URBAN/SML STREAM FLD|URBAN/SMALL STRM FLDG|URBAN/SMALL STREAM FLOODING|URBAN/SMALL STREAM FLOOD|URBAN/SMALL FLOODING|URBAN SMALL STREAM FLOOD|URBAN AND SMALL STREAM FLOODIN|URBAN AND SMALL STREAM FLOOD|SMALL STREAM AND URBAN FLOOD|SMALL STREAM AND URBAN FLOODIN|SMALL STREAM FLOOD|SMALL STREAM FLOODING|SMALL STREAM URBAN FLOOD|SMALL STREAM/URBAN FLOOD|SML STREAM FLD|^RURAL FLOOD$|^RIVER FLOOD$|^RIVER FLOODING$|RIVER AND STREAM FLOOD|^FLOOD/RIVER FLOOD$|^ICE JAM$|^ICE JAM FLOOD \\(MINOR$|^ICE JAM FLOODING$|^URBAN/SMALL STREAM$|^URBAN SMALL$|^URBAN AND SMALL$","FLOOD",df3.economy$EVTYPE)

#correct Freezing Fog
df3.economy$EVTYPE <- gsub("^GLAZE ICE$|^GLAZE$|GLAZE/ICE STORM","FREEZING FOG",df3.economy$EVTYPE)

#correct Frost/Freeze
df3.economy$EVTYPE <- gsub("^FROST\\\\FREEZE$|^FROST/FREEZEFALL|^EARLY FROST$|^ICE$|^ICE ROADS$|^ICY ROADS$|ICE ON ROAD|BLACK ICE|LIGHT SNOW|HARD FREEZE|LATE FREEZE|DAMAGING FREEZE|AGRICULTURAL FREEZE|^FREEZING.*|^FROST$|^FREEZE$","FROST/FREEZE",df3.economy$EVTYPE)

#correct Funnel Cloud - not found in dataset

#correct Hail
df3.economy$EVTYPE <- gsub("^GUSTY WIND/HAIL$|^DEEP HAIL$|^WIND/HAIL$|HAIL ALOFT|^HAIL FLOODING$|HAIL DAMAGE|HAIL/ICY ROADS|LATE SEASON HAIL|^HAILSTORM$|^HAILSTORMS$|NON SEVER HAIL|SMALL HAIL|HAIL 0.75|HAIL 075|HAIL 100|HAIL 125|HAIL 150|HAIL 175|HAIL 200|HAIL 275|HAIL 450|HAIL 75|^HAIL/WIND$|^HAIL/WINDS$","HAIL",df3.economy$EVTYPE)

#correct Heat
df3.economy$EVTYPE <- gsub("^WARM WEATHER$|^UNSEASONABLY WARM.*","HEAT",df3.economy$EVTYPE)

#correct Heavy Rain
df3.economy$EVTYPE <- gsub("^RAINSTORM$|^RECORD RAINFALL$|^EXCESSIVE WETNESS$|^DAM BREAK$|^COLD AND WET CONDITIONS$|^COLD AND WET$|^RAIN/WIND$|^HIGH WATER$|^HEAVY RAINFALL$|EXCESSIVE RAINFALL|^DROWNING$|RAPIDLY RISING WATER|TORRENTIAL RAIN|^HEAVY RAINS$|^HEAVY RAIN AND FLOOD$|^HEAVY RAIN/HIGH SURF$|^HEAVY RAIN/LIGHTNING$|^HEAVY RAIN/SEVERE WEATHER$|^HEAVY RAIN/SMALL STREAM URBAN$|^HEAVY RAIN/SNOW$|^HEAVY RAINS/FLOODING$|^HEAVY SHOWER$|^HVY RAIN$|^RAIN$|^UNSEASONAL RAIN$","HEAVY RAIN",df3.economy$EVTYPE)

#correct Heavy Snow
df3.economy$EVTYPE <- gsub("^SNOW/BLIZZARD$|^SNOW/ ICE$|^SNOW/COLD$|^SNOW/HEAVY SNOW$|^SNOW/ICE$|^SNOW/ICE STORM$|^SNOW ACCUMULATION$|^SNOW AND HEAVY SNOW$|^LATE SEASON SNOW$|^SNOW$|^SNOW AND ICE$|^SNOW/ BITTER COLD$|^SNOW/HIGH WINDS$|FALLING SNOW/ICE|^EXCESSIVE SNOW$|^SNOW SQUALL.*|^HEAVY SNOW.*|^RECORD SNOW.*","HEAVY SNOW",df3.economy$EVTYPE)

#correct High Surf
df3.economy$EVTYPE <- gsub("^HIGH SURF ADVISORY$|^HIGH TIDES$|^HEAVY SWELLS$|^COASTAL EROSION$|^BEACH EROSION$|^ASTRONOMICAL HIGH TIDE$|^ROGUE WAVE$|^ROUGH SEAS$|^ROUGH SURF$|^MARINE ACCIDENT$|^HEAVY SEAS$|^HIGH SEAS$|^HIGH SWELLS$|^HIGH WAVES$|HAZARDOUS SURF|HEAVY SURF.*","HIGH SURF",df3.economy$EVTYPE)

#correct High Wind
df3.economy$EVTYPE <- gsub("^GRADIENT WIND$|^NON THUNDERSTORM WIND$|^NON-SEVERE WIND DAMAGE$|^HIGH WIND.*|^HIGH$","HIGH WIND",df3.economy$EVTYPE)

#correct Hurricane/Typhoon
df3.economy$EVTYPE <- gsub("^HURRICANE.*|^TYPHOON$","HURRICANE/TYPHOON",df3.economy$EVTYPE)

#correct Ice Storm
df3.economy$EVTYPE <- gsub("^LIGHT FREEZING RAIN$|^ICE STORM/FLASH FLOOD$","ICE STORM",df3.economy$EVTYPE)

#correct Lakeshore Flood
df3.economy$EVTYPE <- gsub("^LAKE FLOOD$","LAKESHORE FLOOD",df3.economy$EVTYPE)

#correct "Lake-Effect Snow
df3.economy$EVTYPE <- gsub("^LAKE EFFECT SNOW$|^HEAVY LAKE SNOW$","LAKE-EFFECT SNOW",df3.economy$EVTYPE)

#correct Lightning
df3.economy$EVTYPE <- gsub("^LIGHTNING AND HEAVY RAIN$|^LIGHTNING FIRE$|^LIGHTNING THUNDERSTORM WINDS$|^LIGHTNING WAUSEON$|^LIGHTNING/HEAVY RAIN$|LIGHTING|LIGHTNING .*|LIGHTNING.|LIGNTNING","LIGHTNING",df3.economy$EVTYPE)

#correct Marine Hail - not found in dataset

#correct Marine High Wind - not found in dataset

#correct Marine Strong Wind
df3.economy$EVTYPE <- gsub("^WIND AND WAVE$|MARINE MISHAP","MARINE STRONG WIND",df3.economy$EVTYPE)

#correct Marine Thunderstorm Wind
df3.economy$EVTYPE <- gsub("^MARINE TSTM WIND$","MARINE THUNDERSTORM WIND",df3.economy$EVTYPE)

#correct Rip Current
df3.economy$EVTYPE <- gsub("^RIP CURRENT.*","RIP CURRENT",df3.economy$EVTYPE)

#correct Seiche - not found in dataset

#correct Sleet
df3.economy$EVTYPE <- gsub("^SLEET/ICE STORM$|^ICE AND SNOW$","SLEET",df3.economy$EVTYPE)

#correct Storm Surge/Tide
df3.economy$EVTYPE <- gsub("^STORM SURGE$","STORM SURGE/TIDE",df3.economy$EVTYPE)

#correct Strong Wind
df3.economy$EVTYPE <- gsub("^STORM FORCE WINDS$|^ICE/STRONG WINDS$|^SEVERE TURBULENCE$|^NON-TSTM WIND$|^NON TSTM WIND$|^WIND$|^WINDS$|^WIND STORM$|^STRONG WINDS$|^GUSTY WIND.*","STRONG WIND",df3.economy$EVTYPE)

#correct Thunderstorm Wind
df3.economy$EVTYPE <- gsub("^WIND DAMAGE$|^THUDERSTORM WINDS$|^SEVERE THUNDERSTORM$|^SEVERE THUNDERSTORMS$|^SEVERE THUNDERSTORM WINDS$|^GUSTNADO$|^THUNDERSTORM$|^THUNDERSTORM WIND (G40)$|^THUNDERSTORM WIND G52$|^THUNDERSTORM WINDS$|^THUNDERSTORM WINDS 13$|^THUNDERSTORM WINDS/HAIL$|^THUNDERSTORM WINDSS$|^THUNDERSTORMS WIND$|^THUNDERSTORMS WINDS$|^THUNDERTORM WINDS$|^TORNADOES, TSTM WIND, HAIL$","THUNDERSTORM WIND",df3.economy$EVTYPE)

df3.economy$EVTYPE <- gsub("^THUNDERSTORM$|^TSTM WIND$|^TSTM WIND \\(G.*|THUNDERSTORM WINS|^DOWNBURST$|DRY MICROBURST|DRY MIRCOBURST WIND|^WET MICROBURST$|^MICROBURST$|^MICROBURST WINDS$|THUNDERESTORM WIND|^THUNDERSTORM WIND 13|^THUNDERSTORM WIND.$|^THUNDERSTORMW.*|THUNDERSTROM WIND|^THUNDERTORM WIND$|^THUNDERSTORMS$|^THUNDERSTORM WIND .*|WHIRLWIND|^TSTM WIND/HAIL$|^THUNDEERSTORM WINDS$","THUNDERSTORM WIND",df3.economy$EVTYPE)

df3.economy$EVTYPE <- gsub("^THUNDERSTORM DAMAGE TO$|^THUNDERSTORM HAIL$|^THUNDERSTORM WIND FLOOD$|^THUNDERSTORM WIND3$|^THUNDERSTORM WINDAIL$|^THUNDERSTORM WINDFLOODING$|^THUNDERSTORM WINDFUNNEL CLOU$|^THUNDERSTORM WINDIGHTNING$|^THUNDERSTORM WINDWNING$|^THUNERSTORM WINDS$|^TSTM WIND.*|^TSTMW$|^TUNDERSTORM WIND$|^THUNDERSTORM WINDHTNING$|^THUNDERSTORM WINDING$|^THUNDERSTORM WINDL$|^THUNDERSTORM WINDNNEL CLOU$|^THUNDERSTORM WINDOODING$","THUNDERSTORM WIND",df3.economy$EVTYPE)

#correct Tornado
df3.economy$EVTYPE <- gsub("TORNADO DEBRIS|TORNDAO|^TORNADOES$|TORNADO F0|TORNADO F1|TORNADO F2|TORNADO F3|COLD AIR TORNADO|^TORNADOS$","TORNADO",df3.economy$EVTYPE)

#correct Tropical Depression - not found in dataset

#correct Tropical Storm
df3.economy$EVTYPE <- gsub("COASTAL STORM|COASTALSTORM|TROPICAL STORM.*","TROPICAL STORM",df3.economy$EVTYPE)

#correct Waterspout
df3.economy$EVTYPE <- gsub("^WATERSPOUT-$|^WATERSPOUT-TORNADO$|WATERSPOUT TORNADO|^WATERSPOUT/ TORNADO$|WATERSPOUT/TORNADO","WATERSPOUT",df3.economy$EVTYPE)

#correct Wildfire
df3.economy$EVTYPE <- gsub("^WILD/FOREST FIRE$|^GRASS FIRES$|^FOREST FIRES$|WILD FIRES|^WILD/FOREST FIRES$|^WILDFIRES$|^BRUSH FIRE$|^BRUSH FIRES$|GRASS FIRES","WILDFIRE",df3.economy$EVTYPE)

#correct Winter Storm
df3.economy$EVTYPE <- gsub("^SNOW/FREEZING RAIN$|^SNOW/SLEET$|^SNOW/SLEET/FREEZING RAIN$|^SNOW FREEZING RAIN$|^SNOW AND ICE STORM$|^HEAVY MIX$|^HEAVY PRECIPITATION$|^RAIN/SNOW$|THUNDERSNOW|^MIXED PRECIP$|^MIXED PRECIPITATION$|WINTER STORM HIGH WINDS|WINTER STORMS","WINTER STORM",df3.economy$EVTYPE)

#correct Winter Weather
df3.economy$EVTYPE <- gsub("WINTER WEATHER MIX|WINTER WEATHER/MIX|WINTRY MIX","WINTER WEATHER",df3.economy$EVTYPE)

#final cleanup
df3.economy$EVTYPE <- gsub("^FROST/FREEZEFALL$","FROST/FREEZE",df3.economy$EVTYPE)
df3.economy$EVTYPE <- gsub("^THUNDERSTORM WIND.*","FROST/FREEZE",df3.economy$EVTYPE)

After the above mapping, all values in EVTYPE has been mapped to the 48 Event Types. This can be verified by below code:

df3.economy[!df3.economy$EVTYPE %in% toupper(Events),]
## [1] EVTYPE     PROPDMG    PROPDMGEXP CROPDMG    CROPDMGEXP
## <0 rows> (or 0-length row.names)

There are no values that doesn’t match any event in the Events list.

Cleaning PROPDMG and CROPDMG Columns

A check of PROPDMG and CROPDMG columns shows that all values are filled and properly formatted, thus requiring no further processing.

#create a copy of dataset to work on
df4.economy <- df3.economy

p <- as.data.frame(unique(df4.economy$PROPDMG))
c <- as.data.frame(unique(df4.economy$CROPDMG))

Processing CROPDMGEXP Column

Remove leading or trailing white spaces:

df4.economy$CROPDMGEXP <- str_trim(df4.economy$CROPDMGEXP)

A list of unique values in the column is generated.

ce <- as.data.frame(unique(df4.economy$CROPDMGEXP))
ce
##   unique(df4.economy$CROPDMGEXP)
## 1                               
## 2                              M
## 3                              K
## 4                              m
## 5                              B
## 6                              ?
## 7                              0
## 8                              k

Based on the unique values above, the following strategies were adopted based on the Storm Data documentation:

  1. “” converted to 0

  2. “?” converted to 0

  3. “B” converted to 1,000,000,000 (B refers to billion)

  4. “k” and “K” converted to 1,000 (k or K refers to kilo or thousand)

  5. “m” and “M” converted to 1,000,000 (m or M refers to million)

However, after running step 1 above, the other values changed as below:

  1. “?” converted to “0?0”"

  2. “B” converted to “0B0”

  3. “k” and “K” converted to “0k0” and “0K0”

  4. “m” and “M” converted to “0m0” and “0M0”

Thus the following adjustments were made to the replacement expression codes:

df4.economy$CROPDMGEXP <- gsub("",0,df4.economy$CROPDMGEXP)
df4.economy$CROPDMGEXP <- gsub("0\\?0","0",df4.economy$CROPDMGEXP)
df4.economy$CROPDMGEXP <- gsub("0|000","0",df4.economy$CROPDMGEXP)
df4.economy$CROPDMGEXP <- gsub("0k0|0K0",1000,df4.economy$CROPDMGEXP)
df4.economy$CROPDMGEXP <- gsub("0m0|0M0",1000000,df4.economy$CROPDMGEXP)
df4.economy$CROPDMGEXP <- gsub("0b0|0B0",1000000000,df4.economy$CROPDMGEXP)

After replacing the values, the unique values for CROPDMGEXP are 0, 1e+06, 1000, 1e+09 as shown below:

ce <- as.data.frame(unique(df4.economy$CROPDMGEXP))
ce
##   unique(df4.economy$CROPDMGEXP)
## 1                              0
## 2                          1e+06
## 3                           1000
## 4                          1e+09

Processing PROPDMGEXP Column

Remove leading or trailing white spaces:

df4.economy$PROPDMGEXP <- str_trim(df4.economy$PROPDMGEXP)

A list of unique values in the column is generated.

pe <- as.data.frame(unique(df4.economy$PROPDMGEXP))
pe
##    unique(df4.economy$PROPDMGEXP)
## 1                               K
## 2                               M
## 3                               B
## 4                               m
## 5                                
## 6                               +
## 7                               0
## 8                               5
## 9                               6
## 10                              4
## 11                              h
## 12                              2
## 13                              7
## 14                              3
## 15                              H
## 16                              -

Based on the unique values above, the following strategies were adopted based on the Storm Data documentation and source listed in Reference section:

  1. “” converted to 0

  2. Integers 1 to 10 converted to 10

  3. “+” converted to 1

  4. “-” converted to 0

  5. “h” and “H” converted to 100 (h or H refers to hundred)

  6. “k” and “K” converted to 1,000 (k or K refers to kilo or thousand)

  7. “m” and “M” converted to 1,000,000 (m or M refers to million)

  8. “B” converted to 1,000,000,000 (B refers to billion)

However, after running step 1 above, the other values changed as below:

  1. Integers 1 to 10 converted to 010 to 0100

  2. “+” converted to “0+0”

  3. “-” converted to “0-0”"

  4. “h” and “H” converted to “0h0” and “0H0”

  5. “k” and “K” converted to “0k0” and “0K0”

  6. “m” and “M” converted to “0m0” and “0M0”

  7. “B” converted to “0B0”

Thus the following adjustments were made to the replacement expression codes:

df4.economy$PROPDMGEXP <- gsub("",0,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("010|020|030|040|050|060|070|080|090|0100",10,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("0\\+0",1,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("0-0|000",0,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("0H0|0h0",100,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("0k0|0K0",1000,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("0m0|0M0",1000000,df4.economy$PROPDMGEXP)
df4.economy$PROPDMGEXP <- gsub("0b0|0B0",1000000000,df4.economy$PROPDMGEXP)

After replacing the values, the unique values for PROPDMGEXP are 1000, 1e+06, 1e+09, 0, 1, 10, 100 as shown below:

pe <- as.data.frame(unique(df4.economy$PROPDMGEXP))
pe
##   unique(df4.economy$PROPDMGEXP)
## 1                           1000
## 2                          1e+06
## 3                          1e+09
## 4                              0
## 5                              1
## 6                             10
## 7                            100

Calculate Property and Crop Damage Values

The damage value for Property is PROPDMG times PROPDMGEXP while damage value for Crop is CROPDMG time CROPDMGEXP. Thus the following new columns, PROP and CROP are added, which are product of the respective columns.

df4.economy$PROP <- df4.economy$PROPDMG*as.numeric(df4.economy$PROPDMGEXP)
df4.economy$CROP <- df4.economy$CROPDMG*as.numeric(df4.economy$CROPDMGEXP)
head(df4.economy)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP  PROP CROP
## 1 TORNADO    25.0       1000       0          0 25000    0
## 2 TORNADO     2.5       1000       0          0  2500    0
## 3 TORNADO    25.0       1000       0          0 25000    0
## 4 TORNADO     2.5       1000       0          0  2500    0
## 5 TORNADO     2.5       1000       0          0  2500    0
## 6 TORNADO     2.5       1000       0          0  2500    0

Calculate Total Economic Damage

The damage in terms of economy is calculated by adding together the damages to property and crops. A new column TOTALDMG is added to reflect this.

df4.economy$TOTALDMG <- df4.economy$PROP + df4.economy$CROP
head(df4.economy)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP  PROP CROP TOTALDMG
## 1 TORNADO    25.0       1000       0          0 25000    0    25000
## 2 TORNADO     2.5       1000       0          0  2500    0     2500
## 3 TORNADO    25.0       1000       0          0 25000    0    25000
## 4 TORNADO     2.5       1000       0          0  2500    0     2500
## 5 TORNADO     2.5       1000       0          0  2500    0     2500
## 6 TORNADO     2.5       1000       0          0  2500    0     2500

Aggregate the Data

To calculate total damages by event, the dataset is aggregated by using sum().

df4.sum <- aggregate(TOTALDMG ~ EVTYPE, data = df4.economy,sum)
df4.sum
##                      EVTYPE     TOTALDMG
## 1     ASTRONOMICAL LOW TIDE       320000
## 2                 AVALANCHE      3721800
## 3                  BLIZZARD    771888950
## 4             COASTAL FLOOD    444335060
## 5           COLD/WIND CHILL     38286500
## 6               DEBRIS FLOW    347518100
## 7                 DENSE FOG     22829500
## 8               DENSE SMOKE       100000
## 9                   DROUGHT  15018672000
## 10               DUST DEVIL       751630
## 11               DUST STORM      9199000
## 12           EXCESSIVE HEAT    521536530
## 13  EXTREME COLD/WIND CHILL   1463313400
## 14              FLASH FLOOD  18171037648
## 15                    FLOOD 161341827502
## 16             FROST/FREEZE  15895960491
## 17             FUNNEL CLOUD       194600
## 18                     HAIL  19021448120
## 19                     HEAT    403268500
## 20               HEAVY RAIN   4253396390
## 21               HEAVY SNOW   1108907857
## 22                HIGH SURF    111901500
## 23                HIGH WIND   6689699845
## 24        HURRICANE/TYPHOON  90872527810
## 25                ICE STORM   8967492310
## 26         LAKE-EFFECT SNOW     40682000
## 27          LAKESHORE FLOOD      7570000
## 28                LIGHTNING    945834542
## 29              MARINE HAIL         4000
## 30         MARINE HIGH WIND      1297010
## 31       MARINE STRONG WIND      1418330
## 32 MARINE THUNDERSTORM WIND      5907400
## 33              RIP CURRENT       163000
## 34                   SEICHE       980000
## 35                    SLEET      5500000
## 36         STORM SURGE/TIDE  47965579000
## 37              STRONG WIND    263718240
## 38                  TORNADO  57356894672
## 39      TROPICAL DEPRESSION      1737000
## 40           TROPICAL STORM   8409336550
## 41                  TSUNAMI    144082000
## 42             VOLCANIC ASH       500000
## 43               WATERSPOUT     60730200
## 44                 WILDFIRE   8899910130
## 45             WINTER STORM   6788541250
## 46           WINTER WEATHER     42310500

Ordering Data and Identifying Top Value

To identify the top event, the dataset is ordered in descending order by TOTALDMG column. Then the top 5 events are selected for comparison.

df4.ordered <- df4.sum[order(df4.sum$TOTALDMG,decreasing = TRUE),]
df4.top <- df4.ordered[1:5,]

knitr::kable(df4.top)
EVTYPE TOTALDMG
15 FLOOD 161341827502
24 HURRICANE/TYPHOON 90872527810
38 TORNADO 57356894672
36 STORM SURGE/TIDE 47965579000
18 HAIL 19021448120
ggplot(df4.top,aes(EVTYPE,TOTALDMG/1000000000)) + geom_bar(stat="identity",fill=rainbow(5)) + labs(title="Top 5 Most Economically Damaging Events", x="Event Type", y="Total Damage (Billion Dollars)")

Top Five Economically Damaging Storm Events

Based on the table and plot above, the most economically damaging storm type is FLOOD which caused damages (properties + crops) of 161,341,827,502 dollars. It is about 44% more damaging than the second ranked event.

Results

Based on the analysis in the previous section:

  1. The event which is most harmful to population health is TORNADO which caused casualties (fatalities + injuries) of 96,997.

  2. The event with greatest economic consequence is FLOOD which caused damages (properties + crops) of 161,341,827,502 dollars.

Reference

[link] https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

[link] https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

[link] https://rstudio-pubs-static.s3.amazonaws.com/58957_37b6723ee52b455990e149edde45e5b6.html