Data Source: Storm Data [47Mb]
Other Resource:
* National Weather Service Storm Data Documentation
* National Climatic Data Center Storm Events FAQ
* Data Collection Info
* How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP
* More Detail on Cleaning EVTYPE

synopsis

Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Though the documentation is clear with the Event Types, dataset is not, so cleaning that is the 1st priority. Then we will analyses Fatalities, Injuries, Property damage and Corp damage. Analysis will be done by yearly basis and also in total. The analysis will reveal that “TORNADO” and “FLOOD” are the most dangerous. So let’s begin with importing the libraries.

# Loading libraries
library(dplyr)
library(lubridate)
library(stringr)
library(knitr)
library(kableExtra)
library(ggplot2)
library(reshape2)
library(ggpubr)
library(RColorBrewer)

Reading data

Let’s read the data from the Data Source: Storm Data [47Mb]:

df <- read.csv("repdata_data_StormData.csv.bz2",stringsAsFactors = FALSE)
dim(df)
## [1] 902297     37

Thats a lot of rowssss……
Column names:

df <- tbl_df(df)
names(df)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The given assignment focuses on the types of event(‘EVTYPE’), so let’s check how many events are there in the data:

length(unique(df$EVTYPE))
## [1] 985

This is interesting to see that there are about 1000 types of event even though the documentation says there are only 48 types of event present, we have to fix that in the data preprocessing stage.
Now let’s create a checklist of info’s that will help us to subset the data as analyzing the 902297 rows will take too long.
* Need to convert the 985 event type present in the dataset to the event mentioned in the documentation and drop the ones we cannot convert. As there are hedge no of rows that won’t effect the analysis that much.
* From the Data Collection Info link we can see that from only 1996 all 48 event types are recorded. So we will only consider data from 1996.
* According to the assignment we have to analyze effect of events on population health and economic consequences. So we will only need these columns of the analysis: “EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG” and “CROPDMGEXP”.
* We will keep the column: “REFNUM” for future reference to the main data as it is primary key for the dataset.
* The column: “BGN_DATE” is kept to view the events with respect to time.
* As we are analyzing the effect over the country we can also drop state and county code columns.

Data Processing

Lets process the data and sub set it:

#dates are in the format mdy_hms format
df_new <- df %>% select(REFNUM, EVTYPE, BGN_DATE, FATALITIES, INJURIES, 
                        PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% 
    mutate(BGN_DATE = mdy_hms(BGN_DATE)) %>% filter(year(BGN_DATE) >= 1996)

We can also drop the rows that have 0 values in FATALITIES, INJURIES,PROPDMG, and CROPDMG as it indicates that there were no damage done:

logical <- (df_new$FATALITIES == 0 & df_new$INJURIES == 0 & 
                df_new$PROPDMG == 0 & df_new$CROPDMG == 0)
df_new <- df_new %>% filter(!logical)

dim(df_new)
## [1] 201318      9

So we have reduced the row and column no.
Now lets check for NA values in the given dataset:

apply(df_new, 2, function(x) sum(is.na(x)))
##     REFNUM     EVTYPE   BGN_DATE FATALITIES   INJURIES    PROPDMG 
##          0          0          0          0          0          0 
## PROPDMGEXP    CROPDMG CROPDMGEXP 
##          0          0          0

So there are no NA values in the data.
But there was blank values in PROPDMGEXP and CROPDMGEXP lets check if they are still present.

all(df_new[df_new$PROPDMGEXP %in% c("","?","-"),]$PROPDMG == 0)
## [1] TRUE
all(df_new[df_new$CROPDMGEXP %in% c("","?","-"),]$CROPDMG == 0)
## [1] TRUE

So all the blank values, “?” and “-” values in the EXP columns have 0 values in their corresponding DMG columns. We can use this info to subset the data further while doing the 2nd part of the assignment.

Lets work with EXP and DMG columns-
For these columns below are true:
* EXP(0, 1, 2, 3, 4, 5, 6, 7, 8) = 10
* EXP(H,h) = 100
* EXP(K,k) = 1000
* EXP(M,m) = 1000000 * EXP(B,b) = 1000000000
* EXP(+) = 1
* EXP(blank(),?,-) = Have no values in DMG
Lets convert damages to absolute damages-

key <- c(as.character(0:8),"H","h","K","k","M","m","B","b","+","","?","-")
values <- c(rep(10,9),rep(100,2),rep(1000,2),rep(1000000,2),rep(1000000000,2),1,rep(0,3))

df_new <- df_new %>% 
    mutate(PROPDMG_ABS = as.vector(sapply(PROPDMGEXP,
                                          function(x) values[key == x]))*PROPDMG,
           CROPDMG_ABS = as.vector(sapply(CROPDMGEXP,
                                          function(x) values[key == x]))*CROPDMG,
           TOTAL_DMG = PROPDMG_ABS+CROPDMG_ABS)

We have extracted the absolute property damage(PROPDMG_ABS) and crop damage(CROPDMG_ABS). TOTAL_DMG is the total economic damage as per the dataset.

Lets now work with the event types:

#no of unique event type in the sub data
length(unique(df_new$EVTYPE))
## [1] 222

So there are much less event types in the sub data.
So lets record the event types mentioned in the dataset:

events <- "Astronomical Low Tide Z
Avalanche Z
Blizzard Z
Coastal Flood Z
Cold/Wind Chill Z
Debris Flow C
Dense Fog Z
Dense Smoke Z
Drought Z
Dust Devil C
Dust Storm Z
Excessive Heat Z
Extreme Cold/Wind Chill Z
Flash Flood C
Flood C
Frost/Freeze Z
Funnel Cloud C
Freezing Fog Z
Hail C
Heat Z
Heavy Rain C
Heavy Snow Z
High Surf Z
High Wind Z
Hurricane (Typhoon) Z
Ice Storm Z
Lake-Effect Snow Z
Lakeshore Flood Z
Lightning C
Marine Hail M
Marine High Wind M
Marine Strong Wind M
Marine Thunderstorm Wind M
Rip Current Z
Seiche Z
Sleet Z
Storm Surge/Tide Z
Strong Wind Z
Thunderstorm Wind C
Tornado C
Tropical Depression Z
Tropical Storm Z
Tsunami Z
Volcanic Ash Z
Waterspout M
Wildfire Z
Winter Storm Z
Winter Weather Z"

events <- strsplit(events,"\n")
events <- as.vector(sapply(events, function(x) {substr(x,start=1,stop = nchar(x)-2)}))
events
##  [1] "Astronomical Low Tide"    "Avalanche"               
##  [3] "Blizzard"                 "Coastal Flood"           
##  [5] "Cold/Wind Chill"          "Debris Flow"             
##  [7] "Dense Fog"                "Dense Smoke"             
##  [9] "Drought"                  "Dust Devil"              
## [11] "Dust Storm"               "Excessive Heat"          
## [13] "Extreme Cold/Wind Chill"  "Flash Flood"             
## [15] "Flood"                    "Frost/Freeze"            
## [17] "Funnel Cloud"             "Freezing Fog"            
## [19] "Hail"                     "Heat"                    
## [21] "Heavy Rain"               "Heavy Snow"              
## [23] "High Surf"                "High Wind"               
## [25] "Hurricane (Typhoon)"      "Ice Storm"               
## [27] "Lake-Effect Snow"         "Lakeshore Flood"         
## [29] "Lightning"                "Marine Hail"             
## [31] "Marine High Wind"         "Marine Strong Wind"      
## [33] "Marine Thunderstorm Wind" "Rip Current"             
## [35] "Seiche"                   "Sleet"                   
## [37] "Storm Surge/Tide"         "Strong Wind"             
## [39] "Thunderstorm Wind"        "Tornado"                 
## [41] "Tropical Depression"      "Tropical Storm"          
## [43] "Tsunami"                  "Volcanic Ash"            
## [45] "Waterspout"               "Wildfire"                
## [47] "Winter Storm"             "Winter Weather"

From the Disscusion forum some one pointed out that there are few events that consists of to events from documentation seperated by “/”. Lets check that.

event_unique <- unique(df_new$EVTYPE)
event_unique[grep("/",event_unique)]
##  [1] "TSTM WIND/HAIL"            "WILD/FOREST FIRE"         
##  [3] "URBAN/SML STREAM FLD"      "Erosion/Cstl Flood"       
##  [5] "Heavy Rain/High Surf"      "Gusty wind/rain"          
##  [7] "GUSTY WIND/HVY RAIN"       "RAIN/SNOW"                
##  [9] "Frost/Freeze"              "Hypothermia/Exposure"     
## [11] "HYPOTHERMIA/EXPOSURE"      "COASTAL FLOODING/EROSION" 
## [13] "COASTAL  FLOODING/EROSION" "HYPERTHERMIA/EXPOSURE"    
## [15] "GUSTY WIND/HAIL"           "FLOOD/FLASH/FLOOD"        
## [17] "FLASH FLOOD/FLOOD"         "FALLING SNOW/ICE"         
## [19] "EXTREME COLD/WIND CHILL"   "HURRICANE/TYPHOON"        
## [21] "WINTER WEATHER/MIX"        "FROST/FREEZE"             
## [23] "HEAVY SURF/HIGH SURF"      "STORM SURGE/TIDE"         
## [25] "COLD/WIND CHILL"

Now lets look at all the event types:

unique(df_new$EVTYPE)
##   [1] "WINTER STORM"              "TORNADO"                  
##   [3] "TSTM WIND"                 "HIGH WIND"                
##   [5] "FLASH FLOOD"               "FREEZING RAIN"            
##   [7] "EXTREME COLD"              "LIGHTNING"                
##   [9] "HAIL"                      "FLOOD"                    
##  [11] "TSTM WIND/HAIL"            "EXCESSIVE HEAT"           
##  [13] "RIP CURRENTS"              "Other"                    
##  [15] "HEAVY SNOW"                "WILD/FOREST FIRE"         
##  [17] "ICE STORM"                 "BLIZZARD"                 
##  [19] "STORM SURGE"               "Ice jam flood (minor"     
##  [21] "DUST STORM"                "STRONG WIND"              
##  [23] "DUST DEVIL"                "Tstm Wind"                
##  [25] "URBAN/SML STREAM FLD"      "FOG"                      
##  [27] "ROUGH SURF"                "Heavy Surf"               
##  [29] "Dust Devil"                "HEAVY RAIN"               
##  [31] "Marine Accident"           "AVALANCHE"                
##  [33] "Freeze"                    "DRY MICROBURST"           
##  [35] "Strong Wind"               "WINDS"                    
##  [37] "COASTAL STORM"             "Erosion/Cstl Flood"       
##  [39] "River Flooding"            "WATERSPOUT"               
##  [41] "DAMAGING FREEZE"           "Damaging Freeze"          
##  [43] "HURRICANE"                 "TROPICAL STORM"           
##  [45] "Beach Erosion"             "High Surf"                
##  [47] "Heavy Rain/High Surf"      "Unseasonable Cold"        
##  [49] "Early Frost"               "Wintry Mix"               
##  [51] "Extreme Cold"              "DROUGHT"                  
##  [53] "Coastal Flooding"          "Torrential Rainfall"      
##  [55] "Landslump"                 "Hurricane Edouard"        
##  [57] "Coastal Storm"             "TIDAL FLOODING"           
##  [59] "Tidal Flooding"            "Strong Winds"             
##  [61] "EXTREME WINDCHILL"         "Glaze"                    
##  [63] "Extended Cold"             "Whirlwind"                
##  [65] "Heavy snow shower"         "Light snow"               
##  [67] "COASTAL FLOOD"             "Light Snow"               
##  [69] "MIXED PRECIP"              "COLD"                     
##  [71] "Freezing Spray"            "DOWNBURST"                
##  [73] "Mudslides"                 "Microburst"               
##  [75] "Mudslide"                  "Cold"                     
##  [77] "SNOW"                      "Coastal Flood"            
##  [79] "Snow Squalls"              "Wind Damage"              
##  [81] "Light Snowfall"            "Freezing Drizzle"         
##  [83] "Gusty wind/rain"           "GUSTY WIND/HVY RAIN"      
##  [85] "Wind"                      "Cold Temperature"         
##  [87] "Heat Wave"                 "Snow"                     
##  [89] "COLD AND SNOW"             "HEAVY SURF"               
##  [91] "RAIN/SNOW"                 "WIND"                     
##  [93] "FREEZE"                    "TSTM WIND (G45)"          
##  [95] "Gusty Winds"               "GUSTY WIND"               
##  [97] "TSTM WIND 40"              "TSTM WIND 45"             
##  [99] "HARD FREEZE"               "TSTM WIND (41)"           
## [101] "HEAT"                      "RIVER FLOOD"              
## [103] "TSTM WIND (G40)"           "RIP CURRENT"              
## [105] "HIGH SURF"                 "MUD SLIDE"                
## [107] "Frost/Freeze"              "SNOW AND ICE"             
## [109] "COASTAL FLOODING"          "AGRICULTURAL FREEZE"      
## [111] "WINTER WEATHER"            "STRONG WINDS"             
## [113] "SNOW SQUALL"               "ICY ROADS"                
## [115] "OTHER"                     "THUNDERSTORM"             
## [117] "Hypothermia/Exposure"      "HYPOTHERMIA/EXPOSURE"     
## [119] "Lake Effect Snow"          "Freezing Rain"            
## [121] "Mixed Precipitation"       "BLACK ICE"                
## [123] "COASTALSTORM"              "LIGHT SNOW"               
## [125] "DAM BREAK"                 "Gusty winds"              
## [127] "blowing snow"              "FREEZING DRIZZLE"         
## [129] "FROST"                     "GRADIENT WIND"            
## [131] "UNSEASONABLY COLD"         "GUSTY WINDS"              
## [133] "TSTM WIND AND LIGHTNING"   "gradient wind"            
## [135] "Gradient wind"             "Freezing drizzle"         
## [137] "WET MICROBURST"            "Heavy surf and wind"      
## [139] "FUNNEL CLOUD"              "TYPHOON"                  
## [141] "LANDSLIDES"                "HIGH SWELLS"              
## [143] "HIGH WINDS"                "SMALL HAIL"               
## [145] "UNSEASONAL RAIN"           "COASTAL FLOODING/EROSION" 
## [147] " TSTM WIND (G45)"          "TSTM WIND  (G45)"         
## [149] "HIGH WIND (G40)"           "TSTM WIND (G35)"          
## [151] "GLAZE"                     "COASTAL EROSION"          
## [153] "UNSEASONABLY WARM"         "SEICHE"                   
## [155] "COASTAL  FLOODING/EROSION" "HYPERTHERMIA/EXPOSURE"    
## [157] "WINTRY MIX"                "RIVER FLOODING"           
## [159] "ROCK SLIDE"                "GUSTY WIND/HAIL"          
## [161] "HEAVY SEAS"                " TSTM WIND"               
## [163] "LANDSPOUT"                 "RECORD HEAT"              
## [165] "EXCESSIVE SNOW"            "LAKE EFFECT SNOW"         
## [167] "FLOOD/FLASH/FLOOD"         "MIXED PRECIPITATION"      
## [169] "WIND AND WAVE"             "FLASH FLOOD/FLOOD"        
## [171] "LIGHT FREEZING RAIN"       "ICE ROADS"                
## [173] "HIGH SEAS"                 "RAIN"                     
## [175] "ROUGH SEAS"                "TSTM WIND G45"            
## [177] "NON-SEVERE WIND DAMAGE"    "WARM WEATHER"             
## [179] "THUNDERSTORM WIND (G40)"   "LANDSLIDE"                
## [181] "HIGH WATER"                " FLASH FLOOD"             
## [183] "LATE SEASON SNOW"          "WINTER WEATHER MIX"       
## [185] "ROGUE WAVE"                "FALLING SNOW/ICE"         
## [187] "NON-TSTM WIND"             "NON TSTM WIND"            
## [189] "MUDSLIDE"                  "BRUSH FIRE"               
## [191] "BLOWING DUST"              "VOLCANIC ASH"             
## [193] "   HIGH SURF ADVISORY"     "HAZARDOUS SURF"           
## [195] "WILDFIRE"                  "COLD WEATHER"             
## [197] "WHIRLWIND"                 "ICE ON ROAD"              
## [199] "SNOW SQUALLS"              "DROWNING"                 
## [201] "EXTREME COLD/WIND CHILL"   "MARINE TSTM WIND"         
## [203] "HURRICANE/TYPHOON"         "DENSE FOG"                
## [205] "WINTER WEATHER/MIX"        "FROST/FREEZE"             
## [207] "ASTRONOMICAL HIGH TIDE"    "HEAVY SURF/HIGH SURF"     
## [209] "TROPICAL DEPRESSION"       "LAKE-EFFECT SNOW"         
## [211] "MARINE HIGH WIND"          "THUNDERSTORM WIND"        
## [213] "TSUNAMI"                   "STORM SURGE/TIDE"         
## [215] "COLD/WIND CHILL"           "LAKESHORE FLOOD"          
## [217] "MARINE THUNDERSTORM WIND"  "MARINE STRONG WIND"       
## [219] "ASTRONOMICAL LOW TIDE"     "DENSE SMOKE"              
## [221] "MARINE HAIL"               "FREEZING FOG"

Cleaning process for event types

Lets create a list for cleaning the events-
* We can see that the events are not consistent with uppercase or lowercase. So we have to convert them all to anyone.
* We can also see that there are blank spaces in front of some event type. We will use str_trim to fix them.
* “/” are used in some places we will work with them with the below process:
+ If one is found in documentation use that. GUSTY WIND/HAIL->Hail
+ If both are found in the documentation then use the 1st one. EXCESSIVE HEAT/DROUGHT ->EXCESSIVE HEAT
+ If both are not found in the doc then use the which one is close. WILD/FOREST FIRE -> Wildfire
* Some event type doesn’t specify properly, like TSTM WIND this will be Thunderstorm Wind. Similarly LANDSLIDES will be Debris Flow.
* Keeping the event type “Other” as this one cannot be converted to anything.
* Also there 3 types of FOG mentioned in the data where the documentation specifies only two. We need to check that also.

Lets create a backup:

backup_df_new <- df_new

Lets bring all to uppercase and also trim the events:

df_new$EVTYPE = toupper(df_new$EVTYPE)
df_new$EVTYPE = str_trim(df_new$EVTYPE)
length(unique(df_new$EVTYPE))
## [1] 183

That helped a lot, now lets clean

Event type - Coastal Flood

table(df_new[grep("COASTAL",df_new$EVTYPE),]$EVTYPE)
## 
## COASTAL  FLOODING/EROSION           COASTAL EROSION 
##                         1                         1 
##             COASTAL FLOOD          COASTAL FLOODING 
##                       153                        35 
##  COASTAL FLOODING/EROSION             COASTAL STORM 
##                         3                         4 
##              COASTALSTORM 
##                         1
df_new[grep("COASTAL",df_new$EVTYPE),]$EVTYPE <- toupper("Coastal Flood")

Event type - Cold/Wind Chill

table(df_new[grep("COLD|HYPOTHERMIA|WINDCHILL",df_new$EVTYPE),]$EVTYPE)
## 
##                    COLD           COLD AND SNOW        COLD TEMPERATURE 
##                      20                       1                       2 
##            COLD WEATHER         COLD/WIND CHILL           EXTENDED COLD 
##                       1                      90                       1 
##            EXTREME COLD EXTREME COLD/WIND CHILL       EXTREME WINDCHILL 
##                     166                     111                      19 
##    HYPOTHERMIA/EXPOSURE       UNSEASONABLE COLD       UNSEASONABLY COLD 
##                       6                       1                       3
df_new[grep("^COLD|HYPOTHERMIA",
            df_new$EVTYPE),]$EVTYPE <- toupper("Cold/Wind Chill")

df_new[grep(" COLD|WINDCHILL",
            df_new$EVTYPE),]$EVTYPE <- toupper("Extreme Cold/Wind Chill")

Event type - Debris Flow

table(df_new[grep("SLIDE|SLUMP",df_new$EVTYPE),]$EVTYPE)
## 
##  LANDSLIDE LANDSLIDES  LANDSLUMP  MUD SLIDE   MUDSLIDE  MUDSLIDES 
##        190          1          1          2          5          1 
## ROCK SLIDE 
##          1
df_new[grep("SLIDE|SLUMP",
            df_new$EVTYPE),]$EVTYPE <- toupper("Debris Flow")

Event type - Fog’s

table(df_new[grep("FOG",df_new$EVTYPE),]$EVTYPE)
## 
##    DENSE FOG          FOG FREEZING FOG 
##           58          101            7

DENSE FOG and FREEZING FOG are documented event types but we can see that FOG is much more common.

temp_df <- df_new[grep("FOG",df_new$EVTYPE),]
temp_df <- temp_df %>% group_by(EVTYPE) %>% 
    summarise(count=n(),
              mn_FATALITIES=mean(FATALITIES),
              mn_INJURIES=mean(INJURIES),
              mn_PROPDMG = mean(PROPDMG_ABS),
              mn_CROPDMG = mean(CROPDMG_ABS),
              mn_TOTALDMG = mean(TOTAL_DMG))
kable(temp_df) %>%
  kable_styling(bootstrap_options = c("striped", "hover","condensed","responsive"))
EVTYPE count mn_FATALITIES mn_INJURIES mn_PROPDMG mn_CROPDMG mn_TOTALDMG
DENSE FOG 58 0.1551724 2.465517 126189.7 0 126189.7
FOG 101 0.5940594 7.049505 130153.5 0 130153.5
FREEZING FOG 7 0.0000000 0.000000 311714.3 0 311714.3

Types of FOG causes mostly injuries and property damage. As there is no apparent difference between the three types on FOG we will be keeping all 3 FOG types.

Event type - Dust

table(df_new[grep("DUST",df_new$EVTYPE),]$EVTYPE)
## 
## BLOWING DUST   DUST DEVIL   DUST STORM 
##            1           84           96
df_new[grep("BLOWING DUST",
            df_new$EVTYPE),]$EVTYPE <- toupper("Dust Storm")

Event type - Heat

table(df_new[grep("HEAT|HYPERTHERMIA|WARM",df_new$EVTYPE),]$EVTYPE)
## 
##        EXCESSIVE HEAT                  HEAT             HEAT WAVE 
##                   685                   164                     1 
## HYPERTHERMIA/EXPOSURE           RECORD HEAT     UNSEASONABLY WARM 
##                     1                     1                     5 
##          WARM WEATHER 
##                     1
df_new[grep("HYPERTHERMIA|WARM",
            df_new$EVTYPE),]$EVTYPE <- toupper("Heat")
df_new[grep("HEAT WAVE|RECORD HEAT",
            df_new$EVTYPE),]$EVTYPE <- toupper("Excessive Heat")

Event type - Flash Flood

table(df_new[grep("FLASH|DAM BREAK|HIGH WATER",df_new$EVTYPE),]$EVTYPE)
## 
##         DAM BREAK       FLASH FLOOD FLASH FLOOD/FLOOD FLOOD/FLASH/FLOOD 
##                 2             19012                 1                 1 
##        HIGH WATER 
##                 2
df_new[grep("FLASH|DAM BREAK|HIGH WATER",
            df_new$EVTYPE),]$EVTYPE <- toupper("Flash Flood")

Event type - Flood

table(df_new[grep("FLOOD",df_new$EVTYPE),]$EVTYPE)
## 
##        COASTAL FLOOD   EROSION/CSTL FLOOD          FLASH FLOOD 
##                  198                    2                19018 
##                FLOOD ICE JAM FLOOD (MINOR      LAKESHORE FLOOD 
##                 9513                    1                    5 
##          RIVER FLOOD       RIVER FLOODING       TIDAL FLOODING 
##                   80                    6                    4
df_new[grep("RIVER",
            df_new$EVTYPE),]$EVTYPE <- toupper("Flood")
df_new[grep("CSTL|TIDAL FLOODING",
            df_new$EVTYPE),]$EVTYPE <- toupper("Coastal Flood")
df_new[grep("ICE JAM|FLOOD/FLASH/FLOOD|FLASH FLOOD/FLOOD",
            df_new$EVTYPE),]$EVTYPE <- toupper("FLASH FLOOD")

Event type - Frost/Freeze

table(df_new[grep("FROST|FREEZE",df_new$EVTYPE),]$EVTYPE)
## 
## AGRICULTURAL FREEZE     DAMAGING FREEZE         EARLY FROST 
##                   3                   3                   1 
##              FREEZE               FROST        FROST/FREEZE 
##                  14                   1                 117 
##         HARD FREEZE 
##                   1
df_new[grep("FROST|FREEZE",
            df_new$EVTYPE),]$EVTYPE <- toupper("Frost/Freeze")

Event type - Hail

table(df_new[grep("HAIL",df_new$EVTYPE),]$EVTYPE)
## 
## GUSTY WIND/HAIL            HAIL     MARINE HAIL      SMALL HAIL 
##               1           22679               2              11 
##  TSTM WIND/HAIL 
##             441
df_new[grep("SMALL HAIL",
            df_new$EVTYPE),]$EVTYPE <- toupper("Hail")

Event type - Gusty wind

table(df_new[grep("GUSTY",df_new$EVTYPE),]$EVTYPE)
## 
##          GUSTY WIND     GUSTY WIND/HAIL GUSTY WIND/HVY RAIN 
##                  13                   1                   1 
##     GUSTY WIND/RAIN         GUSTY WINDS 
##                   1                  30
df_new[grep("GUSTY WIND/HAIL",
            df_new$EVTYPE),]$EVTYPE <- toupper("Hail")
df_new[grep("GUSTY WIND/HVY RAIN|GUSTY WIND/RAIN",
            df_new$EVTYPE),]$EVTYPE <- toupper("Heavy Rain")
df_new[grep("GUSTY WIND",
            df_new$EVTYPE),]$EVTYPE <- toupper("Strong Wind")

Event type - Rain

table(df_new[grep("RAIN|DRIZZLE",df_new$EVTYPE),]$EVTYPE)
## 
##     FREEZING DRIZZLE        FREEZING RAIN           HEAVY RAIN 
##                    5                    9                 1049 
## HEAVY RAIN/HIGH SURF  LIGHT FREEZING RAIN                 RAIN 
##                    1                   22                    3 
##            RAIN/SNOW  TORRENTIAL RAINFALL      UNSEASONAL RAIN 
##                    2                    1                    2
df_new[grep("FREEZING RAIN|RAIN/SNOW|FREEZING DRIZZLE",
            df_new$EVTYPE),]$EVTYPE <- toupper("Winter Weather")
df_new[grep("RAIN",
            df_new$EVTYPE),]$EVTYPE <- toupper("Heavy Rain")

Event type - Snow

table(df_new[grep("SNOW",df_new$EVTYPE),]$EVTYPE)
## 
##      BLOWING SNOW    EXCESSIVE SNOW  FALLING SNOW/ICE        HEAVY SNOW 
##                 1                25                 2              1029 
## HEAVY SNOW SHOWER  LAKE-EFFECT SNOW  LAKE EFFECT SNOW  LATE SEASON SNOW 
##                 1               194                 4                 1 
##        LIGHT SNOW    LIGHT SNOWFALL              SNOW      SNOW AND ICE 
##               141                 1                16                 1 
##       SNOW SQUALL      SNOW SQUALLS 
##                 3                 5
df_new[grep("EXCESSIVE SNOW|LATE SEASON SNOW|HEAVY SNOW SHOWER",
            df_new$EVTYPE),]$EVTYPE <- toupper("Heavy Snow")
df_new[grep("LAKE-EFFECT|LAKE EFFECT",
            df_new$EVTYPE),]$EVTYPE <- toupper("Lake-Effect Snow")
df_new[grep("BLOWING SNOW|LIGHT SNOW|SNOW SQUALL|FALLING SNOW/ICE|SNOW AND ICE",
            df_new$EVTYPE),]$EVTYPE <- toupper("Winter Weather")
df_new$EVTYPE[which(df_new$EVTYPE=="SNOW")] <- toupper("Heavy Snow")

Event type - High Surf

table(df_new[grep("SURF",df_new$EVTYPE),]$EVTYPE)
## 
##       HAZARDOUS SURF           HEAVY SURF  HEAVY SURF AND WIND 
##                    1                   29                    1 
## HEAVY SURF/HIGH SURF            HIGH SURF   HIGH SURF ADVISORY 
##                   50                  128                    1 
##           ROUGH SURF 
##                    2
df_new[grep("SURF",
            df_new$EVTYPE),]$EVTYPE <- toupper("High Surf")

Event type - High Wind

table(df_new[grep("HIGH WIND",df_new$EVTYPE),]$EVTYPE)
## 
##        HIGH WIND  HIGH WIND (G40)       HIGH WINDS MARINE HIGH WIND 
##             5402                2                1               19
df_new[grep("^HIGH WIND",
            df_new$EVTYPE),]$EVTYPE <- toupper("High Wind")

Event type - Hurricane (Typhoon)

table(df_new[grep("HURRICANE|TYPHOON",df_new$EVTYPE),]$EVTYPE)
## 
##         HURRICANE HURRICANE EDOUARD HURRICANE/TYPHOON           TYPHOON 
##               126                 1                72                 9
df_new[grep("HURRICANE|TYPHOON",
            df_new$EVTYPE),]$EVTYPE <- toupper("Hurricane (Typhoon)")

Event type - Rip Current

table(df_new[grep("RIP|GRADIENT",df_new$EVTYPE),]$EVTYPE)
## 
## GRADIENT WIND   RIP CURRENT  RIP CURRENTS 
##             6           364           239
df_new[grep("RIP|GRADIENT",
            df_new$EVTYPE),]$EVTYPE <- toupper("Rip Current")

Event type - Storm Surge/Tide

table(df_new[grep("SURGE|TIDE",df_new$EVTYPE),]$EVTYPE)
## 
## ASTRONOMICAL HIGH TIDE  ASTRONOMICAL LOW TIDE            STORM SURGE 
##                      8                      2                    169 
##       STORM SURGE/TIDE 
##                     47
df_new[grep("STORM SURGE",
            df_new$EVTYPE),]$EVTYPE <- toupper("Storm Surge/Tide")
df_new[grep("ASTRONOMICAL HIGH",
            df_new$EVTYPE),]$EVTYPE <- toupper("Coastal Flood")

Event type - Strong Wind

table(df_new[grep("STRONG WIND",df_new$EVTYPE),]$EVTYPE)
## 
## MARINE STRONG WIND        STRONG WIND       STRONG WINDS 
##                 46               3412                 45
df_new[grep("^STRONG WIND",
            df_new$EVTYPE),]$EVTYPE <- toupper("Strong Wind")

Event type - Thunderstorm Wind

table(df_new[grep("THUNDERSTORM|TSTM|BURST",df_new$EVTYPE),]$EVTYPE)
## 
##                DOWNBURST           DRY MICROBURST MARINE THUNDERSTORM WIND 
##                        1                       75                       33 
##         MARINE TSTM WIND               MICROBURST            NON-TSTM WIND 
##                      109                        1                        1 
##            NON TSTM WIND             THUNDERSTORM        THUNDERSTORM WIND 
##                        1                        2                    43097 
##  THUNDERSTORM WIND (G40)                TSTM WIND         TSTM WIND  (G45) 
##                        1                    61778                        1 
##           TSTM WIND (41)          TSTM WIND (G35)          TSTM WIND (G40) 
##                        1                        1                        9 
##          TSTM WIND (G45)             TSTM WIND 40             TSTM WIND 45 
##                       37                        1                        1 
##  TSTM WIND AND LIGHTNING            TSTM WIND G45           TSTM WIND/HAIL 
##                        1                        1                      441 
##           WET MICROBURST 
##                        3
df_new[grep("^THUNDERSTORM|^TSTM|BURST",
            df_new$EVTYPE),]$EVTYPE <- toupper("Thunderstorm Wind")
df_new[grep("MARINE TSTM",
            df_new$EVTYPE),]$EVTYPE <- toupper("Marine Thunderstorm Wind")
df_new[agrep("NON-TSTM WIND",
            df_new$EVTYPE),]$EVTYPE <- toupper("Other")

Event type - Tornado

table(df_new[grep("TORNADO|LANDSPOUT",df_new$EVTYPE),]$EVTYPE)
## 
## LANDSPOUT   TORNADO 
##         2     12366
df_new[grep("TORNADO|LANDSPOUT",
            df_new$EVTYPE),]$EVTYPE <- toupper("Tornado")

Event type - Wildfire

table(df_new[grep("FIRE",df_new$EVTYPE),]$EVTYPE)
## 
##       BRUSH FIRE WILD/FOREST FIRE         WILDFIRE 
##                1              381              847
df_new[grep("FIRE",
            df_new$EVTYPE),]$EVTYPE <- toupper("Wildfire")

Event type - Winter Weather

table(df_new[grep("WINTER WEATHER|ICE|ICY|GLAZE|MIXED PRECIP|WINTRY|FREEZING SPRAY",df_new$EVTYPE),]$EVTYPE)
## 
##           BLACK ICE      FREEZING SPRAY               GLAZE 
##                   1                   1                  16 
##         ICE ON ROAD           ICE ROADS           ICE STORM 
##                   1                   1                 631 
##           ICY ROADS        MIXED PRECIP MIXED PRECIPITATION 
##                  18                   6                  18 
##      WINTER WEATHER  WINTER WEATHER MIX  WINTER WEATHER/MIX 
##                 597                   2                 139 
##          WINTRY MIX 
##                   4
df_new[grep("WINTER WEATHER|ICE$|ICE ROADS|ICE ON ROAD|ICY|GLAZE|MIXED PRECIP|WINTRY|FREEZING SPRAY",
            df_new$EVTYPE),]$EVTYPE <- toupper("Winter Weather")

Event type - Wind

table(df_new[grep("WIND",df_new$EVTYPE),]$EVTYPE)
## 
##          COLD/WIND CHILL  EXTREME COLD/WIND CHILL                HIGH WIND 
##                      120                      301                     5405 
##         MARINE HIGH WIND       MARINE STRONG WIND MARINE THUNDERSTORM WIND 
##                       19                       46                      142 
##   NON-SEVERE WIND DAMAGE              STRONG WIND        THUNDERSTORM WIND 
##                        1                     3457                   105452 
##                WHIRLWIND                     WIND            WIND AND WAVE 
##                        3                       67                        1 
##              WIND DAMAGE                    WINDS 
##                        1                        1
df_new[grep("^WIND|WHIRLWIND|WIND DAMAGE",
            df_new$EVTYPE),]$EVTYPE <- toupper("Other")

Clean the rest event types

df_new[grep("URBAN",df_new$EVTYPE),]$EVTYPE="HEAVY RAIN"
df_new[grep("DROWNING|MARINE ACCIDENT|BEACH EROSION",
            df_new$EVTYPE),]$EVTYPE="OTHER"
df_new[grep("SWELLS|ROGUE|SEAS",df_new$EVTYPE),]$EVTYPE="HIGH SURF"

After cleaning

Claening of the event type is done. Lest chcek them.

unique(df_new$EVTYPE)
##  [1] "WINTER STORM"             "TORNADO"                 
##  [3] "THUNDERSTORM WIND"        "HIGH WIND"               
##  [5] "FLASH FLOOD"              "WINTER WEATHER"          
##  [7] "EXTREME COLD/WIND CHILL"  "LIGHTNING"               
##  [9] "HAIL"                     "FLOOD"                   
## [11] "EXCESSIVE HEAT"           "RIP CURRENT"             
## [13] "OTHER"                    "HEAVY SNOW"              
## [15] "WILDFIRE"                 "ICE STORM"               
## [17] "BLIZZARD"                 "STORM SURGE/TIDE"        
## [19] "DUST STORM"               "STRONG WIND"             
## [21] "DUST DEVIL"               "HEAVY RAIN"              
## [23] "FOG"                      "HIGH SURF"               
## [25] "AVALANCHE"                "FROST/FREEZE"            
## [27] "COASTAL FLOOD"            "WATERSPOUT"              
## [29] "HURRICANE (TYPHOON)"      "TROPICAL STORM"          
## [31] "DROUGHT"                  "DEBRIS FLOW"             
## [33] "COLD/WIND CHILL"          "HEAT"                    
## [35] "LAKE-EFFECT SNOW"         "FUNNEL CLOUD"            
## [37] "SEICHE"                   "VOLCANIC ASH"            
## [39] "MARINE THUNDERSTORM WIND" "DENSE FOG"               
## [41] "TROPICAL DEPRESSION"      "MARINE HIGH WIND"        
## [43] "TSUNAMI"                  "LAKESHORE FLOOD"         
## [45] "MARINE STRONG WIND"       "ASTRONOMICAL LOW TIDE"   
## [47] "DENSE SMOKE"              "MARINE HAIL"             
## [49] "FREEZING FOG"

So cleaning of event type is complete. SLEET is not present in our sub dataset and we have included FOG and OTHER events, that resulted in 48-1+2 = 49 event type.

With this data processing stage is complete.
Lets divide the sub datset in two dataset for the two assignment.

sub_df1 <- df_new %>% 
    select(REFNUM,EVTYPE,BGN_DATE,FATALITIES,INJURIES) %>% 
    filter(!(FATALITIES == 0 & INJURIES == 0))

sub_df2 <- df_new %>% 
    select(everything(),-FATALITIES,-INJURIES) %>% 
    filter(!(PROPDMG_ABS == 0 & CROPDMG_ABS == 0))

Across the United States, which types of events are most harmful with respect to population health?

To analyze this part we will use sub_df1.
Analyzing and comparing the total FATALITIES or INJURIES by the events will not be as informative, as the events are random and total human damage is different every year. So I will convert FATALITIES and INJURIES of say event “a” to ->
(FATALITIES or INJURIES of event “a” for a certain year)/(total FATALITIES or INJURIES across all the events for that year )

sub_df1_year <- sub_df1 %>% mutate(years = year(BGN_DATE)) %>% 
    group_by(years) %>% summarise(T_FATALITIES = sum(FATALITIES),
                                  T_INJURIES = sum(INJURIES))

sub_df1_year_event <- sub_df1 %>% mutate(years = year(BGN_DATE)) %>% 
    group_by(years,EVTYPE) %>% 
    summarise(T_FATALITIES = sum(FATALITIES),
              T_INJURIES = sum(INJURIES))

for(idx in 1:nrow(sub_df1_year_event)){
    temp_df2 <- sub_df1_year[sub_df1_year$years == sub_df1_year_event[idx,]$years,]
    sub_df1_year_event[idx,
                       "T_FATALITIES"] = sub_df1_year_event[idx,
                                                            "T_FATALITIES"] / temp_df2$T_FATALITIES
    sub_df1_year_event[idx,
                       "T_INJURIES"] = sub_df1_year_event[idx,
                                                            "T_INJURIES"] / temp_df2$T_INJURIES
}

sub_df1_year_event$T_FATALITIES = sub_df1_year_event$T_FATALITIES * 100
sub_df1_year_event$T_INJURIES = sub_df1_year_event$T_INJURIES * 100

Lets create a subdataset for the top events that caused most human damage throughout the years.

plot_df1 <- sub_df1_year_event %>% group_by(years) %>% top_n(n=1,wt=T_FATALITIES)
plot_df2 <- sub_df1_year_event %>% group_by(years) %>% top_n(n=1,wt=T_INJURIES)

plot_df1 <- sub_df1_year_event %>% 
    filter(EVTYPE %in% unique(plot_df1$EVTYPE)) %>% group_by(years)

plot_df2 <- sub_df1_year_event %>% 
    filter(EVTYPE %in% unique(plot_df2$EVTYPE)) %>% group_by(years)

#for line plots 1, 2

Lets create a subdataset for total human damage by the event type and take only the top 10.

sub_df1_event <- sub_df1 %>% group_by(EVTYPE) %>% 
    summarise(T_FATALITIES = sum(FATALITIES),
              T_INJURIES = sum(INJURIES))
plot_df3 <-  sub_df1_event %>% 
    top_n(n=10,wt=T_FATALITIES)
plot_df4 <-  sub_df1_event %>% 
    top_n(n=10,wt=T_INJURIES)
events1 <- unique(c(plot_df3$EVTYPE, plot_df4$EVTYPE))
len <- length(events1)
plot_df5 <- tbl_df(data = list("EVTYPE"=1:len,"Total Fatalities"=1:len,"Total Injuries"=1:len))
plot_df5$EVTYPE <- events1

for(event in plot_df5$EVTYPE){
    t_f <- sub_df1_event[sub_df1_event$EVTYPE == event,]$T_FATALITIES
    t_i <- sub_df1_event[sub_df1_event$EVTYPE == event,]$T_INJURIES
    plot_df5[plot_df5$EVTYPE == event,"Total Fatalities"] <- t_f
    plot_df5[plot_df5$EVTYPE == event,"Total Injuries"] <- t_i
}

plot_df5 <- melt(data = plot_df5, id.vars="EVTYPE")

#for bar plots 5

Lets create a table to show some states for top 5 killers through the years.

#use sub_df1_year_event, plot_df1 and plot_df2

plot_df8 <- sub_df1_year_event %>% group_by(years) %>% top_n(n=5,wt=T_FATALITIES)
plot_df9 <- sub_df1_year_event %>% group_by(years) %>% top_n(n=5,wt=T_INJURIES)

events2 <- unique(c(plot_df8$EVTYPE, plot_df9$EVTYPE))

df <- 16

plot_df6 <- sub_df1_year_event %>% filter(EVTYPE %in% events2) %>% 
    group_by(EVTYPE) %>% 
    summarise(mean_fatalities = mean(T_FATALITIES),
              sd_fatalities = sd(T_FATALITIES),
              conf_interval_f = paste(round((mean_fatalities + 
                                           c(-1,1)*qt(0.975,df-1)*sd_fatalities/sqrt(df)),2),
                                      collapse = " - "),
              mean_injuries = mean(T_INJURIES),
              sd_injuries = sd(T_INJURIES),
              conf_interval_i = paste(round((mean_injuries + 
                                           c(-1,1)*qt(0.975,df-1)*sd_injuries/sqrt(df)),2),
                                      collapse = " - "))

#plot table with 6

Lets plot :) …………

plot_df1 <- plot_df1 %>% rename(Event_Type = EVTYPE)
p1 <- ggplot(plot_df1,aes(x=years,y=T_FATALITIES,color=Event_Type))+
    geom_point(size=4,alpha=0.5)+
    geom_line()+
    labs(x="",y="% Fatalities per year")+
    scale_color_brewer(palette = "Paired")

plot_df2 <- plot_df2 %>% rename(Event_Type = EVTYPE)
p2 <- ggplot(plot_df2,aes(x=years,y=T_INJURIES,color=Event_Type))+
    geom_point(size=4,alpha=0.5)+
    geom_line()+
    labs(x="",y="% Injuries per year")+
    scale_color_brewer(palette = "Paired")

plot_df5 <- plot_df5 %>% rename(Type_of_damage = variable)
p3 <- ggplot(plot_df5,aes(x=EVTYPE,y=value,fill=Type_of_damage))+
    geom_bar(stat = "identity")+
    theme(axis.text.x = element_text(angle = 20))+
    labs(y="Total human effected",x="Event type")+
    theme(legend.position = c(0.08, 0.9),
          legend.background = element_rect(fill = "transparent"))+
    scale_fill_brewer(palette = "Set2")
plot_df6 <- plot_df6 %>% rename(Event_Type = EVTYPE)
plot_df6 <- plot_df6 %>% arrange(mean_fatalities)
p4 <- ggtexttable(plot_df6, rows = NULL, 
                        theme = ttheme("mOrange"))
text.p <- paste("This figure shows the which types of events are most harmful",
                "with respect to population health. ",
                "\n1: Top 3 killer by the years, ",
                "\n2: Top 3 events causing most injuries by the years,",
                "\n3: Top 10 events causing most death and injuries in total since 1996, ",
                "\n4: Statistics of the Top 5 events causing most death and injuries", sep=" ")
text.p <- ggparagraph(text = text.p, face = "bold", size = 15, color = "black")
figure <- ggarrange(p1, p2, p3, p4,text.p,
                    labels = c("1", "2", "3", "4",""),
                    ncol = 1, nrow = 5)
figure

Results

So the analysis reviled some interesting points:
* “EXCESSIVE HEAT” was cause of most fatalities till 2007. Then it dropped.
* “TORNADO” consistently kills 10-20% of total deaths. For the year 2011 it caused about 60% of the total deaths. * “FLASH FLOOD” is consistent of % deaths(5-19%).
* “TORNADO” consistently injures more people than other types of storm and a hedge spike 2011.
* “FLOOD” caused big % of injures in the year 1998.
* “EXCESSIVE HEAT”-(1999,2006) & “HURRICANE (TYPHOON)”-(2004) have few spikes in % injures.
* In total “TORNADO”, “EXCESSIVE HEAT”, “FLOOD”, “THUNDERSTORM WIND” and “LIGHTNING” are the clear winners when it comes to historical cumulative damage to humans.
* The stat chart figure(1.4) shows the mean casualty and 95% confidence interval of the top 15 event types.

Across the United States, which types of events have the greatest economic consequences?

To analyze this part we will use sub_df2.
Analyzing and comparing the total PROPDMG_ABS or CROPDMG_ABS by the events will not be as informative, as the events are random and total economic damage is different every year. So I will convert PROPDMG_ABS and CROPDMG_ABS of say event “a” to ->
(PROPDMG_ABS or CROPDMG_ABS of event “a” for a certain year)/(total PROPDMG_ABS or CROPDMG_ABS across all the events for that year )

sub_df2_year <- sub_df2 %>% mutate(years = year(BGN_DATE)) %>% 
    group_by(years) %>% summarise(T_PROPDMG_ABS = sum(PROPDMG_ABS),
                                  T_CROPDMG_ABS = sum(CROPDMG_ABS))

sub_df2_year_event <- sub_df2 %>% mutate(years = year(BGN_DATE)) %>% 
    group_by(years,EVTYPE) %>% 
    summarise(T_PROPDMG_ABS = sum(PROPDMG_ABS),
              T_CROPDMG_ABS = sum(CROPDMG_ABS))

for(idx in 1:nrow(sub_df2_year_event)){
    temp_df3 <- sub_df2_year[sub_df2_year$years == sub_df2_year_event[idx,]$years,]
    sub_df2_year_event[idx,
                       "T_PROPDMG_ABS"] = sub_df2_year_event[idx,
                                                            "T_PROPDMG_ABS"] / temp_df3$T_PROPDMG_ABS
    sub_df2_year_event[idx,
                       "T_CROPDMG_ABS"] = sub_df2_year_event[idx,
                                                            "T_CROPDMG_ABS"] / temp_df3$T_CROPDMG_ABS
}

sub_df2_year_event$T_PROPDMG_ABS = sub_df2_year_event$T_PROPDMG_ABS * 100
sub_df2_year_event$T_CROPDMG_ABS = sub_df2_year_event$T_CROPDMG_ABS * 100

Property damage and crop damges are not consistent with the events. Lets create a subdataset for the top event that caused most economic damage throughout the years.

plot_df11 <- sub_df2_year_event %>% group_by(years) %>% top_n(n=1,wt=T_PROPDMG_ABS)
plot_df12 <- sub_df2_year_event %>% group_by(years) %>% top_n(n=1,wt=T_CROPDMG_ABS)

plot_df11 <- sub_df2_year_event %>% 
    filter(EVTYPE %in% unique(plot_df11$EVTYPE)) %>% group_by(years)

plot_df12 <- sub_df2_year_event %>% 
    filter(EVTYPE %in%  unique(plot_df12$EVTYPE)) %>% group_by(years)
#for line plots 11, 12

Lets create a subdataset for total economic damage by the event type and take only the top 10.

sub_df2_event <- sub_df2 %>% group_by(EVTYPE) %>% 
    summarise(T_PROPDMG_ABS = sum(PROPDMG_ABS),
              T_CROPDMG_ABS = sum(CROPDMG_ABS))

plot_df13 <-  sub_df2_event %>% 
    top_n(n=10,wt=T_PROPDMG_ABS)

plot_df14 <-  sub_df2_event %>% 
    top_n(n=10,wt=T_CROPDMG_ABS)

events3 <- unique(c(plot_df13$EVTYPE, plot_df14$EVTYPE))
len <- length(events3)
plot_df15 <- tbl_df(data = list("EVTYPE"=1:len,"Total PropertyDmg"=1:len,"Total CropDmg"=1:len))
plot_df15$EVTYPE <- events3

for(event in plot_df15$EVTYPE){
    t_f <- sub_df2_event[sub_df2_event$EVTYPE == event,]$T_PROPDMG_ABS
    t_i <- sub_df2_event[sub_df2_event$EVTYPE == event,]$T_CROPDMG_ABS
    plot_df15[plot_df15$EVTYPE == event,"Total PropertyDmg"] <- t_f
    plot_df15[plot_df15$EVTYPE == event,"Total CropDmg"] <- t_i
}

plot_df15 <- melt(data = plot_df15, id.vars="EVTYPE")

#for bar plots 5

Lets create a table to show some states for top 5 killers through the years.

#use sub_df1_year_event, plot_df1 and plot_df2
plot_df18 <- sub_df2_year_event %>% group_by(years) %>% top_n(n=5,wt=T_PROPDMG_ABS)
plot_df19 <- sub_df2_year_event %>% group_by(years) %>% top_n(n=5,wt=T_CROPDMG_ABS)

events4 <- unique(c(plot_df18$EVTYPE, plot_df19$EVTYPE))

df <- 16

plot_df16 <- sub_df2_year_event %>% filter(EVTYPE %in% events4) %>% 
    group_by(EVTYPE) %>% 
    summarise(mean_PropDmg = mean(T_PROPDMG_ABS),
              sd_PropDmg = sd(T_PROPDMG_ABS),
              conf_interval_p = paste(round((mean_PropDmg + 
                                           c(-1,1)*qt(0.975,df-1)*sd_PropDmg/sqrt(df)),2),
                                      collapse = " - "),
              mean_CorpDmg = mean(T_CROPDMG_ABS),
              sd_CorpDmg = sd(T_CROPDMG_ABS),
              conf_interval_c = paste(round((mean_CorpDmg + 
                                           c(-1,1)*qt(0.975,df-1)*sd_CorpDmg/sqrt(df)),2),
                                      collapse = " - "))

#plot table with 6

Lets plot :) …………

plot_df11 <- plot_df11 %>% rename(Event_Type = EVTYPE)
p11 <- ggplot(plot_df11,aes(x=years,y=T_PROPDMG_ABS,color=Event_Type))+
    geom_point(size=4,alpha=0.5)+
    geom_line()+
    labs(x="",y="% Property damage per year")+
    scale_color_brewer(palette = "Paired")

plot_df12 <- plot_df12 %>% rename(Event_Type = EVTYPE)
p12 <- ggplot(plot_df12,aes(x=years,y=T_CROPDMG_ABS,color=Event_Type))+
    geom_point(size=4,alpha=0.5)+
    geom_line()+
    labs(x="",y="% Crop damage per year")+
    scale_color_brewer(palette = "Paired")

plot_df15 <- plot_df15 %>% rename(Type_of_damage = variable)
p13 <- ggplot(plot_df15,aes(x=EVTYPE,y=value,fill=Type_of_damage))+
    geom_bar(stat = "identity")+
    theme(axis.text.x = element_text(angle = 20))+
    labs(y="Total economic damage",x="Event type")+
    theme(legend.position = c(0.08, 0.9),
          legend.background = element_rect(fill = "transparent"))+
    scale_fill_brewer(palette = "Set2")

plot_df16 <- plot_df16 %>% rename(Event_Type = EVTYPE)
plot_df16 <- plot_df16 %>% arrange(mean_PropDmg)
p14 <- ggtexttable(plot_df16, rows = NULL, 
                        theme = ttheme("mOrange"))
text.p <- paste("This figure shows the which types of events are most harmful",
                "with respect to economic damage. ",
                "\n1: Top 3 events causing most property damage by the years,",
                "\n2: Top 3 events causing most corp damage by the years,",
                "\n3: Top 10 events causing most economic damage in total since 1996, ",
                "\n4: Statistics of the Top 5 events causing most economic damage", sep=" ")
text.p <- ggparagraph(text = text.p, face = "bold", size = 15, color = "black")
figure1 <- ggarrange(p11, p12, p13, p14,text.p,
                    labels = c("1", "2", "3", "4",""),
                    ncol = 1, nrow = 5)
figure1

Results

So the analysis reviled some interesting points:
* No event type consistently causes property damage and corp damage.
* “FLOOD” shows two hedge spikes for property damage.(1997,2005)
* “HURRICANE (TYPHOON)” shows two hedge spikes for property damage.(1999,2004)
* “TROPICAL STORM”-(2001), “STORM SURGE/TIDE”-(2005,2008) and “HAIL”-(2010) showed few prominent spikes.
* “DROUGHT”-(2000-2003,2006), “HAIL”-(2009) and “HURRICANE (TYPHOON)”-(1999,2005) event types are main causes of corp damage.
* “FLOOD” is cause a problem for corps since 2008.
* In Total “FLOOD”, “HURRICANE (TYPHOON)”, “STORM SURGE/TIDE” and “TORNADO” are the clear winners when it comes to historical cumulative economic damage.
* The stat chart figure(2.4) shows the mean casualty and 95% confidence interval of the top 15 event types.

END