EVALUATION OF FATALITIES, INJURIES, AND DAMAGE TO BOTH PROPERTY AND CROPS BY TYPES OF EVENTS IN THE UNITED STATES FROM 1950-2011

SYNOPSIS:

The event type that is responsible for the most fatalities and injuries is the tornado. After this, the events most responsible for fatality are heat, flood, winter weather, and lightning, while those most responsible for injury are thunderstorm wind, heat, flood, and winter weather.

The event type that is responsible for the most property damage is the tornado. After this, the events most responsible for property damage are thunderstorm wind, flood, hail, and lightning.

The event most responsible for crop damage is hail. After this the event types most responsible for crop damage are flood, thunderstorm wind, tornado, and draught.

Property damage is greatest in the month of January. From 1950-2011, property damage due to various storms has increased exponentially.

DATA PROCESSING - PART 1

First, I will read the data into R.

setwd("C:/Users/user/Dropbox/Education/Coursera/JH Data Science/REPRODUCIBLE RESEARCH/COURSE PROJECT 5.2/")
stormData<-read.csv("dataset.csv")
stormData[,2]<-as.Date(stormData[,2], format="%m/%d/%Y")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Next, I have consolidated the EVTYPEs (event types). I placed various event types together depending on the similarity of the EVTYPE variable names. For example, I placed “EXCESSIVE HEAT”, “HEAT WAVE”, and “EXTREME HEAT” together under “HEAT”.

stormData <- stormData %>%
  mutate(EVTYPE = case_when(
    EVTYPE=="FLASH FLOOD" ~ "FLOOD",
    EVTYPE=="EXCESSIVE HEAT" ~ "HEAT",
    EVTYPE=="HEAT WAVE" ~ "HEAT",
    EVTYPE=="EXTREME HEAT" ~ "HEAT",
    EVTYPE=="THUNDERSTORM WINDS" ~ "THUNDERSTORM WIND",
    EVTYPE=="TSTM WIND" ~ "THUNDERSTORM WIND",
    EVTYPE=="RIP CURRENT"~"RIP CURRENTS",
    EVTYPE=="HEAVY SURF/HIGH SURF"~"HIGH SURF",
    EVTYPE=="WIND"~"STRONG WIND",
    EVTYPE=="HIGH WIND"~"STRONG WIND",
    EVTYPE=="HIGH WINDS"~"STRONG WIND",
    EVTYPE=="TORNADOES, TSTM WIND, HAIL"~"TORNADO",
    EVTYPE=="TROPICAL STORM"~"HURRICANE/TYPHOON",
    EVTYPE=="HURRICANE"~"HURRICANE/TYPHOON",
    EVTYPE=="WILDFIRE"~"WILD/FOREST FIRE",
    EVTYPE=="EXTREME COLD/WIND CHILL"~"WINTER WEATHER",
    EVTYPE=="WINTER STORM"~"WINTER WEATHER",
    EVTYPE=="EXTREME COLD"~"WINTER WEATHER",
    EVTYPE=="WINTER WEATHER/MIX"~"WINTER WEATHER",
    EVTYPE=="COLD"~"WINTER WEATHER",
    EVTYPE=="COLD/WIND CHILL"~"WINTER WEATHER",
    EVTYPE=="ICE STORM"~"WINTER WEATHER",
    EVTYPE=="HEAVY SNOW"~"WINTER WEATHER",
    EVTYPE=="BLIZZARD"~"WINTER WEATHER",
    EVTYPE=="FLASH FLOODING"~"FLOOD",
    EVTYPE=="FLOOD/FLASH FLOOD"~"FLOOD",
    EVTYPE=="RIVER FLOOD"~"FLOOD",
    EVTYPE=="URBAN FLOOD"~"FLOOD",
    EVTYPE=="COASTAL FLOOD"~"FLOOD",
    EVTYPE=="URBAN FLOODING"~"FLOOD",
    EVTYPE=="FLOODING"~"FLOOD",
    EVTYPE=="URBAN/SML STREAM FLD"~"FLOOD",
    EVTYPE=="COASTAL FLOODING"~"FLOOD",
    EVTYPE=="FLASH FLOOD/FLOOD"~"FLOOD",
    
    TRUE ~ EVTYPE  # Keep other values unchanged
  ))

DATA PROCESSING - PART 2

Next, I calculated the property damages for the various events by multiplying PROPDMG by the conversion for PROPDMGEXP.

stormData[stormData$PROPDMGEXP=="", 38]<-"1"
stormData[stormData$PROPDMGEXP=="-", 38]<-"1"
stormData[stormData$PROPDMGEXP=="?", 38]<-"1"
stormData[stormData$PROPDMGEXP=="+", 38]<-"1"
stormData[stormData$PROPDMGEXP=="0", 38]<-"1"
stormData[stormData$PROPDMGEXP=="1", 38]<-"10"
stormData[stormData$PROPDMGEXP=="2", 38]<-"100"
stormData[stormData$PROPDMGEXP=="3", 38]<-"1000"
stormData[stormData$PROPDMGEXP=="4", 38]<-"10000"
stormData[stormData$PROPDMGEXP=="5", 38]<-"100000"
stormData[stormData$PROPDMGEXP=="6", 38]<-"1000000"
stormData[stormData$PROPDMGEXP=="7", 38]<-"10000000"
stormData[stormData$PROPDMGEXP=="8", 38]<-"100000000"
stormData[stormData$PROPDMGEXP=="B", 38]<-"1000000000"
stormData[stormData$PROPDMGEXP=="h", 38]<-"100"
stormData[stormData$PROPDMGEXP=="H", 38]<-"100"
stormData[stormData$PROPDMGEXP=="K", 38]<-"1000"
stormData[stormData$PROPDMGEXP=="m", 38]<-"1000000"
stormData[stormData$PROPDMGEXP=="M", 38]<-"1000000"

stormData[,38]<-as.numeric(stormData[,38])
## Warning: NAs introduced by coercion
stormData[,39]<-as.numeric(stormData[,25])*stormData[,38]
## Warning: NAs introduced by coercion

Next, I calculated the crop damages for the various events by multiplying CROPDMG by the conversion for CROPDMGEXP.

stormData[stormData$PROPDMGEXP=="", 40]<-"1"
stormData[stormData$PROPDMGEXP=="?", 40]<-"1"
stormData[stormData$PROPDMGEXP=="0", 40]<-"1"
stormData[stormData$PROPDMGEXP=="2", 40]<-"100"
stormData[stormData$PROPDMGEXP=="B", 40]<-"1000000000"
stormData[stormData$PROPDMGEXP=="k", 40]<-"1000"
stormData[stormData$PROPDMGEXP=="K", 40]<-"1000"
stormData[stormData$PROPDMGEXP=="m", 40]<-"1000000"
stormData[stormData$PROPDMGEXP=="M", 40]<-"1000000"

stormData[,40]<-as.numeric(stormData[,40])
## Warning: NAs introduced by coercion
stormData[,41]<-as.numeric(stormData[,27])*stormData[,40]
## Warning: NAs introduced by coercion

RESULTS - PART 1

Next, I listed the categories of event types with the most fatalities and printed the top 15

stormData <- stormData %>%
mutate(FATALITIES = as.numeric(FATALITIES))  # Ensure numeric
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `FATALITIES = as.numeric(FATALITIES)`.
## Caused by warning:
## ! NAs introduced by coercion
fatalStormData <- stormData %>%
  group_by(EVTYPE) %>%
  summarise(Total_Fatalities = sum(FATALITIES, na.rm = TRUE))

fatalStormDataOrdered <- fatalStormData %>%
  arrange(desc(Total_Fatalities))
print(fatalStormDataOrdered[1:15, ])
## # A tibble: 15 × 2
##    EVTYPE            Total_Fatalities
##    <chr>                        <dbl>
##  1 TORNADO                      5618 
##  2 HEAT                         3108 
##  3 FLOOD                        1538 
##  4 WINTER WEATHER                999 
##  5 LIGHTNING                     816 
##  6 N                             812 
##  7 THUNDERSTORM WIND             701 
##  8 RIP CURRENTS                  572 
##  9 WNW                           560.
## 10 S                             544 
## 11 NW                            500.
## 12 WSW                           491 
## 13 SSE                           425 
## 14 STRONG WIND                   409 
## 15 NNE                           262

Next, I listed the categories of event types with the most injuries and printed the top 15

stormData <- stormData %>%
  mutate(INJURIES = as.numeric(INJURIES))  # Convert to numeric
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `INJURIES = as.numeric(INJURIES)`.
## Caused by warning:
## ! NAs introduced by coercion
injuredStormData <- stormData %>%
  group_by(EVTYPE) %>%
  summarise(Total_Injured = sum(INJURIES, na.rm = TRUE))

injuredStormDataOrdered<-injuredStormData %>%
  arrange(desc(Total_Injured))
print(injuredStormDataOrdered[1:15, ])
## # A tibble: 15 × 2
##    EVTYPE            Total_Injured
##    <chr>                     <dbl>
##  1 TORNADO                   90671
##  2 THUNDERSTORM WIND          9353
##  3 HEAT                       9089
##  4 FLOOD                      8674
##  5 WINTER WEATHER             5907
##  6 LIGHTNING                  5230
##  7 STRONG WIND                1805
##  8 HURRICANE/TYPHOON          1661
##  9 WILD/FOREST FIRE           1456
## 10 HAIL                       1361
## 11 FOG                         734
## 12 RIP CURRENTS                529
## 13 DUST STORM                  440
## 14 DENSE FOG                   342
## 15 HEAVY RAIN                  251

The event type that is responsible for the most fatalities and injuries is the tornado. After this, the events most responsible for fatality are heat, flood, winter weather, and lightning, while those most responsible for injury are thunderstorm wind, heat, flood, and winter weather.

RESULTS - PART 2

Next, I listed the categories of event types with the highest property and printed the top 15

# Convert PROPDMG to numeric
stormData <- stormData %>%
  mutate(PROPDMG = as.numeric(PROPDMG))  
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `PROPDMG = as.numeric(PROPDMG)`.
## Caused by warning:
## ! NAs introduced by coercion
# Now perform aggregation
propdmgStormDataOrdered <- stormData %>%
  group_by(EVTYPE) %>%
  summarise(Total_Prop_Damage = sum(PROPDMG, na.rm = TRUE)) %>%
  arrange(desc(Total_Prop_Damage))

# Print top 15 results
print(propdmgStormDataOrdered[1:15,])
## # A tibble: 15 × 2
##    EVTYPE            Total_Prop_Damage
##    <chr>                         <dbl>
##  1 TORNADO                    3208665.
##  2 THUNDERSTORM WIND          2658988.
##  3 FLOOD                      2450035 
##  4 HAIL                        687863.
##  5 LIGHTNING                   603352.
##  6 STRONG WIND                 446001.
##  7 WINTER WEATHER              375942.
##  8 WILD/FOREST FIRE            123804.
##  9 HURRICANE/TYPHOON            69777.
## 10 HEAVY RAIN                   50842.
## 11 STORM SURGE                  19393.
## 12 LANDSLIDE                    18962.
## 13 LAKE-EFFECT SNOW             14141 
## 14 WATERSPOUT                    9354.
## 15 FOG                           8850.

Next, I listed the categories of event types with the highest crop damage and printed the top 15

# Convert CROPDMG to numeric
stormData <- stormData %>%
  mutate(CROPDMG = as.numeric(CROPDMG))  
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `CROPDMG = as.numeric(CROPDMG)`.
## Caused by warning:
## ! NAs introduced by coercion
# Now perform aggregation
cropdmgStormDataOrdered <- stormData %>%
  group_by(EVTYPE) %>%
  summarise(Total_Crop_Damage = sum(CROPDMG, na.rm = TRUE)) %>%
  arrange(desc(Total_Crop_Damage))

# Print top 15 results
print(cropdmgStormDataOrdered[1:15,])
## # A tibble: 15 × 2
##    EVTYPE            Total_Crop_Damage
##    <chr>                         <dbl>
##  1 HAIL                        579596.
##  2 FLOOD                       365338.
##  3 THUNDERSTORM WIND           194679.
##  4 TORNADO                     100021.
##  5 DROUGHT                      33899.
##  6 STRONG WIND                  20960.
##  7 HURRICANE/TYPHOON            16037.
##  8 WINTER WEATHER               12792.
##  9 HEAVY RAIN                   11123.
## 10 WILD/FOREST FIRE              8554.
## 11 FROST/FREEZE                  7034.
## 12 TSTM WIND/HAIL                4357.
## 13 LIGHTNING                     3581.
## 14 HIGH WINDS/COLD               2005 
## 15 SMALL HAIL                    1732.

The event type that is responsible for the most property damage is the tornado. After this, the events most responsible for property damage are thunderstorm wind, flood, hail, and lightning.

The event most responsible for crop damage is hail. After this the event types most responsible for crop damage are flood, thunderstorm wind, tornado, and draught.

PLOT OF PROPERTY DAMAGE BY CALENDAR YEAR

stormData[,42]<-as.numeric(format(stormData[,2],"%Y"))
propDmgPerYear<-aggregate(stormData[39], by=stormData[42], sum)
propDmgPerYear[,2]<-log10(propDmgPerYear[,2])
plot(propDmgPerYear, main="log10 Property Damage by Year - Plot", xlab = "Year", ylab = "log10 Property Damage (in Dollars)", type="l")

PLOT OF PROPERTY DAMAGE BY MONTH OF THE YEAR

stormData[,43]<-as.numeric(format(stormData[,2],"%m"))
propDmgPerMonth<-aggregate(stormData[39], by=stormData[43], sum)
propDmgPerMonth[,2]<-propDmgPerMonth[,2]/10^9
plot(propDmgPerMonth, main="Property Damage by Month - Plot", xlab = "Month of Year", ylab = "Property Damage in Billions of Dollars)", type="l")