Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

The storm data has events starting in the year 1950 and end in November 2011. Crop and property are two economic fronts which gets affected by the weather events. The flood and drought impacts crops badly while property is also gets affected by ice events. tornado event is mainly responsible for bad population health.

Data Processing

Data set contains around 902297 observations each having 37 features. The event type feature contains 985 unique events. Some of them are same and having punctuation and needs to be removed.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tm)
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
storm <- read.csv("repdata_data_StormData.csv")
dim(storm)
## [1] 902297     37
summary(storm$EVTYPE)
##    Length     Class      Mode 
##    902297 character character

As event type contains similar names, only differing in cases and having some extra punctuation Hence, removal of punctuation and converting event type to lower case has become necessary.

storm$EVTYPE <- as.character(storm$EVTYPE)
storm$EVTYPE <- sapply(storm$EVTYPE, removePunctuation)
storm$EVTYPE <- sapply(storm$EVTYPE, tolower)

Impact of event on population health

Impact on Injuries

The maximum injuries are 91346 which is of tornado event type. It is followed by tstm wind, flood, excessive heat, lightning and heat event mostly.

injuries <- storm %>% group_by(EVTYPE) %>% 
  summarise(Total_Injuries = sum(INJURIES)) %>% arrange(desc(Total_Injuries))
## `summarise()` ungrouping output (override with `.groups` argument)
injuries
## # A tibble: 874 x 2
##    EVTYPE            Total_Injuries
##    <chr>                      <dbl>
##  1 tornado                    91346
##  2 tstm wind                   6957
##  3 flood                       6789
##  4 excessive heat              6525
##  5 lightning                   5230
##  6 heat                        2100
##  7 ice storm                   1975
##  8 flash flood                 1777
##  9 thunderstorm wind           1488
## 10 hail                        1361
## # … with 864 more rows
df <- injuries[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_Injuries))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

### Impact on Fatalities

There are top 5 events which caused maximum impact such as 5633 tornado, 1903 excessive heat, 978 flash flood, 937 heat and 817 lightning event causing fatalities.

fatalities <- storm %>% group_by(EVTYPE) %>% 
  summarise(Total_fatalities = sum(FATALITIES)) %>% arrange(desc(Total_fatalities))
## `summarise()` ungrouping output (override with `.groups` argument)
fatalities
## # A tibble: 874 x 2
##    EVTYPE         Total_fatalities
##    <chr>                     <dbl>
##  1 tornado                    5633
##  2 excessive heat             1903
##  3 flash flood                 978
##  4 heat                        937
##  5 lightning                   817
##  6 tstm wind                   504
##  7 flood                       470
##  8 rip current                 368
##  9 high wind                   248
## 10 avalanche                   224
## # … with 864 more rows
df <- fatalities[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_fatalities))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Impact of event on US economy.

Weather events mainly destroy property and crops. The damage is associated with numeric value present in CROPDMG and PROPDMG features. And its exponential value associated in CROPDMGEXP and PROPDMGEXP respectively. Exp has symbolic representation for Hundred, Kilo, Million and Billions. Hence these features can be used to get the quantitative value to measure the economic damage.

get_numeric_exp <- function(x) {
  if (x == 'h' | x == 'H') {
    return(2)
  } else  if (x == 'k' | x == 'K') {
    return(3)
  } else if (x == 'm' | x == 'M') {
    return(6)
  } else if (x == 'b' | x == 'B') {
    return(9)
  } else {
    return(0)
  } 
}

find_exp <- function(x) {
  return(10 ** sapply(x, get_numeric_exp))
}

storm$crop_damage <- storm$CROPDMG * find_exp(storm$CROPDMGEXP)
storm$prop_damage <- storm$PROPDMG * find_exp(storm$PROPDMGEXP)
crops_damage <- storm %>% group_by(EVTYPE) %>% 
  summarise(Total_crop_damage = sum(crop_damage)) %>% arrange(desc(Total_crop_damage))
## `summarise()` ungrouping output (override with `.groups` argument)
crops_damage
## # A tibble: 874 x 2
##    EVTYPE           Total_crop_damage
##    <chr>                        <dbl>
##  1 drought                13972566000
##  2 flood                   5661968450
##  3 river flood             5029459000
##  4 ice storm               5022113500
##  5 hail                    3025954473
##  6 hurricane               2741910000
##  7 hurricanetyphoon        2607872800
##  8 flash flood             1421317100
##  9 extreme cold            1312973000
## 10 frostfreeze             1094186000
## # … with 864 more rows
df <- crops_damage[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_crop_damage))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Flood has the highest impact on property

prop_damage <- storm %>% group_by(EVTYPE) %>% 
  summarise(Total_prop_damage = sum(prop_damage)) %>% arrange(desc(Total_prop_damage))
## `summarise()` ungrouping output (override with `.groups` argument)
prop_damage
## # A tibble: 874 x 2
##    EVTYPE           Total_prop_damage
##    <chr>                        <dbl>
##  1 flood                144657709807 
##  2 hurricanetyphoon      69305840000 
##  3 tornado               56937160779.
##  4 storm surge           43323536000 
##  5 flash flood           16141312067.
##  6 hail                  15732267543.
##  7 hurricane             11868319010 
##  8 tropical storm         7703890550 
##  9 winter storm           6688497251 
## 10 high wind              5270046295 
## # … with 864 more rows
df <- prop_damage[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_prop_damage))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Results

Crops is mostly gets damaged due to drought and secondly mostly impacted by flood event. The property is damaged by floods and ice events.