Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The storm data has events starting in the year 1950 and end in November 2011. Crop and property are two economic fronts which gets affected by the weather events. The flood and drought impacts crops badly while property is also gets affected by ice events. tornado event is mainly responsible for bad population health.
Data set contains around 902297 observations each having 37 features. The event type feature contains 985 unique events. Some of them are same and having punctuation and needs to be removed.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
storm <- read.csv("repdata_data_StormData.csv")
dim(storm)
## [1] 902297 37
summary(storm$EVTYPE)
## Length Class Mode
## 902297 character character
As event type contains similar names, only differing in cases and having some extra punctuation Hence, removal of punctuation and converting event type to lower case has become necessary.
storm$EVTYPE <- as.character(storm$EVTYPE)
storm$EVTYPE <- sapply(storm$EVTYPE, removePunctuation)
storm$EVTYPE <- sapply(storm$EVTYPE, tolower)
The maximum injuries are 91346 which is of tornado event type. It is followed by tstm wind, flood, excessive heat, lightning and heat event mostly.
injuries <- storm %>% group_by(EVTYPE) %>%
summarise(Total_Injuries = sum(INJURIES)) %>% arrange(desc(Total_Injuries))
## `summarise()` ungrouping output (override with `.groups` argument)
injuries
## # A tibble: 874 x 2
## EVTYPE Total_Injuries
## <chr> <dbl>
## 1 tornado 91346
## 2 tstm wind 6957
## 3 flood 6789
## 4 excessive heat 6525
## 5 lightning 5230
## 6 heat 2100
## 7 ice storm 1975
## 8 flash flood 1777
## 9 thunderstorm wind 1488
## 10 hail 1361
## # … with 864 more rows
df <- injuries[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_Injuries))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
### Impact on Fatalities
There are top 5 events which caused maximum impact such as 5633 tornado, 1903 excessive heat, 978 flash flood, 937 heat and 817 lightning event causing fatalities.
fatalities <- storm %>% group_by(EVTYPE) %>%
summarise(Total_fatalities = sum(FATALITIES)) %>% arrange(desc(Total_fatalities))
## `summarise()` ungrouping output (override with `.groups` argument)
fatalities
## # A tibble: 874 x 2
## EVTYPE Total_fatalities
## <chr> <dbl>
## 1 tornado 5633
## 2 excessive heat 1903
## 3 flash flood 978
## 4 heat 937
## 5 lightning 817
## 6 tstm wind 504
## 7 flood 470
## 8 rip current 368
## 9 high wind 248
## 10 avalanche 224
## # … with 864 more rows
df <- fatalities[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_fatalities))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Weather events mainly destroy property and crops. The damage is associated with numeric value present in CROPDMG and PROPDMG features. And its exponential value associated in CROPDMGEXP and PROPDMGEXP respectively. Exp has symbolic representation for Hundred, Kilo, Million and Billions. Hence these features can be used to get the quantitative value to measure the economic damage.
get_numeric_exp <- function(x) {
if (x == 'h' | x == 'H') {
return(2)
} else if (x == 'k' | x == 'K') {
return(3)
} else if (x == 'm' | x == 'M') {
return(6)
} else if (x == 'b' | x == 'B') {
return(9)
} else {
return(0)
}
}
find_exp <- function(x) {
return(10 ** sapply(x, get_numeric_exp))
}
storm$crop_damage <- storm$CROPDMG * find_exp(storm$CROPDMGEXP)
storm$prop_damage <- storm$PROPDMG * find_exp(storm$PROPDMGEXP)
crops_damage <- storm %>% group_by(EVTYPE) %>%
summarise(Total_crop_damage = sum(crop_damage)) %>% arrange(desc(Total_crop_damage))
## `summarise()` ungrouping output (override with `.groups` argument)
crops_damage
## # A tibble: 874 x 2
## EVTYPE Total_crop_damage
## <chr> <dbl>
## 1 drought 13972566000
## 2 flood 5661968450
## 3 river flood 5029459000
## 4 ice storm 5022113500
## 5 hail 3025954473
## 6 hurricane 2741910000
## 7 hurricanetyphoon 2607872800
## 8 flash flood 1421317100
## 9 extreme cold 1312973000
## 10 frostfreeze 1094186000
## # … with 864 more rows
df <- crops_damage[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_crop_damage))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Flood has the highest impact on property
prop_damage <- storm %>% group_by(EVTYPE) %>%
summarise(Total_prop_damage = sum(prop_damage)) %>% arrange(desc(Total_prop_damage))
## `summarise()` ungrouping output (override with `.groups` argument)
prop_damage
## # A tibble: 874 x 2
## EVTYPE Total_prop_damage
## <chr> <dbl>
## 1 flood 144657709807
## 2 hurricanetyphoon 69305840000
## 3 tornado 56937160779.
## 4 storm surge 43323536000
## 5 flash flood 16141312067.
## 6 hail 15732267543.
## 7 hurricane 11868319010
## 8 tropical storm 7703890550
## 9 winter storm 6688497251
## 10 high wind 5270046295
## # … with 864 more rows
df <- prop_damage[1:10, ]
g <- ggplot(df, aes(x=EVTYPE, y=Total_prop_damage))
g <- g + geom_count()
g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Crops is mostly gets damaged due to drought and secondly mostly impacted by flood event. The property is damaged by floods and ice events.