Analysis of NOAA Storm database to assess the impact of severe weather events on public health and economy


1. Synopsis
NOAA is an abbreviation for National Oceanic and Atmospheric Administration in the United States. They provide accurate data and cutting edge research in their field. The public dataset maintains weather data per storm event dating back over 50 years. These data can show us which weather events have occurred and what implications these events can have on the safety and well being of the surrounding communities. The damage to property, people, and crops, all organized by storm event type can be seen and visualised.

In the dataset provided, there were some events that proved to be more dangerous than others. The weather event that causes the most harm to public health is Tornadoes. This conclusion has been made after carefully analysing and visualing the data which indicate that they cause the highest fatality. Coming to economic damages, the events that have caused the most damage are Flood, Drought and Hurricane, but for different reasons. For example the biggest risk to crops is a drought event, whereas the biggest threat to properties are floods.


2. Data Processing
2.1 Loading dataset and libraries

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.3
df <- read.csv('repdata_data_StormData.csv')

2.2 Selecting important attributes

new_df <- select(df,'EVTYPE','FATALITIES','INJURIES','PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')

2.3 Cleaning the dataset

new_df$PROPEXP[new_df$PROPDMGEXP == "K"] <- 1000
new_df$PROPEXP[new_df$PROPDMGEXP == "M"] <- 1e+06
new_df$PROPEXP[new_df$PROPDMGEXP == ""] <- 1
new_df$PROPEXP[new_df$PROPDMGEXP == "B"] <- 1e+09
new_df$PROPEXP[new_df$PROPDMGEXP == "m"] <- 1e+06
new_df$PROPEXP[new_df$PROPDMGEXP == "0"] <- 1
new_df$PROPEXP[new_df$PROPDMGEXP == "5"] <- 1e+05
new_df$PROPEXP[new_df$PROPDMGEXP == "6"] <- 1e+06
new_df$PROPEXP[new_df$PROPDMGEXP == "4"] <- 10000
new_df$PROPEXP[new_df$PROPDMGEXP == "2"] <- 100
new_df$PROPEXP[new_df$PROPDMGEXP == "3"] <- 1000
new_df$PROPEXP[new_df$PROPDMGEXP == "h"] <- 100
new_df$PROPEXP[new_df$PROPDMGEXP == "7"] <- 1e+07
new_df$PROPEXP[new_df$PROPDMGEXP == "H"] <- 100
new_df$PROPEXP[new_df$PROPDMGEXP == "1"] <- 10
new_df$PROPEXP[new_df$PROPDMGEXP == "8"] <- 1e+08

new_df$PROPEXP[new_df$PROPDMGEXP == "+"] <- 0
new_df$PROPEXP[new_df$PROPDMGEXP == "-"] <- 0
new_df$PROPEXP[new_df$PROPDMGEXP == "?"] <- 0

new_df$PROPDMGVAL <- new_df$PROPDMG * new_df$PROPEXP

new_df$CROPEXP[new_df$CROPDMGEXP == "M"] <- 1e+06
new_df$CROPEXP[new_df$CROPDMGEXP == "K"] <- 1000
new_df$CROPEXP[new_df$CROPDMGEXP == "m"] <- 1e+06
new_df$CROPEXP[new_df$CROPDMGEXP == "B"] <- 1e+09
new_df$CROPEXP[new_df$CROPDMGEXP == "0"] <- 1
new_df$CROPEXP[new_df$CROPDMGEXP == "k"] <- 1000
new_df$CROPEXP[new_df$CROPDMGEXP == "2"] <- 100
new_df$CROPEXP[new_df$CROPDMGEXP == ""] <- 1
new_df$CROPEXP[new_df$CROPDMGEXP == "?"] <- 0
new_df$CROPDMGVAL <- new_df$CROPDMG * new_df$CROPEXP

2.4 Top 10 health fatalities

fatal <- aggregate(FATALITIES~EVTYPE,new_df,sum)
fatal <- fatal[order(-fatal$FATALITIES),]
top_fatal <- fatal[1:10,]
top_fatal
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
barplot(top_fatal$FATALITIES, las = 3, names.arg = top_fatal$EVTYPE, main = "Top 10 Fatalities by Weather Events", ylab = "Total Fatalities", col = "green")

2.5 Top 10 economic damages

property_damage <- aggregate(PROPDMGVAL~EVTYPE,new_df,sum)
property_damage <- property_damage[order(-property_damage$PROPDMGVAL),]
property_damage <- property_damage[1:10,]
property_damage
##                EVTYPE   PROPDMGVAL
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380617
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046260
barplot(property_damage$PROPDMGVAL/(10^9), las = 3, names.arg = property_damage$EVTYPE, main = "Top 10 Property Damages by Weather Events", ylab = "In Billions", col = "blue")


3. Conclusion
The events that caused the most damage to human life are tornado, excessive heat, flood, lightning and hurricane. On the other hand, events that damaged properties are flood, typhoon, tornado, flood, hail and winter storm.