Synopsis

Every climatic event has a different range of consequences related to public health and economy. Some have severe effects while some can be directly ignored. Hence, It is very important to study the general trend of these weather events on health and economy, so that precautions can be taken.

In this case study, we have taken the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

Loading the required libraries

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Loading Data

The data will be loaded into R if it isn’t already.

if(!exists("stormData")){
    stormData <- read.csv(file="repdata_data_StormData.csv.bz2")    
}

Initial exploring of stormData

dim(stormData)
## [1] 902297     37
str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Discovering relations between the various Events and Public Health and Economy

The column related with Events type is EVTYPE.

Impact on Health

For the health impact, we consider following columns:

  1. FATALITIES: Number of deaths(apporx.)
  2. INJURIES: Number of Injured People (approx.)

Impact on Economy

For the economic impact, we consider following columns:

  1. PROPDMG: Property Damages
  2. PROPDMGEXP: Property Damage Unit Value
  3. CROPDMG: Crop Damages
  4. CROPDMGEXP: Crop Damage Unit Value

Cleaning the Dataset

The above mentioned columns are extracted from stormData to get a smaller focused dataset.

req <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
req_stormData <- stormData[, req]
dim(req_stormData)
## [1] 902297      7

The economic damages are given in two parts for property and crop each. The PROPDMGEXP and CROPDMGEXP hold the exponential power of 10 to which PROPDMG and CROPDMG is mulitplied.

The characters in PROPDMGEXP and CROPDMGEXP both follow the same notion stated below

  • H/h = Hundred dollars (10^2)
  • K/k = Thousand dollars (10^3)
  • M/m = Million dollars (10^6)
  • B/b = Billion dollars (10^9)
  • rest would be a dollar each (10^0)
# function to add appropriate exponential value in place of letters
exp_unit <- function(dmgexp){
    
    dmgexp[grep("h|H", dmgexp, ignore.case = TRUE)] <- "2"
    dmgexp[grep("k|K", dmgexp, ignore.case = TRUE)] <- "3"     
    dmgexp[grep("m|M", dmgexp, ignore.case = TRUE)] <- "6"
    dmgexp[grep("b|B", dmgexp, ignore.case = TRUE)] <- "9"
    dmgexp[!grep("h|H|k|K|b|B|M|m", dmgexp, ignore.case = TRUE)] <- "0"
    dmgexp <- as.numeric(as.character(dmgexp))
}

# Calculating Total cost for each entry in dataset
req_stormData <- req_stormData %>% mutate(propusd = PROPDMG * 10^exp_unit(PROPDMGEXP)) %>% mutate(cropusd = CROPDMG * 10^exp_unit(CROPDMGEXP)) %>% mutate(totalusd = cropusd + propusd) 

Analysis

Going through fatalities for each Event type

# Grouping by all the occurences of every event while summing up their fatalities
total_fatalities <- req_stormData %>% select(EVTYPE, FATALITIES) %>% group_by(EVTYPE) %>% summarise(total_fatalities = sum(FATALITIES)) %>% arrange(-total_fatalities)
head(total_fatalities, 10)
## # A tibble: 10 x 2
##    EVTYPE         total_fatalities
##    <chr>                     <dbl>
##  1 TORNADO                    5633
##  2 EXCESSIVE HEAT             1903
##  3 FLASH FLOOD                 978
##  4 HEAT                        937
##  5 LIGHTNING                   816
##  6 TSTM WIND                   504
##  7 FLOOD                       470
##  8 RIP CURRENT                 368
##  9 HIGH WIND                   248
## 10 AVALANCHE                   224

Going through injuries for each Event type

# Grouping by all the occurences of every event while summing up their injuries
total_injuries <- req_stormData %>% select(EVTYPE,INJURIES) %>% group_by(EVTYPE) %>% summarise(total_injuries = sum(INJURIES)) %>% arrange(-total_injuries)
head(total_injuries, 10)
## # A tibble: 10 x 2
##    EVTYPE            total_injuries
##    <chr>                      <dbl>
##  1 TORNADO                    91346
##  2 TSTM WIND                   6957
##  3 FLOOD                       6789
##  4 EXCESSIVE HEAT              6525
##  5 LIGHTNING                   5230
##  6 HEAT                        2100
##  7 ICE STORM                   1975
##  8 FLASH FLOOD                 1777
##  9 THUNDERSTORM WIND           1488
## 10 HAIL                        1361

Going through total USD cost for each Event type

# Grouping by all the occurences of every event while summing up their total cost
total_cost <- req_stormData %>% select(EVTYPE, totalusd) %>% group_by(EVTYPE) %>% summarise(total_cost= sum(totalusd,na.rm = TRUE)) %>% arrange(-total_cost) 
head(total_cost, 10)
## # A tibble: 10 x 2
##    EVTYPE              total_cost
##    <chr>                    <dbl>
##  1 FLOOD             138007444500
##  2 HURRICANE/TYPHOON  29348167800
##  3 TORNADO            16570326363
##  4 HURRICANE          12405268000
##  5 RIVER FLOOD        10108369000
##  6 HAIL               10048596590
##  7 FLASH FLOOD         8716525177
##  8 ICE STORM           5925150850
##  9 STORM SURGE/TIDE    4641493000
## 10 THUNDERSTORM WIND   3813647990

Results

Across the United States, which types of events are most harmful with respect to population health?

plot1 <- ggplot(total_injuries[1:10,], aes(x = EVTYPE, y = total_injuries)) + 
    geom_bar(stat = "identity") + 
    labs(x = "Event Type", y = "Number of Injuries", title = "Events with Highest Injuries")+theme(axis.text=element_text(size=5))
plot1

plot2 <- ggplot(total_fatalities[1:10,], aes(x = EVTYPE, y = total_fatalities)) +
    geom_bar(stat = "identity") + 
    labs(x = "Event Type", y = "Number of Fatalities", title = "Events with Highest Fatalities")+theme(axis.text=element_text(size=5))
plot2

As depicted by the above figures, Tornadoes contribute highest in both the fatality and injury count.

Across the United States, which types of events have the greatest economic consequences?

plot <- ggplot(total_cost[1:10,], aes(x = EVTYPE, y = total_cost/10^9)) +
    geom_bar(stat = "identity") + 
    labs(x = "Event Type", y = "Total Cost (per Billion USD$)", title = "Events with Highest Economic Impact")+theme(axis.text=element_text(size=5))
plot

As depicted by the above figure, Floods are responsible for the highest economic impact!