Course Project 2

Gilbert T. Tarus

The most harmful weather events on Population Health and the Economy.

Synopsis

This project aims to find the most harmful weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. In order to investigate these events, we use the estimates of fatalities, injuries, property damage, crop damage and event types. From these data, we found that TORNADO and Excessive Heat are the most harmful to the population health, while Flood,Hurricane/Typhoon,Tornado and Storm Surge have the greatest economic consequences.

Loading and Processing the Raw Data

The data for this project waas obtained from the course project’s website. It comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded using this link.

Load the required packages

library(dplyr)
library(ggplot2)
library(R.utils)

Loading the data

The data has been downloaded into the local disk.

if(exists("stormdata.csv")){
    stormdata <- read.table("stormdata.csv",sep = ",",header = TRUE)
}else{
    stormdata <- read.table(bunzip2("repdata_data_StormData.csv.bz2","stormdata.csv"),sep = ",",header = TRUE)
}
dim(stormdata)
## [1] 902297     37
names(stormdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Data Processing

Select the required varaibles

In order to assess the health and economic effects of weather events, we select the required variables from the storm data for the analysis. The required variables are:

EVTYPE: weather event.

For Economic consequences, we select:

  1. PROPDMG: approx. property damags
  2. PROPDMGEXP: the units for property damage value
  3. CROPDMG: approx. crop damages
  4. CROPDMGEXP: the units for crop damage value

For Health effect, we select:

  • FATALITIES: approximate number of deaths
  • INJURIES: approximate number of injuries
working_data <- stormdata %>% 
    select(EVTYPE,PROPDMGEXP,PROPDMG,CROPDMG,CROPDMGEXP,FATALITIES,INJURIES)

Check missing values

check <- NULL
for (i in 1:ncol(working_data)) {
    check[i] <- mean(is.na(working_data[,i]))
}
check
## [1] 0 0 0 0 0 0 0

1. Population health

Injuries

injuries <- working_data %>% 
    select(EVTYPE,INJURIES) %>% 
    group_by(EVTYPE) %>% 
    summarise(Total.injuries = sum(INJURIES,na.rm = TRUE)) %>% 
    arrange(desc(Total.injuries))
## `summarise()` ungrouping output (override with `.groups` argument)
head(injuries,10)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 10 x 2
##    EVTYPE            Total.injuries
##    <fct>                      <dbl>
##  1 TORNADO                    91346
##  2 TSTM WIND                   6957
##  3 FLOOD                       6789
##  4 EXCESSIVE HEAT              6525
##  5 LIGHTNING                   5230
##  6 HEAT                        2100
##  7 ICE STORM                   1975
##  8 FLASH FLOOD                 1777
##  9 THUNDERSTORM WIND           1488
## 10 HAIL                        1361

Fatalities

fatalities <- working_data %>% 
    select(EVTYPE,FATALITIES) %>% 
    group_by(EVTYPE) %>% 
    summarise(Total.fatalities = sum(FATALITIES,na.rm = TRUE)) %>% 
    arrange(desc(Total.fatalities))
## `summarise()` ungrouping output (override with `.groups` argument)
head(fatalities,10)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 10 x 2
##    EVTYPE         Total.fatalities
##    <fct>                     <dbl>
##  1 TORNADO                    5633
##  2 EXCESSIVE HEAT             1903
##  3 FLASH FLOOD                 978
##  4 HEAT                        937
##  5 LIGHTNING                   816
##  6 TSTM WIND                   504
##  7 FLOOD                       470
##  8 RIP CURRENT                 368
##  9 HIGH WIND                   248
## 10 AVALANCHE                   224

Economic consequences

Property damage(PROPDMG) and Crop damage(CROPDMG) are two featuures of economic effects of weather events. The value of such damages in USD is given in PROPDMGEXP and CROPDMGEXP variables. According to this link;

The possible values of CROPDMGEXP and PROPDMGEXP are:

  • H,h,K,k,M,m,B,b,+,-,?,0,1,2,3,4,5,6,7,8, and blank-character

These levels can be interpreted as:

  • H,h = hundreds = 100
  • K,k = kilos = thousands = 1,000
  • M,m = millions = 1,000,000
  • B,b = billions = 1,000,000,000
  • (+) = 1
  • (-) = 0
  • (?) = 0
  • blank/empty character = 0
  • numeric 0..8 = 10

These two variables have to be transformed into one unit (USD) using the information above.

#PROPDMGEXP
df <- working_data
symbolsP <- levels(df$PROPDMGEXP)
PropdmgexpSV <- c(0,0,0,1,rep(10,9),10^9,100,100,1000,10^6,10^6)
value.data <- data.frame(PROPDMGEXP=symbolsP,PropdmgexpSV)
# Merge with the df
df <- merge(df,value.data)

# CROPDMGEXP
symbolsC <- levels(df$CROPDMGEXP)
CropdmgexpSV <- c(0,0,10,10,10^9,10^3,10^3,10^6,10^6)
value.data2 <- data.frame(CROPDMGEXP = symbolsC,CropdmgexpSV)

## Merge
df <- merge(df,value.data2)

# Create a data frame with total value of economic damage
df <- df %>% 
    mutate(
        PropdmgV = PROPDMG*PropdmgexpSV,
        CropdmgV = CROPDMG*CropdmgexpSV,
        Total.damage = PropdmgV+CropdmgV
        )
head(df)
##   CROPDMGEXP PROPDMGEXP    EVTYPE PROPDMG CROPDMG FATALITIES INJURIES
## 1                       TSTM WIND       0       0          0        0
## 2                       TSTM WIND       0       0          0        0
## 3                       TSTM WIND       0       0          0        0
## 4                       TSTM WIND       0       0          0        0
## 5                       TSTM WIND       0       0          0        0
## 6                            HAIL       0       0          0        0
##   PropdmgexpSV CropdmgexpSV PropdmgV CropdmgV Total.damage
## 1            0            0        0        0            0
## 2            0            0        0        0            0
## 3            0            0        0        0            0
## 4            0            0        0        0            0
## 5            0            0        0        0            0
## 6            0            0        0        0            0

Calculate economic damage of every event.

Economic.DMG <- df %>% 
    group_by(EVTYPE) %>% 
    summarise(total.DMG = sum(Total.damage)) %>% 
    arrange(desc(total.DMG),EVTYPE)
## `summarise()` ungrouping output (override with `.groups` argument)
head(Economic.DMG)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 2
##   EVTYPE               total.DMG
##   <fct>                    <dbl>
## 1 FLOOD             150319678250
## 2 HURRICANE/TYPHOON  71913712800
## 3 TORNADO            57352117607
## 4 STORM SURGE        43323541000
## 5 HAIL               18758224527
## 6 FLASH FLOOD        17562132111

Results

Population Health

ggplot(fatalities[1:10,],aes(reorder(EVTYPE,-Total.fatalities),Total.fatalities))+
    geom_bar(stat = "identity",fill = "red")+
    coord_cartesian(ylim = c(0,6000))+
    theme(axis.text.y = element_text(color = "red"),axis.text.x = element_text(angle = 90,hjust = 1,vjust = 0.5),panel.background = element_rect(fill = "wheat"),axis.line = element_line(arrow = arrow()),axis.text = element_text(color = "navyblue"))+
    labs(x = "Event Type", y = "Total Fatalities",title = "Top 10 Events Causing Highest Fatalities")

ggplot(injuries[1:10,],aes(reorder(EVTYPE,-Total.injuries),Total.injuries))+
    geom_bar(stat = "identity",fill = "navyblue")+
    coord_cartesian(ylim = c(0,100000))+
    theme(axis.text.y = element_text(color = "red"),axis.text.x = element_text(angle = 90,hjust = 1,vjust = 0.5),panel.background = element_rect(fill = "pink"),axis.line = element_line(arrow = arrow()),axis.text = element_text(color = "navyblue"))+
    labs(x = "Event Type", y = "Total Injuries",title = "Top 10 Events Causing Highest Injuries")

ggplot(Economic.DMG[1:10,],aes(x = reorder(EVTYPE,-total.DMG), y = total.DMG/10^9))+
    geom_bar(stat = "identity",fill = rgb(0.5,0,0.5))+#
    coord_cartesian(ylim = c(0,200))+
    theme(axis.text.y = element_text(color = "red"),axis.text.x = element_text(angle = 90),panel.background = element_rect(fill = "wheat"),axis.line = element_line(arrow = arrow()),axis.text = element_text(color = "navyblue"),axis.title = element_text(color = "blue"))+
    labs(x = "Event Type",y = "Total Economic Damage( x 1e+09 USD)",title = "Top 10 Events With Highest Economic Impacts")