Synopsis

This report explores the impact sever weather might have on health and economy. The report uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The report primarily reports the the most common type of sever weather related to fatality, injury, property damage and crop damage in USA from 1950 to 2011.

Data Processing

Loading libraries

library(readr)
library(ggplot2)
library(data.table)
library(patchwork)

Set the working directory to a folder that contains the data

setwd("~/Documents/Reproduceable_research")

Loading data

The dataset is downloaded to the local directory and is loaded and unzipped using the `read.csv’ function. data exploration was done by inspecting the variable names and inspecting a subset of the data.

data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Inspect data

head(data, 2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
## 2         NA         0                         2   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Subset the data set

Health Impact

For ease of computation, I am using data table, and subset data directly related to the health and economic impacts.

To find the event with highest health impact, we summarize health events (fatalities and injuries by event type)

As the data is bulky, our analysis is restricted to top 20 weather events interms of the death and injury they cause. To determine the most harmful weather event, the data is subsetted into Injuries and casualities.

data <- data.table(data)

Casualities <- data[, .(casualities=sum(FATALITIES)), by= (EVTYPE)]

Injuries <- data[, .(injuries=sum(INJURIES)), by= (EVTYPE)]
Economic Impact

Economic Impact is measures by three variables in the dataset: PROPDMG, CROPDMG.

Transformation

Cost is measured in thousands, millions, and billions. All cost data are trasformed into similar unit for easy comparision as follws. `

data <- data.table(data)

propDamage <- data[, c('EVTYPE', 'PROPDMG', 'PROPDMGEXP')]
  
  # multiply the cost by units
  propDamageK <- propDamage[PROPDMGEXP=='K', .(cost=PROPDMG*1e3),by= EVTYPE]
  propDamageM <- propDamage[PROPDMGEXP=='M', .(cost=PROPDMG*1e6), by= EVTYPE]
  propDamageB <- propDamage[PROPDMGEXP=='B', .(cost=PROPDMG*1e9), by= EVTYPE]
  
  # merge all
  propDamageAll <- rbind(propDamageK, propDamageM, propDamageB)
  # adding property damage cost
  prop_cost_sum = propDamageAll[, .(cost=sum(cost)), by= (EVTYPE)]
  

  # Process crop damage cost
  
  cropDamage <- data[, c('EVTYPE', 'CROPDMG', 'CROPDMGEXP')]
  
  # multiply the cost by units
  cropDamageK <- cropDamage[CROPDMGEXP=='K', .(cost=CROPDMG*1e3),by= EVTYPE]
  cropDamageM <- cropDamage[CROPDMGEXP=='M', .(cost=CROPDMG*1e6), by= EVTYPE]
  cropDamageB <- cropDamage[CROPDMGEXP=='B', .(cost=CROPDMG*1e9), by= EVTYPE]
  
  # merge all
  cropDamageAll <- rbind(cropDamageK, cropDamageM, cropDamageB)
  
  # summing cost of crop damage
  crop_cost_sum = cropDamageAll[, .(cost=sum(cost)), by= (EVTYPE)]

Results

Questions

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Tornado is the most harmful weather event as a cause of mortality and injury. Flash flood is the second most important cause of fatality, while TSTM wind stands as the second most important cause of injury.

Health Impact of Weather in US, 1950-2011

# plotting impact of weather events on health (Fatalities)

a <- ggplot(Casualities[1:20,],
              aes(x = reorder(EVTYPE, -casualities), 
                  y = casualities,
                  )) +
    geom_bar(stat = 'identity', col = 'black') +
    labs(title = 'Top 20 fatal weather events', 
         x = 'Type of event',
         y = 'Counts',
         fill = '') +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))


# plotting impact of weather events on health (Injuries)
b <- ggplot(Injuries[1:20,],
              aes(x = reorder(EVTYPE, -injuries), 
                  y = injuries,
                  )) +
    geom_bar(stat = 'identity', col = 'black') +
    labs(title = 'Top 20  weather events by Injury', 
         x = 'Type of event',
         y = 'Counts',
         fill = '') +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

patchwork =a+b
patchwork + plot_annotation(
  title = 'Health Impact of Weather in US, 1950-2011')

Figure 1. Top 20 sever weather types by health consequences, 1950-2011

Economic Consequences

2.Across the United States, which types of events have the greatest economic consequences?

Plotting, the ecconomic damages of sever weather events. As in the health consequences, the analysis is based on top 20 weather events.

Tornnado is most harmful weather type in causing property damage (panel 1). Its property damage costs are extremely high, reported to be more than USD 50 billion.Floods, on the otherhand, cause serious damage to crops, costing billions of dollars.

Economic Impact of Weather in US, 1950-2011

 # calculate total property damage cost

  
  c <- ggplot(prop_cost_sum[1:20,],
              aes(x = reorder(EVTYPE, -cost), 
                  y = cost,
              )) +
    geom_bar(stat = 'identity', col = 'black') +
    labs(title = 'Top 20 events by property cost', 
         x = 'Type of event',
         y = 'total cost, USD(log scale)',
         fill = '') +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))
  
# plotting impact of weather events on crop damage cost
d <- ggplot(crop_cost_sum[1:20,],
              aes(x = reorder(EVTYPE, -cost), 
                  y = cost,
                  )) +
    geom_bar(stat = 'identity', col = 'black') +
    labs(title = 'Top 20 events by Crop damage', 
         x = 'Type of event',
         y = 'total cost, USD(log scale)',
         fill = '') +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

patchwork=c+d

patchwork + plot_annotation(
  title = 'Economic Impact of Weather in US, 1950-2011')

Figure 2. Top 20 sever weather types by economic consequences

Transformation

The cost data is skewed and is log transformed

crop_cost_sum$cost = log10(crop_cost_sum$cost)
prop_cost_sum$cost = log10(prop_cost_sum$cost)

Economic Impact of Weather in US, 1950-2011, log transformed cost

 # calculate total property damage cost

  
  e <- ggplot(prop_cost_sum[1:20,],
              aes(x = reorder(EVTYPE, -cost), 
                  y = cost,
              )) +
    geom_bar(stat = 'identity', col = 'black') +
    labs(title = 'Top 20 events by property cost', 
         x = 'Type of event',
         y = 'total cost, USD(log scale)',
         fill = '') +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))
  
# plotting impact of weather events on crop damage cost
f <- ggplot(crop_cost_sum[1:20,],
              aes(x = reorder(EVTYPE, -cost), 
                  y = cost,
                  )) +
    geom_bar(stat = 'identity', col = 'black') +
    labs(title = 'Top 20 events by Crop damage', 
         x = 'Type of event',
         y = 'total cost, USD(log scale)',
         fill = '') +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

e+f

Figure 2. Top 20 sever weather types by economic consequences (log-transformed cost)

Conclusions

Tornado is the most harmful weather event It is the top cause of weather related mortality, injury and property damage. However, floods are the main causes of crop damage compared to other sever weather forms.