Understanding Severe Weather Events and their human and economical impact

Intro

The intention of this document is to understand the impact that Sever Weather Events have in our society. Using NOAA historical data we are able to identify those events that have toll the most casualities and those events that are prone to generate the most economical impact.

Logical approach

This document can be segmented in 4 different steps: 1. Data loading import NOAA data to R 2. Data cleaning prepare a tidy dataset 3. Public Health impact analysis identify the most dangerous severe wheater event for humand kind 4. Economical impact analysis identify the most expensive severe weather event for the federal government

Data Loading

Below are the steps followed to import NOAA data:

#We start with all the required libraries accros our analysis
library(ggplot2)
library(dplyr)
library(data.table)
library(reshape2)
#Load dataset to data
data <- read.csv("repdata-data-StormData.csv.bz2")
#Select all relevant variables
data <- select(data, 
               EVTYPE, 
               FATALITIES, 
               INJURIES, 
               PROPDMG, 
               PROPDMGEXP, 
               CROPDMG, 
               CROPDMGEXP)

Data cleaning

The next step is to clean our data in order to facilitate any further calculation. There are two colums PROPDMGEXP and CROPDMGEXP that need to be changed from K, M, B to 1,000 or 1,000,000 or 1,000,000,000. To do so we are going to take advantages of sapply by creating a quick function:

f1 <- function(x) {
  if (toupper(x) == "K"){
    x = 1000 #Thousand
  } else if (toupper(x) == "M"){
    x = 1000000 #Millions
  } else if (toupper(x) == "B"){
    x = 1000000000 #Billions
  } else
    x = 0
  } #function to replace values in the *DMGEXP columns
data$PROPDMGEXP <- sapply(data$PROPDMGEXP, f1) #Replace PROPDMGEXP values
data$CROPDMGEXP <- sapply(data$CROPDMGEXP, f1) #Replace CROPDMGEXP values

We still need to “Trim a lot of fat” from our data table. Using mutate we calculate the total damage/casualities generated by the severe weather event:

data <- mutate(data, 
               Property = PROPDMG * PROPDMGEXP, 
               Crops = CROPDMG * CROPDMGEXP,
               Total = Property + Crops,
               Population = FATALITIES + INJURIES)
Health <- select(data,
                EVTYPE,
                FATALITIES,
                INJURIES,
                Population)
Damage <- select(data,
                EVTYPE,
                Property,
                Crops,
                Total)

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question we are going to use the recently created Health dataset, as it includes the required variables to answer this quesiton: 1. EVTYPE 2. FATALITIES 3. INJURIES 4. Population *new variable created by adding 2 and 3

First we are going to select the top 10 events ranked by the total of Fatalities + Injuries

q1 <- Health %>% 
  group_by(EVTYPE) %>% 
  summarize(TFatalities = sum(FATALITIES),
            TInjuries = sum(INJURIES),
            TPopulation = sum(Population)) %>%
  arrange(desc(TPopulation))
q1 <- q1[1:10,] #Limiting the table to only the top 10 events
q1
## Source: local data frame [10 x 4]
## 
##               EVTYPE TFatalities TInjuries TPopulation
## 1            TORNADO        5633     91346       96979
## 2     EXCESSIVE HEAT        1903      6525        8428
## 3          TSTM WIND         504      6957        7461
## 4              FLOOD         470      6789        7259
## 5          LIGHTNING         816      5230        6046
## 6               HEAT         937      2100        3037
## 7        FLASH FLOOD         978      1777        2755
## 8          ICE STORM          89      1975        2064
## 9  THUNDERSTORM WIND         133      1488        1621
## 10      WINTER STORM         206      1321        1527

The previous table can be easly visualized with a bar chart:

question1<-melt(q1[,1:3], id.vars="EVTYPE")
plot1 <- ggplot(question1, aes(x=EVTYPE, 
                               y=value, 
                               fill=variable)) + 
  geom_bar(stat="identity") + 
  theme_minimal() +
  ggtitle("Severe weather consequences to Population Health (by type of event)") +
  xlab("Event Type") +
  ylab("Total casualities (both Fatalities and Injuries)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot1

You can see the impact that Tornados have on the public health

2. Across the United States, which types of events have the greatest economic consequences?

To answer this question we are going to use the recently created Damage dataset, as it includes the required variables to answer this quesiton: 1. EVTYPE 2. Property new variable created by multiplying PROPDMG and PROPDMGEXP 3. Crops new variable created by multiplying CROPDMG and CROPDMGEXP 4. Total *new variable created by adding 2 and 3

Similiar to question 1, we are going to select the top 10 events ranked by the total dollar impact

q2 <- Damage %>% 
  group_by(EVTYPE) %>% 
  summarize(TProperty = sum(Property),
            TCrops = sum(Crops),
            TTotal = sum(Total)) %>%
  arrange(desc(TTotal))
q2 <- q2[1:10,] #Limiting the table to only the top 10 events
q2
## Source: local data frame [10 x 4]
## 
##               EVTYPE    TProperty      TCrops       TTotal
## 1              FLOOD 144657709800  5661968450 150319678250
## 2  HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 3            TORNADO  56937160480   414953110  57352113590
## 4        STORM SURGE  43323536000        5000  43323541000
## 5               HAIL  15732266720  3025954450  18758221170
## 6        FLASH FLOOD  16140811510  1421317100  17562128610
## 7            DROUGHT   1046106000 13972566000  15018672000
## 8          HURRICANE  11868319010  2741910000  14610229010
## 9        RIVER FLOOD   5118945500  5029459000  10148404500
## 10         ICE STORM   3944927810  5022113500   8967041310

Visualizing the previous table we can see the following:

question2<-melt(q2[,1:3], id.vars="EVTYPE")
question2$value <- question2$value/1000000000
plot2 <- ggplot(question2, aes(x=EVTYPE, 
                               y=value, 
                               fill=variable)) + 
  geom_bar(stat="identity") + 
  theme_minimal() +
  ggtitle("Severe weather consequences in dollars (by type of event)") +
  xlab("Event Type") +
  ylab("Total damage (both property and crops) in Billion Dollars") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot2

You can see how expensive Floods are theh most expensiver severe weather event

Conclusions

  1. Tornados are the most deadly “Severe Weather Events” since NOAA started recording this sort of events.
  2. Floods are the most expensive “Severe Weather Events” since NOAA started recording this sort of events.

Note That according to NOAA Storm Event Database from 1950 to 1954 only Tornados were recorded. From 1954 to 1992 only Tornado, Thunderstorm Wind and Hail were recorded and starting from 1993 all events started to be logged in the database.