Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. We use the database to answer some questions about impact of weather events in health and economic impact and show the code for your entire analysis. The analysis consist of tables, figures, or other summaries.

DATA PROCESSING

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

The first step is download the file

# download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","../StormData.csv.bz2")

the second step is read the file into a data frame

tormen <- read.csv(bzfile("C:/Users/javier/Data Science Specialization/Reproducible Research/Proyecto2/StormData.csv.bz2"))

One important thing in this analysis is the differents types of events, for this we have to determine how many events is in the field.

In the original type_of_events the number total of values are

length(unique(tormen$EVTYPE))
## [1] 985

We inspect the list of these events

eventos<- tormen$EVTYPE

and we can see some values like “FLASH FLOOD/LANDSLIDE” and FLASH “FLOOD LANDSLIDE” that are the same event, we have to fix it put all letters in lowercase and all points and characters like spaces

eventos <- tolower(eventos)
event_tipo <- gsub("[[:blank:][:punct:]+]", " ", eventos)
length(unique(event_tipo))
## [1] 874
tormen$EVTYPE<- event_tipo

further analysis could be done to clean data in event types.

Dangerous Events: HEALTH

library(plyr)
## Warning: package 'plyr' was built under R version 3.4.1
casualties <- ddply(tormen, .(EVTYPE), summarize,
                    fatalities = sum(FATALITIES),
                    injuries = sum(INJURIES))

## Find events that caused most death and injury (TOP10):

## Fatalities
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
head(fatal_events[, c("EVTYPE", "fatalities")],10)
##             EVTYPE fatalities
## 741        tornado       5633
## 116 excessive heat       1903
## 138    flash flood        978
## 240           heat        937
## 410      lightning        816
## 762      tstm wind        504
## 154          flood        470
## 515    rip current        368
## 314      high wind        248
## 19       avalanche        224
## Injuries
injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)
head(injury_events[, c("EVTYPE", "injuries")],10)
##                EVTYPE injuries
## 741           tornado    91346
## 762         tstm wind     6957
## 154             flood     6789
## 116    excessive heat     6525
## 410         lightning     5230
## 240              heat     2100
## 382         ice storm     1975
## 138       flash flood     1777
## 671 thunderstorm wind     1488
## 209              hail     1361

Dangerous Events: ECONOMIC

the first is calculate the appropiate damage, it comes in two fields, in dollars “..DMG” and it’s exponent “..DMGEXP”, the exponent comes in h -> hundred, k -> thousand, m -> million, b -> billion and have to be calculate with only one scale.

The next process is to calculate the damage in dollars

transfor <- function(e) {
    if (e %in% c('h', 'H'))
        return(2)
    else if (e %in% c('k', 'K'))
        return(3)
    else if (e %in% c('m', 'M'))
        return(6)
    else if (e %in% c('b', 'B'))
        return(9)
    else if (!is.na(as.numeric(e))) 
        return(as.numeric(e))
    else if (e %in% c('', '-', '?', '+'))
        return(0)
    else {
        stop("Invalid exponent")
    }
}


exp_dmg <- sapply(tormen$PROPDMGEXP, FUN=transfor)
tormen$prop_dmg <- tormen$PROPDMG * (10 ** exp_dmg)

exp_crop <- sapply(tormen$CROPDMGEXP, FUN=transfor)
tormen$crop_dmg <- tormen$CROPDMG * (10 ** exp_crop)

afer apply the function we obtain the real loss by event type:

library(plyr)
econ_loss <- ddply(tormen, .(EVTYPE), summarize,
                   prop_dmg = sum(prop_dmg),
                   crop_dmg = sum(crop_dmg))

##top 10 events:

##properties
Ord_dmg_prop <- arrange(econ_loss, desc(prop_dmg))
prop<-head(Ord_dmg_prop[, c("EVTYPE", "prop_dmg")],10)

##crops
Ord_dmg_crop <- arrange(econ_loss, desc(crop_dmg))
crop<-head(Ord_dmg_crop[, c("EVTYPE", "crop_dmg")],10)

RESULTS

Health Impact of Weather Events in the US

The following plots shows the most severe weather events types in health impact in US (1950-2011)

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.4.1
p1 <- ggplot(data=fatal_events,
             aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
    geom_bar(stat="identity") +
    coord_flip() +
    ylab("Number of Deaths") +
    xlab("Event type") +
    theme(legend.position="none")

p2 <- ggplot(data=injury_events,
             aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
    geom_bar(stat="identity") +
    coord_flip() + 
    ylab("Number of injuries") +
    xlab("Event type") +
    theme(legend.position="none")

grid.arrange(p1, p2,ncol=2, top="The most damaging climatic events in US (1950-2011)")

Tornados are the weather event with the most number of deaths and injuries among all event types. There are almost 6,000 deaths and more than 10,000 injuries between 1950 to 2011 in US due to tornados. The other event types that are most dangerous with respect to population health are excessive heat, flash floods and tstm winds.

Economic Impact of Weather Events in the US

The following plots shows the most severe events types in economic impact in US (1950-2011)

library(ggplot2)
library(gridExtra)

c1 <- ggplot(data=prop,
             aes(x=reorder(EVTYPE, prop_dmg), y=log10(prop_dmg), fill=prop_dmg )) +
    geom_bar(stat="identity") +
    coord_flip() +
    xlab("Event type") +
    ylab("Property damage in dollars (log-scale)") +
    theme(legend.position="none")

c2 <- ggplot(data=crop,
             aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
    geom_bar(stat="identity") + 
    coord_flip() + 
    xlab("Event type") +
    ylab("Crop damage in dollars") + 
    theme(legend.position="none") 

grid.arrange(c1, c2,ncol=2, top="The Most expensive climate events to US (1950-2011)")

Flash flood and thunderstorms winds were the weather events with the highest economics cost in propeties among all event types, both with more than 10 Billions Dollars. In the crop dagmage, the highest event cost was Drought and Flood.