Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Objectives

This report aims to respond these two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Loading and Processing the Raw Data

The file containing the is available in https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.

The data is a comma separetd file and missing values are coded as blank fields. Each collumn contains one variable and each row represents one measurement of all variables. There is a header that contains the name of each collumn variable.

library(lubridate)
library(dplyr)
data<-read.csv("FStormData.csv.bz2", header = TRUE, na.strings = "")

After reading in the file we check the first few rows (there are 902,297) rows in this dataset.

dim(data)
## [1] 902297     37
head(data[, 1:13])
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME
## 1 TORNADO         0    <NA>       <NA>     <NA>     <NA>
## 2 TORNADO         0    <NA>       <NA>     <NA>     <NA>
## 3 TORNADO         0    <NA>       <NA>     <NA>     <NA>
## 4 TORNADO         0    <NA>       <NA>     <NA>     <NA>
## 5 TORNADO         0    <NA>       <NA>     <NA>     <NA>
## 6 TORNADO         0    <NA>       <NA>     <NA>     <NA>

The collumn names are fine for understand what each one represents. For the subjects we are interested, the following collumns will be evaluated:

Here we extract some basic informations:

total.fatalities<-sum(data$FATALITIES)

Total fatalities due to severe weather is 15145 between 1950 and 2011.

total.injuries<-sum(data$INJURIES)

Total injuries due to severe weather is 140528 between 1950 and 2011.

library(dplyr)
## To compute the total damage, it will be created one variable that is equl 6 if ## PROPDMGEXP equal "M" or 3 if not.
data<-mutate(data, propexp = ifelse(PROPDMGEXP == "M", 6 ,3))
total.propdmg<-sum(data$PROPDMG*10^data$propexp/10^9, na.rm = TRUE)

Total property damages due to severe weather is 151.44 billion dollars between 1950 and 2011.

library(dplyr)
## To compute the total damage, it will be created one variable that is equl 6 if ## PROPDMGEXP equal "M" or 3 if not.
data<-mutate(data, cropexp = ifelse(CROPDMGEXP == "M", 6 ,3))
total.cropdmg<-sum(data$CROPDMG*10^data$cropexp/10^9, na.rm = TRUE)

Total crop damages due to severe weather is 35.48 billion dollars between 1950 and 2011.

Just to illustrate, it will be also shown the event with the largest number of fatalities.

library(lubridate)
most_fatal_event<-data[which(data$FATALITIES == max(data$FATALITIES)),]
most_fatal_event_year<-year(mdy_hms(as.character(most_fatal_event$BGN_DATE)))

This event occured in IL state in 1995 due to HEAT, resulting in 583 deaths.

Results

Question #1

In order to answer the first questions, it will be evaluated the top 10 events with more accumulated deaths and injuries.

library(dplyr)
fatalities<-data %>%
  group_by(EVTYPE) %>%
    summarise(FATALITIES = sum(FATALITIES))

fatal_arranged<-arrange(fatalities, desc(FATALITIES))
print(fatal_arranged[1:10,])
## # A tibble: 10 × 2
##            EVTYPE FATALITIES
##            <fctr>      <dbl>
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

We can see that TORNADO is the event with most accumulated deaths with 5633 occurences.

injuries<-data %>%
  group_by(EVTYPE) %>%
    summarise(INJURIES = sum(INJURIES))
injuries_arranged<-arrange(injuries, desc(INJURIES))
print(injuries_arranged[1:10,])
## # A tibble: 10 × 2
##               EVTYPE INJURIES
##               <fctr>    <dbl>
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

We can see that TORNADO is the event with most accumulated injuries with 91346 occurences.

Now, we will evaluate wich events is in top 10 occurence of death and injuries.

death_injuries<-merge(fatal_arranged[1:10,],injuries_arranged[1:10,])
print(arrange(death_injuries, desc(FATALITIES)))
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3    FLASH FLOOD        978     1777
## 4           HEAT        937     2100
## 5      LIGHTNING        816     5230
## 6      TSTM WIND        504     6957
## 7          FLOOD        470     6789

Just to make sure that this list contains the most harmfull events, let’s check the proportion that these selected events to the total occurences.

death_ratio<-sum(death_injuries$FATALITIES)/sum(data$FATALITIES)
injury_ratio<-sum(death_injuries$INJURIES)/sum(data$INJURIES)

So, these 7 events corresponds to 74.2% of total deaths and 85.9% of injuries reported.

The answer of first question is: The most harmful events with respect to population health is TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT, LIGHTNING, TSTM WIND, FLOOD.

Question #2

In order to answer the second questions, it will be evaluated the top 10 events with the biggest sum of property damage and crop damages.

library(dplyr)
propdmg<-data %>%
  group_by(EVTYPE) %>%
    summarise(PROPDMG = sum(PROPDMG*10^propexp/10^9, na.rm=TRUE))

propdmg_arranged<-arrange(propdmg, desc(PROPDMG))
print(propdmg_arranged[1:10,])
## # A tibble: 10 × 2
##               EVTYPE   PROPDMG
##               <fctr>     <dbl>
## 1            TORNADO 51.625973
## 2              FLOOD 22.157832
## 3        FLASH FLOOD 15.141163
## 4               HAIL 13.927644
## 5          HURRICANE  6.168325
## 6          TSTM WIND  4.484983
## 7          HIGH WIND  3.970083
## 8          ICE STORM  3.944978
## 9  HURRICANE/TYPHOON  3.805906
## 10          WILDFIRE  3.725115

We can see that TORNADO is the event with most accumulated property damage with 51.63 billion dollars.

library(dplyr)
cropdmg<-data %>%
  group_by(EVTYPE) %>%
    summarise(CROPDMG = sum(CROPDMG*10^cropexp/10^9, na.rm = TRUE))

cropdmg_arranged<-arrange(cropdmg, desc(CROPDMG))
print(cropdmg_arranged[1:10,])
## # A tibble: 10 × 2
##               EVTYPE    CROPDMG
##               <fctr>      <dbl>
## 1            DROUGHT 12.4725675
## 2              FLOOD  5.6619684
## 3               HAIL  3.0259745
## 4          HURRICANE  2.7419100
## 5        FLASH FLOOD  1.4213171
## 6       EXTREME COLD  1.2929730
## 7  HURRICANE/TYPHOON  1.0978743
## 8       FROST/FREEZE  1.0940860
## 9         HEAVY RAIN  0.7333998
## 10    TROPICAL STORM  0.6783460

We can see that DROUGHT is the event with most accumulated crop damage with 12.47 billion dollars.

Now, we will evaluate wich events is in top 10 occurence of crop and property damage.

prop_crop<-merge(propdmg_arranged[1:10,],cropdmg_arranged[1:10,])
print(arrange(prop_crop, desc(PROPDMG)))
##              EVTYPE   PROPDMG  CROPDMG
## 1             FLOOD 22.157832 5.661968
## 2       FLASH FLOOD 15.141163 1.421317
## 3              HAIL 13.927644 3.025974
## 4         HURRICANE  6.168325 2.741910
## 5 HURRICANE/TYPHOON  3.805906 1.097874

Just to make sure that this list contains the most harmfull events, let’s check the proportion that these selected events to the total occurences.

propdmg_ratio<-sum(prop_crop$PROPDMG)/total.propdmg
cropdmg_ratio<-sum(prop_crop$CROPDMG)/total.cropdmg

So, these 5 events corresponds to 40.4% of total property damage and 39.3% of crop damage reported.

These approach seems not be good. So, it will be applied the sum of property damage and crop damage.

library(dplyr)
total<-merge(propdmg, cropdmg, by = "EVTYPE")
total<-mutate(total, total = PROPDMG+CROPDMG)

totaldmg_arranged<-arrange(total, desc(total))
print(totaldmg_arranged[1:10,])
##               EVTYPE   PROPDMG    CROPDMG     total
## 1            TORNADO 51.625973  0.4151131 52.041086
## 2              FLOOD 22.157832  5.6619684 27.819801
## 3               HAIL 13.927644  3.0259745 16.953619
## 4        FLASH FLOOD 15.141163  1.4213171 16.562480
## 5            DROUGHT  1.046106 12.4725675 13.518674
## 6          HURRICANE  6.168325  2.7419100  8.910235
## 7          TSTM WIND  4.484983  0.5540074  5.038991
## 8  HURRICANE/TYPHOON  3.805906  1.0978743  4.903780
## 9          HIGH WIND  3.970083  0.6385713  4.608654
## 10          WILDFIRE  3.725115  0.2954728  4.020588

The answer of second question is: The most harmful events with respect to economic consequences is TORNADO, FLOOD, HAIL, FLASH FLOOD, DROUGHT.

Tornado through the years

As we can see, the event Tornado is in the top of both lists. So, we will analyse the effects of this kind of event through the years.

Firstly, we will add one more variable in data set that will represent the year of event.

data<-mutate(data, year = year(mdy_hms(as.character(data$BGN_DATE))))
head(data[1:10,35:38])
##   LONGITUDE_ REMARKS REFNUM propexp
## 1       8806    <NA>      1       3
## 2          0    <NA>      2       3
## 3          0    <NA>      3       3
## 4          0    <NA>      4       3
## 5          0    <NA>      5       3
## 6          0    <NA>      6       3

Now, we will create one variable with only tornado events and grouped by year of event.

tornado<-data %>%
  filter(EVTYPE == "TORNADO") %>%
    group_by(EVTYPE, year) %>%
      summarise(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), PROPDMG = sum(PROPDMG*10^propexp/10^9), CROPDMG = sum(CROPDMG*10^cropexp/10^6))
head(tornado)
## Source: local data frame [6 x 6]
## Groups: EVTYPE [1]
## 
##    EVTYPE  year FATALITIES INJURIES    PROPDMG CROPDMG
##    <fctr> <dbl>      <dbl>    <dbl>      <dbl>   <dbl>
## 1 TORNADO  1950         70      659 0.03448165      NA
## 2 TORNADO  1951         34      524 0.06550599      NA
## 3 TORNADO  1952        230     1915 0.09410224      NA
## 4 TORNADO  1953        519     5131 0.59610470      NA
## 5 TORNADO  1954         36      715 0.08580532      NA
## 6 TORNADO  1955        129      926 0.08266063      NA

Finally, we will create one plot that show how tornado consequences bahave through the years.

par(mfrow=c(2,2), oma=c(0,0,2,0))
with(tornado, plot(year, FATALITIES, pch=20, xlab="Year", ylab = "Fatalities"))
with(tornado, plot(year, INJURIES, pch=20, xlab="Year", ylab = "Injuries"))
with(tornado, plot(year, PROPDMG, pch=20, xlab="Year", ylab = "Property damage (billion $)"))
with(tornado, plot(year, CROPDMG, pch=20, xlab="Year", ylab = "Crop damage (million $)"))
title("Tornado consequences through the years", outer = TRUE )