Synopsis

This report is issued to fulfill the Peer Assesment 2 of Reproducible Research Course (Coursera repdata-0322)

The data analysis address the following questions:

  * Across the United States, which types of events (as indicated in the EVTYPE variable) are most 
  harmful with respect to population health?

  * Across the United States, which types of events have the greatest economic consequences?

The NOAA database used contains storm data and its consequences from 1950 to 2011. As criteria and detail for reporting has changed over the years more work is needed to have consistent time series. That is outside the scope of this report.

More information at:

  https://www.ncdc.noaa.gov/stormevents/

Loading and Processing the Raw Data

The bz2 file is downloaded from the Course Web into R working directory. From there is read directly into Data frame sdf using the file header as column names.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata-data-StormData.csv.bz2")
sdf<-read.table("repdata-data-StormData.csv.bz2", sep = ",", header = TRUE)

R Packages used for analysis and plotting are loaded if required. The number of rows and columns are shown.

require(dplyr, quietly = TRUE)
require(ggplot2, quietly = TRUE)
dim(sdf)
## [1] 902297     37

Results

Effects on Population Health

The population health is measured in variables FATALITIES and INJURIES. Fatalities and Injuries cannot be simply added and applying more weight to Fatalities makes Injuries irrelevant. Only Fatalities has been considered.

Fatalities and number of events are grouped by Event Type using dplyr commands. The ten most deadly events are shown:

gr<-group_by(sdf,EVTYPE)
FatbyEv<-arrange(summarize(gr, Fat=sum(FATALITIES), Count=n()), desc(Fat))
FatbyEv
## Source: local data frame [985 x 3]
## 
##            EVTYPE   Fat  Count
##            (fctr) (dbl)  (int)
## 1         TORNADO  5633  60652
## 2  EXCESSIVE HEAT  1903   1678
## 3     FLASH FLOOD   978  54277
## 4            HEAT   937    767
## 5       LIGHTNING   816  15754
## 6       TSTM WIND   504 219940
## 7           FLOOD   470  25326
## 8     RIP CURRENT   368    470
## 9       HIGH WIND   248  20212
## 10      AVALANCHE   224    386
## ..            ...   ...    ...

Tornados, High temperatures and floods are the more serious events

A new variable named Intensity = Fatalities / number of events has been generated. The ten events most intense in terms of fatalities with at least five occurrences are shown:

FatbyEvI<-mutate(FatbyEv, Intensity= Fat / Count)
filter(arrange(FatbyEvI, desc(Intensity)), Count >5)
## Source: local data frame [225 x 4]
## 
##                       EVTYPE   Fat Count Intensity
##                       (fctr) (dbl) (int)     (dbl)
## 1               EXTREME HEAT    96    22 4.3636364
## 2                  HEAT WAVE   172    74 2.3243243
## 3  UNSEASONABLY WARM AND DRY    29    13 2.2307692
## 4                    TSUNAMI    33    20 1.6500000
## 5                       HEAT   937   767 1.2216428
## 6             EXCESSIVE HEAT  1903  1678 1.1340882
## 7            LOW TEMPERATURE     7     7 1.0000000
## 8             HURRICANE ERIN     6     7 0.8571429
## 9                RIP CURRENT   368   470 0.7829787
## 10         HURRICANE/TYPHOON    64    88 0.7272727
## ..                       ...   ...   ...       ...

High temperatures are clearly the most dangerous event.

To see the evolution over the years of the most deadly events a multiplot of the fatalities and number of events has been drawn.

sdf<-mutate(sdf, Year= as.POSIXlt(as.Date(sdf$BGN_DATE, "%m/%d/%Y %H:%M:%S"))$year+1900)
Evfat<-as.vector(FatbyEv$EVTYPE[1:10])
sdff<-filter(sdf, EVTYPE %in% Evfat)
grf<-group_by(sdff, EVTYPE, Year)
FatbyEvY<-arrange(summarize(grf, Fat=sum(FATALITIES), Count=n()), desc(Fat))
p<-ggplot(FatbyEvY, aes(x=Year, y=Fat))+geom_line()+facet_wrap(~ EVTYPE, ncol=5)+scale_y_log10()
p<-p + ylab("Number of fatalities (log scale)")+ggtitle("NOAA Storm Data \nFatalities")
p<-p + annotation_logticks(base = 10)
print(p)

p<-ggplot(FatbyEvY, aes(x=Year, y=Count))+geom_line()+facet_wrap(~ EVTYPE, ncol=5)+scale_y_log10()
p<-p + ylab("Number of events (log scale)")+ggtitle("NOAA Storm Data \nNumber of events")
p<-p + annotation_logticks(base = 10)
print(p)

Economic consequences

The economic consequences of storm events are recorded in the variables PROPDMG and CROPDMG of sdf data frame. As crops seem more difficult to protect from natural events the study will focus only in Property Damage (PROPDMG).

The variable PROPDMGEXP acts as a multiplier of PROPDMG. The values of this multiplier are confusing. Only K for thousand, M for million and B for billion will be used to generate a new variable PROPDMGc = PROPDMG * Multiplier.

sdf<-mutate(sdf, PROPDMGc = PROPDMG * ifelse(PROPDMGEXP == "K", 1E3, ifelse(PROPDMGEXP == "M", 
            1E6, ifelse(PROPDMGEXP == "B", 1E9, 1))))

Property and number of events are grouped by Event Type using dplyr commands. The ten most costly events are shown:

gr<-group_by(sdf,EVTYPE)
DmgbyEv<-arrange(summarize(gr, Damage=sum(PROPDMGc), Count=n()), desc(Damage))
head(format(DmgbyEv, digits=5),10)
##               EVTYPE     Damage  Count
## 1              FLOOD 1.4466e+11  25326
## 2  HURRICANE/TYPHOON 6.9306e+10     88
## 3            TORNADO 5.6926e+10  60652
## 4        STORM SURGE 4.3324e+10    261
## 5        FLASH FLOOD 1.6141e+10  54277
## 6               HAIL 1.5727e+10 288661
## 7          HURRICANE 1.1868e+10    174
## 8     TROPICAL STORM 7.7039e+09    690
## 9       WINTER STORM 6.6885e+09  11433
## 10         HIGH WIND 5.2700e+09  20212

Floods, hurricane and tornados are the most costly.

A new variable named Intensity = Damage / number of events has been generated. The ten events most intense in terms of damage with at least five occurrences are shown:

DmgbyEvI<-mutate(DmgbyEv, Intensity= Damage / Count)
head(format(filter(arrange(DmgbyEvI, desc(Intensity)), Count >5), digits=5), 10)
##                 EVTYPE     Damage  Count  Intensity
## 1    HURRICANE/TYPHOON 6.9306e+10     88 7.8757e+08
## 2       HURRICANE OPAL 3.1528e+09      9 3.5032e+08
## 3          STORM SURGE 4.3324e+10    261 1.6599e+08
## 4  SEVERE THUNDERSTORM 1.2054e+09     13 9.2720e+07
## 5            HURRICANE 1.1868e+10    174 6.8209e+07
## 6              TYPHOON 6.0023e+08     11 5.4566e+07
## 7       HURRICANE ERIN 2.5810e+08      7 3.6871e+07
## 8     STORM SURGE/TIDE 4.6412e+09    148 3.1359e+07
## 9          RIVER FLOOD 5.1189e+09    173 2.9589e+07
## 10           WILDFIRES 1.0050e+08      8 1.2562e+07

Hurricanes are clearly the most onerous event.

To see the evolution over the years of the most costly events a multiplot of the damage has been drawn.

Evdmg<-as.vector(DmgbyEv$EVTYPE[1:10])
sdfd<-filter(sdf, EVTYPE %in% Evdmg)
grd<-group_by(sdfd, EVTYPE, Year)
DmgbyEvY<-arrange(summarize(grd, Damage=sum(PROPDMGc)), desc(Damage))
p<-ggplot(DmgbyEvY, aes(x=Year, y=Damage))+geom_line()+facet_wrap(~ EVTYPE, ncol=5)+scale_y_log10()
p<-p + ylab("Damage $ (log scale)")+ggtitle("NOAA Storm Data \nProperty Damage")
p<-p + annotation_logticks(base = 10)
print(p)