Reproducible Research:

Analysis of the Impact of Severe Weather Events on United States

Published By: Eric Lim B G, Published Date: 21-Oct-14


Synopsis

Storms and other severe weather events cause both public health and economic problems for communities and municipalities across the United States. This analysis make uses of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify events that are most harmful to population health (e.g. fatalities, injuries) or with the greatest economic consequences (e.g. crop & property damages).

The database consists of weather events recorded between year 1950 to November 2011 from which 7 variables per event were selected for the analysis.

The result of the analysis shows that while tornadoes are the most hazardous to human health with 5633 reported fatalities and 91346 injuries, floods posts the greatest economic consequences with total damages in excess of $160 billion.


Data Processing

The storm data file (*.bz2) is obtained from U.S. NOAA and placed in the “data” folder under the R working directory. The file is subsequently unzipped for relevant variables to be extracted and loaded into a data frame for analysis.

The 7 variables extracted are listed below:

  1. EVTYPE : Type of weather events (e.g. tornado, flood & etc.)
  2. FATALITIES : Number of death related to the weather event
  3. INJURIES : Number of injuries related to the weather event
  4. PROPDMG : Damages to properties denote in USD
  5. PROPDMGEXP : Expotential factor on property damages (e.g. M=millions, B=billions & etc.)
  6. CROPDMG : Damages to crops denote in USD
  7. CROPDMGEXP : Expotential factor on crop damages (e.g. M=millions, B=billions & etc.)
library("R.utils")
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
# Unzip U.S. NOAA (*.BZ2) data file
bunzip2(file="data/repdata-data-StormData.csv.bz2",
        destname="data/repdata-data-StormData.csv",
        overwrite=TRUE,remove=FALSE)
unlink("repdata-data-StormData.csv.bz2")

# Load data from CSV file
stormdata <- read.csv(file="data/repdata-data-StormData.csv",header=TRUE,sep=",",
                      strip.white=TRUE,na.strings=c("NA",""))

# Extract variables of interest
studydata <- data.frame(EVTYPE=stormdata$EVTYPE,
                        FATALITIES=stormdata$FATALITIES,INJURIES=stormdata$INJURIES,
                        PROPDMG=stormdata$PROPDMG,PROPDMGEXP=stormdata$PROPDMGEXP,
                        CROPDMG=stormdata$CROPDMG,CROPDMGEXP=stormdata$CROPDMGEXP)

The structure and summary of the extracted data is previewed to assess data format and quality.

str(studydata)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...
summary(studydata)
##                EVTYPE         FATALITIES          INJURIES        
##  HAIL             :288661   Min.   :  0.0000   Min.   :   0.0000  
##  TSTM WIND        :219940   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  THUNDERSTORM WIND: 82563   Median :  0.0000   Median :   0.0000  
##  TORNADO          : 60652   Mean   :  0.0168   Mean   :   0.1557  
##  FLASH FLOOD      : 54277   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  FLOOD            : 25326   Max.   :583.0000   Max.   :1700.0000  
##  (Other)          :170878                                         
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00   K      :424665   Min.   :  0.000   K      :281832  
##  1st Qu.:   0.00   M      : 11330   1st Qu.:  0.000   M      :  1994  
##  Median :   0.00   0      :   216   Median :  0.000   k      :    21  
##  Mean   :  12.06   B      :    40   Mean   :  1.527   0      :    19  
##  3rd Qu.:   0.50   5      :    28   3rd Qu.:  0.000   B      :     9  
##  Max.   :5000.00   (Other):    84   Max.   :990.000   (Other):     9  
##                    NA's   :465934                     NA's   :618413

Page 12 of the Storm Data Documentation provided by National Weather Service states that PROPDMGEXP and CROPDMGEXP are expotential factor (i.e. M=millions, B=billions & etc.) for PROPDMG and CROPDMG respectively. Therefore, property and crop damages are recomputed with these factors to reflect their actual values.

# Convert, trim and format string variables
studydata$EVTYPE <- toupper(trim(as.character(studydata$EVTYPE)))
studydata$PROPDMGEXP <- toupper(trim(as.character(studydata$PROPDMGEXP)))
studydata$CROPDMGEXP <- toupper(trim(as.character(studydata$CROPDMGEXP)))

# Convert expotential factor of property damages into power multiplier (of 10)
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="+")|(studydata$PROPDMGEXP=="-")|
                    (studydata$PROPDMGEXP=="?")|(studydata$PROPDMGEXP==0)|
                    (is.na(studydata$PROPDMGEXP))] <- 0
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="H")] <- 2
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="K")] <- 3
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="M")] <- 6
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="B")] <- 9

# Convert expotential factor of crop damages into power multiplier (of 10)
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="+")|(studydata$CROPDMGEXP=="-")|
                    (studydata$CROPDMGEXP=="?")|(studydata$CROPDMGEXP==0)|
                    (is.na(studydata$CROPDMGEXP))] <- 0
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="H")] <- 2
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="K")] <- 3
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="M")] <- 6
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="B")] <- 9

# Implement the power multiplier for property damages
studydata$PROPDMGEXP <- as.numeric(studydata$PROPDMGEXP)
studydata$PROPDMG <- (10^studydata$PROPDMGEXP)*studydata$PROPDMG

# Implement the power multiplier for crop damages
studydata$CROPDMGEXP <- as.numeric(studydata$CROPDMGEXP)
studydata$CROPDMG <- (10^studydata$CROPDMGEXP)*studydata$CROPDMG

# Compute and incorporate total damages into data frame
TOTALDMG <- studydata$PROPDMG+studydata$CROPDMG
studydata <- cbind(studydata,TOTALDMG)

Previous summary shows duplication/classification issues (e.g. “TSTM WIND” and “THUNDERSTORM WIND”). Therefore, the top 20 weather event types with reference to the subject of interest (i.e. injuries, fatalities & damages) are sampled to consolidate similar event types.

head(sort(tapply(studydata$FATALITIES,studydata$EVTYPE,sum),decreasing=TRUE),20)
##                 TORNADO          EXCESSIVE HEAT             FLASH FLOOD 
##                    5633                    1903                     978 
##                    HEAT               LIGHTNING               TSTM WIND 
##                     937                     816                     504 
##                   FLOOD             RIP CURRENT               HIGH WIND 
##                     470                     368                     248 
##               AVALANCHE            WINTER STORM            RIP CURRENTS 
##                     224                     206                     204 
##               HEAT WAVE            EXTREME COLD       THUNDERSTORM WIND 
##                     172                     162                     133 
##              HEAVY SNOW EXTREME COLD/WIND CHILL               HIGH SURF 
##                     127                     125                     104 
##             STRONG WIND                BLIZZARD 
##                     103                     101
head(sort(tapply(studydata$INJURIES,studydata$EVTYPE,sum),decreasing=TRUE),20)
##            TORNADO          TSTM WIND              FLOOD 
##              91346               6957               6789 
##     EXCESSIVE HEAT          LIGHTNING               HEAT 
##               6525               5230               2100 
##          ICE STORM        FLASH FLOOD  THUNDERSTORM WIND 
##               1975               1777               1488 
##               HAIL       WINTER STORM  HURRICANE/TYPHOON 
##               1361               1321               1275 
##          HIGH WIND         HEAVY SNOW           WILDFIRE 
##               1137               1021                911 
## THUNDERSTORM WINDS           BLIZZARD                FOG 
##                908                805                734 
##   WILD/FOREST FIRE         DUST STORM 
##                545                440
head(sort(tapply(studydata$TOTALDMG,studydata$EVTYPE,sum),decreasing=TRUE),20)
##                     FLOOD         HURRICANE/TYPHOON 
##              150319678257               71913712800 
##                   TORNADO               STORM SURGE 
##               57362333947               43323541000 
##                      HAIL               FLASH FLOOD 
##               18761221986               18244041079 
##                   DROUGHT                 HURRICANE 
##               15018672000               14610229010 
##               RIVER FLOOD                 ICE STORM 
##               10148404500                8967041360 
##            TROPICAL STORM              WINTER STORM 
##                8382236550                6715441251 
##                 HIGH WIND                  WILDFIRE 
##                5908617595                5060586800 
##                 TSTM WIND          STORM SURGE/TIDE 
##                5047065845                4642038000 
##         THUNDERSTORM WIND            HURRICANE OPAL 
##                3897965522                3191846000 
##          WILD/FOREST FIRE HEAVY RAIN/SEVERE WEATHER 
##                3108626330                2500000000

Consolidation of weather event types is performed to reduce fragmentation issues. (Note: certain human judgement is exercised in identifying the duplicates.)

studydata$EVTYPE[(studydata$EVTYPE=="TSTM WIND")] <- "THUNDERSTORM WIND"
studydata$EVTYPE[(studydata$EVTYPE=="RIP CURRENTS")] <- "RIP CURRENT"
studydata$EVTYPE[(studydata$EVTYPE=="STORM SURGE")] <- "STORM SURGE/TIDE"
studydata$EVTYPE[(studydata$EVTYPE=="WILD FIRE")] <- "WILD/FOREST FIRE"
studydata$EVTYPE[(studydata$EVTYPE=="HURRICANE OPAL")] <- "HURRICANE/TYPHOON"
studydata$EVTYPE[(studydata$EVTYPE=="HURRICANE")] <- "HURRICANE/TYPHOON"
studydata$EVTYPE[(studydata$EVTYPE=="RIVER FLOOD")] <- "FLOOD"
studydata$EVTYPE[(studydata$EVTYPE=="ICE STORM")] <- "WINTER STORM"
studydata$EVTYPE[(studydata$EVTYPE=="EXCESSIVE HEAT")] <- "HEAT"
studydata$EVTYPE[(studydata$EVTYPE=="HEAT WAVE")] <- "HEAT"

Data processing is now completed, and the data are ready for our analysis.


Results

The top 15 weather event types that have the greatest fatalities in the United States from year 1950 to November 2011 are plotted.

library(ggplot2)

# Aggregate fatalities count by weather event type, sort and extract top 15
fatalities <- aggregate(FATALITIES~EVTYPE,studydata,"sum")
fatalities <- fatalities[order(fatalities$FATALITIES,decreasing=TRUE),][1:15,]

# Plot horizontal bar chart of top fatalities by weather event type
ggplot(transform(fatalities,EVTYPE=reorder(EVTYPE,order(FATALITIES))),
        aes(EVTYPE,FATALITIES,fill=FATALITIES)) + 
        coord_flip() + xlab("Event Type") + ylab("Fatalities") + 
        ggtitle("Top 15 Fatalities By Weather Event Type") +        
        geom_bar(stat="identity") + theme(legend.position = "none") +
        geom_text(aes(x=EVTYPE,y=FATALITIES,ymax=FATALITIES,
                      label=FATALITIES,hjust=1,vjust=0.5),
                  colour="red",size=3)

Tornadoes (TORNADO) post the greatest risk to human lives in the United States with 5633 fatalities recorded from 1950 to November 2011. This is more than the fatalities of the second (HEAT = 3012) and third (FLASH FLOOD = 978) riskiest weather event combined.


The top 15 weather event types that cause the most injuries in the United States from year 1950 to November 2011 are plotted.

# Aggregate injuries count by weather event type, sort and extract top 15
injuries <- aggregate(INJURIES~EVTYPE,studydata,"sum")
injuries <- injuries[order(injuries$INJURIES,decreasing=TRUE),][1:15,]

# Plot horizontal bar chart of top injuries by weather event type
ggplot(transform(injuries,EVTYPE=reorder(EVTYPE,order(INJURIES))),
        aes(EVTYPE,INJURIES,fill=INJURIES)) + 
        coord_flip() + xlab("Event Type") + ylab("Injuries") + 
        ggtitle("Top 15 Injuries By Weather Event Type") +        
        geom_bar(stat="identity") + theme(legend.position = "none") + 
        geom_text(aes(x=EVTYPE,y=INJURIES,ymax=INJURIES,
                      label=INJURIES,hjust=1,vjust=0.5),
                  colour="red",size=3)

Tornadoes (TORNADO) also result in the most injuries (91346) in the United States, which is more than 4-times the combined reported injuries of the second (HEAT = 9004) and third (THUNDERSTORM WIND = 8445) most hazardous weather event.


The top 15 weather event types that have the greatest financial impact on the United States from year 1950 to November 2011 are plotted.

# Aggregate damages by weather event type, sort and extract top 15
damages <- aggregate(TOTALDMG~EVTYPE,studydata,"sum")
damages <- damages[order(damages$TOTALDMG,decreasing=TRUE),][1:15,]

# Plot horizontal bar chart of top damages by weather event type
ggplot(transform(damages,EVTYPE=reorder(EVTYPE,order(TOTALDMG))),
        aes(EVTYPE,round(TOTALDMG/1000000),fill=TOTALDMG)) + 
        coord_flip() + xlab("Event Type") + ylab("Damages ('mil)") + 
        ggtitle("Top 15 Damages By Weather Event Type") +        
        geom_bar(stat="identity") + theme(legend.position = "none") +
        geom_text(aes(x=EVTYPE,y=round(TOTALDMG/1000000),ymax=round(TOTALDMG/1000000),
                      label=round(TOTALDMG/1000000),hjust=1,vjust=0.5),
                  colour="red",size=3)

In terms of economic impact however, tornadoes are ranked third with total damages of 57 billion. Floods (FLOOD = +160 billion) and Hurricanes/Typhoons (HURRICANE/TYPHOON = +89 billion) are the top two weather events that post the greatest economic consequences.