Synopsis

Severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This report will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine the type of events (Flood, Storm, Drought, etc) that cause the most economic losses and problems to public healts.

Data Processing

The data (NOAA database) was obtained from: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.

if (!file.exists("./stormdata.bz2")){
        url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download.file(url, destfile = "./stormdata.bz2")}
filedate <- file.info("./stormdata.bz2")$mtime

To create this report the data was downloaded on: 2015-04-25 18:00:46

classes = c("numeric", "character", "character", "character", "NULL", "factor", "factor", 
            "factor", "NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL",
            "NULL","NULL","NULL","NULL","NULL", "numeric", "numeric", "numeric", "factor", 
            "numeric", "factor", "NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL")
stormdata <- read.table("./stormdata.bz2", header=T, sep=",", stringsAsFactors = FALSE, 
                        colClasses=classes)
summary(stormdata)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##       COUNTYNAME         STATE                      EVTYPE      
##  JEFFERSON :  7840   TX     : 83728   HAIL             :288661  
##  WASHINGTON:  7603   KS     : 53440   TSTM WIND        :219940  
##  JACKSON   :  6660   OK     : 46802   THUNDERSTORM WIND: 82563  
##  FRANKLIN  :  6256   MO     : 35648   TORNADO          : 60652  
##  LINCOLN   :  5937   IA     : 31069   FLASH FLOOD      : 54277  
##  MADISON   :  5632   NE     : 30271   FLOOD            : 25326  
##  (Other)   :862369   (Other):621339   (Other)          :170878  
##    FATALITIES          INJURIES            PROPDMG          PROPDMGEXP    
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00          :465934  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00   K      :424665  
##  Median :  0.0000   Median :   0.0000   Median :   0.00   M      : 11330  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06   0      :   216  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50   B      :    40  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00   5      :    28  
##                                                           (Other):    84  
##     CROPDMG          CROPDMGEXP    
##  Min.   :  0.000          :618413  
##  1st Qu.:  0.000   K      :281832  
##  Median :  0.000   M      :  1994  
##  Mean   :  1.527   k      :    21  
##  3rd Qu.:  0.000   0      :    19  
##  Max.   :990.000   B      :     9  
##                    (Other):     9

The original dataset has 37 variables. However we only loaded the ones that are useful for our purposes.

The variables that were included in the dataset are:

For the purpose of this analysis, we performed the following transformations in the data frame: - Create one date field for the date the event. We didn’t take care of the time zone as we are only interested on the year in which the event occured.

Any value in the amount expression that’s not H, K, M or B will be considered an error and will be ignored.

library(stringr)
stormdata$PROPDMGEXP <- toupper(stormdata$PROPDMGEXP)
stormdata$CROPDMGEXP <- toupper(stormdata$CROPDMGEXP)
stormdata$EVTYPE <- str_trim(toupper(stormdata$EVTYPE))
calculateAmount <- function(a,exp) {ifelse (exp=="H", a*100,
        ifelse(exp=="K", a*1000,
        ifelse(exp=="M",a*1000000,
        ifelse(exp=="B",a*1000000000,a))))}
stormdata$PROPDMGAMT <- mapply(calculateAmount, stormdata$PROPDMG, stormdata$PROPDMGEXP)
stormdata$CROPDMGAMT <- mapply(calculateAmount, stormdata$CROPDMG, stormdata$CROPDMGEXP)
stormdata$begindate <- as.POSIXct(stormdata$BGN_DATE, tz ="GMT", format="%m/%d/%Y")
stormdata$year=format(stormdata$begindate, "%Y")

Our first impression on the event type field is that it’s manually input and any value can be entered (the types indicated in the documenation are not enforced). we decided to do some manual transformations on the data so that the events appear under the correct type.

stormdata$eventType <- stormdata$EVTYPE
stormdata$eventType[grepl("TSTM", stormdata$EVTYPE) 
                    & !grepl("MARINE", stormdata$EVTYPE)] <- "THUNDERSTORM WIND"
stormdata$eventType[grepl("TSTM", stormdata$EVTYPE) 
                    & grepl("MARINE", stormdata$EVTYPE)]  <- "MARINE THUNDERSTORM WIND"
stormdata$eventType[grepl("HURRICAN", stormdata$EVTYPE) 
                    | grepl("TYPHOON", stormdata$EVTYPE)] <- "HURRICANE (TYPHOON)"
stormdata$eventType[grepl("HAIL", stormdata$EVTYPE)] <- "HAIL"
stormdata$eventType[grepl("TORNADO", stormdata$EVTYPE)] <- "TORNADO"
stormdata$eventType[grepl("SURGE", stormdata$EVTYPE)
                    | grepl("TIDE", stormdata$EVTYPE)] <- "STORM SURGE/TIDE"
stormdata$eventType[grepl("FLASH", stormdata$EVTYPE)] <- "FLASH FLOOD"
stormdata$eventType[grepl("FLOOD", stormdata$EVTYPE) & 
                !grepl("FLASH", stormdata$EVTYPE) & !grepl("COASTAL", stormdata$EVTYPE)] <- "FLOOD"
stormdata$eventType[grepl("HEAT", stormdata$EVTYPE)] <- "HEAT"

Results

According to the weather events documentation (http://www.ncdc.noaa.gov/stormevents/details.jsp), there is no record for some types of events prior to 1996. So we will filter out any event that occured prior to 1996 so that we can compare the same period of time for all the event types.

To start the analysis, we calculated which type of event caused the more damage for fatalities, injuries and economic losses (property losses + crop losses).

library(knitr)
library(dplyr)
options(knitr.table.format = 'html')
stormdataFilt <- filter(stormdata, year >= 1996)
summaryByTypeFilt <- summarise(group_by(stormdataFilt,eventType), 
                           count=sum(!is.na(eventType)), 
                           total_fatalities=sum(FATALITIES), 
                           total_injuries=sum(INJURIES), 
                           total_damage_loss=(sum(PROPDMGAMT)+sum(CROPDMGAMT))/1000000000)
kable(arrange(select(top_n(summaryByTypeFilt,5,total_fatalities), eventType, count, total_fatalities),
        desc(total_fatalities)), 
        caption ="Top 5 Events that cause more fatalities")
Top 5 Events that cause more fatalities
eventType count total_fatalities
HEAT 2421 2036
TORNADO 23155 1511
FLASH FLOOD 51005 887
LIGHTNING 13204 651
FLOOD 24427 416
kable(arrange(select(top_n(summaryByTypeFilt,5,total_injuries), eventType, count, total_injuries),
        desc(total_injuries)), 
        caption ="Top 5 Events that cause more injuries")
Top 5 Events that cause more injuries
eventType count total_injuries
TORNADO 23155 20667
HEAT 2421 7683
FLOOD 24427 6760
THUNDERSTORM WIND 210137 5034
LIGHTNING 13204 4141
kable(arrange(select(top_n(summaryByTypeFilt,5,total_damage_loss), eventType, count, total_damage_loss),
        desc(total_damage_loss)), 
        caption ="Top 5 Events that cause more economic losses (in billions of dollars)")
Top 5 Events that cause more economic losses (in billions of dollars)
eventType count total_damage_loss
FLOOD 24427 149.09970
HURRICANE (TYPHOON) 271 87.06900
STORM SURGE/TIDE 680 47.84532
TORNADO 23155 24.90037
HAIL 209247 17.20109

We see that for fatalities and injuries the Top 5 event types are almost the same, but for economic losses the Top 5 event types are different. So we will analyze it separately.

Now let’s plot the fatalities, injuries and economic losses by type per year.

#Top fatalities plot 
library(ggplot2)
library(gridExtra)
theme_set(theme_grey(base_size = 6)) 
summaryByTypeYearFilt <- summarise(group_by(stormdataFilt,eventType,year), 
                               count=sum(!is.na(eventType)), 
                               total_fatalities=sum(FATALITIES), 
                               total_injuries=sum(INJURIES), 
                                total_damage_loss=(sum(PROPDMGAMT)+sum(CROPDMGAMT))/1000000)
#Top fatalities plot
p1 <-ggplot(data=summaryByTypeYearFilt[
        summaryByTypeYearFilt$eventType %in% top_n(summaryByTypeFilt,5,total_fatalities)$eventType,], 
        aes(x=year, y=total_fatalities, fill=eventType)) +
        geom_bar(stat="identity") +
        scale_x_discrete(breaks = c(1996, seq(from=2000,to=2011,by = 5)))+
        labs(x="",y="Total fatalities")+
        labs(title="Total fatalities per Event/Year")
#Top injuries plot
p2 <-ggplot(data=summaryByTypeYearFilt[
        summaryByTypeYearFilt$eventType %in% top_n(summaryByTypeFilt,5,total_injuries)$eventType,], 
        aes(x=year, y=total_fatalities, fill=eventType)) +
        geom_bar(stat="identity") +
        scale_x_discrete(breaks = c(1996, seq(from=2000,to=2011,by = 5)))+
        labs(x="",y="Total injuries")+
        labs(title="Total injuries per Event/Year")
#Top economic losses
p3 <-ggplot(data=summaryByTypeYearFilt[
        summaryByTypeYearFilt$eventType %in% top_n(summaryByTypeFilt,5,total_damage_loss)$eventType,], 
        aes(x=year, y=total_damage_loss, fill=eventType)) +
        geom_bar(stat="identity") +
        scale_x_discrete(breaks = seq(from=1950,to=2011,by = 10))+
        labs(x="",y="Total (in millions of dollars)")+
        labs(title="Total economic losses per Event/Year")
grid.arrange(p1, p2, p3, ncol=3)

For fatalities and injuries, we can see that FLOOD, FLASH FLOOD and LIGHTNING have about the same amount through the years. Hoewver, events like TORNADO and HEAT have some years with more fatalities or injuries, for example TORNADO events caused a high amount of injuries and fatalities on 2011.

For economic loss, HAIL and TORNADO events caused about the same amount of losses through the years. But TORNADO, HURRICANE and STORM SURGE had major events that elevated the amount of losses for some particular years.

Now we’d like to see which states are more affected by major weather events. We are going to calculate the top 10 states and events with the most injuries, fatalities and economic losses.

summaryByTypeState <- summarise(group_by(stormdataFilt,eventType_State=paste(STATE, " - " , eventType)), 
                           count=sum(!is.na(eventType)), 
                           total_fatalities=sum(FATALITIES), 
                           total_injuries=sum(INJURIES), 
                           total_damage_loss=(sum(PROPDMGAMT)+sum(CROPDMGAMT))/1000000)

kable(arrange(select(top_n(summaryByTypeState,10,total_fatalities), eventType_State, count, total_fatalities),
        desc(total_fatalities)), 
        caption ="Top 10 States affected by Events with more fatalities")
Top 10 States affected by Events with more fatalities
eventType_State count total_fatalities
IL - HEAT 167 357
AL - TORNADO 1079 335
PA - HEAT 96 308
TX - HEAT 151 285
MO - TORNADO 944 233
MO - HEAT 186 195
TN - TORNADO 674 184
TX - FLASH FLOOD 7739 151
FL - RIP CURRENT 195 149
CA - HEAT 147 118
kable(arrange(select(top_n(summaryByTypeState,10,total_injuries), eventType_State, count, total_injuries),
        desc(total_injuries)), 
        caption ="Top 10 States affected by Events with more injuries")
Top 10 States affected by Events with more injuries
eventType_State count total_injuries
TX - FLOOD 881 6338
MO - HEAT 186 3539
AL - TORNADO 1079 3231
TN - TORNADO 674 2071
MO - TORNADO 944 2059
OK - TORNADO 1172 1672
AR - TORNADO 985 1410
GA - TORNADO 621 1191
MS - TORNADO 919 898
FL - HURRICANE (TYPHOON) 60 811
kable(arrange(select(top_n(summaryByTypeState,10,total_damage_loss), eventType_State, count, total_damage_loss),
        desc(total_damage_loss)), 
        caption ="Top 10 States affected by Events with more economic losses (in millions of dollars)")
Top 10 States affected by Events with more economic losses (in millions of dollars)
eventType_State count total_damage_loss
CA - FLOOD 400 117041.640
LA - STORM SURGE/TIDE 48 31827.987
FL - HURRICANE (TYPHOON) 60 29765.125
LA - HURRICANE (TYPHOON) 27 22506.022
MS - HURRICANE (TYPHOON) 16 15692.906
MS - STORM SURGE/TIDE 13 11266.790
NC - HURRICANE (TYPHOON) 40 6974.851
TX - DROUGHT 432 6722.465
TX - TROPICAL STORM 57 5495.333
AL - TORNADO 1079 5032.391

From the tables above, it’s possible to determine which states require more resources to prevent/proceed in case of certain types of events. We can see that heat in Ilinois and tornados in Alabama, have caused the most fatalities. Also, floods in Texas and heat in Missouri have caused the most injuries. And finally California floods have caused the greatest economic losses during this period.