Synopsis

Weather events and their impact on human health and economy in the US have been recorded by NOAA. NOAA has made available the data from 1950 to 2011 for analysis. This report presents the analysis of this data with the objective of understanding the cost of these weather events on human health and economy.

The data contains a record of approximately a million events over six decades. The data is explored by year of event and event type to identify high impact event types. The impact is measured through fatality count and injury count to asses the human health cost and through property and crop damage to asses economic cost.

The analysis has shown that twelve event types are responsible for 85% of the fatalities. The event types - Tornado, excessive heat/heat and flash flood/flood, are responsible for over 10,000 deaths out of the total 15,000 deaths during the analysis period.

On economic front, the top 9 event types account for over 90% of the damage to property and crops. Again, Tornado and flash flood are shown to be the primary causes of damage. In addition, hail and wind also cause significant damage.

A further analysis of geographical/localised aspect of certain weather events is recommended which is not included in this project due to time constraints.

Data Processing

The data is available in the form of CSV files. The file is loaded using read.csv function.

The variables included in this dataset that are of interest in the current analysis are

  1. BGN_DATE: Date the event started. This is stored as date datatype after loading into the dataframe.
  2. EVTYPE: The event type such as Tornado, Hail, Flood etc.
  3. FATALITIES: Number of death attributable to the event
  4. INJURIES: Number of persons injured during the event
  5. PROPDMG: Property damage in US$
  6. CROPDMG: Crop damage in US$
  7. BGN_YEAR: Year derived from the variable BGN_DATE
  8. DAMAGE: sum of PROPDMG and CROPDMG for the event
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lattice)
## Warning: package 'lattice' was built under R version 3.1.3
StormData <- read.csv("repdata_data_StormData.csv", sep=",", header=TRUE)
StormData$BGN_DATE <- as.Date(as.character(StormData$BGN_DATE), "%m/%d/%Y")
StormData$BGN_YEAR <- as.numeric(as.character(StormData$BGN_DATE, "%Y"))
StormData$DAMAGE <- StormData[,"PROPDMG"] + StormData[,"CROPDMG"]

print(paste("Total events   ", nrow(StormData), sep=":"))
## [1] "Total events   :902297"
print(paste("Fatalities     ", sum(StormData$FATALITIES), sep=":"))
## [1] "Fatalities     :15145"
print(paste("Injuries       ", sum(StormData$INJURIES), sep=":"))
## [1] "Injuries       :140528"
print(paste("Property Damage", sum(StormData$PROPDMG), sep=":"))
## [1] "Property Damage:10884500.01"
print(paste("Crop Damage    ", sum(StormData$CROPDMG), sep=":"))
## [1] "Crop Damage    :1377827.32"

Exploratory analysis for human cost

The data is summarised by year and event type for exploratory analysis of human cost.

StormDataSummaryByYear <- aggregate(StormData[, c("FATALITIES","INJURIES")]
                                    , list(StormData[,"BGN_YEAR"], StormData[,"EVTYPE"] )
                                    , sum
                                    , na.rm=TRUE
                                    ) 
colnames(StormDataSummaryByYear) <- c("BGN_YEAR", "EVTYPE", "FATALITIES", "INJURIES")

StormDataSummaryHealthByType <- aggregate(StormDataSummaryByYear[, c("FATALITIES","INJURIES")]
                                    , list(StormDataSummaryByYear[,"EVTYPE"])
                                    , sum
                                    , na.rm=TRUE
                                    ) 
colnames(StormDataSummaryHealthByType) <- c("EVTYPE", "FATALITIES", "INJURIES")
StormDataSummaryHealthByType <- StormDataSummaryHealthByType[order(StormDataSummaryHealthByType$FATALITIES, decreasing=TRUE),]

The top 16 event types by fatality are

print(StormDataSummaryHealthByType[1:16, c("EVTYPE", "FATALITIES","INJURIES") ])
##                EVTYPE FATALITIES INJURIES
## 834           TORNADO       5633    91346
## 130    EXCESSIVE HEAT       1903     6525
## 153       FLASH FLOOD        978     1777
## 275              HEAT        937     2100
## 464         LIGHTNING        816     5230
## 856         TSTM WIND        504     6957
## 170             FLOOD        470     6789
## 585       RIP CURRENT        368      232
## 359         HIGH WIND        248     1137
## 19          AVALANCHE        224      170
## 972      WINTER STORM        206     1321
## 586      RIP CURRENTS        204      297
## 278         HEAT WAVE        172      309
## 140      EXTREME COLD        160      231
## 760 THUNDERSTORM WIND        133     1488
## 310        HEAVY SNOW        127     1021

Analysis of human cost of weather events

This analysis is focussed on the top 9 event types.

EVTYPE <- as.data.frame(StormDataSummaryHealthByType[1:9,"EVTYPE"])
colnames(EVTYPE) <- c("EVTYPE")
EvtypeStormDataHealth <- merge(EVTYPE, StormData, by="EVTYPE")
EvtypeStormDataSummaryByYear <-  aggregate(EvtypeStormDataHealth[, c("FATALITIES","INJURIES")]
                                    , list(EvtypeStormDataHealth[,"EVTYPE"], EvtypeStormDataHealth[,"BGN_YEAR"])
                                    , sum
                                    , na.rm=TRUE
                                    )
colnames(EvtypeStormDataSummaryByYear) <- c("EVTYPE", "BGN_YEAR", "FATALITIES", "INJURIES")

The top 9 event types account for 11857 out of the total 15145 fatalities.

PlotH <- xyplot(FATALITIES + INJURIES ~ BGN_YEAR | EVTYPE, data = EvtypeStormDataSummaryByYear, col=c("red","blue"), type="l" ,
    ylab=list("Casualty count", cex=1.15), xlab= list("Year", cex=1.15), main=list("Casualty due to weather", cex=2),
    layout=c(3,3),
    key=simpleKey(c("Fatalities (in red)", "Injuries (in blue)"),  columns=2, points=FALSE, col=c("red","blue")),
    scales=list(
    y=list(
    log=TRUE,
    limits=c(1,3000),
    at=c(1,10,30,100,300,1000,2000),
    labels=c(1,10,30,100,300,1000,2000) 
    ) )
)

print(PlotH)

Exploration for impact of economic cost

The data is summarised by year and event type for exploratory analysis of economic cost.

StormDataDamageSummaryByYear <- aggregate(StormData[, c("DAMAGE")]
                                    , list(StormData[,"BGN_YEAR"], StormData[,"EVTYPE"] )
                                    , sum
                                    , na.rm=TRUE
                                    ) 
colnames(StormDataDamageSummaryByYear) <- c("BGN_YEAR", "EVTYPE", "DAMAGE")

StormDataDamageSummaryByType <- aggregate(StormDataDamageSummaryByYear[, c("DAMAGE")]
                                    , list(StormDataDamageSummaryByYear[,"EVTYPE"])
                                    , sum
                                    , na.rm=TRUE
                                    ) 
colnames(StormDataDamageSummaryByType) <- c("EVTYPE", "DAMAGE")
StormDataDamageSummaryByType <- StormDataDamageSummaryByType[order(StormDataDamageSummaryByType$DAMAGE, decreasing=TRUE),]

print(StormDataDamageSummaryByType[1:15,])
##                 EVTYPE     DAMAGE
## 834            TORNADO 3312276.68
## 153        FLASH FLOOD 1599325.05
## 856          TSTM WIND 1445168.21
## 244               HAIL 1268289.66
## 170              FLOOD 1067976.36
## 760  THUNDERSTORM WIND  943635.62
## 464          LIGHTNING  606932.39
## 786 THUNDERSTORM WINDS  464978.11
## 359          HIGH WIND  342014.77
## 972       WINTER STORM  134699.58
## 310         HEAVY SNOW  124417.71
## 957           WILDFIRE   88823.54
## 427          ICE STORM   67689.62
## 676        STRONG WIND   64610.71
## 290         HEAVY RAIN   61964.94

The top 9 event types account for US$ 11050596.85 out of the total cost of US$ 12262327.33.

Analysis of top 9 event types impacting economic cost
EVTYPE <- as.data.frame(StormDataDamageSummaryByType[1:9,"EVTYPE"])
colnames(EVTYPE) <- c("EVTYPE")
EvtypeStormDataDamage = merge(EVTYPE, StormData, by="EVTYPE")
EvtypeStormDataDamageSummaryByYear <- aggregate(EvtypeStormDataDamage[, c("DAMAGE")]
                                    , list(EvtypeStormDataDamage[,"BGN_YEAR"], EvtypeStormDataDamage[,"EVTYPE"] )
                                    , sum
                                    , na.rm=TRUE
                                    ) 
colnames(EvtypeStormDataDamageSummaryByYear) <- c("BGN_YEAR", "EVTYPE", "DAMAGE")
    
PlotE <- xyplot(
    DAMAGE ~ BGN_YEAR | EVTYPE, data = EvtypeStormDataDamageSummaryByYear, col=c("red"), type="l" ,
    ylab=list("Economic Impact in US$", cex=1.15), xlab=list("Year", cex=1.15), main=list("Economic Impact of weather",cex=2),
    layout=c(3,3),
    scales=list(
        y=list(
            log=TRUE,
            limits=c(10000,700000),
            at=c(10000,30000,100000,300000,600000),
            labels=c("10K","30K","100K","300K","600K") 
            )
        )
)

print(PlotE)

Results

Impact on human health

The analysis of weather event data by event type, year and number of fatality/injury show that tornadoes account for over a third of casualties. Tornadoes have also been well recorded throughout the analysis period, may be due to better understanding of their destructive power. While the number of deaths are still high at 5633 for Tornado compared to other events, they are only 6 percent of the injuries caused which is 91346. The underlying reasons for lower death rate need to be investigated which could be better preparation and response in the event of tornado or significant difference in tornadoes themselves. In any case, given the large impact even a 10 percent reduction in tornado impact makes a huge difference.

Excessive Heat/Heat cause 2,840 deaths and 8,625 injuries. Data is only available from the 1990s showing only recent understanding of the event. Also, deaths form a significant part of injury. A similar pattern emerges for flash flood.

Another event that requires attention is rip current that have more deaths than injury.

Impact on economy

The analysis of weather events for economic impact shows increasing impact of tornadoes. Of late, winds (winds with or without thunderstorms) are affecting as much as tornadoes. As expected, flood and flash flood damage is increasing at a significant pace and now almost reach same proportion as tornadoes.

Hail, while not significantly affecting human health, has affected crops. but this damage has remained approximately at same level over the last 2 decades.

Together these events are responsible for 90% of all damages.