Hurricanes, Tornadoes and Droughts: Nature's most destructive events

Synopsis

This is an analysis of the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database of major storms and weather events in the United States (from 1950 to November, 2011) to identify which events have caused the most damage to human life, property and crops.

The investigation finds that Hurricanes, Floods and Storms have damaged property and crops the most followed by Drought as the second most destructive. Loss and injury to human life has been caused most by Tornadoes and then Excessive Heat. Flood follows as the third most damaging event type.

Data Processing

We start by reading the file into a data frame 'stormd'. Next we load the data.table package and convert 'stormd' and set the column 'REFNUM' as the key.

Since we are only analysing the economic and humna cost of the events we can drop all rows which have zero fatalities, injuries, crop or property damage reported.

stormd <- read.csv("repdata-data-StormData.csv.bz2")

library(data.table)

stormd <- data.table(stormd)
setkey(stormd, REFNUM)

stormd <- stormd[(FATALITIES > 0 | INJURIES >0 | PROPDMG > 0 | CROPDMG >0)]

Results

The recording of the types of weather events has not followed the key guidelines and as a result the same type of event has been recorded differently in different cases leading to a bit of a jumble.

For instance Thunderstorm Winds have been recorded seperately as 'Thunderstorm Wind', 'Thunderstorm Winds' also 'TSTM Wind'.

Let's have a look at the 15 most occured events to get a sense of the data. We want to count the occurance of each 'EVTYPE' so we insert a new 'count' column into 'stormd' and sum it up grouped by 'EVTYPE'. The new data table is ordered in descending and the first 15 rows are subsetted.

Before plotting the bar graph we create a new pallette of colours 'cpal'.

Using 'ggplot2' package we plot the bar graph.

library(data.table)

stormd <- data.table(stormd, count=c(1))
sumeve <- stormd[,list(sum=sum(count)), by=EVTYPE]
sumeve <- sumeve[order(sum, decreasing =TRUE),]
sumeve_t15 <- sumeve[1:15]

cpal <- c("#6B1D00", "#912700", "#593D9B", "#422091", "#087945", "#139259", "#AF970B", "#E3CC48", "#004A28", "#E37248", "#D4B91C", "#AF370B", "#240963", "#321378", "#D44E1C", "#917C00", "#006437", "#190449", "#6B5C00", "#329D6C")

cpal <- rep(cpal, 10)

library(ggplot2)
library(grid)

ggplot(sumeve_t15, aes(x=EVTYPE, y=sum)) + geom_bar(aes(fill=EVTYPE), stat="identity") + theme(axis.text.x  = element_text(angle=90, vjust=0.5)) + guides(fill=FALSE) + labs(x="Event Type", y="Occurrence", title="15 Most Occured Events") + scale_fill_manual(values=cpal)

plot of chunk unnamed-chunk-2

The most occured severe weather events seem to be Thunderstorms. We also have associated events like Flood, Flash Flood, Lightning and Heavy Rain in the Wet Weather group.

Tornadoes are the second most occuring event type.

List: 15 most occuring severe weather events

sumeve_t15
##                 EVTYPE   sum
##  1:          TSTM WIND 63234
##  2:  THUNDERSTORM WIND 43655
##  3:            TORNADO 39944
##  4:               HAIL 26130
##  5:        FLASH FLOOD 20967
##  6:          LIGHTNING 13293
##  7: THUNDERSTORM WINDS 12086
##  8:              FLOOD 10175
##  9:          HIGH WIND  5522
## 10:        STRONG WIND  3370
## 11:       WINTER STORM  1508
## 12:         HEAVY SNOW  1342
## 13:         HEAVY RAIN  1105
## 14:           WILDFIRE   857
## 15:          ICE STORM   708

Human cost of severe weather events

Deaths and injuries have been recorded in the 'FATALITIES' and 'INJURIES' columns. Let's look at the events which contributed to the 40 highest fatalities and injuries each to identify the most life threatning events.

We order the fatalities data in descending order and subset the top 40 and do the same for injuries. Next we combine both into the data table 'fatinj'

Using ggplot2 package we plot a stacked bar chart.

stormd <- stormd[order(FATALITIES, decreasing=TRUE),]
top_fat <- stormd[1:40]
stormd <- stormd[order(INJURIES, decreasing=TRUE),]
top_inj <- stormd[1:40]

fatinj <- rbind(top_fat, top_inj)

p1 <- ggplot(fatinj, aes(x = factor(1), fill = factor(EVTYPE))) + geom_bar(width = 1) + labs(x="Event Type", y="Count", title="The 80 most fatal and injurious events") + scale_fill_manual(values=cpal) + guides(fill=guide_legend(title="Event Type"))

p1

plot of chunk unnamed-chunk-4

Tornadoes and Excessive Heat are the two biggest causes of fatalities and injuries. This is followed by Flood.

List: 80 events which caused the most fatalities and injuries

fatinj.l <- fatinj[,list(Count=sum(count)), by=EVTYPE]
fatinj.l[order(Count, decreasing=TRUE)]
##                        EVTYPE Count
##  1:                   TORNADO    49
##  2:            EXCESSIVE HEAT    13
##  3:                     FLOOD     8
##  4:                      HEAT     2
##  5:         HURRICANE/TYPHOON     2
##  6:              EXTREME HEAT     1
##  7:                 HEAT WAVE     1
##  8:                   TSUNAMI     1
##  9: UNSEASONABLY WARM AND DRY     1
## 10:                 ICE STORM     1
## 11:                  BLIZZARD     1

Economic cost of severe weather events

Let's have a look at the events which caused the most property and crop damage. Again we order the data in descending with damages in the billions on top followed by millions. We subset the top 40 and combine both into 'prpcro'.

Using ggplot2 package we plot a stacked bar chart.

top_prp <- stormd[(PROPDMGEXP == "B" | PROPDMGEXP == "M")]
top_prp <- top_prp[order(PROPDMGEXP, -PROPDMG)]
top_prp <- top_prp[1:40]

top_cro <- stormd[(CROPDMGEXP == "B" | CROPDMGEXP == "M")]
top_cro <- top_cro[order(CROPDMGEXP, -CROPDMG)]
top_cro <- top_cro[1:40]

prpcro <- rbind(top_prp, top_cro)

p2 <- ggplot(prpcro, aes(x = factor(1), fill = factor(EVTYPE))) + geom_bar(width = 1) + labs(x="Event Type", y="Count", title="The 80 most property and crop damaging events") + scale_fill_manual(values=cpal) + guides(fill=guide_legend(title="Event Type")) + theme(plot.margin=unit(c(3,2,3,2),"lines"), legend.key.height = unit(0.7, "lines") )

p2

plot of chunk unnamed-chunk-6

The most damage has been caused by hurricanes then storms and floods. After this comes Drought. Tornadoes also feature in the list though majorly it's wet event types which have been the most harmful.

The 80 events which caused the most property and crop damage

prpcro.l <- prpcro[,list(Count=sum(count)), by=EVTYPE]
prpcro.l[order(Count, decreasing=TRUE)]
##                         EVTYPE Count
##  1:                    DROUGHT    18
##  2:          HURRICANE/TYPHOON    15
##  3:                      FLOOD    10
##  4:                  HURRICANE     7
##  5:                    TORNADO     3
##  6:                STORM SURGE     2
##  7:             TROPICAL STORM     2
##  8:                RIVER FLOOD     2
##  9:             HURRICANE OPAL     2
## 10:               EXTREME COLD     2
## 11:               WINTER STORM     1
## 12:           STORM SURGE/TIDE     1
## 13:  HEAVY RAIN/SEVERE WEATHER     1
## 14:                       HAIL     1
## 15: TORNADOES, TSTM WIND, HAIL     1
## 16:           WILD/FOREST FIRE     1
## 17:                  HIGH WIND     1
## 18:        SEVERE THUNDERSTORM     1
## 19:                   WILDFIRE     1
## 20:                FLASH FLOOD     1
## 21:  HURRICANE OPAL/HIGH WINDS     1
## 22:                  ICE STORM     1
## 23:                       HEAT     1
## 24:                     FREEZE     1
## 25:             EXCESSIVE HEAT     1
## 26:               FROST/FREEZE     1
## 27:            DAMAGING FREEZE     1
##                         EVTYPE Count