Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The objective of the project is to identify the types of storms or weather events that are the most harmful with respect to population health, and that have the greatest economic consequences.

Data Processing

Obtaining and Loading Source Data

Raw data is obtained from the National Weather Service. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

if(!file.exists("repdata-data-StormData.csv")) {
        if(!file.exists("repdata-data-StormData.csv.bz2")) {
                download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                              "repdata-data-StormData.csv.bz2",
                              method = "curl")
        }

        library(R.utils)
        bunzip2("repdata-data-StormData.csv.bz2")
        }

The storm data is loaded into a data frame for analysis.

wd <- read.csv("repdata-data-StormData.csv")

Cleaning EVTYPE

The EVTYPE column, upon inspection, is inconsistently capitalized and entries may have leading or trailing whitespace. A quick cleaning is possible with str_trim() and toupper().

At the same time, a YEAR column (based on the beginning of the event) is added for convenience.

library(plyr)
library(stringr)
wd_cleaned <- mutate(wd,EVTYPE=toupper(str_trim(EVTYPE)), YEAR=format(strptime(BGN_DATE,format="%m/%d/%Y %T"),format="%Y"))

Results

Human casualties

To determine which types of events cause the greatest number of casualties, the events are grouped by event type and the total number of fatalities and injuries are tallied.

library(plyr)
evtype_total_casualties <- ddply(wd_cleaned,.(EVTYPE),
                                 summarize,
                                 totalFatalities=sum(FATALITIES),
                                 totalInjuries=sum(INJURIES),
                                 totalCasualties=sum(FATALITIES+INJURIES))
casualties_sorted <- evtype_total_casualties[order(evtype_total_casualties[,"totalCasualties"],
                                                   decreasing=TRUE),]
print(casualties_sorted[1:10,])
##                EVTYPE totalFatalities totalInjuries totalCasualties
## 750           TORNADO            5633         91346           96979
## 108    EXCESSIVE HEAT            1903          6525            8428
## 771         TSTM WIND             504          6957            7461
## 146             FLOOD             470          6789            7259
## 410         LIGHTNING             816          5230            6046
## 235              HEAT             937          2100            3037
## 130       FLASH FLOOD             978          1777            2755
## 379         ICE STORM              89          1975            2064
## 677 THUNDERSTORM WIND             133          1488            1621
## 880      WINTER STORM             206          1321            1527

Here we can see that tornadoes result in the most casualties among weather event types, with excessive heat a distant second.

annual_casualties <- ddply(wd_cleaned,.(EVTYPE,YEAR),
                                 summarize,
                                 totalFatalities=sum(FATALITIES),
                                 totalInjuries=sum(INJURIES),
                                 totalCasualties=sum(FATALITIES+INJURIES))
tornado_casualties <- annual_casualties[annual_casualties$EVTYPE == "TORNADO",]
library(ggplot2)
qplot(x = YEAR,totalCasualties,data=tornado_casualties,geom="point",xlab="Year",ylab="Casualties from Tornadoes")

plot of chunk unnamed-chunk-6

Damage to Property and Crops

Property and crop damage is noted in the data along with a multiplier (K or M). The multiplier is applied to determine the actual estimated damage for each event.

wd_damages <- mutate(wd_cleaned,PropDmg = PROPDMG * ifelse(PROPDMGEXP == "K",1000,ifelse(PROPDMGEXP=="M",1000000,1)),CropDmg = CROPDMG * ifelse(CROPDMGEXP == "K",1000,ifelse(CROPDMGEXP=="M",1000000,1)))

To determine the most damaging types of events, we sum up property and crop damage.

evtype_total_damages <- ddply(wd_damages,.(EVTYPE),summarize,
                       totalPropDmg=sum(PropDmg),
                       totalCropDmg=sum(CropDmg),
                       totalDmg = sum(PropDmg,CropDmg))
damages_sorted <- evtype_total_damages[order(evtype_total_damages[,"totalDmg"],
                                             decreasing=TRUE),]
print(damages_sorted[1:10,])
##                EVTYPE totalPropDmg totalCropDmg  totalDmg
## 750           TORNADO    5.163e+10    4.150e+08 5.204e+10
## 146             FLOOD    2.216e+10    5.662e+09 2.782e+10
## 204              HAIL    1.393e+10    3.026e+09 1.695e+10
## 130       FLASH FLOOD    1.514e+10    1.421e+09 1.656e+10
## 76            DROUGHT    1.046e+09    1.247e+10 1.352e+10
## 355         HURRICANE    6.168e+09    2.742e+09 8.910e+09
## 771         TSTM WIND    4.493e+09    5.540e+08 5.047e+09
## 364 HURRICANE/TYPHOON    3.806e+09    1.098e+09 4.904e+09
## 312         HIGH WIND    3.970e+09    6.386e+08 4.609e+09
## 867          WILDFIRE    3.725e+09    2.955e+08 4.021e+09