Synopsis

This analysis examines weather data from the National Oceanic and Atmospheric Administration (NOAA). The data list weather events, when they occurred, and several measures of adverse events associated with each event. For this analysis we are interested in the toll that weather events take in terms of injury, death, and property damage.

To examine which weather events are the most destructive, this analysis looks at the rates of injury, death, and property damage for all events since 1990, limited to those for which there are at least 500 occurences since 1990.

Data Processing

The raw data were available from a URL provided by the Coursera project instructions, and were downloaded using the download.file() function in R:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","~/Ed/R directorystorms.csv.bz2")

The data were then read into R using the read.csv() function:

storms <- read.csv("storms.csv.bz2",stringsAsFactors=F)

Once the data were in the dataframe called “storms”, the weather event text description was made to be all upper case letters, and the weather event date was changed into a proper date variable:

storms$EVTYPE <- toupper(storms$EVTYPE)
storms$BGN_DATE <- as.Date(storms$BGN_DATE,format="%m/%d/%Y")

Next, I subset the data to include only weather events that occured in 1990 or later. I chose this year because it seemed to be the right balance of recency and availability of data; much earlier than this the data got thin and things may have been sufficiently different (infrastructure, monetary value of property) that the results may have gotten skewed.

## Keep the records 1990 and later for fair comparison
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
storms2 <- storms[year(storms$BGN_DATE)>=1990,]

As inital exploratory analyses (not reproduced here), I examined the frequencies of the FATALITIES, INJURIES, and PROPDMG fields in the raw data. There were no missing values in these fields and all values seemed logical. I also examined EVTYPE (weather event type) feature in the data, looking at the unique values and their frequencies in the raw data (also not reproduced here). Noting similarities between many of the less frequent categories and the most frequent categories of EVTYPE, I combined similar categories using the grep() command:

## Combine similar categories
storms2$EVTYPE[grep("AVALAN",storms2$EVTYPE)]                     <- "AVALANCHE"
storms2$EVTYPE[grep("BLIZZARD|WINTER",storms2$EVTYPE)]            <- "BLIZZARD"
storms2$EVTYPE[grep("FLOOD|FLDG|FLD|URBAN|FLOOO",storms2$EVTYPE)] <- "FLOOD"
storms2$EVTYPE[grep("RAIN",storms2$EVTYPE)]                       <- "HEAVY RAIN"
storms2$EVTYPE[grep("FREEZING RAIN",storms2$EVTYPE)]              <- "FREEZING RAIN"
storms2$EVTYPE[grep("ICE|ICY",storms2$EVTYPE)]                    <- "ICE"
storms2$EVTYPE[grep("SNOW",storms2$EVTYPE)]                       <- "SNOW"
storms2$EVTYPE[grep("HAIL",storms2$EVTYPE)]                       <- "HAIL"
storms2$EVTYPE[grep("SURF",storms2$EVTYPE)]                       <- "SURF"
storms2$EVTYPE[grep("HIGH WIND",storms2$EVTYPE)]                  <- "HIGH WIND"
storms2$EVTYPE[grep("HURRICANE|TYPHOON",storms2$EVTYPE)]          <- "HURRICANE"
storms2$EVTYPE[grep("TROPICAL",storms2$EVTYPE)]                   <- "TROP STORM"
storms2$EVTYPE[grep("THUNDER|TSTM",storms2$EVTYPE)]               <- "THUNDER STORM"
storms2$EVTYPE[grep("LIGHTN|LIGHTI|LIGNT",storms2$EVTYPE)]        <- "LIGHTNING"
storms2$EVTYPE[grep("WIND|WND",storms2$EVTYPE)]                   <- "WIND"
storms2$EVTYPE[grep("TORN|FUNNEL",storms2$EVTYPE)]                <- "TORNADO"
storms2$EVTYPE[grep("FIRE",storms2$EVTYPE)]                       <- "WILDFIRE"
storms2$EVTYPE[grep("HEAT|HOT|HIGH TEMP",storms2$EVTYPE)]         <- "HEATWAVE"
storms2$EVTYPE[grep("SURGE",storms2$EVTYPE)]                      <- "STORM SURGE"
storms2$EVTYPE[grep("RIP",storms2$EVTYPE)]                        <- "RIPTIDE"
storms2$EVTYPE[grep("SWELL|SEA|TIDE",storms2$EVTYPE)]             <- "HI SEAS"
storms2$EVTYPE[grep("SPOUT",storms2$EVTYPE)]                      <- "WATERSPOUT"
storms2$EVTYPE[grep("SLIDE",storms2$EVTYPE)]                      <- "LANDSLIDE"
storms2$EVTYPE[grep("DUST",storms2$EVTYPE)]                       <- "DUSTSTORM"

Once this was complete, I set about aggregating the total numbers of injuries and deaths by weather event and the count of weather events themselves:

## Count injuries and deaths by event type and sum damage by event type
deaths   <- as.data.frame(with(storms2,aggregate(list(Deaths=FATALITIES),list(Event=EVTYPE),sum)))
injuries <- as.data.frame(with(storms2,aggregate(list(Injuries=INJURIES),list(Event=EVTYPE),sum)))
events   <- as.data.frame(table(storms2$EVTYPE)); colnames(events) <- c("Event","Count")

I next examined the codebook to understand how the damage variables work. Each damage numerical value (for both property and crop damage) must be multiplied by a number according to the secondary fields of PROPDMGEXP and CROPDMGEXP.

## Compute damage amounts from raw data as the sum of property and crop damage,
## adjusted by the correct units of dollars
storms2$PDMG[storms2$PROPDMG==0] <- 0  
storms2$PDMG[storms2$PROPDMGEXP=="H" | storms2$PROPDMGEXP=="h"] <-
    100*storms2$PROPDMG[storms2$PROPDMGEXP=="H" | storms2$PROPDMGEXP=="h"]

storms2$PDMG[storms2$PROPDMGEXP=="K" | storms2$PROPDMGEXP=="k"] <-
    1000*storms2$PROPDMG[storms2$PROPDMGEXP=="K" | storms2$PROPDMGEXP=="k"]

storms2$PDMG[storms2$PROPDMGEXP=="M" | storms2$PROPDMGEXP=="m"] <-
    1000000*storms2$PROPDMG[storms2$PROPDMGEXP=="M" | storms2$PROPDMGEXP=="m"]

storms2$PDMG[storms2$PROPDMGEXP=="B" | storms2$PROPDMGEXP=="b"] <-
    1000000000*storms2$PROPDMG[storms2$PROPDMGEXP=="B" | storms2$PROPDMGEXP=="b"]  

storms2$CDMG[storms2$CROPDMG==0] <- 0  
storms2$CDMG[storms2$CROPDMGEXP=="H" | storms2$CROPDMGEXP=="h"]  <-
    100*storms2$CROPDMG[storms2$CROPDMGEXP=="H" | storms2$CROPDMGEXP=="h"]

storms2$CDMG[storms2$CROPDMGEXP=="K" | storms2$CROPDMGEXP=="k"] <-
    1000*storms2$CROPDMG[storms2$CROPDMGEXP=="K" | storms2$CROPDMGEXP=="k"]

storms2$CDMG[storms2$CROPDMGEXP=="M" | storms2$CROPDMGEXP=="m"] <-
     1000000*storms2$CROPDMG[storms2$CROPDMGEXP=="M" | storms2$CROPDMGEXP=="m"]

storms2$CDMG[storms2$CROPDMGEXP=="B" | storms2$CROPDMGEXP=="b"] <-
     1000000000*storms2$CROPDMG[storms2$CROPDMGEXP=="B" | storms2$CROPDMGEXP=="b"]  

storms2$TDMG <- (storms2$PDMG + storms2$CDMG)/1000

damage <- as.data.frame(with(storms2,aggregate(list(Damage=TDMG),list(Event=EVTYPE),sum)))

Now I combined these counts and sums into one dataframe and computed rates for each outcome (injury, death, property damage):

## Combine weather events and casualty events 
casualties <- merge(deaths,injuries,by="Event") 
totaldam <- merge(casualties,damage,by="Event")
event_final <- merge(totaldam,events,by="Event")
event_final <- event_final[event_final$Count>=500,]

## Compute the rates
event_final$deathrate <- event_final$Deaths/event_final$Count
event_final$injuryrate <- event_final$Injuries/event_final$Count
event_final$damagerate <- event_final$Damage/event_final$Count

RESULTS

barplot(head(event_final[order(event_final$injuryrate,decreasing=T),7]),
        names.arg=head(event_final[order(event_final$injuryrate,decreasing=T),1]),
        ylab="Injury Rate (Injuries/Event)", main="Most Injurious Weather Events",
        cex.axis=0.8, cex.names=0.5, col="red", axes=T, las=2, ylim=c(0,3.6))

head(event_final[order(event_final$injuryrate,decreasing=T),c(1,5,3,7)])
##          Event Count Injuries injuryrate
## 74    HEATWAVE  2673     9224  3.4508043
## 59         FOG   538      734  1.3643123
## 89         ICE  2215     2195  0.9909707
## 45   DUSTSTORM   586      483  0.8242321
## 215    TORNADO 36799    26738  0.7265958
## 216 TROP STORM   757      383  0.5059445

This chart and the printed output show us that HEATWAVE, or abnormally high heat weather events are responsible for the greeatest number of injuries, on average, with nearly 3.5 injuries per abnormal heat event. The next most injurious weather event was FOG, with 1.36 injuries per event. These were followed by ICE, DUSTSTORM, TORNADO, and TROP STORM (tropical storm), all with injury rates between 0.5 and 1.0.

I next repeated the chart and the output call for the death rate.

barplot(head(event_final[order(event_final$deathrate,decreasing=T),6]),
        names.arg=head(event_final[order(event_final$deathrate,decreasing=T),1]),
        ylab="Mortality Rate (Deaths/Event)", main="Most Deadly Weather Events",
        cex.axis=0.8, cex.names=0.5, col="black", axes=T, las=2, ylim=c(0,1.4))

head(event_final[order(event_final$deathrate,decreasing=T),c(1,5,2,6)])
##            Event Count Deaths  deathrate
## 74      HEATWAVE  2673   3138 1.17396184
## 81       HI SEAS  1334    631 0.47301349
## 54  EXTREME COLD   657    162 0.24657534
## 212         SURF  1064    166 0.15601504
## 59           FOG   538     62 0.11524164
## 216   TROP STORM   757     66 0.08718626

Here we can see that not only is HEATWAVE the most injurious weather event, it is also the most deadly weather event, with a death rate of 1.17 people per heat event. FOG is in the top 6 most deadly weather events, but moves down to the #5 slot with a rate of 0.11 deaths per fog event. As in the injurious rankings, we see TROP STORM in the #6 spot, with a rate of 0.09 deaths per fog event. However, the remaining 3 of the top 6 are different, with HI SEAS, EXTREME COLD, and SURF filling the #2, #3, and #4 spots respectively. All death rates ranged between 0.09 deaths per event and 1.17 deaths per event, on average.

Finally, I examined what I call the damage rate, or the amount of property damage per weather event (measured in dollars/event).

barplot(head(event_final[order(event_final$damagerate,decreasing=T),8]),
        names.arg=head(event_final[order(event_final$damagerate,decreasing=T),1]),
        ylab="Damage Rate ($k/Event)", main="Most Expensive Weather Events",
        cex.axis=0.8, cex.names=0.5, col="dark green", axes=T, las=2, ylim=c(0,14000))

head(event_final[order(event_final$damagerate,decreasing=T),c(1,5,4,8)])
##            Event Count     Damage damagerate
## 216   TROP STORM   757  8411023.6 11110.9954
## 31       DROUGHT  2488 15018672.0  6036.4437
## 54  EXTREME COLD   657  1380710.4  2101.5379
## 238     WILDFIRE  4239  8899910.1  2099.5306
## 67  FROST/FREEZE  1343  1104666.0   822.5361
## 90     LANDSLIDE   646   346843.1   536.9088

FINAL NOTES

It should be noted that the event rates computed here do not take account of two important factors: time and area. A HEATWAVE event usually takes place over several days and in a large area, as do cold weather events, tropical storms, hurricances, and others. Some events may be unlisted in the rankings because they take place over a shorter time period and/or act over a smaller area, putting fewer people and less property at risk. A more thorough analysis should take these factors into account, potentially by computing the rankings within a higher level of weather event category, over geographic size categories, and by duration. As always, asking the right question and understanding the data are paramount to reaching good conclusions.