Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Based on our analysis, tornado is the most harmful weather events, as it caused most injuries, fatalities, and propety damage.

Data Processing

Load the necessary packages.

library(ggplot2)
library(plyr)

Download the dataset and read it into R.

setwd('C:/Coursera/5. Reproducible Research/project 2')
FileName <- "./repdata-data-StormData.csv.bz2"
if (!file.exists(FileName))
{
  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(url = url, destfile = FileName)
}
raw <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE)

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. Thus, we take the records from 1990 and later for our analysis.

raw$year <- as.numeric(format(as.Date(raw$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
storm <- subset(raw, year>=1995)

Next, we subset the dataset in order to find the tem most harmful events with respect to public health. Measurement is taken based on the fatalities and injuries these events cause.

fatalities <- subset(storm, FATALITIES>0)
total.fatalities <- aggregate(fatalities$FATALITIES, by = list(fatalities$EVTYPE), FUN = sum)
colnames(total.fatalities) <- c('Event', 'Fatalities')
total.fatalities <- total.fatalities[order(total.fatalities$Fatalities, decreasing = TRUE), ]

injuries <- subset(storm, INJURIES>0)
total.injuries <- aggregate(injuries$INJURIES, by = list(injuries$EVTYPE), FUN = sum)
colnames(total.injuries) <- c('Event', 'Injuries')
total.injuries <- total.injuries[order(total.injuries$Injuries, decreasing = TRUE), ]

Events that have the greatest economic consequences

In a similar manner, we find the ten most harmful events with respect to economic consequences. Measurement is taken based on damaged on property and crop.

property <- subset(storm, PROPDMG>0)
damage.property <- aggregate(property$PROPDMG, by = list(property$EVTYPE), FUN = sum)
colnames(damage.property) <- c('Event', 'Property_Damage')
damage.property <- damage.property[order(damage.property$Property_Damage, decreasing = TRUE),]

crop <- subset(storm, CROPDMG>0)
damage.crop <- aggregate(crop$CROPDMG, by = list(crop$EVTYPE), FUN = sum)
colnames(damage.crop) <- c('Event', 'Crop_Damage')
damage.crop <- damage.crop[order(damage.crop$Crop_Damage, decreasing = TRUE), ]

Results

Among the ten most harmful events, excessive heat caused most fatalities, while tonado caused most injuries.

total.fatalities[1:10, ]
##              Event Fatalities
## 24  EXCESSIVE HEAT       1903
## 122        TORNADO       1545
## 33     FLASH FLOOD        934
## 49            HEAT        924
## 84       LIGHTNING        729
## 36           FLOOD        423
## 98     RIP CURRENT        360
## 66       HIGH WIND        241
## 124      TSTM WIND        241
## 1        AVALANCHE        223
qplot(Event, data = total.fatalities[1:10, ], weight = Fatalities, geom = 'bar') + ggtitle('Fatalities caused by top 10 severe weather events in the US') + theme(axis.text.x = element_text(angle = 45, hjust = 1))

total.injuries[1:10, ]
##                 Event Injuries
## 110           TORNADO    21765
## 29              FLOOD     6769
## 19     EXCESSIVE HEAT     6525
## 71          LIGHTNING     4631
## 115         TSTM WIND     3630
## 44               HEAT     2030
## 27        FLASH FLOOD     1734
## 105 THUNDERSTORM WIND     1426
## 130      WINTER STORM     1298
## 64  HURRICANE/TYPHOON     1275
qplot(Event, data = total.injuries[1:10, ], weight = Injuries, geom = 'bar') + ggtitle('Injuries caused by top 10 severe weather events in the US') + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Regarding to economic consequences, tornado caused most damage to properties, while hail caused most damage to crops.

damage.property[1:10, ]
##                  Event Property_Damage
## 254          TSTM WIND       1333343.6
## 44         FLASH FLOOD       1281800.5
## 244            TORNADO       1238637.6
## 215  THUNDERSTORM WIND        862964.7
## 51               FLOOD        838690.7
## 86                HAIL        597641.9
## 155          LIGHTNING        513832.1
## 121          HIGH WIND        315549.6
## 297       WINTER STORM        127119.6
## 226 THUNDERSTORM WINDS        107469.6
damage.property[1:10, ]
##                  Event Property_Damage
## 254          TSTM WIND       1333343.6
## 44         FLASH FLOOD       1281800.5
## 244            TORNADO       1238637.6
## 215  THUNDERSTORM WIND        862964.7
## 51               FLOOD        838690.7
## 86                HAIL        597641.9
## 155          LIGHTNING        513832.1
## 121          HIGH WIND        315549.6
## 297       WINTER STORM        127119.6
## 226 THUNDERSTORM WINDS        107469.6