Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Based on our analysis, tornado is the most harmful weather events, as it caused most injuries, fatalities, and propety damage.
Load the necessary packages.
library(ggplot2)
library(plyr)
Download the dataset and read it into R.
setwd('C:/Coursera/5. Reproducible Research/project 2')
FileName <- "./repdata-data-StormData.csv.bz2"
if (!file.exists(FileName))
{
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url, destfile = FileName)
}
raw <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE)
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. Thus, we take the records from 1990 and later for our analysis.
raw$year <- as.numeric(format(as.Date(raw$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
storm <- subset(raw, year>=1995)
Next, we subset the dataset in order to find the tem most harmful events with respect to public health. Measurement is taken based on the fatalities and injuries these events cause.
fatalities <- subset(storm, FATALITIES>0)
total.fatalities <- aggregate(fatalities$FATALITIES, by = list(fatalities$EVTYPE), FUN = sum)
colnames(total.fatalities) <- c('Event', 'Fatalities')
total.fatalities <- total.fatalities[order(total.fatalities$Fatalities, decreasing = TRUE), ]
injuries <- subset(storm, INJURIES>0)
total.injuries <- aggregate(injuries$INJURIES, by = list(injuries$EVTYPE), FUN = sum)
colnames(total.injuries) <- c('Event', 'Injuries')
total.injuries <- total.injuries[order(total.injuries$Injuries, decreasing = TRUE), ]
In a similar manner, we find the ten most harmful events with respect to economic consequences. Measurement is taken based on damaged on property and crop.
property <- subset(storm, PROPDMG>0)
damage.property <- aggregate(property$PROPDMG, by = list(property$EVTYPE), FUN = sum)
colnames(damage.property) <- c('Event', 'Property_Damage')
damage.property <- damage.property[order(damage.property$Property_Damage, decreasing = TRUE),]
crop <- subset(storm, CROPDMG>0)
damage.crop <- aggregate(crop$CROPDMG, by = list(crop$EVTYPE), FUN = sum)
colnames(damage.crop) <- c('Event', 'Crop_Damage')
damage.crop <- damage.crop[order(damage.crop$Crop_Damage, decreasing = TRUE), ]
Among the ten most harmful events, excessive heat caused most fatalities, while tonado caused most injuries.
total.fatalities[1:10, ]
## Event Fatalities
## 24 EXCESSIVE HEAT 1903
## 122 TORNADO 1545
## 33 FLASH FLOOD 934
## 49 HEAT 924
## 84 LIGHTNING 729
## 36 FLOOD 423
## 98 RIP CURRENT 360
## 66 HIGH WIND 241
## 124 TSTM WIND 241
## 1 AVALANCHE 223
qplot(Event, data = total.fatalities[1:10, ], weight = Fatalities, geom = 'bar') + ggtitle('Fatalities caused by top 10 severe weather events in the US') + theme(axis.text.x = element_text(angle = 45, hjust = 1))
total.injuries[1:10, ]
## Event Injuries
## 110 TORNADO 21765
## 29 FLOOD 6769
## 19 EXCESSIVE HEAT 6525
## 71 LIGHTNING 4631
## 115 TSTM WIND 3630
## 44 HEAT 2030
## 27 FLASH FLOOD 1734
## 105 THUNDERSTORM WIND 1426
## 130 WINTER STORM 1298
## 64 HURRICANE/TYPHOON 1275
qplot(Event, data = total.injuries[1:10, ], weight = Injuries, geom = 'bar') + ggtitle('Injuries caused by top 10 severe weather events in the US') + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Regarding to economic consequences, tornado caused most damage to properties, while hail caused most damage to crops.
damage.property[1:10, ]
## Event Property_Damage
## 254 TSTM WIND 1333343.6
## 44 FLASH FLOOD 1281800.5
## 244 TORNADO 1238637.6
## 215 THUNDERSTORM WIND 862964.7
## 51 FLOOD 838690.7
## 86 HAIL 597641.9
## 155 LIGHTNING 513832.1
## 121 HIGH WIND 315549.6
## 297 WINTER STORM 127119.6
## 226 THUNDERSTORM WINDS 107469.6
damage.property[1:10, ]
## Event Property_Damage
## 254 TSTM WIND 1333343.6
## 44 FLASH FLOOD 1281800.5
## 244 TORNADO 1238637.6
## 215 THUNDERSTORM WIND 862964.7
## 51 FLOOD 838690.7
## 86 HAIL 597641.9
## 155 LIGHTNING 513832.1
## 121 HIGH WIND 315549.6
## 297 WINTER STORM 127119.6
## 226 THUNDERSTORM WINDS 107469.6