US Storm Data Brian Baquiran October 24, 2014 Synopsis Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The objective of the project is to identify the types of storms or weather events that are the most harmful with respect to population health, and that have the greatest economic consequences.
Data Processing Obtaining and Loading Source Data Raw data is obtained from the National Weather Service. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
if(!file.exists("repdata-data-StormData.csv")) {
if(!file.exists("repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"repdata-data-StormData.csv.bz2",
method = "auto")
}
library(R.utils)
bunzip2("repdata-data-StormData.csv.bz2")
}
The storm data is loaded into a data frame for analysis.
wd <- read.csv("repdata-data-StormData.csv")
Cleaning EVTYPE The EVTYPE column, upon inspection, is inconsistently capitalized and entries may have leading or trailing whitespace. A quick cleaning is possible with str_trim() and toupper().
At the same time, a YEAR column (based on the beginning of the event) is added for convenience.
library(plyr)
library(stringr)
wd_cleaned <- mutate(wd,EVTYPE=toupper(str_trim(EVTYPE)), YEAR=format(strptime(BGN_DATE,format="%m/%d/%Y %T"),format="%Y"))
Results Human casualties To determine which types of events cause the greatest number of casualties, the events are grouped by event type and the total number of fatalities and injuries are tallied.
library(plyr)
evtype_total_casualties <- ddply(wd_cleaned,.(EVTYPE),
summarize,
totalFatalities=sum(FATALITIES),
totalInjuries=sum(INJURIES),
totalCasualties=sum(FATALITIES+INJURIES))
casualties_sorted <- evtype_total_casualties[order(evtype_total_casualties[,"totalCasualties"],
decreasing=TRUE),]
print(casualties_sorted[1:10,])
## EVTYPE totalFatalities totalInjuries totalCasualties
## 750 TORNADO 5633 91346 96979
## 108 EXCESSIVE HEAT 1903 6525 8428
## 771 TSTM WIND 504 6957 7461
## 146 FLOOD 470 6789 7259
## 410 LIGHTNING 816 5230 6046
## 235 HEAT 937 2100 3037
## 130 FLASH FLOOD 978 1777 2755
## 379 ICE STORM 89 1975 2064
## 677 THUNDERSTORM WIND 133 1488 1621
## 880 WINTER STORM 206 1321 1527
Here we can see that tornadoes result in the most casualties among weather event types, with excessive heat a distant second.
annual_casualties <- ddply(wd_cleaned,.(EVTYPE,YEAR),
summarize,
totalFatalities=sum(FATALITIES),
totalInjuries=sum(INJURIES),
totalCasualties=sum(FATALITIES+INJURIES))
tornado_casualties <- annual_casualties[annual_casualties$EVTYPE == "TORNADO",]
library(ggplot2)
qplot(x = YEAR,totalCasualties,data=tornado_casualties,geom="point",xlab="Year",ylab="Casualties from Tornadoes")
Damage to Property and Crops Property and crop damage is noted in the data along with a multiplier (K or M). The multiplier is applied to determine the actual estimated damage for each event.
wd_damages <- mutate(wd_cleaned,PropDmg = PROPDMG * ifelse(PROPDMGEXP == "K",1000,ifelse(PROPDMGEXP=="M",1000000,1)),CropDmg = CROPDMG * ifelse(CROPDMGEXP == "K",1000,ifelse(CROPDMGEXP=="M",1000000,1)))
To determine the most damaging types of events, we sum up property and crop damage.
evtype_total_damages <- ddply(wd_damages,.(EVTYPE),summarize,
totalPropDmg=sum(PropDmg),
totalCropDmg=sum(CropDmg),
totalDmg = sum(PropDmg,CropDmg))
damages_sorted <- evtype_total_damages[order(evtype_total_damages[,"totalDmg"],
decreasing=TRUE),]
print(damages_sorted[1:10,])
## EVTYPE totalPropDmg totalCropDmg totalDmg
## 750 TORNADO 51625660796 414953270 52040614066
## 146 FLOOD 22157709930 5661968450 27819678380
## 204 HAIL 13927367054 3025537890 16952904944
## 130 FLASH FLOOD 15140862068 1421317100 16562179168
## 76 DROUGHT 1046106000 12472566002 13518672002
## 355 HURRICANE 6168319016 2741910000 8910229016
## 771 TSTM WIND 4493058495 554007350 5047065845
## 364 HURRICANE/TYPHOON 3805840066 1097872802 4903712867
## 312 HIGH WIND 3970046296 638571300 4608617596
## 867 WILDFIRE 3725114001 295472800 4020586801