Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. The National Oceanic and Atmospheric Administration’s (NOAA) storm database contains information about natural disasters that occured from 1950 to 2011. This analysis takes information from this database and identifies which events caused the most: Injuries, Fatalities, Property Damage and Crop Damage.
This project has two main objectives:
Determine which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health.
Determine which types of events have the greatest economic consequences.
The data is available in a bzip2 format to reduce its size. You can download the file from the course web site:
## Reads data from a previously uncompressed .csv file
setwd("~/R/COURSERA5_STORM")
stormdata <- read.csv("repdata-data-StormData.csv.bz2")
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
The following lines of code will create “stormdata_sums” which contains the sums for FATALITIES, INJURIES, PROPERTY DAMAGE AND CROP DAMAGE for each Event Type. Sums by Event Type will alow to analyze which Event Type had the most events or cost for each parameter analyzed.
## Calls library that contains the "ddply" command which is used to to sort the data by Event Type (EVTYPE) and apply the "sum" function to the resulting subset data.
library(plyr)
## Subsets the data according to Event Type (EVTYPE) and applies the "sum" function to the following columns: FATALITIES, INJURIES, PROPERTY DAMAGE AND CROP DAMAGE.
stormdata_sums <- ddply(stormdata, ~EVTYPE, summarise, FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), PROPDMG = sum(PROPDMG), CROPDMG = sum(CROPDMG))
head(stormdata_sums)
## EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1 ? 0 0 5 0.00
## 2 ABNORMALLY DRY 0 0 0 0.00
## 3 ABNORMALLY WET 0 0 0 0.00
## 4 ABNORMAL WARMTH 0 0 0 0.00
## 5 ACCUMULATED SNOWFALL 0 0 0 0.00
## 6 AGRICULTURAL FREEZE 0 0 0 28.82
The following code takes “stormdata_sums” and subsets data by FATALITIES, INJURIES, PROPERTY DAMAGE AND CROP DAMAGE.
fatalities <- stormdata_sums[order(stormdata_sums$FATALITIES, decreasing = T), c("EVTYPE", "FATALITIES")][1:10, ]
injuries <- stormdata_sums[order(stormdata_sums$INJURIES, decreasing = T), c("EVTYPE", "INJURIES")][1:10, ]
property_damage <- stormdata_sums[order(stormdata_sums$PROPDMG, decreasing = T), c("EVTYPE", "PROPDMG")][1:10, ]
crop_damage <- stormdata_sums[order(stormdata_sums$CROPDMG, decreasing = T), c("EVTYPE", "CROPDMG")][1:10, ]
fatalities
## EVTYPE FATALITIES
## 830 TORNADO 5633
## 123 EXCESSIVE HEAT 1903
## 147 FLASH FLOOD 978
## 269 HEAT 937
## 452 LIGHTNING 816
## 854 TSTM WIND 504
## 164 FLOOD 470
## 581 RIP CURRENT 368
## 354 HIGH WIND 248
## 11 AVALANCHE 224
injuries
## EVTYPE INJURIES
## 830 TORNADO 91346
## 854 TSTM WIND 6957
## 164 FLOOD 6789
## 123 EXCESSIVE HEAT 6525
## 452 LIGHTNING 5230
## 269 HEAT 2100
## 424 ICE STORM 1975
## 147 FLASH FLOOD 1777
## 759 THUNDERSTORM WIND 1488
## 238 HAIL 1361
property_damage
## EVTYPE PROPDMG
## 830 TORNADO 3212258.2
## 147 FLASH FLOOD 1420124.6
## 854 TSTM WIND 1335965.6
## 164 FLOOD 899938.5
## 759 THUNDERSTORM WIND 876844.2
## 238 HAIL 688693.4
## 452 LIGHTNING 603351.8
## 783 THUNDERSTORM WINDS 446293.2
## 354 HIGH WIND 324731.6
## 972 WINTER STORM 132720.6
crop_damage
## EVTYPE CROPDMG
## 238 HAIL 579596.28
## 147 FLASH FLOOD 179200.46
## 164 FLOOD 168037.88
## 854 TSTM WIND 109202.60
## 830 TORNADO 100018.52
## 759 THUNDERSTORM WIND 66791.45
## 88 DROUGHT 33898.62
## 783 THUNDERSTORM WINDS 18684.93
## 354 HIGH WIND 17283.21
## 284 HEAVY RAIN 11122.80
These lines of code graph the results obtained in the previous section.
## Sets graph margins
par(mar = c(12,4,4,2), mgp = c(0, 2, 2))
par(mfrow = c(1,2))
## Makes barplot for Events the caused most Injuries
injuries_graph <- barplot(injuries$INJURIES,
names = injuries$EVTYPE,
main = "Event Type with Most Injuires in US 1950 - 2011
(INJURIES)",
xlab = "Event Type", ylab = "Injuries", las = 2,
cex.main = 0.75, cex.axis = 0.6, cex.names = 0.6)
## Makes barplot for Events the caused most Fatalities
fatalities_graph <- barplot(fatalities$FATALITIES,
names = injuries$EVTYPE,
main = "Event Type with Most Fatalities in US 1950 - 2011
(FATALITIES)",
xlab = "Event Type", ylab = "Fatalities", las = 2,
cex.main = 0.75, cex.axis = 0.6, cex.names = 0.6)
The plots show that Tornadoes are cause the most Injuries and Fatalities of all disasters by a large margin.
par(mar = c(12,4,4,2), mgp = c(0, 2, 2))
par(mfrow = c(1,1))
## Makes barplot for Events the caused most Property Damage
propertydmg_graph <- barplot(property_damage$PROPDMG,
names = property_damage$EVTYPE,
main = "Event Type with Most Property Damage in US 1950 - 2011
(USD)",
xlab = "Event Type", ylab = "Property Damage", las = 2,
cex.axis = 0.6, cex.names = 0.6)
As with Injuries and Fatalities, Tornadoes are cause the most Property Damage of all disasters.
par(mar = c(12,4,4,2), mgp = c(0, 2, 2))
## Makes barplot for Events the caused most Crop Damage
cropdmg_graph <- barplot(crop_damage$CROPDMG,
names = crop_damage$EVTYPE,
main = "Event Type with Most Crop Damage in US 1950 - 2011
(USD)",
xlab = "Event Type", ylab = "Crop Damage", las = 2,
cex.axis = 0.6, cex.names = 0.6)
For Crop Damage, Hail cause the most Property Damage of all disasters.