This report analyzes storm data across the U.S. It is based on the storm data data set which contains observations of “event types” and the resulting bodily and/or property damage. The goal of the report is to present the storm events that correlate with the highest number of fatalities/injuries and corresponding property/crop damage
This report contains all code used and is completely reproducible starting from the download of the available bz2 file. Although the unzipped dataset was in csv format, some cleanup needed to be done. The EVTYPE column which contained the storm event type names contained misspellings resulting in some duplicate rows. This was remedied by transforming the names to lowercase and using grep to match on common words and then choosing a common term to represent it (e.g. “tornado”). Each financial field (property damage and crop damage) had a corresponding field with an “EXP” suffix. This field contained, for example, “K”, “M”, “B”, etc. to denote thousands, millions or billions, respectively. There were also a few nonsensical values (such as, “+”, “?”). The extension was replaced with a multiplier and used to multiply the amount field. The irrelevant fields were then discarded.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
wd <- "/Users/Maria/Documents/Coursera/Data Science Specialization/Reproducible Research"
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dfile <- "stormdata.bz2"
setwd(wd)
download.file(url, dfile)
unzipped <- bzfile(dfile, "r")
sdata <- read.csv(unzipped)
The fatalities and injuries are reported on first. The top 20 are presented in the following tables/graphs. For completeness, I included the top storm events with regards to fatalities, injuries and combined. I used log(base 10) since tornado fatalities/injuries were so high so as to squash the remaining results.
#Get rid of the "EXP" columns and split the data into sub data frames, arrange, mutate, etc.
sdata <- select(sdata, -ends_with("EXP"))
prop <- select(sdata, -contains("IES"))
fatal <- select(sdata, -contains("ROP"))
fatal <- aggregate(. ~ EVTYPE, data = fatal, FUN = sum)
fatal <- arrange(fatal, desc(FATALITIES))
injuries <- arrange(fatal, desc(INJURIES))
fplusi <- mutate(fatal, FPLUSI = FATALITIES + INJURIES)
fplusi <- arrange(fplusi, desc(FPLUSI))
fatal <- head(fatal, n = 20)
injuries <- head(injuries, n = 20)
fplusi <- head(fplusi, n = 20)
fplusi2 <- filter(fplusi, EVTYPE != "tornado")
prop <- aggregate(. ~ EVTYPE, data = prop, FUN = sum)
prop <- arrange(prop, desc(PROPDMG))
crop <- arrange(prop, desc(CROPDMG))
cplusp <- mutate(prop, CPLUSP = PROPDMG + CROPDMG)
cplusp <- arrange(cplusp, desc(CPLUSP))
prop <- head(prop, n = 20)
crop <- head(crop, n = 20)
cplusp <- head(cplusp, n = 20)
Sorted by fatalities
## EVTYPE FATALITIES INJURIES
## 1 tornado 5633 91346
## 2 excessive heat 1903 6525
## 3 flash flood 978 1777
## 4 heat 937 2100
## 5 lightning 816 5230
## 6 tstm wind 504 6957
## 7 flood 470 6789
## 8 rip current 368 232
## 9 high wind 248 1137
## 10 avalanche 224 170
## 11 winter storm 206 1321
## 12 rip currents 204 297
## 13 heat wave 172 379
## 14 extreme cold 162 231
## 15 thunderstorm wind 133 1488
## 16 heavy snow 127 1021
## 17 extreme cold/wind chill 125 24
## 18 high surf 104 156
## 19 strong wind 103 280
## 20 blizzard 101 805
Sorted by injuries
## EVTYPE FATALITIES INJURIES
## 1 tornado 5633 91346
## 2 tstm wind 504 6957
## 3 flood 470 6789
## 4 excessive heat 1903 6525
## 5 lightning 816 5230
## 6 heat 937 2100
## 7 ice storm 89 1975
## 8 flash flood 978 1777
## 9 thunderstorm wind 133 1488
## 10 hail 15 1361
## 11 winter storm 206 1321
## 12 hurricane/typhoon 64 1275
## 13 high wind 248 1137
## 14 heavy snow 127 1021
## 15 wildfire 75 911
## 16 thunderstorm winds 64 908
## 17 blizzard 101 805
## 18 fog 62 734
## 19 wild/forest fire 12 545
## 20 dust storm 22 440
Fatalities and injuries summed and sorted
## EVTYPE FATALITIES INJURIES FPLUSI
## 1 tornado 5633 91346 96979
## 2 excessive heat 1903 6525 8428
## 3 tstm wind 504 6957 7461
## 4 flood 470 6789 7259
## 5 lightning 816 5230 6046
## 6 heat 937 2100 3037
## 7 flash flood 978 1777 2755
## 8 ice storm 89 1975 2064
## 9 thunderstorm wind 133 1488 1621
## 10 winter storm 206 1321 1527
## 11 high wind 248 1137 1385
## 12 hail 15 1361 1376
## 13 hurricane/typhoon 64 1275 1339
## 14 heavy snow 127 1021 1148
## 15 wildfire 75 911 986
## 16 thunderstorm winds 64 908 972
## 17 blizzard 101 805 906
## 18 fog 62 734 796
## 19 rip current 368 232 600
## 20 wild/forest fire 12 545 557
Fig. 1: Storm Event Effects on fatalities/injuries
These results show that, by far, tornados have the largest effect on both fatalities and injuries.
Sorted by property damage cost
## EVTYPE PROPDMG CROPDMG
## 1 flood 144657709807 5661968450
## 2 hurricane/typhoon 69305840000 2607872800
## 3 tornado 56937161054 414953110
## 4 storm surge 43323536000 5000
## 5 flash flood 16140812294 1421317100
## 6 hail 15732266932 3025954453
## 7 hurricane 11868319010 2741910000
## 8 tropical storm 7703890550 678346000
## 9 winter storm 6688497250 26944000
## 10 high wind 5270046295 638571300
## 11 river flood 5118945500 5029459000
## 12 wildfire 4765114000 295472800
## 13 storm surge/tide 4641188000 850000
## 14 tstm wind 4484958440 554007350
## 15 ice storm 3944927810 5022113500
## 16 thunderstorm wind 3483121166 414843050
## 17 hurricane opal 3172846000 19000000
## 18 wild/forest fire 3001829500 106796830
## 19 heavy rain/severe weather 2500000000 0
## 20 thunderstorm winds 1735953834 190654708
Sorted by crop damage cost
## EVTYPE PROPDMG CROPDMG
## 1 drought 1046106000 13972566000
## 2 flood 144657709807 5661968450
## 3 river flood 5118945500 5029459000
## 4 ice storm 3944927810 5022113500
## 5 hail 15732266932 3025954453
## 6 hurricane 11868319010 2741910000
## 7 hurricane/typhoon 69305840000 2607872800
## 8 flash flood 16140812294 1421317100
## 9 extreme cold 67737400 1312973000
## 10 frost/freeze 10480000 1094186000
## 11 heavy rain 694248090 733399800
## 12 tropical storm 7703890550 678346000
## 13 high wind 5270046295 638571300
## 14 tstm wind 4484958440 554007350
## 15 excessive heat 7753700 492402000
## 16 freeze 205000 456725000
## 17 tornado 56937161054 414953110
## 18 thunderstorm wind 3483121166 414843050
## 19 heat 1797000 401461500
## 20 damaging freeze 8000000 296230000
Sorted by sum of property and crop damage cost
## EVTYPE PROPDMG CROPDMG CPLUSP
## 1 flood 144657709807 5661968450 150319678257
## 2 hurricane/typhoon 69305840000 2607872800 71913712800
## 3 tornado 56937161054 414953110 57352114164
## 4 storm surge 43323536000 5000 43323541000
## 5 hail 15732266932 3025954453 18758221385
## 6 flash flood 16140812294 1421317100 17562129394
## 7 drought 1046106000 13972566000 15018672000
## 8 hurricane 11868319010 2741910000 14610229010
## 9 river flood 5118945500 5029459000 10148404500
## 10 ice storm 3944927810 5022113500 8967041310
## 11 tropical storm 7703890550 678346000 8382236550
## 12 winter storm 6688497250 26944000 6715441250
## 13 high wind 5270046295 638571300 5908617595
## 14 wildfire 4765114000 295472800 5060586800
## 15 tstm wind 4484958440 554007350 5038965790
## 16 storm surge/tide 4641188000 850000 4642038000
## 17 thunderstorm wind 3483121166 414843050 3897964216
## 18 hurricane opal 3172846000 19000000 3191846000
## 19 wild/forest fire 3001829500 106796830 3108626330
## 20 heavy rain/severe weather 2500000000 0 2500000000
Fig. 1: Storm Event Cost of Property and Crop Damage
These results show that drought causes the highest damage cost to crops, while flooding causes the highest damage cost to property in general