Hi! Thanks for taking the time to review this assessment! We will require the ggplot2, data.table and R.utils libraries to knit this R Markdown document
library(ggplot2, quietly=T, verbose=F, warn.conflicts=F)
library(data.table, quietly=T, verbose=F, warn.conflicts=F)
library(R.utils, quietly=T, verbose=F, warn.conflicts=F)
## Loading required package: R.methodsS3
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. In particular, we aim to address the following questions through this data anaysis:
As a result of this analysis, we conclude that increased assistance for excessive heat and tornado severe weather conditions are the best for reducing impact in population health. On the other hand, most of the economic impact derive from flooding and severe storm conditions.
For this analysis, we are gathering the data from a file made available by the Coursera class, that has a compressed file with all the data. We proceed to download and uncompress the data.
filename = "repdata_peerassessment2.csv"
filename.bzipped = "repdata_peerassessment2.bz2"
if (!file.exists(filename)) {
if (!file.exists(filename.bzipped)) {
retval = download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = filename.bzipped,
method = "curl")
}
bunzip2(filename.bzipped, destname=filename)
}
We load the data up on a data.table object to process it.
storm = data.table(read.csv(filename, header=T, sep=",", stringsAsFactors=F))
str(storm)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
We will to the folowing processing to the raw data in order to facilitate computation:
BGM_DATE filed will be converted to Date objectsinvisible(storm[, BGN_DATE := as.Date(storm$BGN_DATE, format="%m/%d/%Y %H:%M:%S")])
EVTYPE variable) for this analysis. That will address the most common event types.event.order = table(storm$EVTYPE)[order(table(storm$EVTYPE), decreasing=T)]
event.order = event.order[1:50]
event.order
##
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW HEAVY RAIN WINTER STORM
## 15708 11723 11433
## WINTER WEATHER FUNNEL CLOUD MARINE TSTM WIND
## 7026 6839 6175
## MARINE THUNDERSTORM WIND WATERSPOUT STRONG WIND
## 5812 3796 3566
## URBAN/SML STREAM FLD WILDFIRE BLIZZARD
## 3392 2761 2719
## DROUGHT ICE STORM EXCESSIVE HEAT
## 2488 2006 1678
## HIGH WINDS WILD/FOREST FIRE FROST/FREEZE
## 1533 1457 1342
## DENSE FOG WINTER WEATHER/MIX TSTM WIND/HAIL
## 1293 1104 1028
## EXTREME COLD/WIND CHILL HEAT HIGH SURF
## 1002 767 725
## TROPICAL STORM FLASH FLOODING EXTREME COLD
## 690 682 655
## COASTAL FLOOD LAKE-EFFECT SNOW FLOOD/FLASH FLOOD
## 650 636 624
## LANDSLIDE SNOW COLD/WIND CHILL
## 600 587 539
## FOG RIP CURRENT MARINE HAIL
## 538 470 442
## DUST STORM AVALANCHE WIND
## 427 386 340
## RIP CURRENTS STORM SURGE
## 304 261
storm = storm[EVTYPE %chin% names(event.order)]
PROPDMGEXP and CROPDMGEXP are ignoredinvisible(storm[PROPDMGEXP == "K", PROPDMG := PROPDMG * 1000])
invisible(storm[PROPDMGEXP == "M", PROPDMG := PROPDMG * 1000 * 1000])
invisible(storm[PROPDMGEXP == "B", PROPDMG := PROPDMG * 1000 * 1000 * 1000])
invisible(storm[CROPDMGEXP == "K", CROPDMG := CROPDMG * 1000])
invisible(storm[CROPDMGEXP == "M", CROPDMG := CROPDMG * 1000 * 1000])
invisible(storm[CROPDMGEXP == "B", CROPDMG := CROPDMG * 1000 * 1000 * 1000])
summary(storm$PROPDMG + storm$CROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 4.04e+05 1.00e+03 1.15e+11
This section describes the results from both questions.
In order to evaluate the impact on population health, we will analyse the information on fatailities and injuries (encoded on the variables FATALITIES and INJURIES) for us to have a better understanding of the impact of those events.
We will add all occurancies of both items over different event types and also average them out on the absolute number of events.
storm.health = storm[, list(event.count = .N,
fatalities.tot = sum(FATALITIES, na.rm=T),
fatalities.mean = sum(FATALITIES, na.rm=T)/.N,
injuries.tot = sum(INJURIES, na.rm=T),
injuries.mean = sum(INJURIES, na.rm=T)/.N), by="EVTYPE"]
invisible(storm.health[, health.tot := fatalities.tot + injuries.tot])
invisible(storm.health[, health.mean := fatalities.mean + injuries.mean])
storm.health
## EVTYPE event.count fatalities.tot fatalities.mean
## 1: TORNADO 60652 5633 9.287e-02
## 2: TSTM WIND 219940 504 2.292e-03
## 3: HAIL 288661 15 5.196e-05
## 4: SNOW 587 5 8.518e-03
## 5: WINTER STORM 11433 206 1.802e-02
## 6: THUNDERSTORM WINDS 20843 64 3.071e-03
## 7: HEAVY RAIN 11723 98 8.360e-03
## 8: LIGHTNING 15754 816 5.180e-02
## 9: THUNDERSTORM WIND 82563 133 1.611e-03
## 10: DENSE FOG 1293 18 1.392e-02
## 11: RIP CURRENT 470 368 7.830e-01
## 12: FLASH FLOOD 54277 978 1.802e-02
## 13: FLASH FLOODING 682 19 2.786e-02
## 14: HIGH WINDS 1533 35 2.283e-02
## 15: FUNNEL CLOUD 6839 0 0.000e+00
## 16: HEAT 767 937 1.222e+00
## 17: WIND 340 23 6.765e-02
## 18: FLOOD 25326 470 1.856e-02
## 19: WATERSPOUT 3796 3 7.903e-04
## 20: EXTREME COLD 655 160 2.443e-01
## 21: HIGH WIND 20212 248 1.227e-02
## 22: BLIZZARD 2719 101 3.715e-02
## 23: HEAVY SNOW 15708 127 8.085e-03
## 24: COASTAL FLOOD 650 3 4.615e-03
## 25: ICE STORM 2006 89 4.437e-02
## 26: AVALANCHE 386 224 5.803e-01
## 27: DUST STORM 427 22 5.152e-02
## 28: EXCESSIVE HEAT 1678 1903 1.134e+00
## 29: HIGH SURF 725 101 1.393e-01
## 30: FLOOD/FLASH FLOOD 624 17 2.724e-02
## 31: STRONG WIND 3566 103 2.888e-02
## 32: WINTER WEATHER 7026 33 4.697e-03
## 33: DROUGHT 2488 0 0.000e+00
## 34: STORM SURGE 261 13 4.981e-02
## 35: TROPICAL STORM 690 58 8.406e-02
## 36: WILDFIRE 2761 75 2.716e-02
## 37: WILD/FOREST FIRE 1457 12 8.236e-03
## 38: FOG 538 62 1.152e-01
## 39: RIP CURRENTS 304 204 6.711e-01
## 40: LANDSLIDE 600 38 6.333e-02
## 41: LAKE-EFFECT SNOW 636 0 0.000e+00
## 42: URBAN/SML STREAM FLD 3392 28 8.255e-03
## 43: TSTM WIND/HAIL 1028 5 4.864e-03
## 44: MARINE TSTM WIND 6175 9 1.457e-03
## 45: FROST/FREEZE 1342 0 0.000e+00
## 46: WINTER WEATHER/MIX 1104 28 2.536e-02
## 47: EXTREME COLD/WIND CHILL 1002 125 1.248e-01
## 48: MARINE HAIL 442 0 0.000e+00
## 49: COLD/WIND CHILL 539 95 1.763e-01
## 50: MARINE THUNDERSTORM WIND 5812 10 1.721e-03
## EVTYPE event.count fatalities.tot fatalities.mean
## injuries.tot injuries.mean health.tot health.mean
## 1: 91346 1.5060674 96979 1.5989415
## 2: 6957 0.0316314 7461 0.0339229
## 3: 1361 0.0047149 1376 0.0047668
## 4: 29 0.0494037 34 0.0579216
## 5: 1321 0.1155427 1527 0.1335607
## 6: 908 0.0435638 972 0.0466344
## 7: 251 0.0214109 349 0.0297705
## 8: 5230 0.3319792 6046 0.3837755
## 9: 1488 0.0180226 1621 0.0196335
## 10: 342 0.2645012 360 0.2784223
## 11: 232 0.4936170 600 1.2765957
## 12: 1777 0.0327395 2755 0.0507581
## 13: 8 0.0117302 27 0.0395894
## 14: 302 0.1969993 337 0.2198304
## 15: 3 0.0004387 3 0.0004387
## 16: 2100 2.7379400 3037 3.9595828
## 17: 86 0.2529412 109 0.3205882
## 18: 6789 0.2680644 7259 0.2866224
## 19: 29 0.0076396 32 0.0084299
## 20: 231 0.3526718 391 0.5969466
## 21: 1137 0.0562537 1385 0.0685236
## 22: 805 0.2960647 906 0.3332107
## 23: 1021 0.0649987 1148 0.0730838
## 24: 2 0.0030769 5 0.0076923
## 25: 1975 0.9845464 2064 1.0289133
## 26: 170 0.4404145 394 1.0207254
## 27: 440 1.0304450 462 1.0819672
## 28: 6525 3.8885578 8428 5.0226460
## 29: 152 0.2096552 253 0.3489655
## 30: 15 0.0240385 32 0.0512821
## 31: 280 0.0785193 383 0.1074033
## 32: 398 0.0566467 431 0.0613436
## 33: 4 0.0016077 4 0.0016077
## 34: 38 0.1455939 51 0.1954023
## 35: 340 0.4927536 398 0.5768116
## 36: 911 0.3299529 986 0.3571170
## 37: 545 0.3740563 557 0.3822924
## 38: 734 1.3643123 796 1.4795539
## 39: 297 0.9769737 501 1.6480263
## 40: 52 0.0866667 90 0.1500000
## 41: 0 0.0000000 0 0.0000000
## 42: 79 0.0232901 107 0.0315448
## 43: 95 0.0924125 100 0.0972763
## 44: 8 0.0012955 17 0.0027530
## 45: 0 0.0000000 0 0.0000000
## 46: 72 0.0652174 100 0.0905797
## 47: 24 0.0239521 149 0.1487026
## 48: 0 0.0000000 0 0.0000000
## 49: 12 0.0222635 107 0.1985158
## 50: 26 0.0044735 36 0.0061941
## injuries.tot injuries.mean health.tot health.mean
We can consider the top aggregate health incidents (number of fatalities + number of injuries) on all recorded events of a specific type to try to understand the potential impacts of the events. We select the top 5 event types for this plot here.
storm.health = storm.health[order(health.tot, decreasing=T)]
storm.health.tot = storm.health[1:5]
qplot(as.factor(EVTYPE), health.tot, data=storm.health.tot,
geom="bar", stat="identity",
main="Aggregate of health incidents per event type",
xlab="Severe Weather Event Type",
ylab="Count of Fatalities + Injuries", fill=..y..)
Alternativelly, we can look at the averages of such health incidents across every single event, which can provide us with an expected value of the impacts of such incidents
storm.health = storm.health[order(health.mean, decreasing=T)]
storm.health.mean = storm.health[1:5]
qplot(as.factor(EVTYPE), health.mean, data=storm.health.mean,
geom="bar", stat="identity",
main="Average of health incidents per event type",
xlab="Severe Weather Event Type",
ylab="Average of Fatalities + Injuries per instance", fill=..y..)
Given these results, it would seem that events related to heating (EXCESSIVE HEAT and HEAT) and with TORNADOs are the ones that are worth investigating to divert additional funds for assistance if we are to reduce injuries and loss of life after severe weather events.
In order to evaluate the impact on the economy, we will analyse the information on crop and propery damage (encoded on the variables PROPDMG and CROPDMG) for us to have a better understanding of the impact of those events.
There variables have already been normalized on the Data Processing section
This time, we will only consider the averages of economic damage for each event type.
storm.econ = storm[, list(event.count = .N,
prop.mean = sum(PROPDMG, na.rm=T)/.N,
crop.mean = sum(CROPDMG, na.rm=T)/.N), by="EVTYPE"]
invisible(storm.econ[, econ.mean := prop.mean + crop.mean])
storm.econ
## EVTYPE event.count prop.mean crop.mean econ.mean
## 1: TORNADO 60652 9.386e+05 6.842e+03 9.454e+05
## 2: TSTM WIND 219940 2.039e+04 2.519e+03 2.291e+04
## 3: HAIL 288661 5.448e+04 1.048e+04 6.497e+04
## 4: SNOW 587 2.515e+04 1.704e+01 2.517e+04
## 5: WINTER STORM 11433 5.850e+05 2.357e+03 5.874e+05
## 6: THUNDERSTORM WINDS 20843 8.317e+04 9.147e+03 9.231e+04
## 7: HEAVY RAIN 11723 5.922e+04 6.256e+04 1.218e+05
## 8: LIGHTNING 15754 5.895e+04 7.676e+02 5.972e+04
## 9: THUNDERSTORM WIND 82563 4.219e+04 5.025e+03 4.721e+04
## 10: DENSE FOG 1293 7.482e+03 0.000e+00 7.482e+03
## 11: RIP CURRENT 470 2.128e+00 0.000e+00 2.128e+00
## 12: FLASH FLOOD 54277 2.974e+05 2.619e+04 3.236e+05
## 13: FLASH FLOODING 682 4.513e+05 2.215e+04 4.734e+05
## 14: HIGH WINDS 1533 3.968e+05 2.656e+04 4.234e+05
## 15: FUNNEL CLOUD 6839 2.845e+01 0.000e+00 2.845e+01
## 16: HEAT 767 2.343e+03 5.234e+05 5.258e+05
## 17: WIND 340 2.554e+04 8.824e+02 2.642e+04
## 18: FLOOD 25326 5.712e+06 2.236e+05 5.935e+06
## 19: WATERSPOUT 3796 2.464e+03 0.000e+00 2.464e+03
## 20: EXTREME COLD 655 1.034e+05 1.974e+06 2.077e+06
## 21: HIGH WIND 20212 2.607e+05 3.159e+04 2.923e+05
## 22: BLIZZARD 2719 2.424e+05 4.121e+04 2.837e+05
## 23: HEAVY SNOW 15708 5.937e+04 8.572e+03 6.794e+04
## 24: COASTAL FLOOD 650 3.656e+05 0.000e+00 3.656e+05
## 25: ICE STORM 2006 1.967e+06 2.504e+06 4.470e+06
## 26: AVALANCHE 386 9.642e+03 0.000e+00 9.642e+03
## 27: DUST STORM 427 1.300e+04 7.260e+03 2.026e+04
## 28: EXCESSIVE HEAT 1678 4.621e+03 2.934e+05 2.981e+05
## 29: HIGH SURF 725 1.236e+05 0.000e+00 1.236e+05
## 30: FLOOD/FLASH FLOOD 624 2.789e+05 1.523e+05 4.312e+05
## 31: STRONG WIND 3566 4.914e+04 1.821e+04 6.736e+04
## 32: WINTER WEATHER 7026 2.970e+03 2.135e+03 5.105e+03
## 33: DROUGHT 2488 4.205e+05 5.616e+06 6.036e+06
## 34: STORM SURGE 261 1.660e+08 1.916e+01 1.660e+08
## 35: TROPICAL STORM 690 1.117e+07 9.831e+05 1.215e+07
## 36: WILDFIRE 2761 1.726e+06 1.070e+05 1.833e+06
## 37: WILD/FOREST FIRE 1457 2.060e+06 7.330e+04 2.134e+06
## 38: FOG 538 2.445e+04 0.000e+00 2.445e+04
## 39: RIP CURRENTS 304 5.329e+02 0.000e+00 5.329e+02
## 40: LANDSLIDE 600 5.410e+05 3.336e+04 5.744e+05
## 41: LAKE-EFFECT SNOW 636 6.307e+04 0.000e+00 6.307e+04
## 42: URBAN/SML STREAM FLD 3392 1.719e+04 2.502e+03 1.969e+04
## 43: TSTM WIND/HAIL 1028 4.313e+04 6.293e+04 1.061e+05
## 44: MARINE TSTM WIND 6175 8.779e+02 0.000e+00 8.779e+02
## 45: FROST/FREEZE 1342 7.064e+03 8.153e+05 8.223e+05
## 46: WINTER WEATHER/MIX 1104 5.772e+03 0.000e+00 5.772e+03
## 47: EXTREME COLD/WIND CHILL 1002 8.631e+03 4.990e+01 8.681e+03
## 48: MARINE HAIL 442 9.050e+00 0.000e+00 9.050e+00
## 49: COLD/WIND CHILL 539 3.692e+03 1.113e+03 4.805e+03
## 50: MARINE THUNDERSTORM WIND 5812 7.509e+01 8.603e+00 8.369e+01
## EVTYPE event.count prop.mean crop.mean econ.mean
Again, we select the top 5 event types in economic impact for this plot here. We put this one in a logarithmic scale so we couls better visualize the impact.
storm.econ = storm.econ[order(econ.mean, decreasing=T)]
storm.econ.tot = storm.econ[1:5]
qplot(as.factor(EVTYPE), econ.mean, data=storm.econ.tot,
geom="bar", stat="identity", log="y",
main="Average of economic impact per event type",
xlab="Severe Weather Event Type",
ylab="Average of Property and Crop Damage per Incident", fill=..y..)
It is clear that the most economic impact come from flooding and storm related severe weather events, most probably because of crop damage and flooding of residential and commercial areas.