This project explored the Storm Data database of the National Weather Service to answer two basic questions about the effects of severe weather events. Data from 1996 to 2011 wll be used since these years contain the most complete data. The first part of the project seeks to ascertain those severe weather events that are most harmful to the health of the population. The fatalities and injuries incurred by these events will be examined here. The second part of the project seeks to ascertain which severe weather events have the greatest economic consequences. Property and crop damage incurred by these events will be examined. This project assumes that the compressed data file “repdata_data_StormData.csv.bz2” is stored in the working directory.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
The compressed file was read and loaded into the stormData Variable with stringsAsFactors set to FALSE.
stormData <- read.csv("repdata_data_StormData.csv.bz2", header=TRUE, sep=",", stringsAsFactors = FALSE)
The Date variable was created to format the BGN_DATE variable as a date. This will be used as the date of the occurrence of the severe storm weather event. The thinking is that the weather occurrence date will be taken as the day the occurrence began.
stormData2 <- mutate(stormData, Date = mdy_hms(BGN_DATE))
Only a subset of the data will be used for the analysis. According to the National Climatic Data Centre (https://www.ncdc.noaa.gov/stormevents/details.jsp) only the data from 1996 to 2011 is complete. Therefore, only this data will be used for the analysis for this project.
selectData <- filter(stormData2, year(Date)>= 1996 & year(Date)<=2011)
It is assumed that the PROPDMGEXP and CROPDMGEXP is the multiplier for the PROPDMG and CROPDMG fields respectively. Therefore the data was adjusted as follows:
- K was converted to 1000
- B was converted to 1000000000
- M was converted to 1000000
- other values were converted to 1
selectData$PROPDMGEXP[selectData$PROPDMGEXP == "K"] <- 1000
selectData$PROPDMGEXP[selectData$PROPDMGEXP == "B"] <- 1000000000
selectData$PROPDMGEXP[selectData$PROPDMGEXP == "M"] <- 1000000
selectData$PROPDMGEXP[selectData$PROPDMGEXP == ""] <- 1
selectData$PROPDMGEXP[selectData$PROPDMGEXP == "0"] <- 1
selectData$CROPDMGEXP[selectData$CROPDMGEXP == "K"] <- 1000
selectData$CROPDMGEXP[selectData$CROPDMGEXP == "B"] <- 1000000000
selectData$CROPDMGEXP[selectData$CROPDMGEXP == "M"] <- 1000000
selectData$CROPDMGEXP[selectData$CROPDMGEXP == ""] <- 1
The PROPDMGEXP and CROPDMGEXP fields were then converted to numeric
selectData$PROPDMGEXP <- as.numeric(selectData$PROPDMGEXP)
selectData$CROPDMGEXP <- as.numeric(selectData$CROPDMGEXP)
The following formula will be used to represent population health.
fatilities + injuries.
newData <- mutate(selectData, health = FATALITIES + INJURIES)
All records with a health of 0 is removed:
healthData <- filter(newData, health != 0)
The data is then cleaned by manually rectifying some key event labels with those that are offically recognised according to the data collection agency. After this combining, the events with the higest affect on population should be identified. Note that only those events that had a high impact on health were considered for rectification.
healthData$EVENT <- healthData$EVTYPE
healthData$EVENT <- toupper(healthData$EVENT)
healthData$EVENT[grepl("TSTM", healthData$EVENT)] <- "THUNDERSTORM WIND"
healthData$EVENT[grepl("THUNDERSTORM", healthData$EVENT)] <- "THUNDERSTORM WIND"
healthData$EVENT[grepl("RIP CURRENTS", healthData$EVENT)] <- "RIP CURRENT"
healthData$EVENT[grepl("EXTREME COLD", healthData$EVENT)] <- "EXTREME COLD/WIND CHILL"
healthData$EVENT[grepl("EXTREME WINDCHILL", healthData$EVENT)] <- "EXTREME COLD/WIND CHILL"
healthData$EVENT[grepl("COLD/WIND CHILL", healthData$EVENT)] <- "EXTREME COLD/WIND CHILL"
healthData$EVENT[grepl("FOG", healthData$EVENT)] <- "DENSE FOG"
healthData$EVENT[grepl("HURRICANE", healthData$EVENT)] <- "HURRICANE (TYPHOON)"
healthData$EVENT[grepl("WILD/FOREST FIRE", healthData$EVENT)] <- "WILDFIRE"
The following formula will be used to represent economic consequences PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP
newData <- mutate(newData, economic = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
All records with economic = 0 removed.
economicData <- filter(newData, economic !=0)
The data is then cleaned by manually rectifying some key events with those that are offically recognised according to the data collection agency. After this combining, the events with the higest affect on population should be identified. Note that only those events that had a high economic consequence were considered for rectification.
economicData$EVENT <- economicData$EVTYPE
economicData$EVENT <- toupper(economicData$EVENT)
economicData$EVENT[grepl("TSTM", economicData$EVENT)] <- "THUNDERSTORM WIND"
economicData$EVENT[grepl("THUNDERSTORM", economicData$EVENT)] <- "THUNDERSTORM WIND"
economicData$EVENT[grepl("RIP CURRENTS", economicData$EVENT)] <- "RIP CURRENT"
economicData$EVENT[grepl("EXTREME COLD", economicData$EVENT)] <- "EXTREME COLD/WIND CHILL"
economicData$EVENT[grepl("EXTREME WINDCHILL", economicData$EVENT)] <- "EXTREME COLD/WIND CHILL"
economicData$EVENT[grepl("COLD/WIND CHILL", economicData$EVENT)] <- "EXTREME COLD/WIND CHILL"
economicData$EVENT[grepl("FOG", economicData$EVENT)] <- "DENSE FOG"
economicData$EVENT[grepl("HURRICANE", economicData$EVENT)] <- "HURRICANE (TYPHOON)"
economicData$EVENT[grepl("TYPHOON", economicData$EVENT)] <- "HURRICANE (TYPHOON)"
economicData$EVENT[grepl("WILD/FOREST FIRE", economicData$EVENT)] <- "WILDFIRE"
economicData$EVENT[grepl("STORM SURGE", economicData$EVENT)] <- "STORM SURGE/TIDE"
economicData$EVENT[grepl("FREEZE", economicData$EVENT)] <- "FROST/FREEZE"
healthData$EVENT <- as.factor(healthData$EVENT)
healthSummary <- aggregate(healthData$health, by=list(Category=healthData$EVENT), FUN=sum, na.rm=TRUE)
print(healthSummary)
## Category x
## 1 AVALANCHE 379
## 2 BLACK ICE 25
## 3 BLIZZARD 455
## 4 BLOWING SNOW 2
## 5 BRUSH FIRE 2
## 6 COASTAL FLOOD 5
## 7 COASTAL FLOODING 3
## 8 COASTAL FLOODING/EROSION 5
## 9 COASTAL STORM 5
## 10 COASTALSTORM 1
## 11 COLD 30
## 12 COLD AND SNOW 14
## 13 COLD TEMPERATURE 2
## 14 COLD WEATHER 2
## 15 DENSE FOG 924
## 16 DROUGHT 4
## 17 DROWNING 1
## 18 DRY MICROBURST 28
## 19 DUST DEVIL 41
## 20 DUST STORM 387
## 21 EXCESSIVE HEAT 8188
## 22 EXCESSIVE SNOW 2
## 23 EXTENDED COLD 1
## 24 EXTREME COLD/WIND CHILL 472
## 25 FALLING SNOW/ICE 2
## 26 FLASH FLOOD 2561
## 27 FLOOD 7172
## 28 FREEZING DRIZZLE 15
## 29 FREEZING RAIN 2
## 30 FREEZING SPRAY 1
## 31 FROST 4
## 32 FUNNEL CLOUD 1
## 33 GLAZE 213
## 34 GUSTY WIND 2
## 35 GUSTY WINDS 14
## 36 HAIL 720
## 37 HAZARDOUS SURF 1
## 38 HEAT 1459
## 39 HEAT WAVE 70
## 40 HEAVY RAIN 324
## 41 HEAVY SEAS 1
## 42 HEAVY SNOW 805
## 43 HEAVY SNOW SHOWER 2
## 44 HEAVY SURF 46
## 45 HEAVY SURF AND WIND 3
## 46 HEAVY SURF/HIGH SURF 90
## 47 HIGH SEAS 10
## 48 HIGH SURF 240
## 49 HIGH SWELLS 1
## 50 HIGH WATER 3
## 51 HIGH WIND 1318
## 52 HURRICANE (TYPHOON) 1448
## 53 HYPERTHERMIA/EXPOSURE 1
## 54 HYPOTHERMIA/EXPOSURE 7
## 55 ICE ON ROAD 1
## 56 ICE ROADS 1
## 57 ICE STORM 400
## 58 ICY ROADS 26
## 59 LANDSLIDE 89
## 60 LANDSLIDES 2
## 61 LIGHT SNOW 3
## 62 LIGHTNING 4792
## 63 MARINE ACCIDENT 3
## 64 MARINE HIGH WIND 2
## 65 MARINE STRONG WIND 36
## 66 MIXED PRECIP 28
## 67 MUDSLIDE 6
## 68 MUDSLIDES 1
## 69 NON-SEVERE WIND DAMAGE 7
## 70 OTHER 4
## 71 RAIN/SNOW 6
## 72 RECORD HEAT 2
## 73 RIP CURRENT 1045
## 74 RIVER FLOOD 1
## 75 RIVER FLOODING 2
## 76 ROGUE WAVE 2
## 77 ROUGH SEAS 13
## 78 ROUGH SURF 5
## 79 SMALL HAIL 10
## 80 SNOW 14
## 81 SNOW AND ICE 1
## 82 SNOW SQUALL 37
## 83 SNOW SQUALLS 1
## 84 STORM SURGE 39
## 85 STORM SURGE/TIDE 16
## 86 STRONG WIND 381
## 87 STRONG WINDS 28
## 88 THUNDERSTORM WIND 5562
## 89 TIDAL FLOODING 1
## 90 TORNADO 22178
## 91 TORRENTIAL RAINFALL 4
## 92 TROPICAL STORM 395
## 93 TSUNAMI 162
## 94 TYPHOON 5
## 95 UNSEASONABLY WARM 17
## 96 URBAN/SML STREAM FLD 107
## 97 WARM WEATHER 2
## 98 WATERSPOUT 4
## 99 WHIRLWIND 1
## 100 WILDFIRE 1543
## 101 WIND 102
## 102 WINDS 1
## 103 WINTER STORM 1483
## 104 WINTER WEATHER 376
## 105 WINTER WEATHER MIX 68
## 106 WINTER WEATHER/MIX 100
## 107 WINTRY MIX 78
healthSummarySorted <- arrange(healthSummary, desc(x))
selectedHealthSummary <- healthSummarySorted[1:10,]
The graph shows the top ten events that have the highest impact on health
par(mar=c(15,6,4,2))
mgp=c(2,5,1)
barplot(selectedHealthSummary$x, las=2,col="green", main="Top Ten Events that affect health",
xlab="", ylab="", names.arg=selectedHealthSummary$Category)
mtext("Event", side=1, line = 10)
mtext("Number of Injuries and Fatalities", side=2, line=4)
economicData$EVENT <- as.factor(economicData$EVENT)
economicSummary <- aggregate(economicData$economic, by=list(Category=economicData$EVENT), FUN=sum, na.rm=TRUE)
print(economicSummary)
## Category x
## 1 HIGH SURF ADVISORY 200000
## 2 FLASH FLOOD 50000
## 3 ASTRONOMICAL HIGH TIDE 9425000
## 4 ASTRONOMICAL LOW TIDE 320000
## 5 AVALANCHE 3711800
## 6 BEACH EROSION 100000
## 7 BLIZZARD 532718950
## 8 BLOWING DUST 20000
## 9 BLOWING SNOW 15000
## 10 COASTAL FLOODING/EROSION 15000000
## 11 COASTAL EROSION 766000
## 12 COASTAL FLOOD 251400560
## 13 COASTAL FLOODING 103809000
## 14 COASTAL FLOODING/EROSION 20030000
## 15 COASTAL STORM 50000
## 16 COLD 554000
## 17 DAM BREAK 1002000
## 18 DENSE FOG 22646500
## 19 DENSE SMOKE 100000
## 20 DOWNBURST 2000
## 21 DROUGHT 14413667000
## 22 DRY MICROBURST 1747600
## 23 DUST DEVIL 663630
## 24 DUST STORM 8574000
## 25 EARLY FROST 42000000
## 26 EROSION/CSTL FLOOD 16200000
## 27 EXCESSIVE HEAT 500125700
## 28 EXCESSIVE SNOW 1935000
## 29 EXTENDED COLD 100000
## 30 EXTREME COLD/WIND CHILL 1357776400
## 31 FLASH FLOOD 16557105610
## 32 FLASH FLOOD/FLOOD 5000
## 33 FLOOD 148919611950
## 34 FLOOD/FLASH/FLOOD 10000
## 35 FREEZING DRIZZLE 105000
## 36 FREEZING RAIN 626000
## 37 FROST 15000
## 38 FROST/FREEZE 1345441000
## 39 FUNNEL CLOUD 134100
## 40 GLAZE 150000
## 41 GRADIENT WIND 37000
## 42 GUSTY WIND 370000
## 43 GUSTY WIND/HAIL 20000
## 44 GUSTY WIND/HVY RAIN 2000
## 45 GUSTY WIND/RAIN 2000
## 46 GUSTY WINDS 1476000
## 47 HAIL 17071172870
## 48 HEAT 1696500
## 49 HEAVY RAIN 1313034240
## 50 HEAVY RAIN/HIGH SURF 15000000
## 51 HEAVY SNOW 705539640
## 52 HEAVY SNOW SHOWER 10000
## 53 HEAVY SURF 1390000
## 54 HEAVY SURF/HIGH SURF 9870000
## 55 HIGH SEAS 15000
## 56 HIGH SURF 83904500
## 57 HIGH SWELLS 5000
## 58 HIGH WIND 5881421660
## 59 HIGH WIND (G40) 18000
## 60 HIGH WINDS 500000
## 61 HURRICANE (TYPHOON) 87068996810
## 62 ICE JAM FLOOD (MINOR 1000
## 63 ICE ROADS 12000
## 64 ICE STORM 3657908810
## 65 ICY ROADS 331200
## 66 LAKE-EFFECT SNOW 40115000
## 67 LAKE EFFECT SNOW 67000
## 68 LAKESHORE FLOOD 7540000
## 69 LANDSLIDE 344595000
## 70 LANDSLIDES 5000
## 71 LANDSLUMP 570000
## 72 LANDSPOUT 7000
## 73 LATE SEASON SNOW 180000
## 74 LIGHT FREEZING RAIN 451000
## 75 LIGHT SNOW 2513000
## 76 LIGHT SNOWFALL 85000
## 77 LIGHTNING 749975520
## 78 MARINE ACCIDENT 50000
## 79 MARINE HAIL 4000
## 80 MARINE HIGH WIND 1297010
## 81 MARINE STRONG WIND 418330
## 82 MICROBURST 20000
## 83 MIXED PRECIPITATION 790000
## 84 MUD SLIDE 100100
## 85 MUDSLIDE 1225000
## 86 NON-SEVERE WIND DAMAGE 5000
## 87 OTHER 1089900
## 88 RAIN 550000
## 89 RIP CURRENT 163000
## 90 RIVER FLOOD 22157000
## 91 RIVER FLOODING 134175000
## 92 ROCK SLIDE 150000
## 93 ROUGH SURF 10000
## 94 SEICHE 980000
## 95 SMALL HAIL 20863000
## 96 SNOW 2554000
## 97 SNOW SQUALL 30000
## 98 SNOW SQUALLS 70000
## 99 STORM SURGE/TIDE 47835579000
## 100 STRONG WIND 239712950
## 101 STRONG WINDS 2234790
## 102 THUNDERSTORM WIND 8936445880
## 103 TIDAL FLOODING 13000
## 104 TORNADO 24900370720
## 105 TROPICAL DEPRESSION 1737000
## 106 TROPICAL STORM 8320186550
## 107 TSUNAMI 144082000
## 108 UNSEASONABLE COLD 5100000
## 109 UNSEASONABLY COLD 25042500
## 110 UNSEASONABLY WARM 10000
## 111 UNSEASONAL RAIN 10000000
## 112 URBAN/SML STREAM FLD 66797750
## 113 VOLCANIC ASH 500000
## 114 WATERSPOUT 5730200
## 115 WET MICROBURST 35000
## 116 WHIRLWIND 12000
## 117 WILDFIRE 8162704630
## 118 WIND 2589500
## 119 WIND AND WAVE 1000000
## 120 WIND DAMAGE 10000
## 121 WINTER STORM 1544687250
## 122 WINTER WEATHER 35866000
## 123 WINTER WEATHER MIX 60000
## 124 WINTER WEATHER/MIX 6372000
## 125 WINTRY MIX 12500
economicSummarySorted <- arrange(economicSummary, desc(x))
selectedEconomicSummary <- economicSummarySorted[1:10,]
selectedEconomicSummary$y <- selectedEconomicSummary$x/10000000000
The graph shows the top ten events that have an impact on the economy
par(mar=c(15,10,4,2))
mgp=c(2,5,1)
options(scipen=5)
barplot(selectedEconomicSummary$y, las=2,col="green", main="Top Ten Events that affect the economy",
xlab="", ylab="", names.arg=selectedEconomicSummary$Category)
mtext("Event", side=1, line = 10)
mtext("Economic Effect (in 10,000,000,000 dollars)", side=2, line=4)