Severe weather events, particularly for the United States, are significant threats to the physical safety of people and the economy. In this case, we want to study, from the storm database provided by NOAA, damage caused by storms and other severe weather events on people and private property.
The results of this study could be used to design policies to mitigate the effects of these events. In the database are included data from the early 50s through November 2011, being recent data most complete and reliable.
In the present study conducted for the United States, we want to answer two basic questions: 1. Which types of events are the most harmful to the physical integrity of the population. 2. Which types of events generate the greatest economic losses.
Fields that allow us to respond to the questions raised are:
| FIELD | Description |
|---|---|
| EVTYPE | Type of severe weather event |
| FATALITIES | Number of direct fatalities per event |
| INJURIES | Number of direct injuries per event |
| PROPDMG | Estimated value of property damage to three significant figures |
| PROPDMGEXP | Exponent for PROPDMG, where K=thousands, M=millions, B=billions |
| CROPDMG | Estimated value of crop damage to three significant figures |
| CROPDMGEXP | Exponent for CROPDMG, where K=thousands, M=millions, B=billions |
Data reading and basic processing
StormData <- read.csv("/data/mapologo/projects/RepData_PeerAssessment2/StormData.csv")
StormData$BGN_DATE <- as.Date(StormData$BGN_DATE, format="%m/%d/%Y")
StormData$EVTYPE <- as.factor(StormData$EVTYPE)
The fields available in the database are:
names(StormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
data = StormData[c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
summary(data$PROPDMGEXP)
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
summary(data$CROPDMGEXP)
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
“*EXP” fields were taken to be an exponent of 10
calc_dmg <-function(x, x_exp){
aux = numeric(length(x))
for (i in 1:length(x)){
if (x[i] == 0){
aux[i] <- 0
}else{
exp <- switch(as.character(x_exp[i]),
"h"=2, "H"=2, "k"=3, "K"=3, "m"=6, "M"=6, "b"=9, "B"=9,
"1"=1, "2"=2, "3"=3, "4"=4, "5"=5, "6"=6, "7"=7, "8"=8, "9"=9, 0)
aux[i] <- x[i] * 10^exp
}
}
return(aux)
}
data$PROPDMGAMOUNT <- calc_dmg(data$PROPDMG, data$PROPDMGEXP)
data$CROPDMGAMOUNT <- calc_dmg(data$CROPDMG, data$CROPDMGEXP)
Verifying EVTYPE field, whe encounter several coding problems
summary(data$EVTYPE)
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW HEAVY RAIN WINTER STORM
## 15708 11723 11433
## WINTER WEATHER FUNNEL CLOUD MARINE TSTM WIND
## 7026 6839 6175
## MARINE THUNDERSTORM WIND WATERSPOUT STRONG WIND
## 5812 3796 3566
## URBAN/SML STREAM FLD WILDFIRE BLIZZARD
## 3392 2761 2719
## DROUGHT ICE STORM EXCESSIVE HEAT
## 2488 2006 1678
## HIGH WINDS WILD/FOREST FIRE FROST/FREEZE
## 1533 1457 1342
## DENSE FOG WINTER WEATHER/MIX TSTM WIND/HAIL
## 1293 1104 1028
## EXTREME COLD/WIND CHILL HEAT HIGH SURF
## 1002 767 725
## TROPICAL STORM FLASH FLOODING EXTREME COLD
## 690 682 655
## COASTAL FLOOD LAKE-EFFECT SNOW FLOOD/FLASH FLOOD
## 650 636 624
## LANDSLIDE SNOW COLD/WIND CHILL
## 600 587 539
## FOG RIP CURRENT MARINE HAIL
## 538 470 442
## DUST STORM AVALANCHE WIND
## 427 386 340
## RIP CURRENTS STORM SURGE FREEZING RAIN
## 304 261 250
## URBAN FLOOD HEAVY SURF/HIGH SURF EXTREME WINDCHILL
## 249 228 204
## STRONG WINDS DRY MICROBURST ASTRONOMICAL LOW TIDE
## 196 186 174
## HURRICANE RIVER FLOOD LIGHT SNOW
## 174 173 154
## STORM SURGE/TIDE RECORD WARMTH COASTAL FLOODING
## 148 146 143
## DUST DEVIL MARINE HIGH WIND UNSEASONABLY WARM
## 141 135 126
## FLOODING ASTRONOMICAL HIGH TIDE MODERATE SNOWFALL
## 120 103 101
## URBAN FLOODING WINTRY MIX HURRICANE/TYPHOON
## 98 90 88
## FUNNEL CLOUDS HEAVY SURF RECORD HEAT
## 87 84 81
## FREEZE HEAT WAVE COLD
## 74 74 72
## RECORD COLD ICE THUNDERSTORM WINDS HAIL
## 64 61 61
## TROPICAL DEPRESSION SLEET UNSEASONABLY DRY
## 60 59 56
## FROST GUSTY WINDS THUNDERSTORM WINDSS
## 53 53 51
## MARINE STRONG WIND OTHER SMALL HAIL
## 48 48 47
## FUNNEL FREEZING FOG THUNDERSTORM
## 46 45 45
## Temperature record TSTM WIND (G45) Coastal Flooding
## 43 39 38
## WATERSPOUTS MONTHLY PRECIPITATION WINDS
## 37 36 36
## (Other)
## 2940
data$EVTYPE <- toupper(data$EVTYPE)
data$EVTYPE[data$EVTYPE == "AVALANCE"] <- "AVALANCHE"
data$EVTYPE[data$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
data$EVTYPE[data$EVTYPE == "FLASH FLOOD/FLOOD"] <- "FLASH FLOOD"
# and so on
Most harmful events
In terms of fatalities
evt_fat <- aggregate(data$FATALITIES, by=list(data$EVTYPE), FUN=sum)
names(evt_fat) <- c("EventType", "Fatalities")
head(evt_fat[order(evt_fat$Fatalities, decreasing=TRUE),], n=15)
## EventType Fatalities
## 752 TORNADO 5633
## 108 EXCESSIVE HEAT 1903
## 131 FLASH FLOOD 992
## 235 HEAT 937
## 406 LIGHTNING 816
## 682 THUNDERSTORM WIND 637
## 146 FLOOD 470
## 518 RIP CURRENT 368
## 313 HIGH WIND 248
## 10 AVALANCHE 225
## 885 WINTER STORM 206
## 519 RIP CURRENTS 204
## 239 HEAT WAVE 172
## 117 EXTREME COLD 162
## 266 HEAVY SNOW 127
Graphical
library(ggplot2)
top_fat <- head(evt_fat[order(evt_fat$Fatalities, decreasing=TRUE),])
qplot(data= top_fat, y=Fatalities, x=reorder(EventType,Fatalities), geom="bar", stat="identity", ylab="# Fatalities", main="5 Most harmful Events in terms of fatalities", fill=EventType)
In term of injuries
evt_inj <- aggregate(data$INJURIES, by=list(data$EVTYPE), FUN=sum)
names(evt_inj) <- c("EventType", "Injuries")
head(evt_inj[order(evt_inj$Injuries, decreasing=TRUE),], n=15)
## EventType Injuries
## 752 TORNADO 91346
## 682 THUNDERSTORM WIND 8445
## 146 FLOOD 6789
## 108 EXCESSIVE HEAT 6525
## 406 LIGHTNING 5230
## 235 HEAT 2100
## 381 ICE STORM 1975
## 131 FLASH FLOOD 1777
## 204 HAIL 1361
## 885 WINTER STORM 1321
## 365 HURRICANE/TYPHOON 1275
## 313 HIGH WIND 1137
## 266 HEAVY SNOW 1021
## 868 WILDFIRE 911
## 706 THUNDERSTORM WINDS 908
Graphical
top_inj <- head(evt_inj[order(evt_inj$Injuries, decreasing=TRUE),])
qplot(data= top_inj, y=Injuries, x=reorder(EventType,Injuries), geom="bar", stat="identity", ylab="# Injuries", main="5 Most harmful Events in terms of injuries", fill=EventType)
Most costly events
evt_cost <- aggregate(data$PROPDMGAMOUNT, by=list(data$EVTYPE), FUN=sum)
names(evt_cost) <- c("EventType", "Amount")
evt_cost$Amount <- evt_cost$Amount / 100000
head(evt_cost[order(evt_cost$Amount, decreasing=TRUE),],n=15)
## EventType Amount
## 146 FLOOD 1446577
## 365 HURRICANE/TYPHOON 693058
## 752 TORNADO 569474
## 593 STORM SURGE 433235
## 131 FLASH FLOOD 170951
## 204 HAIL 157353
## 356 HURRICANE 118683
## 682 THUNDERSTORM WIND 79681
## 766 TROPICAL STORM 77039
## 885 WINTER STORM 66885
## 313 HIGH WIND 52700
## 523 RIVER FLOOD 51189
## 868 WILDFIRE 47651
## 594 STORM SURGE/TIDE 46412
## 381 ICE STORM 39449
Graphical
top_cost <- head(evt_cost[order(evt_cost$Amount, decreasing=TRUE),])
qplot(data= top_cost, y=Amount, x=reorder(EventType,Amount), geom="bar", stat="identity", ylab="Amount in thousands of US$", main="5 Most costly Events", fill=EventType)
It is crucial to review and standardize the EVTYPE field, as it has major inconsistencies in coding and transcription.