In this report we determine both the weather events most harmful to population health in terms of fatalities and injuries, and the weather events that have the greatest economic consequence in terms of property damage and crop damage, in the last 5 years ending 30 November 2011. Data was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We found that tornadoes are the most harmful to population health in terms of both fatalities and injuries, followed by flash floods and rip currents in terms of fatalities, and thunderstorm wind and lightning in terms of injuries. As for economic consequences, in decreasing order tornadoes, floods and hail cause the greatest amounts of property damage, while floods, frost/freeze events and hail cause the greatest amounts of crop damage.
Needed libraries are loaded.
library(dplyr)
library(lattice)
Data for all available years is loaded.
noaa <- read.csv("repdata-data-StormData.csv.bz2", header=TRUE, na.strings="")
Note that there are interspersed comments in the data file but comparison of the loaded dataset’s tail with the tail of the csv file via a text editor suggests that the comments are handled correctly. The tail also shows that dates are in M/D/Y format, and that there are 902297 rows.
tail(noaa[, c("BGN_DATE", "COUNTYNAME", "EVTYPE")])
## BGN_DATE COUNTYNAME
## 902292 11/28/2011 0:00:00 TNZ001>004 - 019>021 - 048>055 - 088
## 902293 11/30/2011 0:00:00 WYZ007 - 017
## 902294 11/10/2011 0:00:00 MTZ009 - 010
## 902295 11/8/2011 0:00:00 AKZ213
## 902296 11/9/2011 0:00:00 AKZ202
## 902297 11/28/2011 0:00:00 ALZ006
## EVTYPE
## 902292 WINTER WEATHER
## 902293 HIGH WIND
## 902294 HIGH WIND
## 902295 HIGH WIND
## 902296 BLIZZARD
## 902297 HEAVY SNOW
Since this report is only about:
only the following columns are relevant.
relevant.col <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
The BGN_TIME column is not included since this report summarizes by year and therefore does not go to the level of granularity represented by BGN_TIME. As for events that cross year boundaries, we assume that Rule 2.3.2 is followed, in which events that span months will have an entry for each month [1, pg. 7].
For the relevant columns, the inferred data types are as follows:
str(noaa[, relevant.col])
## 'data.frame': 902297 obs. of 8 variables:
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...
It would be more proper if the BGN_DATE column is of type Date.
noaa$BGN_DATE <- as.Date(noaa$BGN_DATE, "%m/%d/%Y %H:%M:%S")
str(noaa[, "BGN_DATE"])
## Date[1:902297], format: "1950-04-18" "1950-04-18" "1951-02-20" "1951-06-08" ...
A summary of the relevant columns shows that there are no negative values, and that (only) the multiplier exponents for property damage and crop damage (PROPDMGEXP and CROPDMGEXP respectively) have missing values. Also, the last begin date on which there is a recorded weather event is 2011-11-30.
summary(noaa[, relevant.col])
## BGN_DATE EVTYPE FATALITIES
## Min. :1950-01-03 HAIL :288661 Min. : 0.0000
## 1st Qu.:1995-04-20 TSTM WIND :219940 1st Qu.: 0.0000
## Median :2002-03-18 THUNDERSTORM WIND: 82563 Median : 0.0000
## Mean :1998-12-27 TORNADO : 60652 Mean : 0.0168
## 3rd Qu.:2007-07-28 FLASH FLOOD : 54277 3rd Qu.: 0.0000
## Max. :2011-11-30 FLOOD : 25326 Max. :583.0000
## (Other) :170878
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0000 Min. : 0.00 K :424665 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 M : 11330 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 0 : 216 Median : 0.000
## Mean : 0.1557 Mean : 12.06 B : 40 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.50 5 : 28 3rd Qu.: 0.000
## Max. :1700.0000 Max. :5000.00 (Other): 84 Max. :990.000
## NA's :465934
## CROPDMGEXP
## K :281832
## M : 1994
## k : 21
## 0 : 19
## B : 9
## (Other): 9
## NA's :618413
To simplify the calculations downstream, the damage multiplier exponents will be converted from character to numeric. The set of characters for PROPDMGEXP and CROPDMGEXP are as follows:
levels(noaa$PROPDMGEXP)
## [1] "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m"
## [18] "M"
levels(noaa$CROPDMGEXP)
## [1] "?" "0" "2" "B" "k" "K" "m" "M"
It is decided, pending further information, that:
The above rules are captured in the following function:
mult.exp.char2num <- function(mult.exp) {
mult.exp <- as.character(mult.exp)
if (!is.na(as.integer(mult.exp))) {
return (as.integer(mult.exp))
} else {
return (switch(mult.exp,
B=, b=9,
M=, m=6,
K=, k=3,
H=, h=2,
0)) # Default multiplier exponent.
}
}
Using the above function, the damage multiplier exponents are converted. The summary of the associated columns show that there are no missing values.
noaa$PROPDMGEXP <- sapply(noaa$PROPDMGEXP, mult.exp.char2num)
noaa$CROPDMGEXP <- sapply(noaa$CROPDMGEXP, mult.exp.char2num)
summary(noaa[, c("PROPDMGEXP", "CROPDMGEXP")])
## PROPDMGEXP CROPDMGEXP
## Min. :0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.000 Median :0.0000
## Mean :1.488 Mean :0.9505
## 3rd Qu.:3.000 3rd Qu.:3.0000
## Max. :9.000 Max. :9.0000
Firstly, the number of recorded events per year is plotted.
noaa.year <- noaa %>% mutate(year = format(BGN_DATE, "%Y")) %>%
group_by(year) %>% summarize(count = length(BGN_DATE))
plot(x=noaa.year$year, y=noaa.year$count, type="b",
main="Number of weather events recorded per year", xlab="Year",
ylab="Event count")
Figure caption: The plot shows a sharp increase in the number of recorded events around 1993-1995. We are only interested in the last 5 years (2006-2011) however with reasons given below, and there appears to be a considerable number of recorded events for that period.
Focus is given only on the last 5 years because the climate is not static and there is variance between years. Since the last begin date on which there is a recorded event is 2011-11-30, the 5 years will begin on the day directly after 2006-11-30.
noaa.last5 <- noaa %>% filter(BGN_DATE > as.Date("2006-11-30"))
with(noaa.last5, c(`Min.` = min(BGN_DATE), `Max.` = max(BGN_DATE)))
## Min. Max.
## "2006-12-01" "2011-11-30"
To determine the weather events most harmful to human health over the last 5 years, we obtain the total number of fatalities and injuries for each weather event and sort by descending order.
(noaa.last5.fatal <- noaa.last5 %>% group_by(EVTYPE) %>%
summarize(total = sum(FATALITIES)) %>%
arrange(desc(total)))
## Source: local data frame [48 x 2]
##
## EVTYPE total
## 1 TORNADO 865
## 2 FLASH FLOOD 296
## 3 RIP CURRENT 208
## 4 HEAT 182
## 5 FLOOD 161
## 6 LIGHTNING 159
## 7 THUNDERSTORM WIND 130
## 8 EXCESSIVE HEAT 119
## 9 COLD/WIND CHILL 94
## 10 AVALANCHE 83
## .. ... ...
(noaa.last5.injur <- noaa.last5 %>% group_by(EVTYPE) %>%
summarize(total = sum(INJURIES)) %>%
arrange(desc(total)))
## Source: local data frame [48 x 2]
##
## EVTYPE total
## 1 TORNADO 9666
## 2 THUNDERSTORM WIND 1393
## 3 LIGHTNING 923
## 4 EXCESSIVE HEAT 880
## 5 HEAT 702
## 6 WILDFIRE 425
## 7 WINTER WEATHER 324
## 8 FLASH FLOOD 316
## 9 HAIL 180
## 10 FLOOD 171
## .. ... ...
As shown in the output above, the weather event in the last 5 years that is most harmful to human health in terms of both fatalities and injuries is tornadoes. In terms of fatalities, the next most harmful events are flash floods and rip currents, while in terms of injuries, the next most harmful events are thunderstorm wind and lightning.
To determine the weather events that have the greatest economic consequences over the last 5 years, we obtain the total amount of property damage and crop damage for each weather event and sort by descending order.
(noaa.last5.propdmg <- noaa.last5 %>% group_by(EVTYPE) %>%
summarize(total = sum(PROPDMG*(10^PROPDMGEXP))) %>%
arrange(desc(total)))
## Source: local data frame [48 x 2]
##
## EVTYPE total
## 1 TORNADO 14699413740
## 2 FLOOD 13969813800
## 3 HAIL 6099123600
## 4 FLASH FLOOD 5041609130
## 5 STORM SURGE/TIDE 4640643000
## 6 THUNDERSTORM WIND 3375990190
## 7 HURRICANE 2467600000
## 8 WILDFIRE 2200413470
## 9 HIGH WIND 1221896040
## 10 WINTER STORM 974440000
## .. ... ...
(noaa.last5.cropdmg <- noaa.last5 %>% group_by(EVTYPE) %>%
summarize(total = sum(CROPDMG*(10^CROPDMGEXP))) %>%
arrange(desc(total)))
## Source: local data frame [48 x 2]
##
## EVTYPE total
## 1 FLOOD 2886110000
## 2 FROST/FREEZE 931801000
## 3 HAIL 868793000
## 4 FLASH FLOOD 711942000
## 5 DROUGHT 426441000
## 6 THUNDERSTORM WIND 398102000
## 7 TROPICAL STORM 180921000
## 8 HURRICANE 180510000
## 9 HIGH WIND 106571000
## 10 TORNADO 103210000
## .. ... ...
As shown in the output above, the weather event that had the greatest economic consequence in terms of property damage is tornadoes, followed by floods and hail. In terms of crop damage, the weather event that had the greatest economic consequence is floods, followed by frost/freeze events and hail.
unique(noaa.last5$EVTYPE)
## [1] DENSE FOG HIGH WIND
## [3] HEAVY SNOW BLIZZARD
## [5] THUNDERSTORM WIND HEAVY RAIN
## [7] FLOOD FROST/FREEZE
## [9] WILDFIRE STRONG WIND
## [11] HEAT DUST STORM
## [13] FUNNEL CLOUD WINTER STORM
## [15] HIGH SURF WINTER WEATHER
## [17] HAIL LIGHTNING
## [19] DROUGHT AVALANCHE
## [21] EXTREME COLD/WIND CHILL FLASH FLOOD
## [23] TORNADO RIP CURRENT
## [25] LAKE-EFFECT SNOW ICE STORM
## [27] SLEET SEICHE
## [29] COLD/WIND CHILL WATERSPOUT
## [31] MARINE THUNDERSTORM WIND MARINE HAIL
## [33] MARINE STRONG WIND DUST DEVIL
## [35] ASTRONOMICAL LOW TIDE LANDSLIDE
## [37] COASTAL FLOOD STORM SURGE/TIDE
## [39] EXCESSIVE HEAT DENSE SMOKE
## [41] TROPICAL STORM TROPICAL DEPRESSION
## [43] HURRICANE FREEZING FOG
## [45] LAKESHORE FLOOD MARINE HIGH WIND
## [47] VOLCANIC ASHFALL TSUNAMI
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
[1] National Oceanic & Atmospheric Administration (2007), “National Weather Service Instruction (NWSI) 10-1605, dated Aug. 17, 2007”.