In this report I aim to investigate the socio-economic effects of weather event in the United States between the years 1950 and 2011. I examined which types of events were most costly in terms of economic costs as well as human injuries and fatalities. I found that during this period, tornadoes were, by far, the most costly in terms of population health. Economically, floods, hurricanes/typhoods and storm surges were the most costly.
From the National Weather Service I obtained data about storms that occured between 1950 and 2011. The zipped file was read directly using the read.csv function. I examined if the data included missing values for either fatalities or injuries; it did not.
cache = TRUE
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
NOAA <- read.csv("repdata_data_StormData.csv.bz2", header = T,
nrows = 902297)
mean(is.na(NOAA$FATALITIES))
## [1] 0
mean(is.na(NOAA$INJURIES))
## [1] 0
To analyse the effects of various types of weather events on population health, I gathered the data for injuries and fatalities and grouped them by event type. I arbitrarily chose the top 20 event types for each category.
##Make datatable for each category, rank them and extract the top 20
##of each
Inj <- NOAA %>%
group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES)) %>%
arrange(desc(Injuries))
InjTop20 <- Inj[1:20,]
Fat <- NOAA %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES)) %>%
arrange(desc(Fatalities))
FatTop20 <- Fat[1:20,]
To answer the question, I plotted the top 20 weather events causing injuries or fatalities and also listed them below each plot with the numbers. In both cases, tornado was the top event and was dramatically higher than all other events. To make the plots meaningful, I scalled the x axis to not include the values for tornadoes.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
gi <- ggplot(InjTop20, aes(x = Injuries, y = reorder(EVTYPE, Injuries)))
pi <- gi + geom_point(size = 3) +
coord_cartesian(xlim = c(0, 7500)) +
labs(title = "Top 20 Weather Events Causing Injuries") +
labs(y = "Event Type")
print(pi)
InjTop20
## Source: local data frame [20 x 2]
##
## EVTYPE Injuries
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
## 11 WINTER STORM 1321
## 12 HURRICANE/TYPHOON 1275
## 13 HIGH WIND 1137
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
## 16 THUNDERSTORM WINDS 908
## 17 BLIZZARD 805
## 18 FOG 734
## 19 WILD/FOREST FIRE 545
## 20 DUST STORM 440
gf <- ggplot(FatTop20, aes(x = Fatalities, y = reorder(EVTYPE, Fatalities)))
pf <- gf + geom_point(size = 3) +
coord_cartesian(xlim = c(0, 2000)) +
labs(title = "Top 20 Weather Events Causing Fatalities") +
labs(y = "Event Type")
print(pf)
FatTop20
## Source: local data frame [20 x 2]
##
## EVTYPE Fatalities
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
These data show that tornadoes are the most harmful weather events with respect to population health.
To determine which types of events have the greatest economic consequence, I examined property damages and crop damages.
##Extract columns for event type and property and crop damages
damage <- NOAA[,c(8,25:28)]
Because the values of the damages can be reported in thousands, millions or billions of dollars, I selected only those events that were reported in the billions.
##Select events in which property or crop damage costs are reported
##in the billions of dollars
propDamBill <- damage[damage$PROPDMGEXP == "B",1:3]
cropDamBill <- damage[damage$CROPDMGEXP == "B",c(1, 4:5)]
##Merge the two lists together
totDamBill <- merge(propDamBill, cropDamBill, all = T)
To calculate the total cost for each event, I added the columns for property damange and crop damage together into a variable called cost. NA values were replaced with 0 before this operation was performed.
propNA <- is.na(totDamBill$PROPDMG)
for(i in 1:length(propNA)) {
if(propNA[i] == TRUE) {
totDamBill$PROPDMG[i] = 0
}
}
cropNA <- is.na(totDamBill$CROPDMG)
for(i in 1:length(cropNA)) {
if(cropNA[i] == TRUE) {
totDamBill$CROPDMG[i] = 0
}
}
##Add together costs for each category for each event
totDamBill$cost <- totDamBill$PROPDMG + totDamBill$CROPDMG
I then grouped the data by event type and added the costs for each.
##Group the event types and sum the costs for each event
econCost <- totDamBill %>%
group_by(EVTYPE) %>%
summarise(cost = sum(cost)) %>%
arrange(desc(cost))
The following plot and table show the data.
ge <- ggplot(econCost, aes(x = cost, y = reorder(EVTYPE, cost)))
pe <- ge + geom_point(size = 3) +
labs(title = "Most expensive types of weather events") +
labs(x = "Cost (billions of dollars)") +
labs(y = "Event Type")
print(pe)
as.data.frame(econCost)
## EVTYPE cost
## 1 FLOOD 122.50
## 2 HURRICANE/TYPHOON 83.62
## 3 STORM SURGE 42.56
## 4 RIVER FLOOD 10.00
## 5 HURRICANE 5.70
## 6 TORNADO 5.30
## 7 TROPICAL STORM 5.15
## 8 ICE STORM 5.00
## 9 WINTER STORM 5.00
## 10 STORM SURGE/TIDE 4.00
## 11 HURRICANE OPAL 3.10
## 12 HEAVY RAIN/SEVERE WEATHER 2.50
## 13 HAIL 1.80
## 14 TORNADOES, TSTM WIND, HAIL 1.60
## 15 DROUGHT 1.50
## 16 WILD/FOREST FIRE 1.50
## 17 HIGH WIND 1.30
## 18 SEVERE THUNDERSTORM 1.20
## 19 WILDFIRE 1.04
## 20 FLASH FLOOD 1.00
## 21 HEAT 0.40
## 22 FREEZE 0.20
## 23 HURRICANE OPAL/HIGH WINDS 0.10