The purpose of this study is to determine the effect that weather events have on public health and economic activity. Toward that end the National Weather Services storm data was analyzed.
The National Weather Service storm data was originaly downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 on May 25, 2014. It consists of a csv file that has been compressed with the bz2 format. The following code decompresses and reads the data into memory.
data = read.csv(bzfile("repdata-data-StormData.csv.bz2"))
The following R code is intended to find the top 10 weather events that affect public health. Injuries and fatalities are summed The number of injuries and fatalities are summed up to form the total Effect score assigned to the event type
healthData = aggregate(INJURIES ~ EVTYPE, data = data, sum)
healthData$deaths = aggregate(FATALITIES ~ EVTYPE, data = data, sum)$FATALITIES
healthData$totalEffect = healthData$INJURIES + healthData$deaths
top10HealthThreats <- healthData[with(healthData, order(totalEffect, decreasing = TRUE)),
][1:10, ]
The following R code finds the top 10 weather events that affect economic damage.
First we start out by inflating all the damage values as indicated by the
inflateDamageValues <- function(data, expColname, colname) {
if (sum(data[[expColname]] == "") > 0) {
data[data[[expColname]] == "", ][[expColname]] = 1
}
if (sum(data[[expColname]] == "-") > 0) {
data[data[[expColname]] == "-", ][[expColname]] = 1
}
if (sum(data[[expColname]] == "?") > 0) {
data[data[[expColname]] == "?", ][[expColname]] = 1
}
if (sum(data[[expColname]] == "+") > 0) {
data[data[[expColname]] == "+", ][[expColname]] = 1
}
if (sum(data[[expColname]] == "B") > 0) {
data[data[[expColname]] == "B", ][[expColname]] = 9
}
if (sum(data[[expColname]] == "h" || data[[expColname]] == "H") > 0) {
data[data[[expColname]] == "h" || data[[expColname]] == "H", ][[expColname]] = 2
}
if (sum(data[[expColname]] == "K") > 0) {
data[data[[expColname]] == "K", ][[expColname]] = 3
}
if (sum(data[[expColname]] == "m" || data[[expColname]] == "M") > 0) {
data[data[[expColname]] == "m" || data[[expColname]] == "M", ][[expColname]] = 6
}
data[[colname]] = data[[colname]]^as.numeric(data[[expColname]])
}
# inflateDamageValues(data, 'PROPDMGEXP', 'PROPDMG')
# inflateDamageValues(data, 'CROPDMGEXP', 'CROPDMG')
The next section of code demonstrates what was done to compute the total effect different weather events had.
economicData = aggregate(CROPDMG ~ EVTYPE, data = data, sum)
economicData$PROPDMG = aggregate(PROPDMG ~ EVTYPE, data = data, sum)$PROPDMG
economicData$totalEffect = economicData$PROPDMG + economicData$CROPDMG
top10EconomicThreats <- economicData[with(economicData, order(totalEffect, decreasing = TRUE)),
][1:10, ]
top10HealthThreats
## EVTYPE INJURIES deaths totalEffect
## 834 TORNADO 91346 5633 96979
## 130 EXCESSIVE HEAT 6525 1903 8428
## 856 TSTM WIND 6957 504 7461
## 170 FLOOD 6789 470 7259
## 464 LIGHTNING 5230 816 6046
## 275 HEAT 2100 937 3037
## 153 FLASH FLOOD 1777 978 2755
## 427 ICE STORM 1975 89 2064
## 760 THUNDERSTORM WIND 1488 133 1621
## 972 WINTER STORM 1321 206 1527
EVTYPE - Event Type (type of severe weather)
INJURIES - Total number of injuries
deaths - Total number of deaths and injuries
top10EconomicThreats
## EVTYPE CROPDMG PROPDMG totalEffect
## 834 TORNADO 100019 3212258 3312277
## 153 FLASH FLOOD 179200 1420125 1599325
## 856 TSTM WIND 109203 1335966 1445168
## 244 HAIL 579596 688693 1268290
## 170 FLOOD 168038 899938 1067976
## 760 THUNDERSTORM WIND 66791 876844 943636
## 464 LIGHTNING 3581 603352 606932
## 786 THUNDERSTORM WINDS 18685 446293 464978
## 359 HIGH WIND 17283 324732 342015
## 972 WINTER STORM 1979 132721 134700
EVTYPE - Event Type (type of severe weather)
CROPDMG - Total amount of Crop Damage in US dollars for the weather event type
PROPDMG - Total amount of Property Damage in US dollars for the weather event type
library(ggplot2)
qplot(seq(1:10), top10HealthThreats$totalEffect, col = top10HealthThreats$EVTYPE,
ylab = "Deaths + Injuries", xlab = "", main = "Public Health impact of Weather Events (Top 10)")
qplot(seq(1:10), top10EconomicThreats$totalEffect, col = top10EconomicThreats$EVTYPE,
ylab = "Property + Crop Damage", xlab = "", main = "Econmic impact of Weather Events (Top 10)")
The data seems to indicate that tornadoes pose the largest health and economic threat nationwide by far for the years covered by the data. The impacts of tornadoes is especially startly when you look at how many more injuries and deaths where caused by them compared to the next highest weather type which is Excessive Heat (3703 more deaths due to Tornadoes).
Unfortunatly I was not able to get scaling of monetary damage to work in time to submit with the report so the actual scaling may be off. It could be that if these values where scaled that tornadoes would not be the leading reson for economic loss due to weather (or the difference would not be as severe).
Another assumption being made is that the injuries and deaths are equally weighted. In a more detailed analysis its probable that deaths and injuries would be handled seperatly, or weighted in some way so that the effect of injuries is less than the effect of fatalities.