In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health and economy. From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.
library(knitr)
opts_chunk$set(cache=T,echo=T,message=T,comment = NA)
library(R.utils) #use function bunzip2
library(ggplot2)
library(plyr)
require(gridExtra)
First, we download the data file and unzip it.
setwd("D:\\git\\RepData_PeerAssessment2")
if (!"stormData.csv.bz2" %in% dir("./data/")) {
print("load dat")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "data/stormData.csv.bz2")
bzfile("data/stormData.csv.bz2",encoding = "utf-8")
bunzip2("data/stormData.csv.bz2", overwrite=T, remove=F)
}
Then, we read the generated csv file. If the data already exists in the working environment, we do not need to load it again. Otherwise, we read the csv file.
if (!"stormData" %in% ls()) {
stormData <- read.csv("data/stormData.csv", sep = ",")
}
Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, : EOF within quoted string
dim(stormData)
[1] 425873 37
head(stormData, n = 2)
STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
1 TORNADO 0 0
2 TORNADO 0 0
COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
1 NA 0 14 100 3 0 0
2 NA 0 2 150 2 0 0
INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
1 15 25.0 K 0
2 0 2.5 K 0
LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
1 3040 8812 3051 8806 1
2 3042 8755 0 0 2
There are 37 rows and 37 columns in total. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
if (dim(stormData)[2] == 37) {
stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(stormData$year, breaks = 30)
Based on the above histogram, we see that the number of events tracked starts to significantly increase around 1995. So, we use the subset of the data from 1990 to 2011 to get most out of good records.
storm <- stormData[stormData$year >= 1995, ]
dim(storm)
[1] 205076 38
Now, there are 38 rows and 38 columns in total.
In this section, we check the number of fatalities and injuries that are caused by the severe weather events. We would like to get the first 15 most severe types of weather events.
sortHelper <- function(fieldName, top = 15, dataset = stormData) {
index <- which(colnames(dataset) == fieldName)
field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
names(field) <- c("EVTYPE", fieldName)
field <- arrange(field, field[, 2], decreasing = T)
field <- head(field, n = top)
field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
return(field)
}
fatalities <- sortHelper("FATALITIES", dataset = storm)
injuries <- sortHelper("INJURIES", dataset = storm)
We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).
convertHelper <- function(dataset = storm, fieldName, newFieldName) {
totalLen <- dim(dataset)[2]
index <- which(colnames(dataset) == fieldName)
dataset[, index] <- as.character(dataset[, index])
logic <- !is.na(toupper(dataset[, index]))
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
dataset[, index] <- as.numeric(dataset[, index])
dataset[is.na(dataset[, index]), index] <- 0
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
names(dataset)[totalLen + 1] <- newFieldName
return(dataset)
}
storm <- convertHelper(storm, "PROPDMGEXP", "propertyDamage")
Warning in convertHelper(storm, "PROPDMGEXP", "propertyDamage"): 强制改变过程中产生了
NA
storm <- convertHelper(storm, "CROPDMGEXP", "cropDamage")
Warning in convertHelper(storm, "CROPDMGEXP", "cropDamage"): 强制改变过程中产生了NA
names(storm)
[1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE"
[5] "COUNTY" "COUNTYNAME" "STATE" "EVTYPE"
[9] "BGN_RANGE" "BGN_AZI" "BGN_LOCATI" "END_DATE"
[13] "END_TIME" "COUNTY_END" "COUNTYENDN" "END_RANGE"
[17] "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
[21] "F" "MAG" "FATALITIES" "INJURIES"
[25] "PROPDMG" "PROPDMGEXP" "CROPDMG" "CROPDMGEXP"
[29] "WFO" "STATEOFFIC" "ZONENAMES" "LATITUDE"
[33] "LONGITUDE" "LATITUDE_E" "LONGITUDE_" "REMARKS"
[37] "REFNUM" "year" "propertyDamage" "cropDamage"
options(scipen=999)
property <- sortHelper("propertyDamage", dataset = storm)
crop <- sortHelper("cropDamage", dataset = storm)
As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.
fatalities
EVTYPE FATALITIES
1 EXCESSIVE HEAT 1088
2 HEAT 694
3 TORNADO 411
4 FLASH FLOOD 369
5 LIGHTNING 337
6 FLOOD 167
7 HEAT WAVE 161
8 TSTM WIND 154
9 RIP CURRENTS 143
10 HIGH WIND 138
11 WINTER STORM 120
12 EXTREME COLD 111
13 HEAVY SNOW 92
14 EXTREME HEAT 91
15 AVALANCHE 69
injuries
EVTYPE INJURIES
1 TORNADO 7712
2 FLOOD 6460
3 EXCESSIVE HEAT 3309
4 TSTM WIND 2278
5 LIGHTNING 2168
6 WINTER STORM 1035
7 FLASH FLOOD 962
8 HEAT 808
9 HIGH WIND 569
10 FOG 529
11 HEAVY SNOW 501
12 THUNDERSTORM WINDS 444
13 HAIL 436
14 WILD/FOREST FIRE 395
15 BLIZZARD 365
And the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.
fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Injuries") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot
Based on the above histograms, we find that excessive heat and tornado cause most fatalities; tornato causes most injuries in the United States from 1995 to 2011.
As for the impact on economy, we have got two sorted lists below by the amount of money cost by damages.
property
EVTYPE propertyDamage
1 FLOOD 10109702527
2 HURRICANE 8775364000
3 TORNADO 6028722585
4 FLASH FLOOD 4889132861
5 HAIL 3603203473
6 HURRICANE OPAL 3172846000
7 WILD/FOREST FIRE 2795268500
8 TSTM WIND 2631852030
9 HEAVY RAIN/SEVERE WEATHER 2500000000
10 ICE STORM 1673611010
11 SEVERE THUNDERSTORM 1200310000
12 THUNDERSTORM WINDS 924962745
13 TYPHOON 600230000
14 TROPICAL STORM 488405000
15 BLIZZARD 413910950
crop
EVTYPE cropDamage
1 DROUGHT 7903431000
2 HURRICANE 2292450000
3 FLOOD 1813403000
4 EXTREME COLD 1222063000
5 HAIL 972614370
6 FLASH FLOOD 508313500
7 HEAT 401235000
8 FREEZE 396225000
9 TSTM WIND 347955000
10 HEAVY RAIN 325854800
11 TROPICAL STORM 265575000
12 DAMAGING FREEZE 262100000
13 EXCESSIVE WETNESS 142000000
14 HIGH WIND 138819300
15 HURRICANE ERIN 136010000
And the following is a pair of graphs of total property damage and total crop damage affected by these severe weather events.
propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property Damage in US dollars")+
xlab("Severe Weather Type") + ggtitle("Total Property Damage by\n Severe Weather Events in\n the U.S. from 1995 - 2011")
cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") +
xlab("Severe Weather Type") + ggtitle("Total Crop Damage by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
cropPlot
Based on the above histograms, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States from 1995 to 2011.
From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.