This report is aimed to analyze a the U.S. National Oceanic and Atmospheric Administration(NOAA)’s storm database, which records data related to storms and other severe weather events between 1950 and 2011. These events influence human health on fatality or injury and their damage to properties and crops can lead to a magnificent economic loss. It is critical to understand which weather event is the most harmful for these concerns. In this study the subset of the original dataset is applied for analysis for the years in the 21st century since people tend to be most interested in the events happening recent years.
From NOAA the dataset of Storm Data is obtained for analysis.
1. Read the data into Rstudio
setInternet2(TRUE)
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
storm <- read.csv(bzfile(temp, "repdata_data_StormData.csv"))
unlink(temp)
2. Subset the data in the 21th century
Since the climate changes every year, the situation right now is probably not similar with that in the last century. Besides, the data in the previous ages are often not as complete as recent ones. Thus, we focus our study in the data collected in the 21th century, that is, from 2000-01-01 till the most recent data available.
storm$BeginDate <- as.Date(storm$BGN_DATE,format="%m/%d/%Y")
storm2k <- subset(storm, storm$BeginDate>="2000-01-01")
3. Tidy the varibale EVTYPE
Since the EVTYPE variable in the original dataset is not complety raw, where there are 196 types of events, most of which are just different representations of the same natural events. The Storm Data Event Table NATIONAL WEATHER SERVICE INSTRUCTION provided by NOAA only involves 48 events. I classified them one by one according to the descriptions in that file.
storm2k$EVTYPE <- tolower(storm2k$EVTYPE)
original <- sort(unique(storm2k$EVTYPE))
rp.0 <- c("high surf", "flash flood", "thunderstorm wind", "waterspout","drought",
"flood", "heavy snow", rep("astronomical tide", 2), "avalanche")
rp.1 <- c("high surf", "ice storm", "blizzard", "dust storm", "wildfire",
"coastal flood", "coastal flood", rep("cold/wind chill",3) )
rp.2 <- c("cold/wind chill", "coastal flood", "debirs flow", "dense fog", "dense smoke",
"drought", "drowning", rep("drought",3) )
rp.3 <- c("drought", rep("dust devil",2), "dust storm", "heavy snow",
rep("excessive heat",2), rep("extreme cold/wind chill",3) )
rp.4 <- c("extreme cold/wind chill", "flood", rep("heavy snow",2), "flash flood",
"flood", rep("frost/freeze", 3), "freezing fog" )
rp.5 <- c(rep("frost/freeze",4), rep("funnel cloud",2),
"winter weather", "strong wind", "strong wind", "thunderstorm wind")
rp.6 <- c("thunderstorm wind", "strong wind", "strong wind", "hail", "frost/freeze",
"high surf", "heat", "heavy rain", "heavy rain", "heavy snow")
rp.7 <- c(rep("high surf", 7), "high wind", rep("hurricane/typhoon",2) )
rp.8 <- c(rep("ice storm",4), rep("lake-effect snow",2),
rep("lakeshore flood",2), "avalanche","heavy snow" )
rp.9 <- c("frost/freeze", "heavy snow", "lightning", "heavy rain", "marine hail",
"marine high wind", "marine strong wind", rep("marine thunderstorm wind",2), "mixed precipitation")
rp.10 <- c("heavy snow", "mixed precipitation", rep("debirs flow",2), "strong wind",
"strong wind", "hail", "strong wind", "northern lights", "other" )
rp.11 <- c("ice storm", "cold/wind chill", "heat", "heavy rain", "extreme cold/wind chill",
"excessive heat", "drought", "heavy rain", rep("heavy snow",2) )
rp.12 <- c("excessive heat", "wildfire", rep("rip current",4),
"seiche", "thunderstorm wind", "sleet", "sleet" )
rp.13 <- c("hail", "dense smoke", rep("heavy snow",8) )
rp.14 <- c("sleet", "heavy snow", rep("storm surge/tide",2),
rep("strong wind",2), rep("thunderstorm wind",4) )
rp.15 <- c("storm surge/tide", "tornado", "tornado", "tropical depression",
"tropical storm", rep("thunderstorm wind",5) )
rp.16 <- c("tsunami", rep("cold/wind chill",3), "drought",
rep("heat",4), "flood" )
rp.17 <- c(rep("cold/wind chill",2), "heavy snow", "heat", "flood",
"drought", "excessive heat", rep("volcanic ash",2), "thunderstorm wind" )
rp.18 <- c("heat", "waterspout", "tornado", rep("wildfire",2),
rep("strong wind",2), "cold/wind chill", rep("strong wind",2) )
rp.19 <- c("winter storm", rep("winter weather",4), "strong wind" )
rp <- c(rp.0,rp.1,rp.2,rp.3,rp.4,rp.5,rp.6,rp.7,rp.8,rp.9,rp.10,rp.11,rp.12,
rp.13,rp.14,rp.15,rp.16,rp.17,rp.18,rp.19)
storm2k$newEVTYPE <- storm2k$EVTYPE
n <- length(original)
for (i in 1:n) {
storm2k$newEVTYPE[ storm2k$newEVTYPE==original[i] ] <- rp[i]
}
sort( unique(storm2k$newEVTYPE) )
Now there are 52 categories of the event, of which 48 are exactly the ones descirbed in the reference document. 4 categories cannot be correctly classified: for “drowning”, it is unknown what causes those unfortunes; for “mixed precipitation”, it is also unclear what is the specific reason for the deaths and injuries; while “northern lights” and “others” are completely new categories. Thus, their categories are kept as it is. Luckily, the sample those categories represent are trivial compared to the whole dataset.
which(storm2k$newEVTYPE=="drowning")
## [1] 94931
which(storm2k$newEVTYPE=="mixed precipitation")
## [1] 12093 12115 12972 19459 20022 23510 23534 31426 32105 32573
## [11] 32578 32804 33101 33126 33129 46769 47259 48148 48156 48168
## [21] 48182 55686 55693 55702 55713 56273 56282 56297 56317 56757
## [31] 58300 58789 66719 66720 66728 66736 66746 66825 66827 66967
## [41] 67510 67853 68067 68070 68079 80838 80870 81557 81626 89843
## [51] 92106 92123 92860 93016 101696 101711 102901 102909 103130 103159
which(storm2k$newEVTYPE=="northern lights")
## [1] 68088
which(storm2k$newEVTYPE=="other")
## [1] 2773 19409 19978 56225
Below I just show all the classifications that I made. Since I am not an expert in climate, there are probably some misclassifications, which could be easily found with the following table.
df <- cbind(original,rp)
require(knitr)
print(kable(df,col.names = c("Original Event Type", "Classified Event Type")))
##
##
## Original Event Type Classified Event Type
## ------------------------------- -------------------------
## high surf advisory high surf
## flash flood flash flood
## tstm wind thunderstorm wind
## waterspout waterspout
## abnormally dry drought
## abnormally wet flood
## accumulated snowfall heavy snow
## astronomical high tide astronomical tide
## astronomical low tide astronomical tide
## avalanche avalanche
## beach erosion high surf
## black ice ice storm
## blizzard blizzard
## blowing dust dust storm
## brush fire wildfire
## coastal flood coastal flood
## coastal flooding coastal flood
## cold cold/wind chill
## cold weather cold/wind chill
## cold wind chill temperatures cold/wind chill
## cold/wind chill cold/wind chill
## cstl flooding/erosion coastal flood
## dam break debirs flow
## dense fog dense fog
## dense smoke dense smoke
## drought drought
## drowning drowning
## dry drought
## dry conditions drought
## dry microburst drought
## dry spell drought
## dust devel dust devil
## dust devil dust devil
## dust storm dust storm
## early snowfall heavy snow
## excessive heat excessive heat
## excessive heat/drought excessive heat
## extreme cold extreme cold/wind chill
## extreme cold/wind chill extreme cold/wind chill
## extreme windchill extreme cold/wind chill
## extreme windchill temperatures extreme cold/wind chill
## extremely wet flood
## falling snow/ice heavy snow
## first snow heavy snow
## flash flood flash flood
## flood flood
## fog frost/freeze
## freeze frost/freeze
## freezing drizzle frost/freeze
## freezing fog freezing fog
## freezing rain frost/freeze
## freezing rain/sleet frost/freeze
## frost frost/freeze
## frost/freeze frost/freeze
## funnel cloud funnel cloud
## funnel clouds funnel cloud
## glaze winter weather
## gradient wind strong wind
## gusty lake wind strong wind
## gusty thunderstorm wind thunderstorm wind
## gusty thunderstorm winds thunderstorm wind
## gusty wind strong wind
## gusty winds strong wind
## hail hail
## hard freeze frost/freeze
## hazardous surf high surf
## heat heat
## heavy rain heavy rain
## heavy rain effects heavy rain
## heavy snow heavy snow
## heavy surf high surf
## heavy surf/high surf high surf
## high seas high surf
## high surf high surf
## high surf advisories high surf
## high surf advisory high surf
## high water high surf
## high wind high wind
## hurricane hurricane/typhoon
## hurricane/typhoon hurricane/typhoon
## ice on road ice storm
## ice storm ice storm
## ice/snow ice storm
## icy roads ice storm
## lake-effect snow lake-effect snow
## lake effect snow lake-effect snow
## lakeshore flood lakeshore flood
## landslide lakeshore flood
## landslump avalanche
## late season snow heavy snow
## light freezing rain frost/freeze
## light snow heavy snow
## lightning lightning
## locally heavy rain heavy rain
## marine hail marine hail
## marine high wind marine high wind
## marine strong wind marine strong wind
## marine thunderstorm wind marine thunderstorm wind
## marine tstm wind marine thunderstorm wind
## mixed precipitation mixed precipitation
## moderate snowfall heavy snow
## monthly precipitation mixed precipitation
## mud slide debirs flow
## mudslide debirs flow
## non-severe wind damage strong wind
## non-tstm wind strong wind
## non severe hail hail
## non tstm wind strong wind
## northern lights northern lights
## other other
## patchy ice ice storm
## prolong cold cold/wind chill
## prolong warmth heat
## rain heavy rain
## record cold extreme cold/wind chill
## record heat excessive heat
## record low rainfall drought
## record rainfall heavy rain
## record snow heavy snow
## record snowfall heavy snow
## record warmth excessive heat
## red flag criteria wildfire
## rip current rip current
## rip currents rip current
## rogue wave rip current
## rough seas rip current
## seiche seiche
## severe thunderstorms thunderstorm wind
## sleet sleet
## sleet storm sleet
## small hail hail
## smoke dense smoke
## snow heavy snow
## snow advisory heavy snow
## snow and ice heavy snow
## snow drought heavy snow
## snow showers heavy snow
## snow squalls heavy snow
## snow/blowing snow heavy snow
## snow/freezing rain heavy snow
## snow/sleet sleet
## snowmelt flooding heavy snow
## storm surge storm surge/tide
## storm surge/tide storm surge/tide
## strong wind strong wind
## strong winds strong wind
## thunderstorm thunderstorm wind
## thunderstorm wind thunderstorm wind
## thunderstorm wind (g40) thunderstorm wind
## thunderstorms thunderstorm wind
## tidal flooding storm surge/tide
## tornado tornado
## tornado debris tornado
## tropical depression tropical depression
## tropical storm tropical storm
## tstm wind thunderstorm wind
## tstm wind (g40) thunderstorm wind
## tstm wind (g45) thunderstorm wind
## tstm wind g45 thunderstorm wind
## tstm wind/hail thunderstorm wind
## tsunami tsunami
## unseasonably cold cold/wind chill
## unseasonably cool cold/wind chill
## unseasonably cool & wet cold/wind chill
## unseasonably dry drought
## unseasonably hot heat
## unseasonably warm heat
## unseasonably warm & wet heat
## unseasonably warm/wet heat
## unseasonably wet flood
## unseasonal low temp cold/wind chill
## unusually cold cold/wind chill
## unusually late snow heavy snow
## unusually warm heat
## urban/sml stream fld flood
## very dry drought
## very warm excessive heat
## volcanic ash volcanic ash
## volcanic ashfall volcanic ash
## wall cloud thunderstorm wind
## warm weather heat
## waterspout waterspout
## whirlwind tornado
## wild/forest fire wildfire
## wildfire wildfire
## wind strong wind
## wind advisory strong wind
## wind chill cold/wind chill
## wind damage strong wind
## wind gusts strong wind
## winter storm winter storm
## winter weather winter weather
## winter weather mix winter weather
## winter weather/mix winter weather
## wintry mix winter weather
## wnd strong wind
4. Tidy the Variable PROPDMG and CROPDMGEXP
Variables related to economic loss are PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP. The intepretation of them are as follows:
1. PROPDMG: The number of property damage.
2. PROPDMGEXP: The monetary unit for property damage.
3. CROPDMG: The number of crop damage.
4. CROPDMGEXP: The monetary unit for crop damage.
The new variables newPROPDMG and newPROPDMG are made to conform the units of property damage estimtes and crop damage estimates.
4.1 Tidy the Property Demage Estimates
summary(storm2k$PROPDMGEXP)
## - ? + 0 1 2 3 4 5
## 189121 0 0 0 1 0 0 0 0 0
## 6 7 8 B h H K m M
## 0 0 0 29 0 0 328461 0 5551
In NATIONAL WEATHER SERVICE INSTRUCTION for PROPDMGEXP and CROPDMGEXP, alphabetical character “B” means billion persons, “M” means million persons, “K” means thousand persons, “” or “0” means person. In the following step, all the units of property demage estimates are conformed to person.
newPROPDMG <- numeric()
m <- length(storm2k$PROPDMG)
for ( i in 1:m) {
if (storm2k$PROPDMGEXP[i]=="B")
newPROPDMG[i] <- storm2k$PROPDMG[i]*10^9
else if(storm2k$PROPDMGEXP[i]=="M")
newPROPDMG[i] <- storm2k$PROPDMG[i]*10^6
else if(storm2k$PROPDMGEXP[i]=="K")
newPROPDMG[i] <- storm2k$PROPDMG[i]*10^3
else
newPROPDMG[i] <- storm2k$PROPDMG[i]
}
length(newPROPDMG)
## [1] 523163
summary(newPROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 6.323e+05 1.000e+03 1.150e+11
Now all the units of property demage estimates are conformed to dollars.
4.2 Tidy the Crop Demage Estimates
summary(storm2k$CROPDMGEXP)
## ? 0 2 B k K m M
## 250613 0 0 0 4 0 271351 0 1195
The strategy for tidying crop demage estimates is simiar as before.
newCROPDMG <- numeric()
for ( i in 1:m ) {
if (storm2k$CROPDMGEXP[i]=="B")
newCROPDMG[i] <- storm2k$CROPDMG[i]*10^9
else if(storm2k$CROPDMGEXP[i]=="M")
newCROPDMG[i] <- storm2k$CROPDMG[i]*10^6
else if(storm2k$CROPDMGEXP[i]=="K")
newCROPDMG[i] <- storm2k$CROPDMG[i]*10^3
else
newCROPDMG[i] <- storm2k$CROPDMG[i]
}
length(newCROPDMG)
## [1] 523163
summary(newCROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 4.51e+04 0.00e+00 1.51e+09
Now all the units of crop demage estimates are conformed to dollars.
In this analysis the questions are focused to study:
1. Across the United States in the 21th century, which types of events are most harmful with respect to population health?
2. Across the United States in the 21th century, which types of events have the greatest economic consequences?
First, I will study which events causes the most fatalities.
fatality.mean <- sapply(split(storm2k$FATALITIES, storm2k$newEVTYPE), mean)
most.fatlality <- head(sort(fatality.mean, decreasing=TRUE ),5)
barplot(most.fatlality, main="Top 5 Events Causing Fatalities",
cex.main=1.5, ylab="Average Deaths per Event" )
From the above plot, we can see the most harmful events with respect to fatalities is tsunami.
Second, I will study which events causes the most injuries.
injury.mean <- sapply(split(storm2k$INJURIES, storm2k$newEVTYPE), mean)
most.injury <- head(sort(injury.mean, decreasing=TRUE ),5)
barplot(most.injury, main="Top 5 Events Causing Injuries",
cex.main=1.5, ylab="Average Injuries per Event")
From the above plot, we can see the most harmful events with respect to injuries is hurricane/typhoon. While Tsunami ranks the second.
I will study which events causes the most economic losses, which is the summation of estimated property losses and crop losses.
property.mean <- sapply(split(newPROPDMG, storm2k$newEVTYPE), mean)
crop.mean <- sapply(split(newCROPDMG, storm2k$newEVTYPE), mean)
economic.mean <- (property.mean+crop.mean)/(10^6)
most.economic <- head(sort(economic.mean, decreasing=TRUE ),5)
# Transform the units to Million Dollars for plotting
barplot(most.economic, main="Top 5 Events Causing Economic Losses (Million Dollars)",
cex.main=1.5, ylab="Average Economic Losses per Event (Million Dollars)" )
From the above plot, we can see the most harmful events with respect to economic consequences is hurricane/typhoon. While Storm surge/tide ranks the second.
BIN FANG
July, 2015