This report quantify and analyze storm event observations to identify those with highest potential for sever consequences. The processed data include events recorded between the years 1996-2011 (incomplete data from 1950-1995 was omitted), and the analysis is based on the fields in the processed data: evtype, year, fatalities, injuries, propdmg, cropdmg (units for propdmg & cropdmg are million dollar), and remarks. The data was processed to include relevant data, omit unused variables, filter observations, and deal with noise as inconsistent recordings and errors. Unfinalized numbers are left blank by NOAA though the observations of other fields are there. They are ignored also in this report. Locations of events are also ignored since the analysis focus is on total numbers and not geographic distribution. The report indicates that excessive heat and tornado are the major events for heavy loss of life (around 3300 fatalities out of ~8700 for the 48 event types) and tornado for injuries (around 20650 injuries out of ~58000), while hurricane (typhoon) is responsible for the highest property damages (around 82 billion dollar out of ~250 B) and drought for the highest crop damages (around 13 billion dollar out of ~35 B).
readTest <- read.csv("repdata-data-StormData.csv.bz2", sep=",", nrows=4, header=TRUE,
na.strings=c("NA","N/A",""))
ncol(readTest)
## [1] 37
names(readTest)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
| Variable | Description | Comments |
|---|---|---|
| BGN_DATE | date of event | |
| EVTYPE | type of event | |
| FATALITIES | number of deaths | Health related damages |
| INJURIES | number of injuries | Health related damages |
| PROPDMG | property damages (dollar) | economic related damages |
| PROPDMGEXP | multiplier for PROPDMG | k=1,000 M=million B=billion |
| CROPDMG | crop damages (money dollar) | economic related damages |
| CROPDMGEXP | multiplier for CROPDMG | k=1,000 M=million B=billion |
| REMARKS | comments |
#library(plyr)
#library(dplyr)
readSet <- select(read.csv("repdata-data-StormData.csv.bz2",header=TRUE,
na.strings=c("NA","N/A","")), EVTYPE, FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, BGN_DATE, REMARKS)
nrow(readSet)
## [1] 902297
names(readSet) <- tolower(names(readSet)) ##change column names to lower case
readSet$evtype <- tolower(readSet$evtype) ##change event names to lower case
readSet$year <- strptime(readSet$bgn_date, "%m/%d/%Y %H:%M:%S")
readSet$year <- format(readSet$year, "%Y")
ListAllYearsObsv <- readSet$year ##keep frequencies of all years for a later plot
readSet <- filter(readSet, year>1995)
nrow(readSet)
## [1] 653530
plot(table(ListAllYearsObsv), type="p", xlab="Years", ylab="Number of observations (yearly)", main="Number of observations from 1950 to 2011")
Figure 1: Number of observations for each year (1950-2011). The yearly number of observations until the early nineties is much lower than the following years.
##before processing property, check how many of each:
readSet$propdmgexp <- as.character(readSet$propdmgexp)
table(readSet$propdmgexp) ##there are about 377,000 exp observations
##
## 0 B K M
## 1 32 369938 7374
sum(is.na(readSet$propdmgexp)) ##the rest of exp values are NA
## [1] 276185
readSet$propdmg <- as.numeric(readSet$propdmg)
readSet$propdmg[is.na(readSet$propdmgexp)] <- NA ##reset property damages of no multipliers
readSet$cropdmgexp <- as.character(readSet$cropdmgexp)
readSet$cropdmg <- as.numeric(readSet$cropdmg)
readSet$cropdmg[is.na(readSet$cropdmgexp)] <- NA ##reset crop damages of no multipliers
##convert all values to units=million dollar
readSet$propdmg[readSet$propdmgexp == "K" & !is.na(readSet$propdmg)] <- readSet$propdmg[readSet$propdmgexp == "K" & !is.na(readSet$propdmg)] / 1000
readSet$propdmg[readSet$propdmgexp == "B" & !is.na(readSet$propdmg)] <- readSet$propdmg[readSet$propdmgexp == "B" & !is.na(readSet$propdmg)] * 1000
readSet$cropdmg[readSet$cropdmgexp == "K" & !is.na(readSet$cropdmg)] <- readSet$cropdmg[readSet$cropdmgexp == "K" & !is.na(readSet$cropdmg)] / 1000
readSet$cropdmg[readSet$cropdmgexp == "B" & !is.na(readSet$cropdmg)] <- readSet$cropdmg[readSet$cropdmgexp == "B" & !is.na(readSet$cropdmg)] * 1000
##re-structure the data to omit variables of multiplier
readSet <- select(readSet, evtype, fatalities, injuries, propdmg, cropdmg, bgn_date, year, remarks)
##remove unused observations (of zero or NA whole row)
readSet <- readSet[!((readSet$fatalities==0 & readSet$injuries==0) & (is.na(readSet$propdmg) | readSet$propdmg==0) & (is.na(readSet$cropdmg) | readSet$cropdmg==0)),]
sort(unique(readSet$evtype[readSet$fatalities>0 | readSet$injuries>0 | (readSet$propdmg>0 & !is.na(readSet$propdmg)) | (readSet$cropdmg>0 & !is.na(readSet$cropdmg))]))
## [1] " high surf advisory" " flash flood"
## [3] " tstm wind" " tstm wind (g45)"
## [5] "agricultural freeze" "astronomical high tide"
## [7] "astronomical low tide" "avalanche"
## [9] "beach erosion" "black ice"
## [11] "blizzard" "blowing dust"
## [13] "blowing snow" "brush fire"
## [15] "coastal flooding/erosion" "coastal erosion"
## [17] "coastal flood" "coastal flooding"
## [19] "coastal flooding/erosion" "coastal storm"
## [21] "coastalstorm" "cold"
## [23] "cold and snow" "cold temperature"
## [25] "cold weather" "cold/wind chill"
## [27] "dam break" "damaging freeze"
## [29] "dense fog" "dense smoke"
## [31] "downburst" "drought"
## [33] "drowning" "dry microburst"
## [35] "dust devil" "dust storm"
## [37] "early frost" "erosion/cstl flood"
## [39] "excessive heat" "excessive snow"
## [41] "extended cold" "extreme cold"
## [43] "extreme cold/wind chill" "extreme windchill"
## [45] "falling snow/ice" "flash flood"
## [47] "flash flood/flood" "flood"
## [49] "flood/flash/flood" "fog"
## [51] "freeze" "freezing drizzle"
## [53] "freezing fog" "freezing rain"
## [55] "freezing spray" "frost"
## [57] "frost/freeze" "funnel cloud"
## [59] "glaze" "gradient wind"
## [61] "gusty wind" "gusty wind/hail"
## [63] "gusty wind/hvy rain" "gusty wind/rain"
## [65] "gusty winds" "hail"
## [67] "hard freeze" "hazardous surf"
## [69] "heat" "heat wave"
## [71] "heavy rain" "heavy rain/high surf"
## [73] "heavy seas" "heavy snow"
## [75] "heavy snow shower" "heavy surf"
## [77] "heavy surf and wind" "heavy surf/high surf"
## [79] "high seas" "high surf"
## [81] "high swells" "high water"
## [83] "high wind" "high wind (g40)"
## [85] "high winds" "hurricane"
## [87] "hurricane edouard" "hurricane/typhoon"
## [89] "hyperthermia/exposure" "hypothermia/exposure"
## [91] "ice jam flood (minor" "ice on road"
## [93] "ice roads" "ice storm"
## [95] "icy roads" "lake-effect snow"
## [97] "lake effect snow" "lakeshore flood"
## [99] "landslide" "landslides"
## [101] "landslump" "landspout"
## [103] "late season snow" "light freezing rain"
## [105] "light snow" "light snowfall"
## [107] "lightning" "marine accident"
## [109] "marine hail" "marine high wind"
## [111] "marine strong wind" "marine thunderstorm wind"
## [113] "marine tstm wind" "microburst"
## [115] "mixed precip" "mixed precipitation"
## [117] "mud slide" "mudslide"
## [119] "mudslides" "non-severe wind damage"
## [121] "non-tstm wind" "non tstm wind"
## [123] "other" "rain"
## [125] "rain/snow" "record heat"
## [127] "rip current" "rip currents"
## [129] "river flood" "river flooding"
## [131] "rock slide" "rogue wave"
## [133] "rough seas" "rough surf"
## [135] "seiche" "small hail"
## [137] "snow" "snow and ice"
## [139] "snow squall" "snow squalls"
## [141] "storm surge" "storm surge/tide"
## [143] "strong wind" "strong winds"
## [145] "thunderstorm" "thunderstorm wind"
## [147] "thunderstorm wind (g40)" "tidal flooding"
## [149] "tornado" "torrential rainfall"
## [151] "tropical depression" "tropical storm"
## [153] "tstm wind" "tstm wind (g45)"
## [155] "tstm wind (41)" "tstm wind (g35)"
## [157] "tstm wind (g40)" "tstm wind (g45)"
## [159] "tstm wind 40" "tstm wind 45"
## [161] "tstm wind and lightning" "tstm wind g45"
## [163] "tstm wind/hail" "tsunami"
## [165] "typhoon" "unseasonable cold"
## [167] "unseasonably cold" "unseasonably warm"
## [169] "unseasonal rain" "urban/sml stream fld"
## [171] "volcanic ash" "warm weather"
## [173] "waterspout" "wet microburst"
## [175] "whirlwind" "wild/forest fire"
## [177] "wildfire" "wind"
## [179] "wind and wave" "wind damage"
## [181] "winds" "winter storm"
## [183] "winter weather" "winter weather mix"
## [185] "winter weather/mix" "wintry mix"
windObservations <- nrow(readSet[readSet$evtype=="wind",])
windFatal <- sum(readSet$fatalities[readSet$evtype=="wind"], na.rm=TRUE)
windInjur <- sum(readSet$injuries[readSet$evtype=="wind"], na.rm=TRUE)
windProp <- sum(readSet$propdmg[readSet$evtype=="wind"], na.rm=TRUE)
windCrop <- sum(readSet$cropdmg[readSet$evtype=="wind"], na.rm=TRUE)
print(paste0("number of WIND observations: " , windObservations , ", total fatalities: " , windFatal , ", total injuries: " , windInjur , ", total propdmg: " , windProp , " million dollar, total cropdmg: " , windCrop , " million dollar."))
## [1] "number of WIND observations: 67, total fatalities: 18, total injuries: 84, total propdmg: 2.2895 million dollar, total cropdmg: 0.3 million dollar."
##readSet[readSet$evtype=="wind",][1:70,]
For these 67 observations, the total fatalities is 18, injuries is 84, propdmg is 2.2895 million dollar and cropdmg is 0.3 million dollar. Therefore the remarks field was checked to better identify the events. Most of the remarks mentioned thunderstorm wind or a speed higher then 40 mph which match the definition of thunderstorm, therefore the event name was changed to “thunderstorm wind”. A comment was added in the code to list the amount of fatalities, injuries, propdmg, and cropdmg that were originally associated with the event name “wind.”
The following process set uniform names for the event types in variable evtype.
##cleaning the data: renaming event types
##wind different types
readSet$evtype[grep("^[m].*tstm.*", readSet$evtype)] <- "marine thunderstorm wind"
readSet$evtype[grep("non.?tstm wind*", readSet$evtype)] <- "strong wind"
readSet$evtype[grep("^[^m].*tstm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("tstm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("thunderstorm.*g40.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("high.*g40.*", readSet$evtype)] <- "high wind"
readSet$evtype <- gsub("winds", "wind", readSet$evtype)
readSet$evtype[grep("extreme windchill", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep(".*gusty.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("non-severe wind damage", readSet$evtype)] <- "high wind"
readSet$evtype[grep("gradient wind", readSet$evtype)] <- "strong wind"
readSet$evtype[grep("heavy surf and wind", readSet$evtype)] <- "high surf"
readSet$evtype[grep("wind and wave", readSet$evtype)] <- "marine thunderstorm wind"
##cold
readSet$evtype[grep(".*extended cold.*", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep(".*cold and snow.*", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep("^[e].*cold.*", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep("^[^e].*cold.*", readSet$evtype)] <- "cold/wind chill"
readSet$evtype[grep("^cold.*", readSet$evtype)] <- "cold/wind chill"
##heat
readSet$evtype[grep(".*heat wave.*", readSet$evtype)] <- "heat"
readSet$evtype[grep(".*record heat.*", readSet$evtype)] <- "excessive heat"
##flood
readSet$evtype[grep(".*coastal.*flood.*", readSet$evtype)] <- "coastal flood"
readSet$evtype[grep(".*cstl.*", readSet$evtype)] <- "coastal flood"
readSet$evtype[grep(".*tidal.*", readSet$evtype)] <- "coastal flood"
readSet$evtype[grep(".*flash.*flood.*", readSet$evtype)] <- "flash flood"
readSet$evtype[grep(".*river.*", readSet$evtype)] <- "flood" ##p=126.437M
readSet$evtype[grep(".*ice jam.*", readSet$evtype)] <- "flood"
##fog
readSet$evtype[grep("^fog.*", readSet$evtype)] <- "dense fog" ##f=60 i=712 p=13.15M
##freez
readSet$evtype[grep(".*freezing rain.*", readSet$evtype)] <- "temp1"
readSet$evtype[grep(".*freezing fog.*", readSet$evtype)] <- "temp2"
readSet$evtype[grep(".*freez.*", readSet$evtype)] <- "frost/freeze"
readSet$evtype[grep(".*temp1.*", readSet$evtype)] <- "freezing fog"
readSet$evtype[grep(".*temp2.*", readSet$evtype)] <- "sleet"
#snow
readSet$evtype[grep(".*heavy snow shower.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep(".*excessive snow.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep(".*falling snow/ice.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep(".*light snow.*", readSet$evtype)] <- "heavy snow" ##f=1, i=2, p=2.598M
readSet$evtype[grep(".*snow squall.*", readSet$evtype)] <- "blizzard"
readSet$evtype[grep(".*snow and ice.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep("^snow?", readSet$evtype)] <- "heavy snow" ##f=2, i=12, p=2.554M
readSet$evtype[grep(".*rain/snow.*", readSet$evtype)] <- "blizzard"
readSet$evtype[grep(".*blowing snow.*", readSet$evtype)] <- "blizzard"
readSet$evtype[grep(".*lake.*snow.*", readSet$evtype)] <- "lake-effect snow"
readSet$evtype[grep(".*late season snow.*", readSet$evtype)] <- "heavy snow"
##various types
readSet$evtype[grep(".*surf.*", readSet$evtype)] <- "high surf"
readSet$evtype[grep(".*wild.*", readSet$evtype)] <- "wildfire"
readSet$evtype[grep(".*wint.*mix.*", readSet$evtype)] <- "winter weather" ##f=29 i=217 p=6M
readSet$evtype[grep(".*rip.*", readSet$evtype)] <- "rip current"
readSet$evtype[grep(".*fld.*", readSet$evtype)] <- "flood" ##f=28 i=79 p=58M
readSet$evtype[grep(".*dry microburst.*", readSet$evtype)] <- "thunderstorm wind" ##f=3 i=25 p=1.7M
readSet$evtype[grep(".*coastal storm.*", readSet$evtype)] <- "marine thunderstorm wind"
readSet$evtype[grep(".*hurricane|typhoon.*", readSet$evtype)] <- "hurricane (typhoon)"
readSet$evtype[grep(".*coastalstorm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep(".*frost.*", readSet$evtype)] <- "frost/freeze"
readSet$evtype[grep(".*torrential rainfall.*", readSet$evtype)] <- "heavy rain"
readSet$evtype[grep(".*ic.*road.*", readSet$evtype)] <- "frost/freeze" ##f=1 i=1 p=0.012M
readSet$evtype[grep(".*glaze.*", readSet$evtype)] <- "frost/freeze" ##f=1 i=212 p=0.15M
readSet$evtype[grep(".*exposure.*", readSet$evtype)] <- "extreme cold/wind chill" ##f=8
readSet$evtype[grep(".*land.*", readSet$evtype)] <- "debris flow" ##f=38 i=53 p=325 c=20.017
readSet$evtype[grep(".*mudslide.*", readSet$evtype)] <- "debris flow" ##f=5 i=2 p=1.225
readSet$evtype[grep(".*mixed precip.*", readSet$evtype)] <- "frost/freeze" ##f=2 i=26 p=0.79
readSet$evtype[grep(".*rough seas.*", readSet$evtype)] <- "marine strong wind"
readSet$evtype[grep(".*small hail.*", readSet$evtype)] <- "hail"
readSet$evtype[grep(".*storm surge.*", readSet$evtype)] <- "storm surge/tide"
readSet$evtype[grep("^thunderstorm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep(".*whirlwind.*", readSet$evtype)] <- "dust devil" ##f=1 i=0 p=0.012
readSet$evtype[grep(".*warm.*", readSet$evtype)] <- "excessive heat" ##f=0 i=16 c=0.01
readSet$evtype[grep("^wind.*", readSet$evtype)] <- "thunderstorm wind" ##f=19 i=85 p=2.33M c=0.3
readSet$evtype[grep(".*astronomical high tide.*", readSet$evtype)] <- "coastal flood" ##f=0 i=0 p=9.425M
readSet$evtype[grep(".*dam break.*", readSet$evtype)] <- "flash flood" ##p=1.002M
readSet$evtype[grep(".*mud slide.*", readSet$evtype)] <- "debris flow"
readSet$evtype[grep(".*rock slide.*", readSet$evtype)] <- "debris flow"
readSet$evtype[grep(".*unseasonal rain.*", readSet$evtype)] <- "heavy rain" ##c=10M
##The following event type (except for "other") were set as "unused":
##unused type fatalities injuries propdmg M-doler cropdmg
##other: 7 4 0.055 1.034
##black ice: 1 24 0
##brush fire: 0 2 0
##drowning: 1 0 0
##heavy seas: 1 0 0
##high seas: 3 7 0.015
#high swells: 1 0 0.005
##high water: 3 0 0
##marine accident: 1 2 0.05
##wind damage: 0 0 0.01
##beach erosion: 0 0 0.1
##blowing dust: 0 0 0.02
##downburst: 0 0 0.002
##microburst: 0 0 0.055
##rain: 0 0 0.3 0.25
##coastal erosion: 0 0 0.766
readSet$evtype[grep(".*black ice.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*brush fire.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*drowning.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*heavy seas.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*high seas.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*high swells.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*high water.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*marine accident.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*rogue wave.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*wind damage.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*beach erosion.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*blowing dust.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*downburst.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*microburst.*", readSet$evtype)] <- "unused"
readSet$evtype[readSet$evtype=="rain"] <- "unused"
readSet$evtype[grep(".*coastal erosion.*", readSet$evtype)] <- "unused"
sort(unique(readSet$evtype[readSet$fatalities>0 | readSet$injuries>0 | (readSet$propdmg>0 & !is.na(readSet$propdmg)) | (readSet$cropdmg>0 & !is.na(readSet$cropdmg))]))
## [1] "astronomical low tide" "avalanche"
## [3] "blizzard" "coastal flood"
## [5] "cold/wind chill" "debris flow"
## [7] "dense fog" "dense smoke"
## [9] "drought" "dust devil"
## [11] "dust storm" "excessive heat"
## [13] "extreme cold/wind chill" "flash flood"
## [15] "flood" "freezing fog"
## [17] "frost/freeze" "funnel cloud"
## [19] "hail" "heat"
## [21] "heavy rain" "heavy snow"
## [23] "high surf" "high wind"
## [25] "hurricane (typhoon)" "ice storm"
## [27] "lake-effect snow" "lakeshore flood"
## [29] "lightning" "marine hail"
## [31] "marine high wind" "marine strong wind"
## [33] "marine thunderstorm wind" "other"
## [35] "rip current" "seiche"
## [37] "sleet" "storm surge/tide"
## [39] "strong wind" "thunderstorm wind"
## [41] "tornado" "tropical depression"
## [43] "tropical storm" "tsunami"
## [45] "unused" "volcanic ash"
## [47] "waterspout" "wildfire"
## [49] "winter storm" "winter weather"
unusedRows <- nrow(readSet[readSet$evtype=="unused",])
unusedFatal <- sum(readSet$fatal[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
unusedInjur <- sum(readSet$injur[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
unusedProp <- sum(readSet$prop[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
unusedCtop <- sum(readSet$crop[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
otherRows <- nrow(readSet[readSet$evtype=="other",])
otherFatal <- sum(readSet$fatal[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
otherInjur <- sum(readSet$injur[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
otherProp <- sum(readSet$prop[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
otherCtop <- sum(readSet$crop[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
print(paste0("The number of unused observations: " , unusedRows , ", total fatalities: " , unusedFatal , ", total injuries: " , unusedInjur , ", total propdmg: " , unusedProp , " million dollar, total cropdmg: " , unusedProp , " million dollar. The number of other observations: " , otherRows , ", total fatalities: " , otherFatal , ", total injuries: " , otherInjur , ", total propdmg: " , otherProp , " million dollar, total cropdmg: " , otherCtop , " million dollar."))
## [1] "The number of unused observations: 27, total fatalities: 11, total injuries: 37, total propdmg: 1.313 million dollar, total cropdmg: 1.313 million dollar. The number of other observations: 34, total fatalities: 0, total injuries: 4, total propdmg: 0.0555 million dollar, total cropdmg: 1.0344 million dollar."
NOTE: There is an error in the dataset that is significant. There are two records for an event flood that refer the Napa River (CA): 12/31/2005 - propdmg is 115 M; 1/1/2006 - propdmg is 115 B. The correct value is 115 M. Therefore, the value of 115 B was set to NA. These records are corrected in the updated data (2014) in the following link:
http://www.ncdc.noaa.gov/stormevents/textsearch.jsp?q=Napa+River+City+and+Parks+Department
readSet$propdmg[readSet$propdmg==115000] <- NA ##fix error in the dataset
Sum total for each field to see what is the 100% of the entire dataset.
The following process summarize the four variable, for a rough estimate:
print(paste0("total fatalities: " , sum(readSet$fatalities, na.rm=TRUE) , ", total injuries: " , sum(readSet$injuries, na.rm=TRUE) , ", total property damage in million dollar: " , round(sum(readSet$propdmg, na.rm=TRUE),2) , ", total crop damage in million dollar: " , round(sum(readSet$cropdmg, na.rm=TRUE),2)))
## [1] "total fatalities: 8732, total injuries: 57975, total property damage in million dollar: 251767.62, total crop damage in million dollar: 34752.73"
readSet <- select(readSet, evtype, year, fatalities, injuries, propdmg, cropdmg)
readSet <- arrange(readSet, evtype, year)
readSet[1:5,]
## evtype year fatalities injuries propdmg cropdmg
## 1 astronomical low tide 2007 0 0 0.12 0
## 2 astronomical low tide 2008 0 0 0.20 0
## 3 avalanche 1996 0 2 NA NA
## 4 avalanche 1996 1 1 NA NA
## 5 avalanche 1996 2 0 NA NA
The analyses here aim to indicate events that have the highest impact on health and economics, where health associated categories are fatalities and injuries, and economics associated categories are property damages and crop damages.
Therefore, two separate tables with the relevant calculations were created, as follows.
For the following code, the packages “reshape”, “data.table”, and “ggplot2” are required.
#library(reshape) ##reshape package is required for changing the data structure
#library(data.table) ##data.table package is required for fast binding
#library(ggplot2) ##ggplot2 package is required for the following plot
##calculate total fatalities, injuries, propdmg, cropdmg for each event
allTotals <- ddply(readSet, .(evtype), summarize, totalFatalities=sum(fatalities), totalInjuries=sum(injuries), totalPropdmg=sum(propdmg, na.rm=TRUE), totalCropdmg=sum(cropdmg, na.rm=TRUE))
##restructure the totals: column 1 evtype, column 2 categories, column 3 totals
allTotals <- melt(allTotals, id=c("evtype"), categoryFactor.vars=c("fatalities","injuries", "prop", "crop"))
names(allTotals)[2:3] <- c("categoryFactor", "totals")
##calculate yearly fatalities, injuries, propdmg, cropdmg, for each event (1996-2011)
allYearly <- ddply(readSet, .(evtype, year), summarize, yearlyFatalities=sum(fatalities), yearlyInjuries=sum(injuries), yearlyPropdmg=sum(propdmg, na.rm=TRUE), yearlyCropdmg=sum(cropdmg, na.rm=TRUE))
##restructure the totals: col1 evtype, col2 year, col3 categories, col4 yearly totals
allYearly <- melt(allYearly, id=c("evtype", "year"), categoryFactor.vars=c("fatalities","injuries", "prop", "crop"))
names(allYearly)[3:4] <- c("categoryFactor", "yearlyTotals")
tableAll <- rbindlist(list(select(allTotals, evtype, categoryFactor, totals), select(allYearly, evtype, categoryFactor, yearlyTotals)))
##health table
health <- tableAll[grep("Fatalities|Injuries", tableAll$categoryFactor)] ##subset health categories
healthLevels <- arrange(health[health$categoryFactor=="totalFatalities", ], desc(totals))[,"evtype", with=FALSE]
healthLevels$evtype <- factor(healthLevels$evtype, levels=healthLevels$evtype)
health$evtype <- factor(health$evtype, levels=levels(healthLevels$evtype)) ##sort event levels by totals of fatalities
##economics table
economic <- tableAll[grep("Propdmg|Cropdmg", tableAll$categoryFactor)] ##subset economic categories
totalEconomi <- economic[economic$categoryFactor=="totalPropdmg"]
totalEconomi$categoryFactor <- "totalPropAndCropDmg"
totalEconomi$totals <- economic$totals[economic$categoryFactor=="totalPropdmg"] + economic$totals[economic$categoryFactor=="totalCropdmg"]
totalEconomi <- arrange(totalEconomi, desc(totals))
totalEconomi$evtype <- factor(totalEconomi$evtype, levels=totalEconomi$evtype)
economic$evtype <- factor(economic$evtype, levels=levels(totalEconomi$evtype)) ##Economi dataset, sort the event levels by sum of propdmg+cropdmg
The summary tables for health and economics consequences, that were prepared at the data processing section, contain totals for each event types. There are totals for all years and totals for each year (1996-2011). The year 2011 originally included data until November only.
Figure 2 and Figure 3 below enable to compare the impact of each event type on population health and economical damages, respectively. The box-plot figures enable to get an idea of the differences in yearly impact, where the box and points show the distribution of the yearly values of each event type.
Following are the top 15 harmful events of each category.
A. Causes for fatalities
head(arrange(health[health$categoryFactor=="totalFatalities", ], desc(totals)), 15)
## evtype categoryFactor totals
## 1: excessive heat totalFatalities 1799
## 2: tornado totalFatalities 1511
## 3: flash flood totalFatalities 887
## 4: lightning totalFatalities 651
## 5: rip current totalFatalities 542
## 6: flood totalFatalities 444
## 7: thunderstorm wind totalFatalities 406
## 8: extreme cold/wind chill totalFatalities 280
## 9: heat totalFatalities 237
## 10: high wind totalFatalities 235
## 11: avalanche totalFatalities 223
## 12: winter storm totalFatalities 191
## 13: high surf totalFatalities 145
## 14: hurricane (typhoon) totalFatalities 125
## 15: cold/wind chill totalFatalities 117
B. Causes for injuries
head(arrange(health[health$categoryFactor=="totalInjuries", ], desc(totals)), 15)
## evtype categoryFactor totals
## 1: tornado totalInjuries 20667
## 2: flood totalInjuries 6838
## 3: excessive heat totalInjuries 6410
## 4: thunderstorm wind totalInjuries 5250
## 5: lightning totalInjuries 4141
## 6: flash flood totalInjuries 1674
## 7: wildfire totalInjuries 1456
## 8: hurricane (typhoon) totalInjuries 1328
## 9: heat totalInjuries 1292
## 10: winter storm totalInjuries 1292
## 11: high wind totalInjuries 1090
## 12: dense fog totalInjuries 855
## 13: hail totalInjuries 723
## 14: heavy snow totalInjuries 717
## 15: winter weather totalInjuries 560
C. Graphs
printHealth <- qplot(evtype, totals, data=health, facets=categoryFactor~.)
printHealth + geom_boxplot() + theme(axis.text.x = element_text(angle=90, vjust=0.5, size=11, face="bold", color="black")) + facet_grid(categoryFactor~., scales="free_y") + theme(strip.text=element_text(size=14)) + labs(x="Event Types", y="Fatalities or Injuries (free scale)", title="Weather harmful events with respect to population health \n (fatalities or injuries)")
Figure 2: Weather harmful events with respect to population health. The events are ordered (descending) by the number of total fatalities (from left to right) over the years 1996-2011. The vertical scale indicate the number of fatalities or injuries. From upper to lower: the first and second graphs show the total fatalities and injuries, respectively; the third and forth graphs show the fatalities and injuries (respectively) for the different years.
From the above tables and graphs of Figure 2, it seems that the event types that are associated with highest numbers of fatalities are excessive heat and tornado, and the one associated with highest numbers of injuries is tornado. Also another four event types are associated with quite a large amount of injuries: flood, excessive heat, thunderstorm wind, and lightning.
A. Causes for property damages (in million dollar)
head(arrange(economic[economic$categoryFactor=="totalPropdmg", ], desc(totals)), 15)
## evtype categoryFactor totals
## 1: hurricane (typhoon) totalPropdmg 81718.8890
## 2: storm surge/tide totalPropdmg 47834.7240
## 3: flood totalPropdmg 29129.5812
## 4: tornado totalPropdmg 24616.9457
## 5: flash flood totalPropdmg 15223.2709
## 6: hail totalPropdmg 14595.2134
## 7: thunderstorm wind totalPropdmg 7919.2480
## 8: wildfire totalPropdmg 7760.4495
## 9: tropical storm totalPropdmg 7642.4756
## 10: high wind totalPropdmg 5248.3834
## 11: ice storm totalPropdmg 3642.2488
## 12: winter storm totalPropdmg 1532.7432
## 13: drought totalPropdmg 1046.1010
## 14: lightning totalPropdmg 743.0771
## 15: heavy snow totalPropdmg 641.6945
B. Causes for crop damages (in million dollar)
head(arrange(economic[economic$categoryFactor=="totalCropdmg", ], desc(totals)), 15)
## evtype categoryFactor totals
## 1: drought totalCropdmg 13367.5660
## 2: hurricane (typhoon) totalCropdmg 5350.1078
## 3: flood totalCropdmg 5013.1615
## 4: hail totalCropdmg 2496.8225
## 5: frost/freeze totalCropdmg 1368.7610
## 6: flash flood totalCropdmg 1334.9017
## 7: extreme cold/wind chill totalCropdmg 1326.0230
## 8: thunderstorm wind totalCropdmg 1017.4676
## 9: heavy rain totalCropdmg 738.1698
## 10: tropical storm totalCropdmg 677.7110
## 11: high wind totalCropdmg 633.5613
## 12: excessive heat totalCropdmg 492.4120
## 13: wildfire totalCropdmg 402.2551
## 14: tornado totalCropdmg 283.4250
## 15: heavy snow totalCropdmg 71.1221
C. Graphs
printEconomics <- qplot(evtype, totals, data=economic, facets=categoryFactor~.)
printEconomics + geom_boxplot() + theme(axis.text.x = element_text(angle=90, vjust=0.5, size=11, face="bold", color="black")) + facet_grid(categoryFactor~., scales="free_y") + theme(strip.text=element_text(size=14)) + labs(x="Event Types", y="Property and Crop Damages in million dollar (free scale)", title="Weather harmful events with respect to property and crop damages \n (in million dollar)")
Figure 3: Weather harmful events with respect to property and crop damages in million dollar. The events are ordered (descending) by the number of total property and crop damages together (from left to right) over the years 1996-2011. The vertical scale indicate the amount of damage in million dollar. From upper to lower: the first and second graphs show the total property damage and crop damage, respectively; the third and forth graphs show the amount of property damage and crop damage (respectively) for the different years.
From the above tables and graphs of Figure 3, it seems that the event type that is associated with the greatest amounts of property damages is hurricane (typhoon), and the one associated with highest amounts of crop damages is drought. Also another five event types are associated with high amounts of property damages (from higher to lower): storm surge/tide, flood, tornado, flash flood, and hail.
The report indicates that excessive heat and tornado are the major events for heavy loss of life (around 1800 and 1500 fatalities, respectively) and tornado for injuries (around 20650 injuries), while hurricane (typhoon) is responsible for the highest property damages (around 82 billion dollar) and drought for the highest crop damages (around 13 billion dollar). Hurricane and tornado together are associated with damage amounts of around 110 billion dollar over the years 1996-2011, and so are storm surge/tide, flood and flash flood when counted together.