Determination of weather event types in the US that have caused the greatest number of fatalities and injuries, and had the greatest economic impact on property and crops
This study identifies the 10 weather event types that are most harmful to human health, in terms of number of fatalities and injuries caused in the US since 1950, and the 10 weather event types that have had the greatest economic impact on property and crops, in terms of value of damage caused. The weather event types are those defined by the NOAA here
National Weather Service Storm Data Documentation
The data were provided by the NOAA and were sourced from here:
Storm data [47Mb]
The data contained many errors, in that many events had not been categorised or had not been recorded in the required manner, either in respect of recorded weather type, or by value multipler.
A procedure was carried out to map actual recorded data to one of the correct categories, and to omit data that could not be mapped. The final data set captured 94% of total fatalities and injuries, and 97% of total damage value to property and crops.
The top three most damaging weather types in terms of impact on human health have been (in descending order) tornados, heat, and thuderstorms, whiile the top three in terms of economic impact on property and crops have been floods, hurricanes and tornados
rm(list=ls())
setwd("C:/Users/Mike/Rspace/JHU_RR/PA2") # amend pathway as required
#setwd("H:/Rspace/JHU_Data_Science/JHU_RR/PA2")
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:dplyr':
##
## between, last
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:data.table':
##
## hour, mday, month, quarter, wday, week, yday, year
library(pander)
panderOptions('round', 2)
The NOAA data for this analysis is taken from the link provided on the JHU Reproducible Research Peer Assignment 2 Coursera site:
Storm data [47Mb]
Information on this data is available here
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
We save it into a csv.bz2 file on our local machine
if(!file.exists("data")){
dir.create("data")
}
if(!file.exists("./data/stormData.csv.bz2")){
fileURL<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile="./data/stormData.csv.bz2")
#include date of download
datedownloaded<-date()
datedownloaded
}
# using cache here causes memory overload in my computer!
# To do: find a faster way to read the data into R.
file<- bzfile(description = "./data/stormData.csv.bz2", open = 'r', encoding =
getOption('encoding'), compression = 9)
stormdata <- read.csv(file, header = TRUE, stringsAsFactors = FALSE)
#closeAllConnections()
First we look at the structure of the data:
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
#check for NAs
mean(is.na(stormdata$FATALITIES))
## [1] 0
mean(is.na(stormdata$INJURIES))
## [1] 0
mean(is.na(stormdata$PROPDMG))
## [1] 0
mean(is.na(stormdata$CROPDMG))
## [1] 0
# There are no NAs in these relevant data columns. Yay!
This suggests that some data removal can be carried out:
All rows of the data set for which fatalities, injuries, crop damage and property damage values were all zero are removed from the data set, as are columns that will not be used in the subsequent analysis..
# select columns required
library(dplyr)
df1 <-select(stormdata,STATE__:EVTYPE,FATALITIES:CROPDMGEXP)
rm(stormdata)
str(df1)
## 'data.frame': 902297 obs. of 14 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
# filter out null rows
df2<-filter(df1,FATALITIES+INJURIES+PROPDMG+CROPDMG>0 )
rm(df1)
nrow(df2)
## [1] 254633
Hence about 70% of the events reported and recorded caused neither fatalities nor injuries, nor was any economic damage to property or crops recorded.
We inspect the data by year to see the time periods from when the data mainly arises:
# extract year from the date column
library(lubridate)
year<-as.POSIXlt(mdy_hms(df2$BGN_DATE))
year<-year$year+1900
summary(year) # check this looks OK.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1950 1997 2002 2000 2008 2011
# include this column in the reduced data set df2
sd.red<-data.frame(year,df2)
rm(df2,year)
fit.i<-aggregate(sd.red$INJURIES,by=list(sd.red$year),FUN="sum")
fit.f<-aggregate(sd.red$FATALITIES,by=list(sd.red$year),FUN="sum")
fit<-data.frame(fit.i$Group.1,fit.i$x+fit.f$x)
summary(fit)
## fit.i.Group.1 fit.i.x...fit.f.x
## Min. :1950 Min. : 558
## 1st Qu.:1965 1st Qu.: 1098
## Median :1980 Median : 1864
## Mean :1980 Mean : 2511
## 3rd Qu.:1996 3rd Qu.: 3167
## Max. :2011 Max. :11864
names(fit)[names(fit)=="fit.i.Group.1"] <- "Year"
names(fit)[names(fit)=="fit.i.x...fit.f.x"] <- "Fatalities.Injuries"
pt<-aggregate(sd.red$PROPDMG,by=list(sd.red$year),FUN="sum")
ct<-aggregate(sd.red$CROPDMG,by=list(sd.red$year),FUN="sum")
pct<-data.frame(pt$Group.1,pt$x+ct$x)
summary(pct)
## pt.Group.1 pt.x...ct.x
## Min. :1950 Min. : 10561
## 1st Qu.:1965 1st Qu.: 33248
## Median :1980 Median : 53566
## Mean :1980 Mean :197780
## 3rd Qu.:1996 3rd Qu.:423508
## Max. :2011 Max. :918602
names(pct)[names(pct)=="pt.Group.1"] <- "Year"
names(pct)[names(pct)=="pt.x...ct.x"] <- "Totals"
Merge these two totals together
annual.totals<-fit
annual.totals$Damage<-pct$Totals
rm(fit.i,fit.f,fit,pt,ct,pct) # clean up
#names(annual.totals)[names(annual.totals)=="pct$Total.Damage"] <- "Total.Damage"
str(annual.totals)
## 'data.frame': 62 obs. of 3 variables:
## $ Year : num 1950 1951 1952 1953 1954 ...
## $ Fatalities.Injuries: num 729 558 2145 5650 751 ...
## $ Damage : num 16999 10561 16680 19182 23368 ...
From this analysis, we find that the annual number of reported events and the value of the damage caused is shown below in Figure 1.
library(ggplot2)
par(mfrow = c(1, 2))
par(mar = c(4, 4,3,1))
par(bg="white")
with(annual.totals,plot(Year,Fatalities.Injuries,
xlim=c(1950,2020),
ylim=c(0,max(annual.totals$Fatalities.Injuries)),
xlab="Year",
ylab="Annual fatalities and Injuries",
pch=19,
col="red",
#main="Fatalities / Injuries"
)
)
with(annual.totals,plot(Year,Damage,
xlim=c(1950,2020),
ylim=c(0,max(annual.totals$Damage)),
xlab="Year",
ylab="Annual property and crop damage value ($) ",
pch=19,
col="blue",
#main="Property and crop damage ($)"
)
)
Figure 1 : Variation by year of the number of weather events recorded and the economic value of their damage to property and crops.
We note a huge rise in fatalities and injuries from about 1990 onwards. This probably reflects a rise in the available volume of data, rather than an actual rise’ in the number of events, but does suggest that the results of our subsequent analysis are more reflective of the last 25 years than of the first years since 1950. Further analysis could verify this.
We now to turn to the main goal of our analysis, which is to fin the most damaging type of weather events in the US since 1950, as measured by impact on human health and by damage caused to property and crops. To achieve this, the data set must be cleaned. The meaning of this is explained below.
We wish to identify those weather types that have been most damaging to people, in terms of numbers of fatalities and injuries caused, and to crops and property in economic terms. The categories used will be those recognised by NOAA:
evtype<-c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill","Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm","Excessive Heat","Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze","Funnel Cloud","Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind","Hurricane","Ice Storm","Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail","Marine High Wind","Marine Strong Wind","Marine Thunderstorm Wind","Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash","Waterspout","Wildfire","Winter Storm","Winter Weather")
evtype
## [1] "Astronomical Low Tide" "Avalanche"
## [3] "Blizzard" "Coastal Flood"
## [5] "Cold/Wind Chill" "Debris Flow"
## [7] "Dense Fog" "Dense Smoke"
## [9] "Drought" "Dust Devil"
## [11] "Dust Storm" "Excessive Heat"
## [13] "Extreme Cold/Wind Chill" "Flash Flood"
## [15] "Flood" "Frost/Freeze"
## [17] "Funnel Cloud" "Freezing Fog"
## [19] "Hail" "Heat"
## [21] "Heavy Rain" "Heavy Snow"
## [23] "High Surf" "High Wind"
## [25] "Hurricane" "Ice Storm"
## [27] "Lake-Effect Snow" "Lakeshore Flood"
## [29] "Lightning" "Marine Hail"
## [31] "Marine High Wind" "Marine Strong Wind"
## [33] "Marine Thunderstorm Wind" "Rip Current"
## [35] "Seiche" "Sleet"
## [37] "Storm Surge/Tide" "Strong Wind"
## [39] "Thunderstorm Wind" "Tornado"
## [41] "Tropical Depression" "Tropical Storm"
## [43] "Tsunami" "Volcanic Ash"
## [45] "Waterspout" "Wildfire"
## [47] "Winter Storm" "Winter Weather"
levtypes<-length(evtype)
which number 48 in all
evrep<-sd.red$EVTYPE
nevrec<-length(unique(evrep))
Howeever, 488 event types wereactually recorded which means that many events must have been recorded with non-standard terms. Hence before final analysis, the recorded events have had to be mapped to one or other of the recognised categories. This has been achieved by a process of matching terms, identifying common alternative terms (eg TSTM for thunderstorm, precipitation for rain etc), and correction of obvious typographical errors eg TORNDAo etc.
The matching process is described below and may be skipped if not of interest. The criterion for this process was that at least 95% of the reported events should be mapped to one NOAA category. Those that could not be mapped were not included in the final analysis.
This is an iterative process.
First, we make some substitutions to remove obvious typographical errors (eg THUNERSTORM) and to substitute NOAA category terms for other equivalent terms used in the data set (eg “RAIN” for “precipitation”)
evrep<-sd.red$EVTYPE
evrep<-gsub("Precipitation|drizzle|shower","RAIN",evrep,ignore.case=TRUE)
evrep<-gsub("TSTM|THUNDERESTORM|THUNERSTORM|THUNDEERSTORM|THUNDERTORM",
"THUNDERSTORM",evrep,ignore.case=TRUE)
evrep<-gsub("STROM","STORM",evrep,ignore.case=TRUE)
evrep<-gsub("TUN","THUN",evrep,ignore.case=TRUE)
evrep<-gsub("THUNDERSTORM\\S","THUNDERSTORM\\s",evrep,ignore.case=TRUE)
evrep<-gsub("STORMS","STORM",evrep,ignore.case=TRUE)
evrep<-gsub("WINDS|WINS","WIND",evrep,ignore.case=TRUE)
evrep<-gsub("Typhoon","Hurricane",evrep,ignore.case=TRUE)
evrep<-gsub("Hurricane\\s[A-z]|Hurricane[A-z]","Hurricane",evrep,ignore.case=TRUE)
evrep<-gsub("ASTRONOMICAL HIGH TIDE","Storm Surge/Tide",evrep,ignore.case=TRUE)
evrep<-gsub("Whirlwind|Torndao","TORNADO",evrep,ignore.case=TRUE)
evrep<-gsub("Lighting|LIGNTNING","Lightning",evrep,ignore.case=TRUE)
evrep<-gsub("Excessive|Hvy|Severe","HEAVY",evrep,ignore.case=TRUE)
evred<-sapply(1:levtypes,function(x){
grep(evtype[x],evrep,ignore.case=TRUE)
})
event<-c(rep("Other",nrow(sd.red)))
for(i in 1:47){
event[evred[[i]]]<-evtype[i]
}
The number of terms usewd that were not captured and mapped by this process is:
missed<-evrep[grep("Other",event)]
unique(missed)
## [1] "WIND" "COLD"
## [3] "EXTREME COLD" "FREEZE"
## [5] "MARINE MISHAP" "HIGH TIDES"
## [7] "HIGH SEAS" "HEAVY TURBULENCE"
## [9] "RECORD RAINFALL" "APACHE COUNTY"
## [11] "GUSTY WIND" "WILD FIRES"
## [13] "HIGH" "MUDSLIDES"
## [15] "RAINSTORM" "HEAVY THUNDERSTORM"
## [17] "THUNDERSTORM" "HEAVY SURF"
## [19] "DRY MIRCOBURST WIND" "DRY MICROBURST"
## [21] "MICROBURST WIND" "UNSEASONABLY WARM"
## [23] "STORM SURGE" "DAMAGING FREEZE"
## [25] "SNOW" "FROST"
## [27] "FREEZING RAIN/SNOW" "THUNDERSNOW"
## [29] "COOL AND WET" "GLAZE ICE"
## [31] "MUD SLIDE" "HIGH WIND"
## [33] "MUD SLIDES" "COLD AND WET CONDITIONS"
## [35] "HEAVY WETNESS" "GUSTNADO"
## [37] "FREEZING RAIN" "EXTREME WIND CHILL"
## [39] "WIND DAMAGE" "FOG"
## [41] "SNOW AND ICE" "WIND STORM"
## [43] "GRASS FIRES" "ICE"
## [45] "THUNDERSTORM WIND" "WINTER WEATHER"
## [47] "RAIN" "ICE FLOES"
## [49] "HEAVY LAKE SNOW" "RECORD COLD"
## [51] "COLD WAVE" "GLAZE"
## [53] "MICROBURST" "AVALANCE"
## [55] "ICE JAM" "FOREST FIRES"
## [57] "FROST\\FREEZE" "HARD FREEZE"
## [59] "URBAN AND SMALL" "FOG AND COLD TEMPERATURES"
## [61] "SNOW/COLD" "MUDSLIDE"
## [63] "SNOW SQUALL" "ICY ROADS"
## [65] "HEAVY MIX" "SNOW FREEZING RAIN"
## [67] "SNOW/FREEZING RAIN" "SNOW SQUALLS"
## [69] "RECORD SNOW" "COASTAL SURGE"
## [71] "THUNDERSTORM DAMAGE TO" "BLOWING SNOW"
## [73] "THUDERSTORM WIND" "ICE AND SNOW"
## [75] "STORM FORCE WIND" "RAIN/WIND"
## [77] "SNOW/ICE" "UNSEASONABLY WARM AND DRY"
## [79] "UNSEASONABLY COLD" "HIGH WAVES"
## [81] "LOW TEMPERATURE" "HYPOTHERMIA"
## [83] "COLD/WIND" "SNOW/ BITTER COLD"
## [85] "COLD WEATHER" "RAPIDLY RISING WATER"
## [87] "WILD/FOREST FIRE" "SNOW ACCUMULATION"
## [89] "SNOW/ ICE" "SNOW/BLOWING SNOW"
## [91] "LANDSLIDE" "THUNDERSTORMINDS"
## [93] "WILD/FOREST FIRES" "HEAVY SEAS"
## [95] "?" "HIGH WATER"
## [97] "LANDSLIDES" "URBAN/SMALL STREAM"
## [99] "BRUSH FIRE" "HEAVY SWELLS"
## [101] "URBAN SMALL" "Other"
## [103] "URBAN/SML STREAM FLD" "ROUGH SURF"
## [105] "Heavy Surf" "Marine Accident"
## [107] "Freeze" "COASTAL STORM"
## [109] "Damaging Freeze" "Beach Erosion"
## [111] "Unseasonable Cold" "Early Frost"
## [113] "Wintry Mix" "Extreme Cold"
## [115] "Torrential Rainfall" "Landslump"
## [117] "Coastal Storm" "EXTREME WINDCHILL"
## [119] "Glaze" "Extended Cold"
## [121] "Light snow" "Light Snow"
## [123] "MIXED PRECIP" "Freezing Spray"
## [125] "DOWNBURST" "Mudslides"
## [127] "Microburst" "Mudslide"
## [129] "Cold" "Snow Squalls"
## [131] "Wind Damage" "Light Snowfall"
## [133] "Freezing RAIN" "Gusty wind/rain"
## [135] "Wind" "Cold Temperature"
## [137] "Snow" "COLD AND SNOW"
## [139] "RAIN/SNOW" "Gusty WIND"
## [141] "AGRICULTURAL FREEZE" "OTHER"
## [143] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [145] "Lake Effect Snow" "Freezing Rain"
## [147] "Mixed RAIN" "BLACK ICE"
## [149] "COASTALSTORM" "LIGHT SNOW"
## [151] "DAM BREAK" "blowing snow"
## [153] "GRADIENT WIND" "gradient wind"
## [155] "Gradient wind" "WET MICROBURST"
## [157] "Heavy surf and wind" "HIGH SWELLS"
## [159] "UNSEASONAL RAIN" "COASTAL EROSION"
## [161] "HYPERTHERMIA/EXPOSURE" "WINTRY MIX"
## [163] "ROCK SLIDE" "LANDSPOUT"
## [165] "LAKE EFFECT SNOW" "MIXED RAIN"
## [167] "WIND AND WAVE" "LIGHT FREEZING RAIN"
## [169] "ICE ROADS" "ROUGH SEAS"
## [171] "NON-HEAVY WIND DAMAGE" "WARM WEATHER"
## [173] "LATE SEASON SNOW" "WINTER WEATHER MIX"
## [175] "ROGUE WAVE" "FALLING SNOW/ICE"
## [177] "BLOWING DUST" "HAZARDOUS SURF"
## [179] "ICE ON ROAD" "DROWNING"
## [181] "WINTER WEATHER/MIX"
length(missed)
## [1] 3357
length(missed)/length(event)
## [1] 0.01318368
The capture rate is 99 %
This exceeds the threshold of 95% of recorded events set for completeness of capture, and so we include the mapped event set in the original set..
# Add the mapped weather type vector to the reduced data set
sd.red<-data.frame(select(sd.red,year:EVTYPE),
event,
select(sd.red,FATALITIES:CROPDMGEXP),
stringsAsFactors=FALSE)
Damage values to property or crops should be recorded as a number of dollars together with a multipler, that is either “” (blank), denoting “as is”, or “K” for a thousand, “M” for a million or “B” for a billion 1000. Since these are human estimates, there are likely to be errors, even if they had all been entered according to the scheme just described In fact, they have not.
We include in our analysis only those values that have either no signifier or one of the recognised signifiers in either lower of upper case, then multiply the damage values by the appropriate factor, to give final values in millions of dollars.
# select for valid value multipliers
# select for valid value multipliers
sd.redd<-sd.red[grepl("[kK]|[mM]|[bB]|[^3]|[^5]",sd.red$PROPDMGEXP),]
sd.redd$PROPDMGEXP<-gsub("[^kK][^mM][^bB]",'0',sd.redd$PROPDMGEXP)
sd.redd<-sd.redd[grepl("[kK]|[mM]|[bB]",sd.redd$CROPDMGEXP),]
sd.redd$PROPDMGEXP<-gsub("3|5","0.000001",sd.redd$PROPDMGEXP) #remove pesky 3 and 5
# express damage mulipliers in billions of dollars
sd.redd$PROPDMGEXP<-gsub("[kK]","0.000001",sd.redd$PROPDMGEXP)
sd.redd$PROPDMGEXP<-gsub("[mM]","0.001",sd.redd$PROPDMGEXP)
sd.redd$PROPDMGEXP<-gsub("[bB]","1",sd.redd$PROPDMGEXP)
sd.redd$PROPDMGEXP<-as.numeric(sd.redd$PROPDMGEXP)
sd.redd$CROPDMGEXP<-gsub("[kK]","0.000001",sd.redd$CROPDMGEXP)
sd.redd$CROPDMGEXP<-gsub("[mM]","0.001",sd.redd$CROPDMGEXP)
sd.redd$CROPDMGEXP<-gsub("[bB]","1",sd.redd$CROPDMGEXP)
sd.redd$CROPDMGEXP<-as.numeric(sd.redd$CROPDMGEXP)
# check replacements are complete
# should all be 1e-6, 1e-3 or 1
table(sd.redd$PROPDMGEXP)
##
## 0 1e-06 0.001 1
## 4 93707 3925 24
table(sd.redd$CROPDMGEXP)
##
## 1e-06 0.001 1
## 96113 1544 3
sd.redd$PROPDMG<-sd.redd$PROPDMG*sd.redd$PROPDMGEXP
sd.redd$CROPDMG<-sd.redd$CROPDMG*sd.redd$CROPDMGEXP
tfatalities <-sum(sd.red$FATALITIES)
tinjuries<-sum(sd.red$INJURIES)
ttotal<-tfatalities+tinjuries
tpropdmg <-sum(sd.redd$PROPDMG)
tcropdmg <-sum(sd.redd$CROPDMG)
dtotal<-tpropdmg+tcropdmg
#find fatality and injury number for each event type
fatalities<-sd.red %>% group_by(event) %>% summarise(fatalities=sum(FATALITIES))
injuries<-sd.red %>% group_by(event) %>% summarise(injuries=sum(INJURIES))
human<-left_join(fatalities,injuries,by="event")
human<-mutate(human,total=fatalities+injuries)
# add a totals row at the top
human<-arrange(human,desc(total))
totals<-data.frame("Total",tfatalities,tinjuries,ttotal)
names(totals)<-names(human)
human<-rbind(totals,human)
# calculate each row as % of total
fatfrac<-100*human$fatalities/ttotal
injfrac<-100*human$injuries/ttotal
totfrac<-100*human$total/ttotal
# calculate sum contribution of top 5.
top10frac<-sum(fatfrac[2:11])
top10inj<-sum(injfrac[2:11])
top10tot<-sum(totfrac[2:11])
# combine for reults table
human<-cbind(human,fatfrac,injfrac,totfrac)
human$event<-as.character(human$event)
humantop10<-human[1:11,]
names(humantop10)[names(humantop10)=="fatfrac"] <- "Fatalities"
names(humantop10)[names(humantop10)=="injfrac"] <- "Injuries"
# melt data for plot
library(reshape2)
hmelt<-melt(humantop10,id="event",measure.vars=c("Fatalities","Injuries"))
hmelt$event<-as.factor(hmelt$event)
hmelt$variable<-as.factor(hmelt$variable)
hmelt<-arrange(hmelt,desc(variable),desc(value))
## set the levels in order we want
hmelt <- transform(hmelt,
event.ord = factor(
event ,
levels=c( 'Hurricane','Winter Storm','High Wind','Ice Storm','Other','Lightning','Flood','Thunderstorm Wind', 'Heat','Tornado','Total'),
ordered =TRUE))
#find property and crop damage value for each event type
propdmg<-sd.redd %>% group_by(event) %>% summarise(propdmg=sum(PROPDMG))
cropdmg<-sd.redd %>% group_by(event) %>% summarise(cropdmg=sum(CROPDMG))
damage<-left_join(propdmg,cropdmg,by="event")
damage<-mutate(damage,totald=propdmg+cropdmg)
# add a totals row at the top
damage<-arrange(damage,desc(totald))
totalsd<-data.frame("Total",tpropdmg,tcropdmg,dtotal)
names(totalsd)<-names(damage)
damage<-rbind(totalsd,damage)
# calculate each row as % of total
propfrac<-100*damage$propdmg/dtotal
cropfrac<-100*damage$cropdmg/dtotal
dtotfrac<-100*damage$totald/dtotal
# calculate sum contribution of top 5.
dtop10prop<-sum(propfrac[2:11])
dtop10crop<-sum(cropfrac[2:11])
dtop10tot<-sum(dtotfrac[2:11])
# combine for reults table
damage<-cbind(damage,propfrac,cropfrac,dtotfrac)
damage$event<-as.character(damage$event)
damagetop10<-damage[1:11,]
names(damagetop10)[names(damagetop10)=="propfrac"] <- "Property"
names(damagetop10)[names(damagetop10)=="cropfrac"] <- "Crops"
# melt data for plot
library(reshape2)
dmelt<-melt(damagetop10,id="event",measure.vars=c("Property","Crops"))
dmelt$event<-as.factor(dmelt$event)
dmelt$variable<-as.factor(dmelt$variable)
dmelt<-arrange(dmelt,desc(variable),desc(value))
## set the levels in order we want
dmelt <- transform(dmelt,
eventd.ord = factor(
event ,
levels=c( 'Drought','High Wind','Wildfire','Storm Surge/Tide', 'Thunderstorm Wind','Ice Storm','Hail','Tornado','Hurricane','Flood','Total'),
ordered =TRUE))
The top 10 weather event types in respect of total fatalities and injuries caused in the US since 1950 are shown in Figure 2 below.
#g<-ggplot(data=hmelt, aes(x=event, y=value, fill=variable)) +
# geom_bar(stat="identity", position=position_dodge())
g<-ggplot(data=hmelt,
aes(x=event.ord,y=value,fill=variable,order=event.ord))+
geom_bar(stat="identity")+
#facet_wrap(~variable)+
coord_flip()+
scale_y_continuous(breaks = seq(0, 100, 10))+
theme(axis.text.x = element_text(size=14),
axis.text.y=element_text(size=14))+
labs(y = "% of total fatalities and injuries")+
theme(axis.title.x = element_text(size=14,vjust=-.5),
axis.title.y=element_blank())+
theme(legend.text=element_text(size=12),
legend.title = element_blank())+
theme(legend.position=c(.8, .2))
g
Figure 2: The top 10 categories for relative impact of weather events on human health by weather type, as measured by fatalities and injuries caused.
Of the 48 weather event types recognised by NOAA, 94 % of all fatalities and injuries have been caused by these 10 categories.
The top 10 weather event types in respect of total economic impact caused in the US since 1950 are shown in Figure 3 below, subdivided by value of damage to property and by damage to crops.
d<-ggplot(data=dmelt,
aes(x=eventd.ord, y=value,fill=variable,order=eventd.ord))+
geom_bar(stat="identity")+
#facet_wrap(~variable)+
coord_flip()+
scale_y_continuous(breaks = seq(0, 100, 10))+
theme(axis.text.x = element_text(size=14),
axis.text.y=element_text(size=14))+
labs(y = "% of total value of damage caused")+
theme(axis.title.x = element_text(size=14,vjust=-.5),
axis.title.y=element_blank())+
theme(legend.text=element_text(size=12),
legend.title = element_blank())+
theme(legend.position=c(.8, .2))
d
Figure 3: The top 10 categories for relative conomic damage to property and crops caused by weather events, by weather type.
Of the 48 weather event types recognised by NOAA, 97 % of all damage value to property and crops have been caused by these 10 categories.
# clean workspace
rm(list=ls())