## Warning: package 'Rmisc' was built under R version 3.2.2
This report analyses the the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database of storm and weather events recorded from 1950 - 2011 in the United States. It categorises each event and then reports on the impacts of each category of event in terms of negative impacts on population health and damage to property and crops across the entire country. It makes no attempt to break this down by state. That will be performed in a later analysis.
Using the categories generated by the program,we aggregate population health impact, and property and crop damages across all events. The results are presented in graphic and tabular form.
It can be concluded from this analysis that wind based events have the most negative impact on population health whereas damage to property and crops is more evenly spread across wind based and flood based events. Property damage significantly outweighs crop damage by a factor of 10:1 in terms of costs.
The data contained in repdata-data-StormData.csv was downloaded from the NOAA web site (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) and unzipped into the working folder. We load the CSV file into a dataframe and then use grep to categorise the events based on event type as recorded in the database. Some information regarding the contents of the file can be gleaned from (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf).
The event types as stored in the database are somewhat fuzzy in nature, and being hand generated, in some cases, involve mis-spellings and other inaccuracies. We clean the dataset by re-categorising events using the information embedded in the event types.
This code specifically categorises events into: wind, hot weather, cold weather, flood, maritime, meterogical,volcanic, and miscellaneous as follows:
Event Category 1: Wind events (any event associated with high wind speeds, no matter how generated) - wind, storm, tornado, hurricane, dust devil, typhoon, waterspout, funnel cloud, microburst, etc
Event Category 2: Hot weather events (any event associated with hot temperatures) - heat, fire, drought, warm, dry, etc.
Event Category 3: Cold weather events (any event associated with cold temperatures) - cold, snow, ice, hypothermia, freez, frost, low temp,winter, wintry, hail, blizzard, etc.
Event Category 4: Flood events - flood, drowning, water, slides, slumps, tsunami, etc.
Event Category 5: Maritime events - any event occurring at sea/coast which is not storm/wind generated - marine, current, surf, seas, wave, swell, rainfall, stream fld, etc.
Event Category 6: Meteorological events (events which are not directly associated with wind, cold, or heat) - fog, mist, precip, heavy rains, torrential, lightning, etc.
Event Category 7: Volcanic events - ash, vog, etc
Event Category 8: Miscellaneous events - all other events that could not be categorised as above
Once categorised, each category is summated for population health impacts, and property/crop damage impacts.
Extensive use is made of the dataframe functionality provided in the dplyr package.
The summated data is tabulated and plotted in bar graph form using the ggplot2 package. Bar graphs were selected as being one of the easiest ways to access the quantitative data.
## NULL
Table of population health impacts in 1000’s of casualties
## EventCategory Total_Casualties Total_Fatalities Total_Injuries
## 1 WIND 115.844 7.411 108.433
## 2 HEAT 14.123 3.268 10.855
## 3 FLOOD 10.888 1.736 9.152
## 4 METEOROLOGICAL 7.232 0.899 6.333
## 5 COLD 6.003 1.058 4.945
## 6 MARINE 1.578 0.773 0.805
## 7 MISCELLANEOUS 0.005 0.000 0.005
Across the United States, wind based events are by far the most significant in terms of negative impact on population health, both in terms of actual fatalities as well as non-fatal injuries.
## NULL
Table of Damages in billions of dollars.
## EventCategory Total_Damages Total_Property_Damages Total_Crop_Damages
## 1 WIND 241.38870091 227.71988865 13.66881226
## 2 FLOOD 184.63718976 171.28039776 13.35679200
## 3 HEAT 24.84848716 9.56816025 15.28032691
## 4 COLD 24.46125156 17.67611731 6.78513425
## 5 METEOROLOGICAL 0.96499137 0.95289928 0.01209209
## 6 MARINE 0.11925550 0.11925550 0.00000000
## 7 MISCELLANEOUS 0.00246045 0.00142605 0.00103440
## 8 VOLCANIC 0.00050000 0.00050000 0.00000000
Generally, both wind and flood contribute the greatest damage as far as property is concerned. With crops, wind, floods, and heat are all major contributors, with heat - fire and drought - playing the more signifcant part. Property damage outplays crop damage by an a order of magnitude in terms of $ cost.
#load packages
library(Rmisc)
library(ggplot2)
library(dplyr)
#read data into system
StormData <- read.csv("repdata-data-StormData.csv")
#select a subset of data to process for casualties & damages
SD<-select(StormData,COUNTY,COUNTYNAME,STATE,FATALITIES,INJURIES,EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP,REMARKS)
SD$EventCategory<-as.character(SD$EVTYPE)
#Clean event types and categorise events cleanly
#use grep to categorise events as WIND, COLD WEATHER, HOT WEATHER, FLOOD, MARINE, METEOROLOGICAL, VOLCANIC, MISCELLANEOUS
SD$EventCategory[agrepl("tornado", SD$EventCategory, ignore.case=TRUE)] <- "1"
SD$EventCategory[agrepl("hypothermia", SD$EventCategory, ignore.case=TRUE)] <- "3"
SD$EventCategory[grepl("volc|vog", SD$EventCategory, ignore.case=TRUE)] <- "7"
SD$EventCategory[agrepl("avalanche", SD$EventCategory, ignore.case=TRUE)] <- "3"
SD$EventCategory[grepl("gustnado|turbulence|dust|storm|depression|hurricane|cloud|typhoon|high wind|strong wind|burst|spout|funnel",
SD$EventCategory, ignore.case=TRUE)] <- "1"
SD$EventCategory[grepl("hot|high temp|heat|fire|drought|warm|smoke",
SD$EventCategory, ignore.case=TRUE)] <- "2"
SD$EventCategory[grepl("cool|cold|snow|sleet|Freez|glaze|Frost|low temp|winter|wintry|hail|blizzard",
SD$EventCategory, ignore.case=TRUE)] <- "3"
SD$EventCategory[grepl("small str|Dam|Drown|Water|slide|slump|rain|stream fld|wet|seiche",
SD$EventCategory, ignore.case=TRUE)] <- "4"
SD$EventCategory[agrepl("flood",SD$EventCategory, ignore.case=TRUE)] <- "4"
SD$EventCategory[grepl("Marine|Current|SURF|TSUNAMI|SEAS|Wave|Swell|Tide|Coast|beach",
SD$EventCategory, ignore.case=TRUE)] <- "5"
SD$EventCategory[grepl("fog|mist|precip|torrent|lig|shower",
SD$EventCategory, ignore.case=TRUE)] <- "6"
SD$EventCategory[agrepl("wind", SD$EventCategory, ignore.case=TRUE)] <- "1"
SD$EventCategory[agrepl("dry", SD$EventCategory, ignore.case=TRUE)] <- "2"
SD$EventCategory[grepl("ice|icy", SD$EventCategory, ignore.case=TRUE)] <- "3"
#All events that could not be categorised as above
SD$EventCategory[grepl("other", SD$EventCategory, ignore.case=TRUE)] <- "8"
#name event categories
SD$EventCategory[grep("[[:digit:]]", SD$EventCategory, invert = TRUE,ignore.case=TRUE)] <- "8"
SD$EventCategory[grep("1", SD$EventCategory,ignore.case=TRUE)] <- "WIND"
SD$EventCategory[grep("2", SD$EventCategory,ignore.case=TRUE)] <- "HEAT"
SD$EventCategory[grep("3", SD$EventCategory,ignore.case=TRUE)] <- "COLD"
SD$EventCategory[grep("4", SD$EventCategory,ignore.case=TRUE)] <- "FLOOD"
SD$EventCategory[grep("5", SD$EventCategory,ignore.case=TRUE)] <- "MARINE"
SD$EventCategory[grep("6", SD$EventCategory,ignore.case=TRUE)] <- "METEOROLOGICAL"
SD$EventCategory[grep("7", SD$EventCategory,ignore.case=TRUE)] <- "VOLCANIC"
SD$EventCategory[grep("8", SD$EventCategory,ignore.case=TRUE)] <- "MISCELLANEOUS"
SD$EventCategory<-as.factor(SD$EventCategory)
SD<-group_by(SD,EventCategory)
SD$TotalCasualties<-SD$FATALITIES+SD$INJURIES
ReducedCasualtyData<-SD[SD$TotalCasualties>0,]
ReducedCasualtyData<-group_by(ReducedCasualtyData,EventCategory)
ReducedCasSDsummary<-summarise(ReducedCasualtyData,Total_Casualties = sum(TotalCasualties)/1000,
Total_Fatalities = sum(FATALITIES)/1000, Total_Injuries =sum(INJURIES)/1000)
ReducedCasSDsummary<-arrange(ReducedCasSDsummary,desc(Total_Casualties))
#only consider data where there is actual property or crop damage recorded
ReducedData<-SD[(SD$CROPDMG+SD$PROPDMG)>0,]
#convert prop damage and crop damage exponents to a value for that exponent
#reduce the misentries to returning 0
ReducedData$PROPDMGEXP[grepl("1|2|3|4|5|6|7|8|\\?|-| |\\+|]",ReducedData$PROPDMGEXP)]<-"0"
#convert exponent codes to actual values for property damage
ReducedData$PROPDMGEXP<-as.character(ReducedData$PROPDMGEXP)
ReducedData$PROPDMGEXP[grepl("h",ReducedData$PROPDMGEXP,ignore.case=TRUE)]<-"100"
ReducedData$PROPDMGEXP[grepl("k",ReducedData$PROPDMGEXP,ignore.case=TRUE)]<-"1000"
ReducedData$PROPDMGEXP[grepl("m",ReducedData$PROPDMGEXP,ignore.case=TRUE)]<-"1000000"
ReducedData$PROPDMGEXP[grepl("b",ReducedData$PROPDMGEXP,ignore.case=TRUE)]<-"1000000000"
ReducedData$PROPDMGEXP<-as.numeric(ReducedData$PROPDMGEXP)
ReducedData$PROPDMGEXP[is.na(ReducedData$PROPDMGEXP)]<-0
#convert exponent codes to actual values for crop damage
ReducedData$CROPDMGEXP[grepl("1|2|3|4|5|6|7|8|\\?|-| |\\+|]",ReducedData$CROPDMGEXP)]<-"0"
ReducedData$CROPDMGEXP<-as.character(ReducedData$CROPDMGEXP)
ReducedData$CROPDMGEXP[grepl("h",ReducedData$CROPDMGEXP,ignore.case=TRUE)]<-"100"
ReducedData$CROPDMGEXP[grepl("k",ReducedData$CROPDMGEXP,ignore.case=TRUE)]<-"1000"
ReducedData$CROPDMGEXP[grepl("m",ReducedData$CROPDMGEXP,ignore.case=TRUE)]<-"1000000"
ReducedData$CROPDMGEXP[grepl("b",ReducedData$CROPDMGEXP,ignore.case=TRUE)]<-"1000000000"
ReducedData$CROPDMGEXP<-as.numeric(ReducedData$CROPDMGEXP)
ReducedData$CROPDMGEXP[is.na(ReducedData$CROPDMGEXP)]<-0
#convert value and exp to log 10 of actual value for each event in billions of dollars
ReducedData$PropertyDamages=ReducedData$PROPDMG*ReducedData$PROPDMGEXP/1000000000
ReducedData$CropDamages=ReducedData$CROPDMG*ReducedData$CROPDMGEXP/1000000000
#Calculate total damages
ReducedData$TotalDamages<-ReducedData$PropertyDamages+ReducedData$CropDamages
#Group by Event Category
ReducedData<-group_by(ReducedData,EventCategory)
#sum damages across all event types
ReducedSDsummary<-summarise(ReducedData,Total_Damages = sum(TotalDamages),
Total_Property_Damages = sum(PropertyDamages), Total_Crop_Damages =sum(CropDamages))
#sort from largest to samllest
ReducedSDsummary<-arrange(ReducedSDsummary,desc(Total_Damages))
#Population health impacts
ReducedCasSDsummary <- transform(ReducedCasSDsummary,
EventCategory = reorder(EventCategory, order(Total_Casualties,decreasing = TRUE)))
p1<-ggplot(ReducedCasSDsummary,aes(EventCategory, y=Total_Casualties))+geom_bar(stat = "identity", fill="lightblue",colour = "darkblue")+coord_flip()+ ylab("Total Casualties (1000's)") + xlab("Event Category")
p2<-ggplot(ReducedCasSDsummary,aes(EventCategory, y=Total_Fatalities))+geom_bar(stat = "identity", fill="lightblue",colour = "darkblue")+coord_flip()+ ylab("Total Fatalities (1000's)") + xlab("Event Category")
p3<-ggplot(ReducedCasSDsummary,aes(EventCategory, y=Total_Injuries))+geom_bar(stat = "identity", fill="lightblue",colour = "darkblue")+coord_flip()+ylab("Total Injuries (1000's)") + xlab("Event Category")
#print all plots on one page
print(multiplot(p1,p2,p3))
print.data.frame(ReducedSDsummary)
#Damages
ReducedSDsummary <- transform(ReducedSDsummary,
EventCategory = reorder(EventCategory, order(Total_Damages,decreasing = TRUE)))
p1<-ggplot(ReducedSDsummary,aes(EventCategory, y=Total_Damages))+geom_bar(colour = "darkblue", stat = "identity", fill="light blue")+coord_flip()+
ylab("Total Damages (Billion $)") + xlab("Event Category")
p2<-ggplot(ReducedSDsummary,aes(EventCategory, y=Total_Property_Damages))+geom_bar(colour = "darkblue", stat = "identity", fill="light blue")+coord_flip()+
ylab("Total Property Damages (Billion $)") + xlab("Event Category")
p3<-ggplot(ReducedSDsummary,aes(EventCategory, y=Total_Crop_Damages))+geom_bar(colour = "darkblue", stat = "identity", fill="light blue")+coord_flip()+
ylab("Total Crop Damages (Billion $)") + xlab("Event Category")
#print both plots on one page
print(multiplot(p1,p2,p3))
print.data.frame(ReducedSDsummary)