Weather data for NOAA tracked events dating back to 1950 was anlayzed to identify the largest potential hazzards. Hazzards were grouped into 21 weather event catagories. Each catagory was compared for fatalities, injuries, property damage and crop damage. While different types of events are worse for different types of damage, Hurricanes were always in the top 5 events for destructiveness. Tsunami’s, which are more of a geological than a weather issue, are the single largest culprit for fatalities and injuries.
This section will cover the creation of a tidy data subset from the original NOAA weather data. The code is written in R.
Downloading the data: The data originally came from the website cloudfront.net and when downloaded will come as a zipped file. The following code will retrieve that file and unzip it and create a .csv file in a subfolder called “data”. The commented out section will read the unzipped file if needed so that the original data does not need to be read in again. The uncommented section will read in the raw csv file.
knitr::opts_chunk$set(cache=TRUE)
#print("downloading file")
#download.file ("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="./data/stormdata.csv.bz2",method="curl")
#print("reading in data")
#data<-read.csv("./data/stormdata.csv.bz2",header=T,na.strings="NA",colClasses=c(rep("NULL",6),"character","character",rep("NULL",14),rep("numeric",3),"character","numeric","character",rep("NULL",9)))
data<-read.csv("./data/repdata-data-StormData.csv",header=T,na.strings="NA",colClasses=c(rep("NULL",6),"character","character",rep("NULL",14),rep("numeric",3),"character","numeric","character",rep("NULL",9)))
Note that the code above does not read in the entire file, rather it reads in 8 columns so that the data takes up less space. These columns are STATE, EVTYPE (the type of weather event), FATALITIES, INJURIES, PROPDMG (the first part of property damage data), PROPDMGEXP (an alphabetical character that represents billions, millions or thousands of dollars for property damage),CROPDMG (the first part of crop damage) and CROPDMGEXP(the equivalent of PROPDMGEXP but for crop damage).
The data was then preprocessed to remove incomplete data.
# this fills all blanks with NA
data[data==""]=NA
# this drops the number of rows from 902297 to 279566
data2<-complete.cases(data)
data<-data[data2,]
The event type initially records over 100 different “types” of events. For example every hurricane and tropical storm are called out individually by name. Additionally many similar events have been called slightly different things thorough the years so that there were multiple categories that referred to similar events.
The next step in pre-processing was to group similar events into fewer catagories that could be analyzed and understood more easily. When making these groupings the goal was to group like-events so that the resulting statistics would describe the result of that TYPE of event.
# Hail was pulled out separate from thunderstorms as it was a specific type of damage
w<-grep("HAIL",data$EVTYPE)
data$EVTYPE[w]<-c("HAIL")
# Winter storms include blizzards, things that bring ice, snow, sleet or other winter storms
ws<-grep("WINTER|SNOW|BLIZZARD|ICE STORM|GLAZE ICE|ICY ROADS|FREEZING FOG|SLEET",data$EVTYPE)
data$EVTYPE[ws]<-c("WINTER STORM")
# putting this before Wind so that wind chill gets classified as cold not wind
cold<- grep("COLD|WIND CHILL",data$EVTYPE)
data$EVTYPE[cold]<-c("COLD")
# make sure thunderstorm is before wind
thunderstorm<-grep("THUNDERSTORM|LIGHTNING|RAIN|RAINS|TSTM|MICROBURST",data$EVTYPE)
data$EVTYPE[thunderstorm]<-c("THUNDERSTORM")
# note this will also catch Marine Strong Wind and Marine High Wind
wind<-grep("WINDS|WIND DAMAGE|STRONG WIND|GUSTY WINDS|HIGH WIND",data$EVTYPE)
data$EVTYPE[wind]<-c("WIND")
# includes tropical storms of all names and hurricanes of all names
hurricane<-grep("HURRICANE|TYPHOON|TROPICAL STORM|TROPICAL DEPRESSION|STORM SURGE",data$EVTYPE)
data$EVTYPE[hurricane]<-c("HURRICANE")
# Frost is held out separate from extreme cold as it is a crop damage problem, rather than health
frost<-grep("FROST|FREEZE|Frost", data$EVTYPE)
data$EVTYPE[frost]<-c("FROST")
# Tornados have several names as well
tornado<-grep("TORNADO|FUNNEL CLOUD|WATERSPOUT|GUSTNADO",data$EVTYPE)
data$EVTYPE[tornado]<-c("TORNADO")
# Floods
flood<-grep("FLOOD|River Flooding|URBAN",data$EVTYPE)
data$EVTYPE[flood]<-c("FLOOD")
# fog
fog<-grep("FOG",data$EVTYPE)
data$EVTYPE[fog]<-c("FOG")
#heat
heat<-grep("HEAT",data$EVTYPE)
data$EVTYPE[heat]<-c("HEAT")
# Fires and forest fires
fire<-grep("WILD/FOREST FIRE|WILDFIRES|FOREST FIRES|WILD FIRE|DENSE SMOKE|WILDFIRE",data$EVTYPE)
data$EVTYPE[fire]<-c("FIRE")
# These were hazzards encountered at the beach
beach<-grep("ASTRONOMICAL|SURF|RIP|Surf",data$EVTYPE)
data$EVTYPE[beach]<-c("BEACH HAZARD")
Doing these combinations results in 21 catagories categories of events. These are checked in the following code. The resulting catagories are shown below.
data$event<-as.factor(data$EVTYPE)
unique(data$event)
## [1] WIND THUNDERSTORM HURRICANE TORNADO
## [5] HAIL FLOOD WINTER STORM HEAT
## [9] COLD FIRE DROUGHT BEACH HAZARD
## [13] FROST FOG DUST STORM LANDSLIDE
## [17] TSUNAMI AVALANCHE SEICHE DUST DEVIL
## [21] VOLCANIC ASHFALL
## 21 Levels: AVALANCHE BEACH HAZARD COLD DROUGHT DUST DEVIL ... WINTER STORM
The next step in processing the data was to convert the two columns for property damage and crop damage into two colums with the calculated dollar value of that event. This was complicated by the fact that there were some characters that were not the requested B,M,K values.
# now get the exponents converted to numbers
prop<-grep("B|b",data$PROPDMGEXP)
data$PROPDMGEXP[prop]<-c("1000000000")
prop1<-grep("M|m",data$PROPDMGEXP)
data$PROPDMGEXP[prop1]<-c("1000000")
prop2<-grep("K|k",data$PROPDMGEXP)
data$PROPDMGEXP[prop2]<-c("1000")
prop3<-grep("5|3",data$PROPDMGEXP)
data$PROPDMGEXP[prop3]<-c("0")
crop<-grep("B|b", data$CROPDMGEXP)
data$CROPDMGEXP[crop]<-c("1000000000")
crop1<-grep("M|m",data$CROPDMGEXP)
data$CROPDMGEXP[crop1]<-c("1000000")
crop2<-grep("K|k",data$CROPDMGEXP)
data$CROPDMGEXP[crop2]<-c("1000")
data$CROPDMGEXP[144]<-c("0")
data$CROPDMGEXP[155]<-c("0")
data$CROPDMGEXP[1988]<-c("0")
data$CROPDMGEXP[2159]<-c("0")
data$CROPDMGEXP[2609]<-c("0")
data$CROPDMGEXP=as.numeric(data$CROPDMGEXP)
data$PROPDMGEXP=as.numeric(data$PROPDMGEXP)
data$PROPERTY<-data$PROPDMG * data$PROPDMGEXP
data$CROP<-data$CROPDMG * data$CROPDMGEXP
The last part of preprocessing was to divide the events up and calculate the average amount of property/crop damage, as well as personal injuries/fatalities for each type of event.
# Split and create means by event type for crop, property damage, deaths and injuries
dataEVENT<-split(data,data$event)
PropertyDamage<-sapply(dataEVENT,function(x) colMeans(x[c("PROPERTY")]))
CropDamage<-sapply(dataEVENT,function(x) colMeans(x[c("CROP")]))
deaths<-sapply(dataEVENT,function(x) colMeans(x[c("FATALITIES")]))
injuries<-sapply(dataEVENT,function(x) colMeans(x[c("INJURIES")]))
EVENT<-names(dataEVENT)
summarydf=data.frame(EVENT,PropertyDamage,CropDamage,deaths,injuries)
The economic impact of weather events can be used to justify the expenditure of money to predict, prepare and mitigate them. Since there are more types of events than money, it is important to prioritize expenditures to address the most signficant risks.
Hurricanes and related events like tropical storms and depressions which frequently can turn into hurricanes, cause the highest average property damage per event. With global warming impacting the strength and frequency of these events, learning how to predict storm paths and strengths and develop hurricane proof shorelines should be a priority. The second biggest problem is Tsunami followed by flood. These are significantly less of an issue in terms of property damage. The following graph shows the top 5 “weather” related events.
worstProp<-summarydf[order(-PropertyDamage),]
worstProptop<-worstProp[1:5,]
barplot(worstProptop$PropertyDamage,names=worstProptop$EVENT,xlab="Type of Event",ylab="Average property damage by event ($)", main="Average Property Damage in Dollars")
Crop damage is a special form of property damage which impacts not only the farmer but the economy. Food production is particularly sensitive to damage at certain times of the year, and events which are insignificant to the rest of the population, like a late frost, can be devistating to crops. The following chart shows the top 5 causes of crop damage.
par(cex.axis=0.5)
worstCrop<-summarydf[order(-CropDamage),]
worstCroptop<-worstCrop[1:5,]
barplot(worstCroptop$CropDamage,names=worstCroptop$EVENT,xlab="Type of Event",ylab="Average crop damage by event ($)", main="Average Crop Damage in Dollars")
While drought and frost are classic crop risks, the highest dollar loss occures from hurricanes. Drought and frost are 2 and 3 respecitively and dwarfed by hurricane dammage. Again the ability to predict and protect against hurricane damage would be the best place to prevent economic damage.
Human loss of life falls into a different catagory. It is hard to calculate the monitary value of loss of life. Similarly injuries, related to loss of life, are hard to economically quantify. What is easy to understand is the need to prevent loss of life. The following stacked graph shows the top 5 events for fatalities and injuries.
worstdeath<-summarydf[order(-deaths),]
worstdeathtop<-worstdeath[1:5,]
worstinjury<-summarydf[order(-injuries),]
worstinjurytop<-worstinjury[1:5,]
par(cex.axis=0.5, mfrow=c(2,1))
barplot(worstdeathtop$deaths,names=worstdeathtop$EVENT,ylab="deaths/event", main="Average Fatalities Per Event", col="red")
barplot(worstinjurytop$injuries,names=worstinjurytop$EVENT,xlab="Type of Event",ylab="injuries/event", main="Average Injuries Per Event",col="orange")
For fatalities and injuries the results are surprisingly different from property damage and crop damage. Tsunamis, which are more of a geological event than an weather event, claim the most lives on average. The next two leading average causes of death or avalanches and beach hazards like rip currents. Heat is the next worst and hurricanes, the number one cause of property damage comes in at number 5.
Injuries have tsunamis as the highest average rate, in this case followed by hurricanes, heat and tornados. Avalanches and drowning are much more likely to kill than injure a person, where as debris from hurricanes and tornados leaves more injuries in their wake.
Hurricanes and tsunamis are both coastal events that bring large amounts of water ashore. These two events combined create the highest average property damage and injury. Systems to help lesson the distruction such as off shore wetlands, sensors under oceans and better prediction models can save money and lives.