This analysis covers the potential damaging effects of weather in terms of both economic damages and population health, as reported by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We seek to identify which weather events lead to the most damage so that municiple planners can have a better idea of which kinds of events should be prioritized for their potential effects.
In general I found that wind and water events such as hurricanes, tornadoes, and tunderstorms, along with their associated flooding and storm serge, were the primary causes of economic damage. These causes also were among the highest causes of damage to population health, although heat and cold events were also significant factors in population health.
Summary plots of the top 10 events for each type are located at the end of this report.
The data provided from the course website is in the repdata_data_StormData.csv.bz2 file, linked here.
Checking the size of the database, I see that the number of records is over 900,000. This is unmanagable for my implementation so I need to find a way to reduce the complexity.
NOAAdata<-read.csv("repdata_data_StormData.csv.bz2",stringsAsFactors = FALSE)
dim(NOAAdata)
## [1] 902297 37
Checking the list of Exponentials I see there are a few records in the billions (“B”), thousands of records in the millions (“M”, ignoring “m” due to low relative size), and hundreds of thousands of records in the thousands (“K”, again ignoring “k” due to low relative size). If I can get good results from just the billion and million dollar entries, this will greatly simplify the work. Since I am looking for “events of greatest consequence”, this should make sense.
table(NOAAdata$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
table(NOAAdata$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
Property Damage Values:
# Find only values with the Million or Billion dollar marker
PropDmg<-filter(NOAAdata,PROPDMGEXP=="B"|PROPDMGEXP=="M")
# Remove any 0 values
PropDmg<-filter(PropDmg,PROPDMG>0)
dim(PropDmg)
## [1] 11359 37
Crop Damage Values:
# Find only values with the Million or Billion dollar marker
CropDmg<-filter(NOAAdata,CROPDMGEXP=="B"|CROPDMGEXP=="M")
# Remove any 0 values
CropDmg<-filter(CropDmg,CROPDMG>0)
dim(CropDmg)
## [1] 1925 37
# Takes a data frame of type NOAAdata and returns the total Crop Damage for all entries
CropDmgSumer<-function(x){
sum(x$CROPDMG[x$CROPDMGEXP=="B"])*1e9+
sum(x$CROPDMG[x$CROPDMGEXP=="M"])*1e6+
sum(x$CROPDMG[x$CROPDMGEXP=="K"])*1e3
}
# Takes a data frame of type NOAAdata and returns the total Property Damage for all entries
PropDmgSumer<-function(x){
sum(x$PROPDMG[x$PROPDMGEXP=="B"])*1e9+
sum(x$PROPDMG[x$PROPDMGEXP=="M"])*1e6+
sum(x$PROPDMG[x$PROPDMGEXP=="K"])*1e3
}
# Takes the value and Exponential of a line and returns the value in long form.
IndSummer<-function(val,exp){
x<-0
if(exp=="B"){x<-val*1e9}
if(exp=="M"){x<-val*1e6}
if(exp=="K"){x<-val*1e3}
x
}
I see below that 97.26% of all Crop damage comes from the Million and Billion dollar loss events.
# Calculate the total amount of Crop damage from events greater than 1 Million.
Total_CropDmg<-CropDmgSumer(CropDmg)
# Calculate the total amount of Crop damage
Total_CropDmg_Inc<-CropDmgSumer(NOAAdata)
# What percent of the total are the events greater than 1 million?
Total_CropDmg/Total_CropDmg_Inc
## [1] 0.9726451
Similarly, I see that 97.49% of all Property damage comes from the Million and Billion dollar loss events.
# Calculate the total amount of Property damage from events greater than 1 Million.
Total_PropDmg<-PropDmgSumer(PropDmg)
# Calculate the total amount of Property damage
Total_PropDmg_Inc<-PropDmgSumer(NOAAdata)
# What percent of the total are the events greater than 1 million?
Total_PropDmg/Total_PropDmg_Inc
## [1] 0.9748753
Over 97% of the total value should be enough to work with. This leaves me with 11,359 Property Damage events, and 1925 Crop Damage events (not mutually exclusive) out of the 902,297 original records.
With this information, since the question is on total economic damage, I will now extract the events which fall into either economic category and conbine them into one data set.
# Filter out events greater than 1 million, and save for later
AllDmgOrig<-filter(NOAAdata,PROPDMGEXP=="B"|PROPDMGEXP=="M"|CROPDMGEXP=="B"|CROPDMGEXP=="M")
AllDmg<-AllDmgOrig
#Find the values Damage by line
AllDmgLines<-{
x<-vector(mode="integer",length = dim(AllDmg)[1])
for(i in 1:dim(AllDmg)[1]){
x[i]<-sum(IndSummer(AllDmg$PROPDMG[i],AllDmg$PROPDMGEXP[i]),IndSummer(AllDmg$CROPDMG[i],AllDmg$CROPDMGEXP[i]))
}
x
}
#Insert Damage values per line into Data Frame
AllDmg<-mutate(AllDmg,VALUE=AllDmgLines,COUNT=1)
# Remove any 0 values
AllDmg<-filter(AllDmg,VALUE>0)
# Change EVTYPE into factors now to avoid factors with 0 occurences
AllDmg<-mutate(AllDmg,EVTYPE=as.factor(EVTYPE))
To determine the “events of greatest consequence” (by EVTYPE), we now group the data by EVTYPE and summarize it. I’ve included the “Average_loss” per event, the “Count” of events of that type, and the “Max_Loss” for an event of that type.
#Group By EVTYPE
AllDmgEVTYPES<-group_by(AllDmg,EVTYPE)
# Summarize based on EVTYPE
df1<-summarize(AllDmgEVTYPES, Average_Loss = mean(VALUE),Count = sum(COUNT),Max_Loss = max(VALUE))
# Order the data by Average loss amount in decending order
df2<-df1[order(df1$Average_Loss,decreasing=TRUE),]
df2
## Source: local data frame [153 x 4]
##
## EVTYPE Average_Loss Count Max_Loss
## (fctr) (dbl) (dbl) (dbl)
## 1 HEAVY RAIN/SEVERE WEATHER 2500000000 1 2.5000e+09
## 2 TORNADOES, TSTM WIND, HAIL 1602500000 1 1.6025e+09
## 3 HURRICANE/TYPHOON 1198527097 60 1.6930e+10
## 4 STORM SURGE 984203068 44 3.1300e+10
## 5 HURRICANE OPAL 790250000 4 2.1050e+09
## 6 RIVER FLOOD 596055882 17 1.0000e+10
## 7 STORM SURGE/TIDE 463505000 10 4.0000e+09
## 8 SEVERE THUNDERSTORM 401733333 3 1.2000e+09
## 9 WILD FIRES 312000000 2 6.1900e+08
## 10 HURRICANE 175887831 83 3.5000e+09
## .. ... ... ... ...
Noticing 2 potential outliers, we observer the remarks on these:
AllDmg$REMARKS[AllDmg$EVTYPE=="HEAVY RAIN/SEVERE WEATHER"]
## [1] "A potent weather system stalled over southeast Louisiana from the evening of May 8 through mid day of May 10 producing two bouts of heavy rain and severe thunderstorms. The first event occurred the evening of May 8 and early morning of May 9 producing several tornadoes and widespread heavy rain of 8 to 15 inches across the greater New Orleans metro area into southeast St. Tammany Parish. The second bout of severe weather struck during the evening of May 9 and continued into the morning of May 10, producing rainfall of 10 to 15 inches primarily from St. Tammany Parish into south Mississippi. Drainage capacity was overwhelmed by the torrential rainfall on each of these nights and water flooded tens of thousands of homes in southeast Louisiana. By early June, the Federal Emergency Management Agency (FEMA) reported approximately 60,000 addresses representing single family units, multifamily units, and businesses had filed for assistance due to weather damage. In late May, the Red Cross estimated 36,000 homes in southeast Louisiana were affected by flood water. Newspaper accounts indicated weather related damage to reach the $2.5 to $3.0 billion range. Parish by parish detail follows. "
AllDmg$REMARKS[AllDmg$EVTYPE=="TORNADOES, TSTM WIND, HAIL"]
## [1] "Tornadoes, Thunderstorm Winds, Hail, Storm Surge, Flooding, Beach Erosion, Snow\n\nThe \"Storm Of The Century\" roared across Florida producing a variety of severe and unusual weather conditions for a period of about 18 hours from late Friday the 12th through late afternoon Saturday the 13th. A severe squall line raced eastward at 50 mph ahead of an intense low producing several tornadoes and strong downbursts as it moved through the state and directly caused seven fatalities. This was followed by a hitherto unknown (for the Gulf Coast) winter storm surge of 9 to 12 feet onto the gulf coast in Taylor County, with storm surges and/or tidal and wind driven flooding of 5 to 9 feet elsewhere along the gulf coast of the state and into the lower Keys. This was followed by a period of 8 to 12 hours of high sustained winds of up to 50 mph with gusts to 70 mph, keeping tides much above normal along the west coast and causing severe beach erosion in many areas. Total number of fatalities from the storm was 47, including 14 from storm surge and flooding, seven from tornadoes and/or strong downbursts, and four from high winds in the aftermath of the squall line. Eleven people drowned offshore in the Gulf of Mexico after strong winds swamped or capsized ships (including seven crewmen from a 200-foot Honduran freighter). Eleven others died during rescue operations and cleanup activities. Meanwhile, colder air poured in behind the intense low with up to four inches of snow falling in the panhandle from north of Pensacola to Crestview, and a trace to 3 inches elsewhere across north Florida. Record or near record low temperatures occurred over much of the state the following two nights. Total property damage was estimated at $1.6 billion. "
Based on these remarks, I reclassified “HEAVY RAIN/SEVERE WEATHER” as “THUNDERSTORMS”, and I reclassified “TORNADOES, TSTM WIND, HAIL” as “STORM SURGE”.
Additionally, the following items had several variants that I grouped up for clarity: Hurricanes/Typhoons, Tornados, Floods, Storm Serge, Heat, Hail, Ice, Wild Fires and Thunderstorms.
#Reset AllDmg so we can simplify the EVTYPES before we group EVTYPES.
AllDmg<-AllDmgOrig
#Re-classify several EVTYPES to remove outliers without eliminating important data,
# and to create consistency for some minor variations of spelling and case.
replist<-grep("HEAVY RAIN/SEVERE WEATHER",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("TORNADOES, TSTM WIND, HAIL",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("flood",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"FLOODS"
replist<-grep("hurric",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("typho",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("tropic",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("torna",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"TORNADO"
replist<-grep("storm surge",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("heat",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HEAT"
replist<-grep("hail",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HAIL"
replist<-grep("freeze",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("ice",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("thunder",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("wildfire",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"WILD/FOREST FIRE"
replist<-grep("wild fire",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"WILD/FOREST FIRE"
# Change EVTYPE into factors now to avoid factors with 0 occurences
AllDmg<-mutate(AllDmg,EVTYPE=as.factor(EVTYPE))
# Re Calculate to see effect of consolidation
#Find the damage values by line
AllDmgLines<-{
x<-vector(mode="integer",length = dim(AllDmg)[1])
for(i in 1:dim(AllDmg)[1]){
x[i]<-sum(IndSummer(AllDmg$PROPDMG[i],AllDmg$PROPDMGEXP[i]),IndSummer(AllDmg$CROPDMG[i],AllDmg$CROPDMGEXP[i]))
}
x
}
#Insert Damage values per line into Data Frame
AllDmg<-mutate(AllDmg,VALUE=AllDmgLines,COUNT=1)
#Group By EVTYPE
AllDmgEVTYPES<-group_by(AllDmg,EVTYPE)
# Summarize based on EVTYPE
df1<-summarize(AllDmgEVTYPES,Total_Loss = sum(VALUE), Average_Loss = mean(VALUE),Count = sum(COUNT),Max_Loss = max(VALUE))
# Order the data by Average loss amount in decending order
df2<-df1[order(df1$Average_Loss,decreasing=TRUE),]
df2
## Source: local data frame [80 x 5]
##
## EVTYPE Total_Loss Average_Loss Count Max_Loss
## (fctr) (dbl) (dbl) (dbl) (dbl)
## 1 STORM SURGE 49542485000 900772455 55 31300000000
## 2 HURRICANE/TYPHOON 99185616200 349245127 284 16930000000
## 3 EXCESSIVE WETNESS 142000000 142000000 1 142000000
## 4 DROUGHT 14997822000 96760142 155 1000000000
## 5 COLD AND WET CONDITIONS 66000000 66000000 1 66000000
## 6 WINTER STORM HIGH WINDS 65000000 65000000 1 65000000
## 7 HEAT 920905000 54170882 17 492570000
## 8 FLOODS 177289098850 51582513 3437 115032500000
## 9 WINTER STORM 6583500000 46362676 142 5000000000
## 10 EXTREME COLD 1348910000 44963667 30 596000000
## .. ... ... ... ... ...
The Average Loss still has several low count items, what else can we do? All of the low count items do not have high max damage numbers so let’s just filter them out.
df3<-filter(df2,Count>2)
df3
## Source: local data frame [35 x 5]
##
## EVTYPE Total_Loss Average_Loss Count Max_Loss
## (fctr) (dbl) (dbl) (dbl) (dbl)
## 1 STORM SURGE 49542485000 900772455 55 31300000000
## 2 HURRICANE/TYPHOON 99185616200 349245127 284 16930000000
## 3 DROUGHT 14997822000 96760142 155 1000000000
## 4 HEAT 920905000 54170882 17 492570000
## 5 FLOODS 177289098850 51582513 3437 115032500000
## 6 WINTER STORM 6583500000 46362676 142 5000000000
## 7 EXTREME COLD 1348910000 44963667 30 596000000
## 8 ICE/FREEZE 10818255500 43273022 250 5000500000
## 9 WILD/FOREST FIRE 8768930000 35791551 245 1500000000
## 10 HIGH WINDS/COLD 116500000 29125000 4 55000000
## .. ... ... ... ... ...
Now we have a nice data set that is informative and significant.
FatalOrInj<-filter(NOAAdata, FATALITIES>0|INJURIES>0)
#Group fatalities and injuries together for simplicity
FatalOrInj<-mutate(FatalOrInj, ForI = FATALITIES + INJURIES,Count=1)
#Re-classify several EVTYPES to remove outliers without eliminating important data,
# and to create consistency for some minor variations of spelling and case.
replist<-grep("HEAVY RAIN/SEVERE WEATHER",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("TORNADOES, TSTM WIND, HAIL",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("flood",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"FLOODS"
replist<-grep("hurric",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("typho",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("tropic",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("torna",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"TORNADO"
replist<-grep("storm surge",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("heat",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HEAT"
replist<-grep("hail",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HAIL"
replist<-grep("freeze",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("ice",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("thunder",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("wildfire",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"WILD/FOREST FIRE"
replist<-grep("wild fire",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"WILD/FOREST FIRE"
# Change EVTYPE into factors now to avoid factors with 0 occurences
FatalOrInj<-mutate(FatalOrInj,EVTYPE=as.factor(EVTYPE))
# Group by EVTYPE
FatalorInjTypes<-group_by(FatalOrInj,EVTYPE)
# Summarize by EVTYPE
df4<-summarize(FatalorInjTypes,Avg_inc = mean(ForI), Max_inc = max(ForI),Count=sum(Count))
Show summary of human casualty data by average loss per event.
# Sort by Average Human casualty per event
df5<-df4[order(df4$Avg_inc, decreasing = TRUE),]
df5
## Source: local data frame [147 x 4]
##
## EVTYPE Avg_inc Max_inc Count
## (fctr) (dbl) (dbl) (dbl)
## 1 TSUNAMI 81.00000 161 2
## 2 WINTER WEATHER MIX 34.00000 65 2
## 3 UNSEASONABLY WARM AND DRY 29.00000 29 1
## 4 WINTER STORMS 27.00000 27 1
## 5 WINTRY MIX 26.00000 72 3
## 6 EXCESSIVE RAINFALL 23.00000 23 1
## 7 HIGH WIND AND SEAS 23.00000 23 1
## 8 ICE/FREEZE 19.54310 1569 116
## 9 SNOW/HIGH WINDS 18.00000 34 2
## 10 HURRICANE/TYPHOON 16.81579 787 114
## .. ... ... ... ...
Again, there are a lot of low count events. What happens when we remove these?
Show summary of human casualty data by average loss per event after filtering.
# Remove low event counts.
df5<-filter(df5,Count>3)
df5
## Source: local data frame [55 x 4]
##
## EVTYPE Avg_inc Max_inc Count
## (fctr) (dbl) (dbl) (dbl)
## 1 ICE/FREEZE 19.543103 1569 116
## 2 HURRICANE/TYPHOON 16.815789 787 114
## 3 GLAZE 13.937500 61 16
## 4 HEAT 13.179104 583 938
## 5 TORNADO 12.231283 1742 7934
## 6 DENSE FOG 10.588235 72 34
## 7 BLIZZARD 10.534884 390 86
## 8 DUST STORM 10.266667 40 45
## 9 WINTER WEATHER 8.795918 138 49
## 10 FOG 8.652174 80 92
## .. ... ... ... ...
These results seem clean and usable.
I notice that several of these results were also in the Economic top 10 as well, namely HURRICANE/TYPHOON, HEAT, and ICE/FREEZE.
#Select top 10 damage causes by Average Loss.
df3a<-df3[1:10,]
#Reset Factors so they stay in desending order
df3a$EVTYPE<-as.character(df3a$EVTYPE)
df3a$EVTYPE<-factor(df3a$EVTYPE, levels = unique(df3a$EVTYPE))
# Generate the plot with ggplot2
qplot(x = EVTYPE, y = log10(Average_Loss), data = df3a)+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ggtitle("Average Economic Losses by Event Type from 1950-2011")
#Select top 10 damage causes by Average Loss.
df5a<-df5[1:10,]
#Reset Factors so they stay in desending order
df5a$EVTYPE<-as.character(df5a$EVTYPE)
df5a$EVTYPE<-factor(df5a$EVTYPE, levels = unique(df5a$EVTYPE))
# Generate the plot with ggplot2
qplot(x = EVTYPE, y = Avg_inc, data = df5a)+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ggtitle("Average Human Casualty by Event Type from 1950-2011")