Synopsis

This analysis covers the potential damaging effects of weather in terms of both economic damages and population health, as reported by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We seek to identify which weather events lead to the most damage so that municiple planners can have a better idea of which kinds of events should be prioritized for their potential effects.

In general I found that wind and water events such as hurricanes, tornadoes, and tunderstorms, along with their associated flooding and storm serge, were the primary causes of economic damage. These causes also were among the highest causes of damage to population health, although heat and cold events were also significant factors in population health.

Summary plots of the top 10 events for each type are located at the end of this report.

Data Processing

The data provided from the course website is in the repdata_data_StormData.csv.bz2 file, linked here.

Checking the size of the database, I see that the number of records is over 900,000. This is unmanagable for my implementation so I need to find a way to reduce the complexity.

NOAAdata<-read.csv("repdata_data_StormData.csv.bz2",stringsAsFactors = FALSE)
dim(NOAAdata)
## [1] 902297     37

Checking the list of Exponentials I see there are a few records in the billions (“B”), thousands of records in the millions (“M”, ignoring “m” due to low relative size), and hundreds of thousands of records in the thousands (“K”, again ignoring “k” due to low relative size). If I can get good results from just the billion and million dollar entries, this will greatly simplify the work. Since I am looking for “events of greatest consequence”, this should make sense.

table(NOAAdata$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
table(NOAAdata$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

Justify decision to remove events under 1 Million for Crop and Property Damage.

Extract events where Damage exceeds 1 Million

Property Damage Values:

# Find only values with the Million or Billion dollar marker
PropDmg<-filter(NOAAdata,PROPDMGEXP=="B"|PROPDMGEXP=="M")

# Remove any 0 values
PropDmg<-filter(PropDmg,PROPDMG>0)
dim(PropDmg)
## [1] 11359    37

Crop Damage Values:

# Find only values with the Million or Billion dollar marker
CropDmg<-filter(NOAAdata,CROPDMGEXP=="B"|CROPDMGEXP=="M")

# Remove any 0 values
CropDmg<-filter(CropDmg,CROPDMG>0)
dim(CropDmg)
## [1] 1925   37

Create custom functions

# Takes a data frame of type NOAAdata and returns the total Crop Damage for all entries
CropDmgSumer<-function(x){
     sum(x$CROPDMG[x$CROPDMGEXP=="B"])*1e9+
     sum(x$CROPDMG[x$CROPDMGEXP=="M"])*1e6+
     sum(x$CROPDMG[x$CROPDMGEXP=="K"])*1e3
     
}

# Takes a data frame of type NOAAdata and returns the total Property Damage for all entries
PropDmgSumer<-function(x){
     sum(x$PROPDMG[x$PROPDMGEXP=="B"])*1e9+
     sum(x$PROPDMG[x$PROPDMGEXP=="M"])*1e6+
     sum(x$PROPDMG[x$PROPDMGEXP=="K"])*1e3
     
}

# Takes the value and Exponential of a line and returns the value in long form.
IndSummer<-function(val,exp){
     x<-0
     if(exp=="B"){x<-val*1e9}
     if(exp=="M"){x<-val*1e6}
     if(exp=="K"){x<-val*1e3}
     x
}

Analyze effects of removing events under 1 million from data set.

I see below that 97.26% of all Crop damage comes from the Million and Billion dollar loss events.

# Calculate the total amount of Crop damage from events greater than 1 Million.
Total_CropDmg<-CropDmgSumer(CropDmg)

# Calculate the total amount of Crop damage
Total_CropDmg_Inc<-CropDmgSumer(NOAAdata)

# What percent of the total are the events greater than 1 million?
Total_CropDmg/Total_CropDmg_Inc    
## [1] 0.9726451

Similarly, I see that 97.49% of all Property damage comes from the Million and Billion dollar loss events.

# Calculate the total amount of Property damage from events greater than 1 Million.
Total_PropDmg<-PropDmgSumer(PropDmg)

# Calculate the total amount of Property damage
Total_PropDmg_Inc<-PropDmgSumer(NOAAdata)

# What percent of the total are the events greater than 1 million?
Total_PropDmg/Total_PropDmg_Inc
## [1] 0.9748753

Over 97% of the total value should be enough to work with. This leaves me with 11,359 Property Damage events, and 1925 Crop Damage events (not mutually exclusive) out of the 902,297 original records.

With this information, since the question is on total economic damage, I will now extract the events which fall into either economic category and conbine them into one data set.

# Filter out events greater than 1 million, and save for later
AllDmgOrig<-filter(NOAAdata,PROPDMGEXP=="B"|PROPDMGEXP=="M"|CROPDMGEXP=="B"|CROPDMGEXP=="M")
AllDmg<-AllDmgOrig

#Find the values Damage by line
AllDmgLines<-{
     x<-vector(mode="integer",length = dim(AllDmg)[1])
     for(i in 1:dim(AllDmg)[1]){
          x[i]<-sum(IndSummer(AllDmg$PROPDMG[i],AllDmg$PROPDMGEXP[i]),IndSummer(AllDmg$CROPDMG[i],AllDmg$CROPDMGEXP[i]))
     }
     x
}

#Insert  Damage values per line into Data Frame
AllDmg<-mutate(AllDmg,VALUE=AllDmgLines,COUNT=1)

# Remove any 0 values
AllDmg<-filter(AllDmg,VALUE>0)

# Change EVTYPE into factors now to avoid factors with 0 occurences
AllDmg<-mutate(AllDmg,EVTYPE=as.factor(EVTYPE))

Results

Analyze Economic Loss data

To determine the “events of greatest consequence” (by EVTYPE), we now group the data by EVTYPE and summarize it. I’ve included the “Average_loss” per event, the “Count” of events of that type, and the “Max_Loss” for an event of that type.

#Group By EVTYPE
AllDmgEVTYPES<-group_by(AllDmg,EVTYPE)

# Summarize based on EVTYPE
df1<-summarize(AllDmgEVTYPES, Average_Loss = mean(VALUE),Count = sum(COUNT),Max_Loss = max(VALUE))

# Order the data by Average loss amount in decending order
df2<-df1[order(df1$Average_Loss,decreasing=TRUE),]
df2
## Source: local data frame [153 x 4]
## 
##                        EVTYPE Average_Loss Count   Max_Loss
##                        (fctr)        (dbl) (dbl)      (dbl)
## 1   HEAVY RAIN/SEVERE WEATHER   2500000000     1 2.5000e+09
## 2  TORNADOES, TSTM WIND, HAIL   1602500000     1 1.6025e+09
## 3           HURRICANE/TYPHOON   1198527097    60 1.6930e+10
## 4                 STORM SURGE    984203068    44 3.1300e+10
## 5              HURRICANE OPAL    790250000     4 2.1050e+09
## 6                 RIVER FLOOD    596055882    17 1.0000e+10
## 7            STORM SURGE/TIDE    463505000    10 4.0000e+09
## 8         SEVERE THUNDERSTORM    401733333     3 1.2000e+09
## 9                  WILD FIRES    312000000     2 6.1900e+08
## 10                  HURRICANE    175887831    83 3.5000e+09
## ..                        ...          ...   ...        ...

Noticing 2 potential outliers, we observer the remarks on these:

AllDmg$REMARKS[AllDmg$EVTYPE=="HEAVY RAIN/SEVERE WEATHER"]
## [1] "A potent weather system stalled over southeast Louisiana from the evening of May 8 through mid day of May 10 producing two bouts of heavy rain and severe thunderstorms.  The first event occurred the evening of May 8 and early morning of May 9 producing several tornadoes and widespread heavy rain of 8 to 15 inches across the greater New Orleans metro area into southeast St. Tammany Parish.  The second bout of severe weather struck during the evening of May 9 and continued into the morning of May 10, producing rainfall of 10 to 15 inches primarily from St. Tammany Parish into south Mississippi.  Drainage capacity was overwhelmed by the torrential rainfall on each of these nights and water flooded tens of thousands of homes in southeast Louisiana.  By early June, the Federal Emergency Management Agency (FEMA) reported approximately 60,000 addresses representing single family units,  multifamily units, and businesses had filed for assistance due to weather damage.  In late May, the Red Cross estimated 36,000 homes in southeast Louisiana were affected by flood water.  Newspaper accounts indicated weather related damage to reach the $2.5 to $3.0 billion range.  Parish by parish detail follows. "
AllDmg$REMARKS[AllDmg$EVTYPE=="TORNADOES, TSTM WIND, HAIL"]
## [1] "Tornadoes, Thunderstorm Winds, Hail, Storm Surge, Flooding, Beach Erosion, Snow\n\nThe \"Storm Of The Century\" roared across Florida producing a variety of severe and unusual weather conditions for a period of about 18 hours from late Friday the 12th through late afternoon Saturday the 13th.  A severe squall line raced eastward at 50 mph ahead of an intense low producing several tornadoes and strong downbursts as it moved through the state and directly caused seven fatalities.  This was followed by a hitherto unknown  (for the Gulf Coast) winter storm surge of 9 to 12 feet onto the gulf coast in Taylor County, with storm surges and/or tidal and wind driven flooding of 5 to 9 feet elsewhere along the gulf coast of the state and into the lower Keys.  This was followed by a period of 8 to 12 hours of high sustained winds of up to 50 mph with gusts to 70 mph, keeping tides much above normal along the west coast and causing severe beach erosion in many areas.  Total number of fatalities from the storm was 47, including 14 from storm surge and flooding, seven from tornadoes and/or strong downbursts, and four from high winds in the aftermath of the squall line.  Eleven people drowned offshore in the Gulf of Mexico after strong winds swamped or capsized ships  (including seven crewmen from a 200-foot Honduran freighter).  Eleven others died during rescue operations and cleanup activities.  Meanwhile, colder air poured in behind the intense low with up to four inches of snow falling in the panhandle from north of Pensacola to Crestview, and a trace to 3 inches elsewhere across north Florida.  Record or near record low temperatures occurred over much of the state the following two nights.  Total property damage was estimated at $1.6 billion. "

Based on these remarks, I reclassified “HEAVY RAIN/SEVERE WEATHER” as “THUNDERSTORMS”, and I reclassified “TORNADOES, TSTM WIND, HAIL” as “STORM SURGE”.

Additionally, the following items had several variants that I grouped up for clarity: Hurricanes/Typhoons, Tornados, Floods, Storm Serge, Heat, Hail, Ice, Wild Fires and Thunderstorms.

#Reset AllDmg so we can simplify the EVTYPES before we group EVTYPES.
AllDmg<-AllDmgOrig

#Re-classify several EVTYPES to remove outliers without eliminating important data,
#  and to create consistency for some minor variations of spelling and case.
replist<-grep("HEAVY RAIN/SEVERE WEATHER",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("TORNADOES, TSTM WIND, HAIL",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("flood",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"FLOODS"
replist<-grep("hurric",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("typho",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("tropic",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("torna",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"TORNADO"
replist<-grep("storm surge",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("heat",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HEAT"
replist<-grep("hail",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"HAIL"
replist<-grep("freeze",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("ice",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("thunder",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("wildfire",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"WILD/FOREST FIRE"
replist<-grep("wild fire",AllDmg$EVTYPE,ignore.case=TRUE)
AllDmg$EVTYPE[replist]<-"WILD/FOREST FIRE"

# Change EVTYPE into factors now to avoid factors with 0 occurences
AllDmg<-mutate(AllDmg,EVTYPE=as.factor(EVTYPE))

# Re Calculate to see effect of consolidation
#Find the damage values by line
AllDmgLines<-{
     x<-vector(mode="integer",length = dim(AllDmg)[1])
     for(i in 1:dim(AllDmg)[1]){
          x[i]<-sum(IndSummer(AllDmg$PROPDMG[i],AllDmg$PROPDMGEXP[i]),IndSummer(AllDmg$CROPDMG[i],AllDmg$CROPDMGEXP[i]))
     }
     x
}

#Insert  Damage values per line into Data Frame
AllDmg<-mutate(AllDmg,VALUE=AllDmgLines,COUNT=1)

#Group By EVTYPE
AllDmgEVTYPES<-group_by(AllDmg,EVTYPE)

# Summarize based on EVTYPE
df1<-summarize(AllDmgEVTYPES,Total_Loss = sum(VALUE), Average_Loss = mean(VALUE),Count = sum(COUNT),Max_Loss = max(VALUE))

# Order the data by Average loss amount in decending order
df2<-df1[order(df1$Average_Loss,decreasing=TRUE),]
df2
## Source: local data frame [80 x 5]
## 
##                     EVTYPE   Total_Loss Average_Loss Count     Max_Loss
##                     (fctr)        (dbl)        (dbl) (dbl)        (dbl)
## 1              STORM SURGE  49542485000    900772455    55  31300000000
## 2        HURRICANE/TYPHOON  99185616200    349245127   284  16930000000
## 3        EXCESSIVE WETNESS    142000000    142000000     1    142000000
## 4                  DROUGHT  14997822000     96760142   155   1000000000
## 5  COLD AND WET CONDITIONS     66000000     66000000     1     66000000
## 6  WINTER STORM HIGH WINDS     65000000     65000000     1     65000000
## 7                     HEAT    920905000     54170882    17    492570000
## 8                   FLOODS 177289098850     51582513  3437 115032500000
## 9             WINTER STORM   6583500000     46362676   142   5000000000
## 10            EXTREME COLD   1348910000     44963667    30    596000000
## ..                     ...          ...          ...   ...          ...

The Average Loss still has several low count items, what else can we do? All of the low count items do not have high max damage numbers so let’s just filter them out.

df3<-filter(df2,Count>2)
df3
## Source: local data frame [35 x 5]
## 
##               EVTYPE   Total_Loss Average_Loss Count     Max_Loss
##               (fctr)        (dbl)        (dbl) (dbl)        (dbl)
## 1        STORM SURGE  49542485000    900772455    55  31300000000
## 2  HURRICANE/TYPHOON  99185616200    349245127   284  16930000000
## 3            DROUGHT  14997822000     96760142   155   1000000000
## 4               HEAT    920905000     54170882    17    492570000
## 5             FLOODS 177289098850     51582513  3437 115032500000
## 6       WINTER STORM   6583500000     46362676   142   5000000000
## 7       EXTREME COLD   1348910000     44963667    30    596000000
## 8         ICE/FREEZE  10818255500     43273022   250   5000500000
## 9   WILD/FOREST FIRE   8768930000     35791551   245   1500000000
## 10   HIGH WINDS/COLD    116500000     29125000     4     55000000
## ..               ...          ...          ...   ...          ...

Now we have a nice data set that is informative and significant.

Analyze human casuality data

Extract all events causing injuries or fatalaties

FatalOrInj<-filter(NOAAdata, FATALITIES>0|INJURIES>0)

#Group fatalities and injuries together for simplicity
FatalOrInj<-mutate(FatalOrInj, ForI = FATALITIES + INJURIES,Count=1)

#Re-classify several EVTYPES to remove outliers without eliminating important data,
#  and to create consistency for some minor variations of spelling and case.
replist<-grep("HEAVY RAIN/SEVERE WEATHER",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("TORNADOES, TSTM WIND, HAIL",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("flood",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"FLOODS"
replist<-grep("hurric",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("typho",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("tropic",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HURRICANE/TYPHOON"
replist<-grep("torna",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"TORNADO"
replist<-grep("storm surge",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"STORM SURGE"
replist<-grep("heat",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HEAT"
replist<-grep("hail",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"HAIL"
replist<-grep("freeze",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("ice",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"ICE/FREEZE"
replist<-grep("thunder",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"THUNDERSTORMS"
replist<-grep("wildfire",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"WILD/FOREST FIRE"
replist<-grep("wild fire",FatalOrInj$EVTYPE,ignore.case=TRUE)
FatalOrInj$EVTYPE[replist]<-"WILD/FOREST FIRE"

# Change EVTYPE into factors now to avoid factors with 0 occurences
FatalOrInj<-mutate(FatalOrInj,EVTYPE=as.factor(EVTYPE))

# Group by EVTYPE
FatalorInjTypes<-group_by(FatalOrInj,EVTYPE)

# Summarize by EVTYPE
df4<-summarize(FatalorInjTypes,Avg_inc = mean(ForI), Max_inc = max(ForI),Count=sum(Count))

Show summary of human casualty data by average loss per event.

# Sort by Average Human casualty per event
df5<-df4[order(df4$Avg_inc, decreasing = TRUE),]
df5
## Source: local data frame [147 x 4]
## 
##                       EVTYPE  Avg_inc Max_inc Count
##                       (fctr)    (dbl)   (dbl) (dbl)
## 1                    TSUNAMI 81.00000     161     2
## 2         WINTER WEATHER MIX 34.00000      65     2
## 3  UNSEASONABLY WARM AND DRY 29.00000      29     1
## 4              WINTER STORMS 27.00000      27     1
## 5                 WINTRY MIX 26.00000      72     3
## 6         EXCESSIVE RAINFALL 23.00000      23     1
## 7         HIGH WIND AND SEAS 23.00000      23     1
## 8                 ICE/FREEZE 19.54310    1569   116
## 9            SNOW/HIGH WINDS 18.00000      34     2
## 10         HURRICANE/TYPHOON 16.81579     787   114
## ..                       ...      ...     ...   ...

Again, there are a lot of low count events. What happens when we remove these?

Show summary of human casualty data by average loss per event after filtering.

# Remove low event counts.
df5<-filter(df5,Count>3)
df5
## Source: local data frame [55 x 4]
## 
##               EVTYPE   Avg_inc Max_inc Count
##               (fctr)     (dbl)   (dbl) (dbl)
## 1         ICE/FREEZE 19.543103    1569   116
## 2  HURRICANE/TYPHOON 16.815789     787   114
## 3              GLAZE 13.937500      61    16
## 4               HEAT 13.179104     583   938
## 5            TORNADO 12.231283    1742  7934
## 6          DENSE FOG 10.588235      72    34
## 7           BLIZZARD 10.534884     390    86
## 8         DUST STORM 10.266667      40    45
## 9     WINTER WEATHER  8.795918     138    49
## 10               FOG  8.652174      80    92
## ..               ...       ...     ...   ...

These results seem clean and usable.

I notice that several of these results were also in the Economic top 10 as well, namely HURRICANE/TYPHOON, HEAT, and ICE/FREEZE.

Summary Plots of the results

Plot results of Economic damage.

#Select top 10 damage causes by Average Loss.
df3a<-df3[1:10,]

#Reset Factors so they stay in desending order
df3a$EVTYPE<-as.character(df3a$EVTYPE)
df3a$EVTYPE<-factor(df3a$EVTYPE, levels = unique(df3a$EVTYPE))

# Generate the plot with ggplot2
qplot(x = EVTYPE, y = log10(Average_Loss), data = df3a)+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ggtitle("Average Economic Losses by Event Type from 1950-2011")

Plot results of human damage.

#Select top 10 damage causes by Average Loss.
df5a<-df5[1:10,]

#Reset Factors so they stay in desending order
df5a$EVTYPE<-as.character(df5a$EVTYPE)
df5a$EVTYPE<-factor(df5a$EVTYPE, levels = unique(df5a$EVTYPE))

# Generate the plot with ggplot2
qplot(x = EVTYPE, y = Avg_inc, data = df5a)+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ggtitle("Average Human Casualty by Event Type from 1950-2011")