The Cost of Storms in the United States

Synopsis

This analysis looked at NOAA storm data since 1950 to find out what sorts of storms cause the most problems with human health and wealth. We found that tornadoes are by far the worst sorts of storms for human health, and heat is second. We did not finish our analysis on the economic impact.

Data Processing

From the Coursera course website we obtained data on storms and severe weather events that were sourced from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database.

Reading in the data

We first read in the file from the original bz2 file, and then create the storm data frame.

storm <- read.csv(bzfile('repdata-data-StormData.csv.bz2', 'r'), header = TRUE)

Now we check the storm data frame and its structure to confirm we can work with the data as loaded.

dim(storm)
## [1] 902297     37
head(storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We will only be using a small handful of the variables for this analysis – including EVTYPE (event type), FATALITIES, INJURIES, and the various damage-related variables.

Before moving into the analysis, let's take a look at which types of events are most reported. We'll take the top 20.

head(sort(table(storm$EVTYPE), decreasing = TRUE), 20)
## 
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE 
##                     3392                     2761

So thunderstorm-related storms are by far the highest number of reports, between hail, wind, tornados, and flash floods. This doesn't tell us how damaging these storms are, just that they have the most reports.

Cleaning up the data

Making EVTYPE more Uniform

As we were inspecting the data, we discovered that one major problem that we were seeing was somewhat inconsistent reporting for weather events. For example, heat is alternatively described as 'heat', 'excessive heat', or 'extreme heat.' As another example, hail is often reported just as 'hail' but sometimes as hail with a size of the pellets. In order to ease our analysis, we will clean up the EVTYPE variable.

unique(grep('HEAT', storm$EVTYPE, value = TRUE))
##  [1] "HEAT"                   "EXTREME HEAT"          
##  [3] "EXCESSIVE HEAT"         "RECORD HEAT"           
##  [5] "HEAT WAVE"              "DROUGHT/EXCESSIVE HEAT"
##  [7] "RECORD HEAT WAVE"       "RECORD/EXCESSIVE HEAT" 
##  [9] "HEAT WAVES"             "HEAT WAVE DROUGHT"     
## [11] "HEAT/DROUGHT"           "HEAT DROUGHT"          
## [13] "EXCESSIVE HEAT/DROUGHT"

So there are many ways to express heat, but we want to disinclude reasons of drought, because those aren't only related to heat.

heat <- unique(intersect(grep('HEAT', storm$EVTYPE), 
                         grep('DROUGHT', storm$EVTYPE, invert = TRUE)))
storm$EVTYPE <- replace(storm$EVTYPE, heat, 'HEAT')

There are a few other variables that need fixing likewise, in particular those regarding tornadoes, hurricanes, snow, ice, flooding, wind, hail, and thunderstorms.

# Fix Tornadoes
unique(grep('TORN', storm$EVTYPE, value = TRUE))
##  [1] "TORNADO"                    "TORNADO F0"                
##  [3] "TORNADOS"                   "WATERSPOUT/TORNADO"        
##  [5] "WATERSPOUT TORNADO"         "WATERSPOUT-TORNADO"        
##  [7] "TORNADOES, TSTM WIND, HAIL" "COLD AIR TORNADO"          
##  [9] "WATERSPOUT/ TORNADO"        "TORNADO F3"                
## [11] "TORNDAO"                    "TORNADO F1"                
## [13] "TORNADO/WATERSPOUT"         "TORNADO F2"                
## [15] "TORNADOES"                  "TORNADO DEBRIS"
tornadoes <- grep('TORN', storm$EVTYPE)
storm$EVTYPE <- replace(storm$EVTYPE, tornadoes, 'TORNADO')
# Fix hurricanes
unique(grep('HURR', storm$EVTYPE, value = TRUE))
## [1] "HURRICANE OPAL/HIGH WINDS"  "HURRICANE ERIN"            
## [3] "HURRICANE OPAL"             "HURRICANE"                 
## [5] "HURRICANE-GENERATED SWELLS" "HURRICANE EMILY"           
## [7] "HURRICANE GORDON"           "HURRICANE FELIX"           
## [9] "HURRICANE/TYPHOON"
hurricane <- grep('HURR', storm$EVTYPE)
storm$EVTYPE <- replace(storm$EVTYPE, hurricane, 'HURRICANE')
# Fix Ice Storm
unique(grep('ICE STORM', storm$EVTYPE, value = TRUE))
## [1] "ICE STORM/FLASH FLOOD"    "ICE STORM"               
## [3] "SLEET/ICE STORM"          "SNOW AND ICE STORM"      
## [5] "HEAVY SNOW/ICE STORM"     "HEAVY SNOW AND ICE STORM"
## [7] "ICE STORM AND SNOW"       "GLAZE/ICE STORM"         
## [9] "SNOW/ICE STORM"
icestorm <- grep('ICE STORM', storm$EVTYPE)
storm$EVTYPE <- replace(storm$EVTYPE, icestorm, 'ICE STORM')
# Fix Ice
unique(grep('ICE', storm$EVTYPE, value = TRUE))
##  [1] "ICE STORM"                 "SNOW/ICE"                 
##  [3] "ICE"                       "GLAZE ICE"                
##  [5] "ICE JAM FLOODING"          "ICE/SNOW"                 
##  [7] "SNOW AND ICE"              "ICE FLOES"                
##  [9] "HEAVY SNOW AND ICE"        "HEAVY SNOW/ICE"           
## [11] "ICE JAM"                   "FLASH FLOOD FROM ICE JAMS"
## [13] "ICE AND SNOW"              "HEAVY SNOW & ICE"         
## [15] "ICE/STRONG WINDS"          "SNOW/ ICE"                
## [17] "BLACK ICE"                 "ICE PELLETS"              
## [19] "ICE ROADS"                 "FALLING SNOW/ICE"         
## [21] "PATCHY ICE"                "ICE ON ROAD"
ice <- unique(intersect(grep('ICE', storm$EVTYPE), 
                         grep('ICE STORM', storm$EVTYPE, invert = TRUE)))
storm$EVTYPE <- replace(storm$EVTYPE, ice, 'ICE')
# Fix Snow
unique(grep('SNOW', storm$EVTYPE, value = TRUE))
##  [1] "SNOW"                           "HEAVY SNOW"                    
##  [3] "HIGH WIND AND HEAVY SNOW"       "HEAVY SNOW/HIGH"               
##  [5] "HEAVY SNOW/HIGH WINDS/FREEZING" "HIGH WIND/HEAVY SNOW"          
##  [7] "RECORD SNOWFALL"                "HEAVY SNOW/WIND"               
##  [9] "SNOW AND WIND"                  "HEAVY SNOWPACK"                
## [11] "BLIZZARD/HEAVY SNOW"            "HEAVY SNOW/HIGH WINDS"         
## [13] "BLOWING SNOW"                   "LIGHT SNOW AND SLEET"          
## [15] "FIRST SNOW"                     "SLEET/RAIN/SNOW"               
## [17] "RAIN/SNOW"                      "SNOW/RAIN/SLEET"               
## [19] "HEAVY SNOW/HIGH WIND"           "FREEZING RAIN/SNOW"            
## [21] "THUNDERSNOW"                    "HEAVY RAIN/SNOW"               
## [23] "SNOW/SLEET/FREEZING RAIN"       "EARLY SNOW"                    
## [25] "HEAVY SNOW/BLOWING SNOW"        "BLOWING SNOW- EXTREME WIND CHI"
## [27] "SNOW AND HEAVY SNOW"            "SNOW/HEAVY SNOW"               
## [29] "SNOW- HIGH WIND- WIND CHILL"    "HEAVY SNOW/BLIZZARD"           
## [31] "SNOWSTORM"                      "HEAVY SNOW/SLEET"              
## [33] "SNOW/RAIN"                      "SNOW SQUALLS"                  
## [35] "SNOW SQUALL"                    "HEAVY LAKE SNOW"               
## [37] "HEAVY SNOW/FREEZING RAIN"       "LAKE EFFECT SNOW"              
## [39] "HEAVY WET SNOW"                 "BLIZZARD AND HEAVY SNOW"       
## [41] "HEAVY SNOW ANDBLOWING SNOW"     "BLOWING SNOW & EXTREME WIND CH"
## [43] "HEAVY SNOW/WINTER STORM"        "HEAVY SNOW AND HIGH WINDS"     
## [45] "HEAVY SNOW/HIGH WINDS & FLOOD"  "WET SNOW"                      
## [47] "LIGHT SNOW"                     "RECORD SNOW"                   
## [49] "SNOW/COLD"                      "HEAVY SNOW SQUALLS"            
## [51] "HEAVY SNOW/SQUALLS"             "HEAVY SNOW-SQUALLS"            
## [53] "SNOW FREEZING RAIN"             "LACK OF SNOW"                  
## [55] "SNOW/SLEET"                     "SNOW/FREEZING RAIN"            
## [57] "SNOW DROUGHT"                   "HEAVY SNOW   FREEZING RAIN"    
## [59] "FREEZING RAIN AND SNOW"         "SNOW SHOWERS"                  
## [61] "HEAVY SNOW/BLIZZARD/AVALANCHE"  "RECORD SNOW/COLD"              
## [63] "SNOW/ BITTER COLD"              "SNOW SLEET"                    
## [65] "SNOW/HIGH WINDS"                "HIGH WINDS/SNOW"               
## [67] "SNOWMELT FLOODING"              "HEAVY SNOW AND STRONG WINDS"   
## [69] "SNOW ACCUMULATION"              "BLOWING SNOW/EXTREME WIND CHIL"
## [71] "SNOW/BLOWING SNOW"              "NEAR RECORD SNOW"              
## [73] "SLEET/SNOW"                     "SNOW/SLEET/RAIN"               
## [75] "SNOW AND COLD"                  "PROLONG COLD/SNOW"             
## [77] "SNOW\\COLD"                     "SNOWFALL RECORD"               
## [79] "HEAVY SNOW AND"                 "LAKE-EFFECT SNOW"              
## [81] "LATE SNOW"                      "COLD AND SNOW"                 
## [83] "MODERATE SNOW"                  "MODERATE SNOWFALL"             
## [85] "SNOW AND SLEET"                 "LIGHT SNOW/FREEZING PRECIP"    
## [87] "EARLY SNOWFALL"                 "EXCESSIVE SNOW"                
## [89] "MONTHLY SNOWFALL"               "LATE SEASON SNOW"              
## [91] "SNOW ADVISORY"                  "UNUSUALLY LATE SNOW"           
## [93] "ACCUMULATED SNOWFALL"
snow <- unique(grep('SNOW', storm$EVTYPE))
storm$EVTYPE <- replace(storm$EVTYPE, snow, 'SNOW')
# Fix Hail
unique(grep('HAIL', storm$EVTYPE, value = TRUE))
##  [1] "HAIL"                     "THUNDERSTORM WINDS/HAIL" 
##  [3] "THUNDERSTORM WINDS HAIL"  "HAIL 1.75)"              
##  [5] "HAIL STORM"               "HAIL 75"                 
##  [7] "SMALL HAIL"               "HAIL 80"                 
##  [9] "FUNNEL CLOUD/HAIL"        "HAIL 0.75"               
## [11] "HAIL 1.00"                "HAIL/WINDS"              
## [13] "HAIL/WIND"                "HAIL 1.75"               
## [15] "WIND/HAIL"                "THUNDERSTORM WINDS/ HAIL"
## [17] "HAIL 225"                 "HAIL 0.88"               
## [19] "DEEP HAIL"                "HAIL 88"                 
## [21] "HAIL 175"                 "HAIL 100"                
## [23] "HAIL 150"                 "HAIL 075"                
## [25] "HAIL 125"                 "HAIL 200"                
## [27] "HAIL FLOODING"            "HAIL DAMAGE"             
## [29] "THUNDERSTORM HAIL"        "HAIL 088"                
## [31] "THUNDERSTORM WINDSHAIL"   "HAIL/ICY ROADS"          
## [33] "HAIL ALOFT"               "THUNDERSTORM WIND/HAIL"  
## [35] "HAIL 275"                 "HAIL 450"                
## [37] "HAILSTORM"                "HAILSTORMS"              
## [39] "TSTM WIND/HAIL"           "GUSTY WIND/HAIL"         
## [41] "LATE SEASON HAIL"         "NON SEVERE HAIL"         
## [43] "MARINE HAIL"
hail <- unique(grep('HAIL', storm$EVTYPE))
storm$EVTYPE <- replace(storm$EVTYPE, hail, 'HAIL')
# Fix Flash Flooding
unique(grep('FLASH FLOOD', storm$EVTYPE, value = TRUE))
##  [1] "FLASH FLOOD"                    "FLASH FLOODING"                
##  [3] "FLASH FLOODING/THUNDERSTORM WI" "FLASH FLOODS"                  
##  [5] "FLOOD/FLASH FLOOD"              "FLASH FLOOD WINDS"             
##  [7] "FLASH FLOOD/"                   "THUNDERSTORM WINDS/FLASH FLOOD"
##  [9] "LOCAL FLASH FLOOD"              "FLOOD/FLASH FLOODING"          
## [11] "FLASH FLOOD/FLOOD"              "FLASH FLOOD - HEAVY RAIN"      
## [13] "FLASH FLOOD/ STREET"            "FLASH FLOOD/HEAVY RAIN"        
## [15] "FLASH FLOOD/ FLOOD"             "FLASH FLOODING/FLOOD"          
## [17] "FLASH FLOOD/LANDSLIDE"          "FLASH FLOOD LANDSLIDES"        
## [19] " FLASH FLOOD"
ff <- unique(grep('FLASH FLOOD', storm$EVTYPE))
storm$EVTYPE <- replace(storm$EVTYPE, ff, 'FLASH FLOOD')
# Combine all other types of flooding into one label "FLOOD"
unique(grep('FLOOD', storm$EVTYPE, value = TRUE))
##  [1] "FLASH FLOOD"                    "FLOODING"                      
##  [3] "FLOOD"                          "BREAKUP FLOODING"              
##  [5] "RIVER FLOOD"                    "COASTAL FLOOD"                 
##  [7] "FLOOD WATCH/"                   "FLOODING/HEAVY RAIN"           
##  [9] "HEAVY SURF COASTAL FLOODING"    "URBAN FLOODING"                
## [11] "URBAN/SMALL FLOODING"           "LOCAL FLOOD"                   
## [13] "FLOOD/RAIN/WINDS"               "URBAN/SMALL STREAM FLOODING"   
## [15] "STREAM FLOODING"                "FLOOD/RAIN/WIND"               
## [17] "SMALL STREAM URBAN FLOOD"       "URBAN FLOOD"                   
## [19] "HEAVY RAIN/FLOODING"            "COASTAL FLOODING"              
## [21] "HIGH WINDS/FLOODING"            "URBAN/SMALL STREAM FLOOD"      
## [23] "MINOR FLOODING"                 "URBAN/SMALL STREAM  FLOOD"     
## [25] "URBAN AND SMALL STREAM FLOOD"   "SMALL STREAM FLOODING"         
## [27] "FLOODS"                         "SMALL STREAM AND URBAN FLOODIN"
## [29] "SMALL STREAM/URBAN FLOOD"       "SMALL STREAM AND URBAN FLOOD"  
## [31] "RURAL FLOOD"                    "THUNDERSTORM WINDS URBAN FLOOD"
## [33] "MAJOR FLOOD"                    "STREET FLOOD"                  
## [35] "SMALL STREAM FLOOD"             "LAKE FLOOD"                    
## [37] "URBAN AND SMALL STREAM FLOODIN" "RIVER AND STREAM FLOOD"        
## [39] "MINOR FLOOD"                    "HIGH WINDS/COASTAL FLOOD"      
## [41] "RIVER FLOODING"                 "FLOOD/RIVER FLOOD"             
## [43] "MUD SLIDES URBAN FLOODING"      "HEAVY RAIN AND FLOOD"          
## [45] "COASTAL/TIDAL FLOOD"            "HEAVY RAIN; URBAN FLOOD WINDS;"
## [47] "FLOOD FLASH"                    "FLOOD FLOOD/FLASH"             
## [49] "TIDAL FLOOD"                    "FLOOD/FLASH"                   
## [51] "HEAVY RAINS/FLOODING"           "THUNDERSTORM WINDS/FLOODING"   
## [53] "HIGHWAY FLOODING"               "HEAVY RAIN/MUDSLIDES/FLOOD"    
## [55] "BEACH EROSION/COASTAL FLOOD"    "BEACH FLOOD"                   
## [57] "THUNDERSTORM WINDS/ FLOOD"      "FLOOD & HEAVY RAIN"            
## [59] "FLOOD/FLASHFLOOD"               "URBAN SMALL STREAM FLOOD"      
## [61] "URBAN FLOOD LANDSLIDE"          "URBAN FLOODS"                  
## [63] "HEAVY RAIN/URBAN FLOOD"         "LANDSLIDE/URBAN FLOOD"         
## [65] "COASTALFLOOD"                   "STREET FLOODING"               
## [67] "TIDAL FLOODING"                 " COASTAL FLOOD"                
## [69] "COASTAL FLOODING/EROSION"       "URBAN/STREET FLOODING"         
## [71] "COASTAL  FLOODING/EROSION"      "FLOOD/FLASH/FLOOD"             
## [73] "CSTL FLOODING/EROSION"          "LAKESHORE FLOOD"
unique(grep('FLD', storm$EVTYPE, value = TRUE))
## [1] "URBAN/SML STREAM FLD"  "URBAN/SML STREAM FLDG" "URBAN/SMALL STRM FLDG"
flood <- unique(intersect(union(grep('FLOOD', storm$EVTYPE), 
                                grep('FLD', storm$EVTYPE)),
                         grep('FLASH FLOOD', storm$EVTYPE, invert = TRUE)))
storm$EVTYPE <- replace(storm$EVTYPE, flood, 'FLOOD')
# Fix Thunderstorm Wind
unique(grep('TSTM WIND', storm$EVTYPE, value = TRUE))
##  [1] "TSTM WIND"               "TSTM WIND 51"           
##  [3] "TSTM WIND 50"            "TSTM WIND 52"           
##  [5] "TSTM WIND 55"            "TSTM WIND G58"          
##  [7] "TSTM WIND DAMAGE"        "TSTM WINDS"             
##  [9] "TSTM WIND 65)"           "TSTM WIND (G45)"        
## [11] "TSTM WIND 40"            "TSTM WIND 45"           
## [13] "TSTM WIND (41)"          "TSTM WIND (G40)"        
## [15] " TSTM WIND"              "TSTM WIND AND LIGHTNING"
## [17] " TSTM WIND (G45)"        "TSTM WIND  (G45)"       
## [19] "TSTM WIND (G35)"         "TSTM WIND G45"          
## [21] "NON-TSTM WIND"           "NON TSTM WIND"          
## [23] "MARINE TSTM WIND"
unique(grep('THUNDERSTORM WIND', storm$EVTYPE, value = TRUE))
##  [1] "THUNDERSTORM WINDS"             "THUNDERSTORM WIND"             
##  [3] "THUNDERSTORM WINDS LIGHTNING"   "THUNDERSTORM WINDS/FUNNEL CLOU"
##  [5] "SEVERE THUNDERSTORM WINDS"      "LIGHTNING THUNDERSTORM WINDSS" 
##  [7] "THUNDERSTORM WINDS 60"          "THUNDERSTORM WINDSS"           
##  [9] "LIGHTNING THUNDERSTORM WINDS"   "THUNDERSTORM WINDS53"          
## [11] "THUNDERSTORM WINDS 13"          "THUNDERSTORM WINDS SMALL STREA"
## [13] "THUNDERSTORM WINDS 2"           "THUNDERSTORM WINDS 61"         
## [15] "THUNDERSTORM WIND/LIGHTNING"    "THUNDERSTORM WIND G50"         
## [17] "THUNDERSTORM WINDS/HEAVY RAIN"  "THUNDERSTORM WINDS      LE CEN"
## [19] "THUNDERSTORM WINDS G"           "THUNDERSTORM WIND G60"         
## [21] "THUNDERSTORM WINDS."            "THUNDERSTORM WIND G55"         
## [23] "THUNDERSTORM WINDS G60"         "THUNDERSTORM WINDS FUNNEL CLOU"
## [25] "THUNDERSTORM WINDS 62"          "THUNDERSTORM WINDS 53"         
## [27] "THUNDERSTORM WIND 59"           "THUNDERSTORM WIND 52"          
## [29] "THUNDERSTORM WIND 69"           "THUNDERSTORM WIND 60 MPH"      
## [31] "THUNDERSTORM WIND 65MPH"        "THUNDERSTORM WIND/ TREES"      
## [33] "THUNDERSTORM WIND/AWNING"       "THUNDERSTORM WIND 98 MPH"      
## [35] "THUNDERSTORM WIND TREES"        "THUNDERSTORM WIND 59 MPH"      
## [37] "THUNDERSTORM WINDS 63 MPH"      "THUNDERSTORM WIND/ TREE"       
## [39] "THUNDERSTORM WIND 65 MPH"       "THUNDERSTORM WIND."            
## [41] "THUNDERSTORM WIND 59 MPH."      "THUNDERSTORM WINDS AND"        
## [43] "THUNDERSTORM WINDS 50"          "THUNDERSTORM WIND G52"         
## [45] "THUNDERSTORM WINDS 52"          "THUNDERSTORM WIND G51"         
## [47] "THUNDERSTORM WIND G61"          "THUNDERSTORM WIND 50"          
## [49] "THUNDERSTORM WIND 56"           "THUNDERSTORM WINDS HEAVY RAIN" 
## [51] "THUNDERSTORM WIND (G40)"        "GUSTY THUNDERSTORM WINDS"      
## [53] "GUSTY THUNDERSTORM WIND"        "MARINE THUNDERSTORM WIND"
tstorm <- unique(intersect(union(grep('TSTM WIND', storm$EVTYPE),
                           grep('THUNDERSTORM WIND', storm$EVTYPE)),
                    grep('NON', storm$EVTYPE, invert = TRUE)))
storm$EVTYPE <- replace(storm$EVTYPE, tstorm, 'TSTM WIND')
head(sort(table(storm$EVTYPE), decreasing = TRUE), 20)
## 
##      TSTM WIND           HAIL        TORNADO    FLASH FLOOD          FLOOD 
##         335548         290393          60701          55661          30380 
##      HIGH WIND           SNOW      LIGHTNING     HEAVY RAIN   WINTER STORM 
##          20212          17539          15754          11723          11433 
## WINTER WEATHER   FUNNEL CLOUD     WATERSPOUT    STRONG WIND       WILDFIRE 
##           7026           6839           3796           3566           2761 
##       BLIZZARD           HEAT        DROUGHT      ICE STORM     HIGH WINDS 
##           2719           2628           2488           2032           1533

Based on this final table, most of the highest-occurring storm types have been cleaned up, and this should be sufficient to answer our questions. We could achieve more fidelity with our analysis if we combined the various forms of snow/winter weather/winter storm, etc; but sometimes these various terms mean slightly different things so we are going to leave them as-is.

Economic Impact

When looking at economic impact of the storms, we are looking at Property Damage and Crop Damage. The data are presented as a number in one column and a multiplier in another. The multiplier columns look like what's shown below.

table(storm$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(storm$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

It is clear what is mean by the exp's such as 'K', 'M', and 'B', but it is not clear what is meant by exp's such as 'H', '0', '2'. Since those unclear columns are such a small part of the overall total we'll basically ignore them.

storm$PROPDMGEXP.number <- as.character(storm$PROPDMGEXP)
proptozero <- grep('\\?|\\+|\\-|0|1|2|3|4|5|6|7|8|h|H', storm$PROPDMGEXP.number)
proptok <- grep('k|K', storm$PROPDMGEXP.number)
proptom <- grep('m|M', storm$PROPDMGEXP.number)
proptob <- grep('B', storm$PROPDMGEXP.number)
storm$PROPDMGEXP.number <- replace(storm$PROPDMGEXP.number, proptozero, 0)
storm$PROPDMGEXP.number <- replace(storm$PROPDMGEXP.number, proptok, 1000)
storm$PROPDMGEXP.number <- replace(storm$PROPDMGEXP.number, proptom, 1000000)
storm$PROPDMGEXP.number <- replace(storm$PROPDMGEXP.number, proptob, 1000000000)
storm$PROPDMGEXP.number <- as.numeric(storm$PROPDMGEXP.number)

# Fix CROPDMGEXP
storm$CROPDMGEXP.number <- as.character(storm$CROPDMGEXP)
croptozero <- grep('\\?|0|2', storm$CROPDMGEXP.number)
croptok <- grep('k|K', storm$CROPDMGEXP.number)
croptom <- grep('m|M', storm$CROPDMGEXP.number)
croptob <- grep('B', storm$CROPDMGEXP.number)
storm$CROPDMGEXP.number <- replace(storm$CROPDMGEXP.number, croptozero, 0)
storm$CROPDMGEXP.number <- replace(storm$CROPDMGEXP.number, croptok, 1000)
storm$CROPDMGEXP.number <- replace(storm$CROPDMGEXP.number, croptom, 1000000)
storm$CROPDMGEXP.number <- replace(storm$CROPDMGEXP.number, croptob, 1000000000)
storm$CROPDMGEXP.number <- as.numeric(storm$CROPDMGEXP.number)

Now we create two extra columns with total Property Damage and total Crop Damage and a final column with total economic impact.

storm$TOTALPROPDMG <- storm$PROPDMG * storm$PROPDMGEXP.number
storm$TOTALCROPDMG <- storm$CROPDMG * storm$CROPDMGEXP.number
storm$ECONIMPACT <- storm$TOTALPROPDMG + storm$TOTALCROPDMG
summary(storm$ECONIMPACT)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## 0.00e+00 0.00e+00 0.00e+00 9.35e+05 4.00e+03 1.15e+11   622731

Results

We have two main questions that we are trying to answer in this analysis. First, which types of storms have the most effect on human health? Secondly, what sorts of storms have the most economic impact?

Storms and Human Health

Let's first look a summary for how the storms affect human health overall. The two health-related variables are INJURIES and FATALITIES.

# Display a summary of INJURY data from storm data
summary(storm$INJURIES)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     0.0     0.2     0.0  1700.0
# Display a summary of FATALITIES data from storm data
summary(storm$FATALITIES)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0     583

So far and away most storms have little documented effect on human health. It's interesting that in both cases the mean is above the 3rd quartile (which is 0 in both cases), so that means most weather events do not hurt or kill people, but that a few have a very large effect.

We'll table these two variables and look at the top 50 or so to see what the upper end of the injuries and fatalities spectrum looks like.

head(sort(storm$INJURIES, decreasing = TRUE), 50)
##  [1] 1700 1568 1228 1150 1150  800  800  785  780  750  700  600  597  560
## [15]  550  519  504  500  500  500  500  500  500  500  463  450  450  450
## [29]  437  411  410  397  385  350  350  350  350  342  325  316  306  300
## [43]  300  300  300  300  293  280  280  275
head(sort(storm$FATALITIES, decreasing = TRUE), 50)
##  [1] 583 158 116 114  99  90  75  74  67  57  57  50  49  46  44  42  42
## [18]  42  38  37  36  34  33  33  33  32  32  32  31  31  31  30  30  30
## [35]  29  29  29  27  27  27  26  25  25  25  25  25  24  24  24  24

Looking at the above, we can tell that injuries data are in most cases likely rough estimates – there are too many round numbers for it to be a coincidence – but that fatalities information is likely much more accurate.

We'll create a smaller dataframe consisting of just a few variables from the storm data set to make it easier to sort.

storm.health <- subset(storm, select = c(BGN_DATE, EVTYPE, INJURIES, FATALITIES))
head(storm.health)
##             BGN_DATE  EVTYPE INJURIES FATALITIES
## 1  4/18/1950 0:00:00 TORNADO       15          0
## 2  4/18/1950 0:00:00 TORNADO        0          0
## 3  2/20/1951 0:00:00 TORNADO        2          0
## 4   6/8/1951 0:00:00 TORNADO        2          0
## 5 11/15/1951 0:00:00 TORNADO        2          0
## 6 11/15/1951 0:00:00 TORNADO        6          0

Now we'll create two separate data frames, sorting on injuries and fatalities, respectively.

# Create df to sort by injuries
storm.health.injuries <- storm.health[ order(storm.health$INJURIES,
                                             decreasing = TRUE), ]
head(storm.health.injuries, 20)
##                  BGN_DATE    EVTYPE INJURIES FATALITIES
## 157885  4/10/1979 0:00:00   TORNADO     1700         42
## 223449   2/8/1994 0:00:00 ICE STORM     1568          1
## 67884    6/9/1953 0:00:00   TORNADO     1228         90
## 116011   4/3/1974 0:00:00   TORNADO     1150         36
## 862634  5/22/2011 0:00:00   TORNADO     1150        158
## 344159 10/17/1998 0:00:00     FLOOD      800          2
## 860386  4/27/2011 0:00:00   TORNADO      800         44
## 68670    6/8/1953 0:00:00   TORNADO      785        116
## 529351  8/13/2004 0:00:00 HURRICANE      780          7
## 344178 10/17/1998 0:00:00     FLOOD      750          0
## 858228  4/27/2011 0:00:00   TORNADO      700         20
## 344158 10/17/1998 0:00:00     FLOOD      600         11
## 148852  5/11/1953 0:00:00   TORNADO      597        114
## 35124   4/11/1965 0:00:00   TORNADO      560         17
## 344163 10/17/1998 0:00:00     FLOOD      550          0
## 667233   8/4/2007 0:00:00      HEAT      519          2
## 78567    3/3/1966 0:00:00   TORNADO      504         57
## 16503   10/3/1979 0:00:00   TORNADO      500          3
## 29566   4/21/1967 0:00:00   TORNADO      500         33
## 153749  5/11/1970 0:00:00   TORNADO      500         26
# Create df to sort by fatalities
storm.health.fatalities <- storm.health[ order(storm.health$FATALITIES,
                                             decreasing = TRUE), ]
head(storm.health.fatalities, 20)
##                 BGN_DATE  EVTYPE INJURIES FATALITIES
## 198704 7/12/1995 0:00:00    HEAT        0        583
## 862634 5/22/2011 0:00:00 TORNADO     1150        158
## 68670   6/8/1953 0:00:00 TORNADO      785        116
## 148852 5/11/1953 0:00:00 TORNADO      597        114
## 355128 7/28/1999 0:00:00    HEAT        0         99
## 67884   6/9/1953 0:00:00 TORNADO     1228         90
## 46309  5/25/1955 0:00:00 TORNADO      270         75
## 371112  7/4/1999 0:00:00    HEAT      135         74
## 230927  7/1/1995 0:00:00    HEAT        0         67
## 78567   3/3/1966 0:00:00 TORNADO      504         57
## 247938 7/13/1995 0:00:00    HEAT        0         57
## 6370   3/21/1952 0:00:00 TORNADO      325         50
## 598500 9/21/2005 0:00:00    HEAT        0         49
## 606363 7/16/2006 0:00:00    HEAT       18         46
## 860386 4/27/2011 0:00:00 TORNADO      800         44
## 157885 4/10/1979 0:00:00 TORNADO     1700         42
## 362850 7/18/1999 0:00:00    HEAT      397         42
## 629242  8/1/2006 0:00:00    HEAT        0         42
## 78123  12/5/1953 0:00:00 TORNADO      270         38
## 83578  5/20/1957 0:00:00 TORNADO      176         37

Now we're going to see what are the types of storms that have the most effect on human health, and order to get the top 6 to plot.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stormsgrp <- group_by(storm, EVTYPE)
storm.inj.top <- summarise(stormsgrp, sum = sum(INJURIES),
                                       median = median(INJURIES),
                                       mean = mean(INJURIES),
                                       n = n())
storm.inj.top <- storm.inj.top[ order(storm.inj.top$sum, decreasing = T), ]
head(storm.inj.top)
## Source: local data frame [6 x 5]
## 
##        EVTYPE   sum median   mean      n
## 525   TORNADO 91407      0 1.5059  60701
## 537 TSTM WIND  9396      0 0.0280 335548
## 179      HEAT  9139      0 3.4775   2628
## 124     FLOOD  6880      0 0.2265  30380
## 292 LIGHTNING  5230      0 0.3320  15754
## 271 ICE STORM  1992      0 0.9803   2032
storm.fat.top <- summarise(stormsgrp, sum = sum(FATALITIES),
                                        median = median(FATALITIES),
                                        mean = mean(FATALITIES),
                                        n = n())
storm.fat.top <- storm.fat.top[ order(storm.fat.top$sum, decreasing = T), ]
head(storm.fat.top)
## Source: local data frame [6 x 5]
## 
##          EVTYPE  sum median     mean      n
## 525     TORNADO 5661      0 0.093260  60701
## 179        HEAT 3132      0 1.191781   2628
## 121 FLASH FLOOD 1035      0 0.018595  55661
## 292   LIGHTNING  816      0 0.051796  15754
## 537   TSTM WIND  723      0 0.002155 335548
## 124       FLOOD  516      0 0.016985  30380

We chose to order by total sum of injuries and fatalities, because this seems to be the best way of reporting total human toll over time and many events.

Now we create vectors with the top 6 event types for each type of human toll.

topinjuries <- head(storm.inj.top$EVTYPE, 6)
topfatalities <- head(storm.fat.top$EVTYPE, 6)
print(topinjuries)
## [1] TORNADO   TSTM WIND HEAT      FLOOD     LIGHTNING ICE STORM
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND
print(topfatalities)
## [1] TORNADO     HEAT        FLASH FLOOD LIGHTNING   TSTM WIND   FLOOD      
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

The above are the worst sorts of storms.

** At this point I had to quit my work due to a family emergency. I did not leave enough time to come back to it. I might have to re-take this class next month. Thanks for your time grading it and looking at it. **