Synopsis

This is an analysis of storm data between 1950 and November of 2011 from the NOAA (National Oceanic and Atmospheric Administration) Storm Database. This analysis aims to answer two important questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The analysis showed that Floods, Hail, High Winds, and Tornadoes are the most harmful in regards to population health and economic damages. The deadliest natural event is the Tornado, which also caused the most injuries. Floods cause the most property damage and Drought causes the most crop damage. All of these events have become more frequent in the years leading up to 2011.

Some important links are below:

Data Processing

Downloading and reading in the data.

download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="./stormdata.csv.bz2",method="curl")

stormdata=read.csv("./stormdata.csv.bz2")
      
print(object.size(stormdata),units="Mb")
## 409.4 Mb

We can cut out a few columns that we don’t need to trim down the data set. We only really need the following variables for our analysis:

We can also cut down the data set to where either FATALITIES, INJURIES, CROPDMG, or PROPDMG are greater than zero. Then we can extract the easy variables (from the list above) first.

stormdata2<-stormdata[stormdata$FATALITIES>0|stormdata$INJURIES>0|stormdata$CROPDMG>0|stormdata$PROPDMG>0,]

REFNUM<-stormdata2$REFNUM

MONTH<-as.factor(unclass(as.POSIXlt(stormdata2$BGN_DATE,format="%m/%d/%Y"))$mon+1)

YEAR<-as.factor(unclass(as.POSIXlt(stormdata2$BGN_DATE,format="%m/%d/%Y"))$year+1900)

FATALITIES<-as.numeric(stormdata2$FATALITIES)

INJURIES<-as.numeric(stormdata2$INJURIES)

Next, we need to figure out what the inflation adjustments needs to be and how to eventually work that into the damage numbers. Inflation is defined as a sustained increase in the general level of prices of goods and services. That means that a dollar fifty years ago had much more purchasing power than a dollar does today, assuming positive cumulative inflation. This analysis will be incomplete if historical dollar figures are unadjusted and compared to more recent dollar figures. A good proxy for inflation is the Consumer Price Index.

The following data set could be downloaded from The BLS Website. The “From” box has been set to 1950 and the “To” box has been set to 2011. Unfortunately, there is no automated way to get this exact dataset from the BLS (U.S. Bureau of Labor Statistics). It is worth sticking to the BLS website to get the most accurate inflation estimates.

First, let’s import the CPI data and get it ready for use.

if("xlsx" %in% rownames(installed.packages()) == FALSE) {install.packages("xlsx")}
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
CPIdata<-read.xlsx("./BLS CPI Data 1950-2011.xlsx",rowIndex = 11:73, sheetName = "BLS Data Series")

names(CPIdata)[2:13]<-c(1:12)

if("reshape2" %in% rownames(installed.packages()) == FALSE) {install.packages("reshape2")}
library(reshape2)

CPIdatamelt<-melt(CPIdata,measure.vars = names(CPIdata)[2:16])

CPIdatamelt$Year<-as.factor(CPIdatamelt$Year)

names(CPIdatamelt)[2]<-"Year.Part"

Now let’s prepare both data sets for the CPI matching loop.

MONTH.YEAR.CPI<-paste(CPIdatamelt$Year.Part,CPIdatamelt$Year)

CPIdatamelt<-cbind(MONTH.YEAR.CPI,CPIdatamelt)

MONTH.YEAR.STORM<-paste(MONTH,YEAR)

stormdata2<-cbind(MONTH.YEAR.STORM,stormdata2)

Finally, let’s write a loop to match the correct CPI numbers to the stormdata2 time periods.

CPInumbers<-vector()

i=1

for(i in i:length(stormdata2$MONTH.YEAR.STORM))
{
    q<-which(as.character(CPIdatamelt$MONTH.YEAR.CPI) ==            as.character(stormdata2$MONTH.YEAR.STORM[i]))
    
    CPInumbers<-c(CPInumbers,CPIdatamelt[q,4])
}

At this point, we have a vector that has the correct CPI number for each month in each year of the data set. This will be used in a later calculation when adjusting the damage figures.

Now we need to see what the orders of magnitude are for PROPDMGEXP so we can bring the numbers to the correct scale. Let’s do a spot check on one. Let’s also see how many instances we have where the damage is 0 but there is a scale.

levels(stormdata2$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
q<-which(stormdata2$PROPDMGEXP=="3")

stormdata2[q,]
##        MONTH.YEAR.STORM STATE__          BGN_DATE BGN_TIME TIME_ZONE
## 214375           5 1995      29 5/16/1995 0:00:00     1750       CST
##        COUNTY COUNTYNAME STATE             EVTYPE BGN_RANGE BGN_AZI
## 214375    205     SHELBY    MO THUNDERSTORM WINDS         2       S
##         BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE
## 214375 Shelbyville                            0         NA         0
##        END_AZI END_LOCATI LENGTH WIDTH  F MAG FATALITIES INJURIES PROPDMG
## 214375                         0     0 NA   0          0        0      20
##        PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE
## 214375          3     2.5          K                                 0
##        LONGITUDE LATITUDE_E LONGITUDE_
## 214375         0          0          0
##                                                                                                                                                                                                                                                                REMARKS
## 214375 The county sheriff reported numerous windows broken and large tree limbs down in Shelbyville from high wind gusts.  As the storms moved south, four barns were heavily damaged south of town.  Numerous large trees across the county were blown over as well. 
##        REFNUM
## 214375 214307
sum(stormdata2$PROPDMG=="0" & stormdata2$PROPDMGEXP!="")
## [1] 3950

We can see that there are different ways to call the same order of magnitude and all of these need to be accounted for. Some of the PROPDMG numbers are recorded as “0” with an EXP most likely because there was no clear estimate. We need to keep all of these levels in mind as we are going through the loop and adjust for the 0s. 1 through 9 will be sampled randomly every time there is a property damage order of magnitude without a base number in PROPDMG. This figure will be multiplied by the order of magnitude to produce a pseudo-random estimate of property damage.

set.seed(123)
zero.random.adj<-sample(1:9,size=sum(stormdata2$PROPDMG=="0" & stormdata2$PROPDMGEXP!=""),replace=TRUE)

i=1

j=1

propdmgfull<-vector()

stormdata2$PROPDMGEXP<-as.character(stormdata2$PROPDMGEXP)

for(i in i:length(stormdata2$PROPDMGEXP))
{
  if(stormdata2$PROPDMGEXP[i]==""|stormdata2$PROPDMGEXP[i]=="-"|stormdata2$PROPDMGEXP[i]=="?"|stormdata2$PROPDMGEXP[i]=="+"|stormdata2$PROPDMGEXP[i]=="0")
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i])} 
    
  if(stormdata2$PROPDMGEXP[i]=="1" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e1)}  
    
  if(stormdata2$PROPDMGEXP[i]=="1" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e1))
   j<-j+1
  } 
    
  if(stormdata2$PROPDMGEXP[i]=="2" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e2)}  
    
  if(stormdata2$PROPDMGEXP[i]=="2" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e2))
   j<-j+1
  } 
    
  if(stormdata2$PROPDMGEXP[i]=="3" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e3)}  
    
  if(stormdata2$PROPDMGEXP[i]=="3" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e3))
   j<-j+1
  }     
  
   if(stormdata2$PROPDMGEXP[i]=="4" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e4)}  
    
  if(stormdata2$PROPDMGEXP[i]=="4" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e4))
   j<-j+1
  }     
    
  if(stormdata2$PROPDMGEXP[i]=="5" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e5)}  
    
  if(stormdata2$PROPDMGEXP[i]=="5" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e5))
   j<-j+1
  }
    
  if(stormdata2$PROPDMGEXP[i]=="6" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e6)}  
    
  if(stormdata2$PROPDMGEXP[i]=="6" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e6))
   j<-j+1
  }
    
  if(stormdata2$PROPDMGEXP[i]=="7" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e7)}  
    
  if(stormdata2$PROPDMGEXP[i]=="7" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e7))
   j<-j+1
  }
    
  if(stormdata2$PROPDMGEXP[i]=="8" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e8)}  
    
  if(stormdata2$PROPDMGEXP[i]=="8" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e8))
   j<-j+1
  } 
    
   if(stormdata2$PROPDMGEXP[i]=="B" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e9)}  
    
  if(stormdata2$PROPDMGEXP[i]=="B" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e9))
   j<-j+1
  } 
    
   if(stormdata2$PROPDMGEXP[i]=="h" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e2)}  
    
  if(stormdata2$PROPDMGEXP[i]=="h" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e2))
   j<-j+1
  }     
  
  if(stormdata2$PROPDMGEXP[i]=="H" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e2)}  
    
  if(stormdata2$PROPDMGEXP[i]=="H" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e2))
   j<-j+1
  } 
    
  if(stormdata2$PROPDMGEXP[i]=="K" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e3)}  
    
  if(stormdata2$PROPDMGEXP[i]=="K" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e3))
   j<-j+1
  }  
    
  if(stormdata2$PROPDMGEXP[i]=="m" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e6)}  
    
  if(stormdata2$PROPDMGEXP[i]=="m" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e6))
   j<-j+1
  }
    
  if(stormdata2$PROPDMGEXP[i]=="M" & stormdata2$PROPDMG[i]!=0)  
  {propdmgfull<-c(propdmgfull,stormdata2$PROPDMG[i]*1e6)}  
    
  if(stormdata2$PROPDMGEXP[i]=="M" & stormdata2$PROPDMG[i]==0)  
  {propdmgfull<-c(propdmgfull,(zero.random.adj[j]*1e6))
   j<-j+1
  } 
     
}

Now we can do the same thing for crop damage.

set.seed(123)
zero.random.adj<-sample(1:9,size=sum(stormdata2$CROPDMG=="0" & stormdata2$CROPDMGEXP!=""),replace=TRUE)

i=1

j=1

CROPdmgfull<-vector()

stormdata2$CROPDMGEXP<-as.character(stormdata2$CROPDMGEXP)

for(i in i:length(stormdata2$CROPDMGEXP))
{
  if(stormdata2$CROPDMGEXP[i]==""|stormdata2$CROPDMGEXP[i]=="-"|stormdata2$CROPDMGEXP[i]=="?"|stormdata2$CROPDMGEXP[i]=="+"|stormdata2$CROPDMGEXP[i]=="0")
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i])}
  
  if(stormdata2$CROPDMGEXP[i]=="1" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e1)}  
    
  if(stormdata2$CROPDMGEXP[i]=="1" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e1))
   j<-j+1
  } 
    
  if(stormdata2$CROPDMGEXP[i]=="2" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e2)}  
    
  if(stormdata2$CROPDMGEXP[i]=="2" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e2))
   j<-j+1
  } 
    
  if(stormdata2$CROPDMGEXP[i]=="3" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e3)}  
    
  if(stormdata2$CROPDMGEXP[i]=="3" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e3))
   j<-j+1
  }     
  
  if(stormdata2$CROPDMGEXP[i]=="4" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e4)}  
    
  if(stormdata2$CROPDMGEXP[i]=="4" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e4))
   j<-j+1
  }     
    
   if(stormdata2$CROPDMGEXP[i]=="5" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e5)}  
    
  if(stormdata2$CROPDMGEXP[i]=="5" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e5))
   j<-j+1
  }
    
   if(stormdata2$CROPDMGEXP[i]=="6" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e6)}  
    
  if(stormdata2$CROPDMGEXP[i]=="6" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e6))
   j<-j+1
  }
    
   if(stormdata2$CROPDMGEXP[i]=="7" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e7)}  
    
  if(stormdata2$CROPDMGEXP[i]=="7" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e7))
   j<-j+1
  }
    
   if(stormdata2$CROPDMGEXP[i]=="8" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e8)}  
    
  if(stormdata2$CROPDMGEXP[i]=="8" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e8))
   j<-j+1
  } 
    
   if(stormdata2$CROPDMGEXP[i]=="B" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e9)}  
    
  if(stormdata2$CROPDMGEXP[i]=="B" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e9))
   j<-j+1
  } 
    
   if(stormdata2$CROPDMGEXP[i]=="h" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e2)}  
    
  if(stormdata2$CROPDMGEXP[i]=="h" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e2))
   j<-j+1
  }     
   if(stormdata2$CROPDMGEXP[i]=="H" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e2)}  
    
  if(stormdata2$CROPDMGEXP[i]=="H" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e2))
   j<-j+1
  } 
    
   if(stormdata2$CROPDMGEXP[i]=="K" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e3)}  
    
  if(stormdata2$CROPDMGEXP[i]=="K" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e3))}
   j<-j+1
   
   if(stormdata2$CROPDMGEXP[i]=="k" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e3)}  
    
  if(stormdata2$CROPDMGEXP[i]=="k" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e3))
   j<-j+1
  }  
    
   if(stormdata2$CROPDMGEXP[i]=="m" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e6)}  
    
  if(stormdata2$CROPDMGEXP[i]=="m" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e6))
   j<-j+1
  }
    
   if(stormdata2$CROPDMGEXP[i]=="M" & stormdata2$CROPDMG[i]!=0)  
  {CROPdmgfull<-c(CROPdmgfull,stormdata2$CROPDMG[i]*1e6)}  
    
  if(stormdata2$CROPDMGEXP[i]=="M" & stormdata2$CROPDMG[i]==0)  
  {CROPdmgfull<-c(CROPdmgfull,(zero.random.adj[j]*1e6))
   j<-j+1
  }   
}

At this point we have the full non-inflation adjusted PROPDMG and CROPDMG numbers. Now all we need to do is apply the inflation adjustment. The adjustment works as follows:

i=1

inf.numerator<-tail(CPInumbers,1)

PROPDMG.INF.ADJ<-vector()

for(i in i:length(propdmgfull))
{
    PROPDMGADJ<-(inf.numerator/CPInumbers[i])*propdmgfull[i]
    
    PROPDMG.INF.ADJ<-c(PROPDMG.INF.ADJ,PROPDMGADJ)
}


i=1

CROPDMG.INF.ADJ<-vector()

for(i in i:length(CROPdmgfull))
{
    CROPDMGADJ<-(inf.numerator/CPInumbers[i])*CROPdmgfull[i]
    
    CROPDMG.INF.ADJ<-c(CROPDMG.INF.ADJ,CROPDMGADJ)
}

Now the CPI adjustment is completed and it is time to move on to the last part of tidying up the data.

Finally, the last step is to clean up EVTYPE. There are many levels to this variable. The documentation lists 48 different event names but the EVTYPE variable has many more levels.

summary(stormdata2$EVTYPE)
##                TSTM WIND        THUNDERSTORM WIND                  TORNADO 
##                    63234                    43655                    39944 
##                     HAIL              FLASH FLOOD                LIGHTNING 
##                    26130                    20967                    13293 
##       THUNDERSTORM WINDS                    FLOOD                HIGH WIND 
##                    12086                    10175                     5522 
##              STRONG WIND             WINTER STORM               HEAVY SNOW 
##                     3370                     1508                     1342 
##               HEAVY RAIN                 WILDFIRE                ICE STORM 
##                     1105                      857                      708 
##     URBAN/SML STREAM FLD           EXCESSIVE HEAT               HIGH WINDS 
##                      702                      698                      657 
##           TSTM WIND/HAIL           TROPICAL STORM           WINTER WEATHER 
##                      441                      416                      407 
##              RIP CURRENT         WILD/FOREST FIRE           FLASH FLOODING 
##                      400                      388                      302 
##        FLOOD/FLASH FLOOD                AVALANCHE                  DROUGHT 
##                      279                      268                      266 
##                 BLIZZARD             RIP CURRENTS                     HEAT 
##                      253                      241                      215 
##             EXTREME COLD         LAKE-EFFECT SNOW                LANDSLIDE 
##                      197                      194                      193 
##              STORM SURGE            COASTAL FLOOD              URBAN FLOOD 
##                      177                      164                      139 
##       WINTER WEATHER/MIX                HIGH SURF                HURRICANE 
##                      139                      130                      129 
##               LIGHT SNOW             FROST/FREEZE  EXTREME COLD/WIND CHILL 
##                      119                      116                      111 
##         MARINE TSTM WIND                      FOG              RIVER FLOOD 
##                      109                      107                      107 
##               DUST STORM          COLD/WIND CHILL               DUST DEVIL 
##                      103                       90                       89 
##                     WIND           URBAN FLOODING           DRY MICROBURST 
##                       83                       80                       78 
##                DENSE FOG        HURRICANE/TYPHOON                 FLOODING 
##                       74                       72                       58 
##         COASTAL FLOODING                     SNOW     HEAVY SURF/HIGH SURF 
##                       55                       52                       50 
##             STRONG WINDS         STORM SURGE/TIDE               WATERSPOUT 
##                       50                       47                       47 
##       MARINE STRONG WIND  THUNDERSTORM WINDS HAIL          TSTM WIND (G45) 
##                       46                       40                       36 
##            FREEZING RAIN      TROPICAL DEPRESSION                HEAT WAVE 
##                       35                       35                       34 
##      THUNDERSTORM WINDSS MARINE THUNDERSTORM WIND                    OTHER 
##                       34                       33                       33 
##                     COLD               HEAVY SURF           EXCESSIVE SNOW 
##                       32                       30                       25 
##                      ICE                ICY ROADS      LIGHT FREEZING RAIN 
##                       24                       22                       22 
##  THUNDERSTORM WINDS/HAIL              GUSTY WINDS       HEAVY SNOW SQUALLS 
##                       22                       21                       21 
##               Light Snow              HEAVY RAINS        EXTREME WINDCHILL 
##                       21                       20                       19 
##                    GLAZE         MARINE HIGH WIND             THUNDERSTORM 
##                       19                       19                       19 
##                    WINDS             EXTREME HEAT        FLASH FLOOD/FLOOD 
##                       18                       17                       16 
##                   FREEZE              SNOW SQUALL       HEAVY SNOW-SQUALLS 
##                       16                       16                       15 
##      MIXED PRECIPITATION                  TSUNAMI             FLASH FLOODS 
##                       15                       14                       13 
##             FUNNEL CLOUD               GUSTY WIND           RIVER FLOODING 
##                       13                       13                       13 
##               SMALL HAIL   DROUGHT/EXCESSIVE HEAT              Gusty Winds 
##                       11                       10                       10 
##                  (Other) 
##                      773

The stringdist package has some great tools to help with this sort of problem. The amatch function, which stands for “approximate match”, will help us with partial string matching. First, we need to manually input the correct 48 names for EVTYPE. Then we will use amatch to force the old EVTYPE variables into one of the new 48 levels. This will not be perfect for every case but it will handle the vast majority of cases correctly.

Label = c("Astronomical Low Tide", "Avalanche", "Blizzard", 
                     "Coastal Flood", "Cold/Wind Chill", "Debris Flow", 
                    "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", 
                     "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", 
                     "Flash Flood", "Flood", "Freezing Fog", "Frost/Freeze", 
                     "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", 
                     "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", 
                     "Lake-Effect Snow", "Lakeshore Flood", "Lightning", 
                     "Marine Hail", "Marine High Wind", "Marine Strong Wind", 
                     "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", 
                     "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", 
                     "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", 
                     "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", 
                     "Winter Weather")

if("stringdist" %in% rownames(installed.packages()) == FALSE) {install.packages("stringdist")}
library(stringdist)

ClosestMatch2 = function(string, stringVector){
    
    stringVector[amatch(string, stringVector, maxDist=Inf)]
    
}

EVTYPE<-stormdata2$EVTYPE

EVTYPE<-as.character(EVTYPE)

i=1

for (i in i:length(EVTYPE))
{
    EVTYPE[i]<-ClosestMatch2(string=stormdata2$EVTYPE[i],stringVector=Label)
}

Now we can join together all of the relevant variables into a much smaller and tidier data set.

tidystormdata<-data.frame(REFNUM,MONTH,YEAR,EVTYPE,FATALITIES,INJURIES,PROPDMG.INF.ADJ,CROPDMG.INF.ADJ)

Results

Now we can perform some exploratory analysis. Let’s try to answer the first question:

par(mar=c(7,4.5,2,2))
par(mfrow=c(1,2))

plotdata.fat<-tapply(tidystormdata$FATALITIES,tidystormdata$EVTYPE, FUN = sum,na.rm=TRUE)

barplot(plotdata.fat[plotdata.fat>500],col = "blue", main = "Total Fatalities By Event",las=2)

plotdata.inj<-tapply(tidystormdata$INJURIES,tidystormdata$EVTYPE, FUN = sum,na.rm=TRUE)

barplot(plotdata.inj[plotdata.inj>5000],col = "blue", main = "Total Injuries By Event",las=2)

We can see that Tornadoes cause the most fatalities and injuries. This could be because Tornadoes are usually the hardest to prepare for.

Let’s take a look at the frequency of these events over time. We will start the plot at 1991 and end it at 2011 for each event.

par(mar=c(3,4.5,2,2))
Wind.Count<-table(tidystormdata$YEAR,tidystormdata$EVTYPE=="High Wind")[,1]
Tornado.Count<-table(tidystormdata$YEAR,tidystormdata$EVTYPE=="Tornado")[,1]
Hail.Count<-table(tidystormdata$YEAR,tidystormdata$EVTYPE=="Hail")[,1]
Flood.Count<-table(tidystormdata$YEAR,tidystormdata$EVTYPE=="Flood")[,1]


counts<-cbind(Wind.Count,Tornado.Count,Hail.Count,Flood.Count)

par(mfrow=c(1,1))

barplot(counts[42:62,],beside=TRUE,col=c("Blue"), main = "Counts Of Event Per Year 1991-2011")

We can see that there has been an abnormally high number of tornadoes in the years leading up to 2011. High Winds and Hail have been getting progressively more frequent.

Finally, let’s move on to answer question number two.

par(mar=c(7,4.5,2,2))
par(mfrow=c(1,2))

plotdata.prop<-tapply(tidystormdata$PROPDMG.INF.ADJ,tidystormdata$EVTYPE, FUN = sum, na.rm=TRUE)

barplot(plotdata.prop[plotdata.prop>10000000000], col = "blue", main = "Total Prop Damage in 2011 $",las=2)

plotdata.crop<-tapply(tidystormdata$CROPDMG.INF.ADJ,tidystormdata$EVTYPE, FUN = sum, na.rm=TRUE)

barplot(plotdata.crop[plotdata.crop>1000000000], col = "blue", main = "Total Crop Damage in 2011 $",las=2)

We can see that Floods and Tornadoes cause the most property damage overall. Drought causes the most crop damage by far.

The End

Thank you for your time.