INTRODUCTION

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

Reading the data set

setwd("S:/Software/R/Data Science Specialization/Reproducible Research/week4")
stormdata<-read.csv("repdata_data_StormData.csv.bz2",sep=",")

Sumamrizing the data

dim(stormdata)
## [1] 902297     37
summary(stormdata)
##     STATE__                  BGN_DATE             BGN_TIME     
##  Min.   : 1.0   5/25/2011 0:00:00:  1202   12:00:00 AM: 10163  
##  1st Qu.:19.0   4/27/2011 0:00:00:  1193   06:00:00 PM:  7350  
##  Median :30.0   6/9/2011 0:00:00 :  1030   04:00:00 PM:  7261  
##  Mean   :31.2   5/30/2004 0:00:00:  1016   05:00:00 PM:  6891  
##  3rd Qu.:45.0   4/4/2011 0:00:00 :  1009   12:00:00 PM:  6703  
##  Max.   :95.0   4/2/2006 0:00:00 :   981   03:00:00 PM:  6700  
##                 (Other)          :895866   (Other)    :857229  
##    TIME_ZONE          COUNTY           COUNTYNAME         STATE       
##  CST    :547493   Min.   :  0.0   JEFFERSON :  7840   TX     : 83728  
##  EST    :245558   1st Qu.: 31.0   WASHINGTON:  7603   KS     : 53440  
##  MST    : 68390   Median : 75.0   JACKSON   :  6660   OK     : 46802  
##  PST    : 28302   Mean   :100.6   FRANKLIN  :  6256   MO     : 35648  
##  AST    :  6360   3rd Qu.:131.0   LINCOLN   :  5937   IA     : 31069  
##  HST    :  2563   Max.   :873.0   MADISON   :  5632   NE     : 30271  
##  (Other):  3631                   (Other)   :862369   (Other):621339  
##                EVTYPE         BGN_RANGE           BGN_AZI      
##  HAIL             :288661   Min.   :   0.000          :547332  
##  TSTM WIND        :219940   1st Qu.:   0.000   N      : 86752  
##  THUNDERSTORM WIND: 82563   Median :   0.000   W      : 38446  
##  TORNADO          : 60652   Mean   :   1.484   S      : 37558  
##  FLASH FLOOD      : 54277   3rd Qu.:   1.000   E      : 33178  
##  FLOOD            : 25326   Max.   :3749.000   NW     : 24041  
##  (Other)          :170878                      (Other):134990  
##          BGN_LOCATI                  END_DATE             END_TIME     
##               :287743                    :243411              :238978  
##  COUNTYWIDE   : 19680   4/27/2011 0:00:00:  1214   06:00:00 PM:  9802  
##  Countywide   :   993   5/25/2011 0:00:00:  1196   05:00:00 PM:  8314  
##  SPRINGFIELD  :   843   6/9/2011 0:00:00 :  1021   04:00:00 PM:  8104  
##  SOUTH PORTION:   810   4/4/2011 0:00:00 :  1007   12:00:00 PM:  7483  
##  NORTH PORTION:   784   5/30/2004 0:00:00:   998   11:59:00 PM:  7184  
##  (Other)      :591444   (Other)          :653450   (Other)    :622432  
##    COUNTY_END COUNTYENDN       END_RANGE           END_AZI      
##  Min.   :0    Mode:logical   Min.   :  0.0000          :724837  
##  1st Qu.:0    NA's:902297    1st Qu.:  0.0000   N      : 28082  
##  Median :0                   Median :  0.0000   S      : 22510  
##  Mean   :0                   Mean   :  0.9862   W      : 20119  
##  3rd Qu.:0                   3rd Qu.:  0.0000   E      : 20047  
##  Max.   :0                   Max.   :925.0000   NE     : 14606  
##                                                 (Other): 72096  
##            END_LOCATI         LENGTH              WIDTH         
##                 :499225   Min.   :   0.0000   Min.   :   0.000  
##  COUNTYWIDE     : 19731   1st Qu.:   0.0000   1st Qu.:   0.000  
##  SOUTH PORTION  :   833   Median :   0.0000   Median :   0.000  
##  NORTH PORTION  :   780   Mean   :   0.2301   Mean   :   7.503  
##  CENTRAL PORTION:   617   3rd Qu.:   0.0000   3rd Qu.:   0.000  
##  SPRINGFIELD    :   575   Max.   :2315.0000   Max.   :4400.000  
##  (Other)        :380536                                         
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00          :465934   Min.   :  0.000          :618413  
##  1st Qu.:   0.00   K      :424665   1st Qu.:  0.000   K      :281832  
##  Median :   0.00   M      : 11330   Median :  0.000   M      :  1994  
##  Mean   :  12.06   0      :   216   Mean   :  1.527   k      :    21  
##  3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000   0      :    19  
##  Max.   :5000.00   5      :    28   Max.   :990.000   B      :     9  
##                    (Other):    84                     (Other):     9  
##       WFO                                       STATEOFFIC    
##         :142069                                      :248769  
##  OUN    : 17393   TEXAS, North                       : 12193  
##  JAN    : 13889   ARKANSAS, Central and North Central: 11738  
##  LWX    : 13174   IOWA, Central                      : 11345  
##  PHI    : 12551   KANSAS, Southwest                  : 11212  
##  TSA    : 12483   GEORGIA, North and Central         : 11120  
##  (Other):690738   (Other)                            :595920  
##                                                                                                                                                                                                     ZONENAMES     
##                                                                                                                                                                                                          :594029  
##                                                                                                                                                                                                          :205988  
##  GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M                                                                                                                                         :   639  
##  GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA                                                                                                                                                       :   592  
##  JEFFERSON - JEFFERSON                                                                                                                                                                                   :   303  
##  MADISON - MADISON                                                                                                                                                                                       :   302  
##  (Other)                                                                                                                                                                                                 :100444  
##     LATITUDE      LONGITUDE        LATITUDE_E     LONGITUDE_    
##  Min.   :   0   Min.   :-14451   Min.   :   0   Min.   :-14455  
##  1st Qu.:2802   1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0  
##  Median :3540   Median :  8707   Median :   0   Median :     0  
##  Mean   :2875   Mean   :  6940   Mean   :1452   Mean   :  3509  
##  3rd Qu.:4019   3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735  
##  Max.   :9706   Max.   : 17124   Max.   :9706   Max.   :106220  
##  NA's   :47                      NA's   :40                     
##                                            REMARKS           REFNUM      
##                                                :287433   Min.   :     1  
##                                                : 24013   1st Qu.:225575  
##  Trees down.\n                                 :  1110   Median :451149  
##  Several trees were blown down.\n              :   568   Mean   :451149  
##  Trees were downed.\n                          :   446   3rd Qu.:676723  
##  Large trees and power lines were blown down.\n:   432   Max.   :902297  
##  (Other)                                       :588295

DATA PROCESSING

The data seems to be a large data set constituting 902297 rows and 37 columns. But we make a choice to pick only the relevant columns out of all the columns.

Events Which were most harmeful with respect to the populations health can be determined by looking at the fields which specify the most number of deaths and injuries and these are labelled Fatailities and Injuries

We bind the Injusries and Fatalities to see which event caused the most harm

library(data.table)

stormag<-stormdata[stormdata$FATALITIES>0,c("EVTYPE","FATALITIES")]
injury<-stormdata[stormdata$INJURIES>0,c("EVTYPE","INJURIES")]
names(injury)<-c("Event","Total")
names(stormag)<-c("Event","Total")
storm<-rbind(stormag,injury)
fatdata<-aggregate(storm$Total,by=list(storm$Event),FUN="sum")
names(fatdata)<-c("Event","Total")
ftj <-fatdata[order(fatdata$Total,decreasing = TRUE),]
fatinjtop10<-ftj[1:10,]
le<-length(ftj[,2])
fatinjothers<-data.frame("Others",sum(ftj[11:le,2]))
names(fatinjothers)<-c("Event","Total")
fatinjtop<-rbind(fatinjtop10,fatinjothers)

RESULTS OF EVENTS WHICH CAUSED MOST HARM

print(fatinjtop)
##                 Event Total
## 184           TORNADO 96979
## 32     EXCESSIVE HEAT  8428
## 191         TSTM WIND  7461
## 47              FLOOD  7259
## 123         LIGHTNING  6046
## 69               HEAT  3037
## 42        FLASH FLOOD  2755
## 117         ICE STORM  2064
## 173 THUNDERSTORM WIND  1621
## 214      WINTER STORM  1527
## 1              Others 18496

We can see that Tornados, Excessive Heat, Wind and Flooods are the cause of major health hazard

PLOT OF THE DATA

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
ggplot(fatinjtop,aes(Event,Total,fill=Event),fill=fatinjtop[,Event])+geom_bar(stat="identity")+geom_text(aes(label=Total),vjust=-2)+theme(axis.ticks=element_blank(),axis.text.x=element_blank())+xlab("Events")+ylab("Total Fatalities & Injuries")+ggtitle("Top Weather Events Which Caused Most Harm ")+ylim(c(0,100500))

Tornado,Excessicve Heat, Excessive Heat, Wind, Floods and other causes caused maximum.

stormeco<-subset(stormdata,select=c("BGN_DATE","EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"))
stormeco$BGN_DATE<-format(as.Date(stormeco$BGN_DATE,format = "%m/%d/%Y %H:%M:%S"),"%Y")
stormeco$BGN_DATE<-as.numeric(stormeco$BGN_DATE)
#hist(stormeco$BGN_DATE,col="Blue",main="Histogram of Storm Data",xlab="Year",ylab="No Of Recrods",xlim=c(1950,2015),ylim=c(0,250000))

Since the earlier data records hold fewer events due to lack of god records we consider most recent rears of data for our analysis on events causing most amount of economic impact.

Based on the histogram the data above year 1990 holds good and hence I choose this year and filtered out the records. Before I do that I filter out data having the damage expenditure greater than 0

ppdmg<-stormeco[stormeco$PROPDMG>0,c("BGN_DATE","EVTYPE","PROPDMG","PROPDMGEXP")]
ccdmg<-stormeco[stormeco$CROPDMG>0,c("BGN_DATE","EVTYPE","CROPDMG","CROPDMGEXP")]
propdmg<-ppdmg[ppdmg$BGN_DATE>1990,]
cropdmg<-ccdmg[ccdmg$BGN_DATE>1990,]
head(propdmg,10)
##      BGN_DATE  EVTYPE PROPDMG PROPDMGEXP
## 4889     1991 TORNADO   250.0          K
## 4891     1991 TORNADO     2.5          M
## 4944     1991 TORNADO   250.0          K
## 5005     1991 TORNADO   250.0          K
## 5107     1992 TORNADO     2.5          M
## 5129     1992 TORNADO   250.0          K
## 5130     1992 TORNADO   250.0          K
## 5163     1992 TORNADO    25.0          K
## 5312     1992 TORNADO    25.0          K
## 5315     1992 TORNADO   250.0          K
head(cropdmg,10)
##        BGN_DATE                    EVTYPE CROPDMG CROPDMGEXP
## 187566     1995 HURRICANE OPAL/HIGH WINDS      10          M
## 187571     1994        THUNDERSTORM WINDS     500          K
## 187581     1995            HURRICANE ERIN       1          M
## 187583     1995            HURRICANE OPAL       4          M
## 187584     1995            HURRICANE OPAL      10          m
## 187653     1994        THUNDERSTORM WINDS      50          K
## 187654     1994        THUNDERSTORM WINDS      50          K
## 187680     1994                   TORNADO       5          K
## 187723     1994                   TORNADO      50          K
## 187750     1995   THUNDERSTORM WINDS/HAIL      15          K
print(length(propdmg$PROPDMG))
## [1] 215060
print(length(cropdmg$CROPDMG))
## [1] 22099

In the below code we evaluate the expenditure by identifying the multiplicative term - wether Billion,Million, Thousand or Hudred and multiple the value appropriately to the Dmg Value.

Breaking the data into CROP and Property also improves the computational speed.

DATA PROCESSING

options(scipen=0)
symb<-c("k","K","b","B","h","H","m","M")
val<-c(1000,1000,1e+09,1e+09,100,100,1e+06,1e+06)
for(i in 1:length(cropdmg$CROPDMG))
{
  if(cropdmg$CROPDMGEXP[i] %in% symb){
    index<-grep(cropdmg$CROPDMGEXP[i],symb)
    cropdmg$CROPDMG[i]<-val[index]*cropdmg$CROPDMG[i]
  }
}
head(cropdmg,20)
##        BGN_DATE                    EVTYPE CROPDMG CROPDMGEXP
## 187566     1995 HURRICANE OPAL/HIGH WINDS 1.0e+07          M
## 187571     1994        THUNDERSTORM WINDS 5.0e+05          K
## 187581     1995            HURRICANE ERIN 1.0e+06          M
## 187583     1995            HURRICANE OPAL 4.0e+06          M
## 187584     1995            HURRICANE OPAL 1.0e+07          m
## 187653     1994        THUNDERSTORM WINDS 5.0e+04          K
## 187654     1994        THUNDERSTORM WINDS 5.0e+04          K
## 187680     1994                   TORNADO 5.0e+03          K
## 187723     1994                   TORNADO 5.0e+04          K
## 187750     1995   THUNDERSTORM WINDS/HAIL 1.5e+04          K
## 187772     1994        THUNDERSTORM WINDS 5.0e+03          K
## 187796     1994                   TORNADO 5.0e+04          K
## 187848     1994               FLASH FLOOD 5.0e+04          K
## 187889     1995                   TORNADO 5.0e+02          K
## 187910     1995                   TORNADO 5.0e+03          K
## 188055     1995                   TORNADO 5.0e+02          K
## 188061     1994            FLASH FLOODING 5.0e+04          K
## 188062     1994               FLASH FLOOD 5.0e+04          K
## 188071     1995   THUNDERSTORM WINDS HAIL 1.0e+04          K
## 188199     1994                      HAIL 5.0e+04          K
tail(cropdmg,20)
##        BGN_DATE            EVTYPE CROPDMG CROPDMGEXP
## 899274     2011          WILDFIRE   4e+05          K
## 899546     2011          WILDFIRE   3e+03          K
## 899692     2011              HAIL   1e+04          K
## 900520     2011          WILDFIRE   3e+03          K
## 900610     2011       STRONG WIND   5e+03          K
## 900757     2011           DROUGHT   5e+03          K
## 900762     2011           DROUGHT   5e+03          K
## 900838     2011          WILDFIRE   5e+06          M
## 900882     2011           TORNADO   2e+03          K
## 900913     2011 THUNDERSTORM WIND   2e+03          K
## 900936     2011             FLOOD   1e+03          K
## 900947     2011             FLOOD   1e+03          K
## 901033     2011             FLOOD   1e+03          K
## 901034     2011             FLOOD   1e+03          K
## 901035     2011             FLOOD   1e+03          K
## 901036     2011             FLOOD   1e+03          K
## 901056     2011             FLOOD   1e+03          K
## 901566     2011       STRONG WIND   2e+04          K
## 901567     2011       STRONG WIND   2e+03          K
## 901684     2011       STRONG WIND   1e+03          K
for(i in 1:length(propdmg$PROPDMG))
{
  if(propdmg$PROPDMGEXP[i] %in% symb){
    index<-grep(propdmg$PROPDMGEXP[i],symb)
    propdmg$PROPDMG[i]<-round(val[index]*propdmg$PROPDMG[i],0)
  }
}
head(propdmg,20)
##      BGN_DATE  EVTYPE PROPDMG PROPDMGEXP
## 4889     1991 TORNADO  250000          K
## 4891     1991 TORNADO 2500000          M
## 4944     1991 TORNADO  250000          K
## 5005     1991 TORNADO  250000          K
## 5107     1992 TORNADO 2500000          M
## 5129     1992 TORNADO  250000          K
## 5130     1992 TORNADO  250000          K
## 5163     1992 TORNADO   25000          K
## 5312     1992 TORNADO   25000          K
## 5315     1992 TORNADO  250000          K
## 5318     1992 TORNADO  250000          K
## 5320     1992 TORNADO  250000          K
## 5345     1992 TORNADO 2500000          M
## 5355     1992 TORNADO   25000          K
## 5356     1992 TORNADO   25000          K
## 5358     1992 TORNADO 2500000          M
## 5359     1992 TORNADO 2500000          M
## 5363     1992 TORNADO 2500000          M
## 5366     1992 TORNADO  250000          K
## 5368     1992 TORNADO 2500000          M
tail(propdmg,20)
##        BGN_DATE         EVTYPE PROPDMG PROPDMGEXP
## 902107     2011   WINTER STORM 1000000          M
## 902187     2011 WINTER WEATHER    5000          K
## 902199     2011       WILDFIRE    1500          K
## 902201     2011    STRONG WIND   60000          K
## 902204     2011      HIGH WIND   15000          K
## 902208     2011      HIGH WIND  100000          K
## 902209     2011      HIGH WIND   10000          K
## 902210     2011    STRONG WIND   50000          K
## 902220     2011   WINTER STORM   10000          K
## 902224     2011    STRONG WIND   10000          K
## 902236     2011    STRONG WIND   25000          K
## 902243     2011    STRONG WIND    2000          K
## 902247     2011 WINTER WEATHER    5000          K
## 902248     2011 WINTER WEATHER    5000          K
## 902249     2011   WINTER STORM    2000          K
## 902250     2011   WINTER STORM    5000          K
## 902255     2011    STRONG WIND     600          K
## 902257     2011    STRONG WIND    1000          K
## 902259     2011        DROUGHT    2000          K
## 902260     2011      HIGH WIND    7500          K

The Economic Impact is evaluated and top 10 events which caused most of economic impact are plotted.

Here I have plotted the top events causing economic impact in terms of property damage and crop damage.

RESULTS

  fpropdmg<-aggregate(propdmg$PROPDMG,by=list(propdmg$EVTYPE),FUN="sum")
  fcropdmg<-aggregate(cropdmg$CROPDMG,by=list(cropdmg$EVTYPE),FUN="sum")
  names(fpropdmg)<-c("Event","DmgValue")
  names(fcropdmg)<-c("Event","DmgValue")
  
  fpropdmg<-fpropdmg[order(fpropdmg$DmgValue,decreasing=TRUE),]
  fcropdmg<-fcropdmg[order(fcropdmg$DmgValue,decreasing=TRUE),]
  
  print(fpropdmg[1:10,])
##                 Event     DmgValue
## 64              FLOOD 144657709807
## 182 HURRICANE/TYPHOON  69305840000
## 282       STORM SURGE  43323536000
## 334           TORNADO  28897947539
## 51        FLASH FLOOD  16140812067
## 106              HAIL  15732267543
## 174         HURRICANE  11868319010
## 342    TROPICAL STORM   7703890550
## 399      WINTER STORM   6688497251
## 159         HIGH WIND   5270046295
    ggplot(fpropdmg[1:10,],aes(Event,DmgValue/1000000000,fill=Event),fill=fpropdmg[,Event])+geom_bar(stat="identity")+geom_text(aes(label=round(DmgValue/1000000000,1)),vjust=-2)+theme(axis.ticks=element_blank(),axis.text.x=element_blank())+xlab("Events")+ylab("Total Damage in $B")+ggtitle("Top Weather Events Which Caused Most Amount Of Property Damage")+ylim(c(0,175))

    print(fcropdmg[1:10,])  
##                Event    DmgValue
## 10           DROUGHT 13972566000
## 27             FLOOD  5661968450
## 78       RIVER FLOOD  5029459000
## 72         ICE STORM  5022113500
## 42              HAIL  3025954473
## 64         HURRICANE  2741910000
## 69 HURRICANE/TYPHOON  2607872800
## 23       FLASH FLOOD  1421317100
## 19      EXTREME COLD  1292973000
## 37      FROST/FREEZE  1094086000
    ggplot(fcropdmg[1:10,],aes(Event,DmgValue/1000000000,fill=Event),fill=fcropdmg[,Event])+geom_bar(stat="identity")+geom_text(aes(label=round(DmgValue/1000000000,1)),vjust=-2)+theme(axis.ticks=element_blank(),axis.text.x=element_blank())+xlab("Events")+ylab("Total Damage in $B")+ggtitle("Top Weather Events Which Caused Most Amount Of Crop Damage")+ylim(c(0,20))

We can interpret from the plots that Floods, Hurricane/Typhoon have been devastating and have caused majority of the property damage amounting to approx. 20+ billion dollars of public property

Drought and Floods are the 2 events which account for majority crop damages across the US amounting to 23+ Billion Dollars

Conclusion:

Based on the plots we can say that Tornados, and other events such as Floods(+ oterevents such as tsunamis, flash floods) have been the major event to cause maximum fatalities and Injuries.

Flood, Hurrican, Typhoons have been economically devastating throught the period from 1990 to 2010.