INTRODUCTION
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
Reading the data set
setwd("S:/Software/R/Data Science Specialization/Reproducible Research/week4")
stormdata<-read.csv("repdata_data_StormData.csv.bz2",sep=",")
Sumamrizing the data
dim(stormdata)
## [1] 902297 37
summary(stormdata)
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0.0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31.0 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75.0 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :100.6 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131.0 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873.0 MADISON : 5632 NE : 30271
## (Other): 3631 (Other) :862369 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0.000 :547332
## TSTM WIND :219940 1st Qu.: 0.000 N : 86752
## THUNDERSTORM WIND: 82563 Median : 0.000 W : 38446
## TORNADO : 60652 Mean : 1.484 S : 37558
## FLASH FLOOD : 54277 3rd Qu.: 1.000 E : 33178
## FLOOD : 25326 Max. :3749.000 NW : 24041
## (Other) :170878 (Other):134990
## BGN_LOCATI END_DATE END_TIME
## :287743 :243411 :238978
## COUNTYWIDE : 19680 4/27/2011 0:00:00: 1214 06:00:00 PM: 9802
## Countywide : 993 5/25/2011 0:00:00: 1196 05:00:00 PM: 8314
## SPRINGFIELD : 843 6/9/2011 0:00:00 : 1021 04:00:00 PM: 8104
## SOUTH PORTION: 810 4/4/2011 0:00:00 : 1007 12:00:00 PM: 7483
## NORTH PORTION: 784 5/30/2004 0:00:00: 998 11:59:00 PM: 7184
## (Other) :591444 (Other) :653450 (Other) :622432
## COUNTY_END COUNTYENDN END_RANGE END_AZI
## Min. :0 Mode:logical Min. : 0.0000 :724837
## 1st Qu.:0 NA's:902297 1st Qu.: 0.0000 N : 28082
## Median :0 Median : 0.0000 S : 22510
## Mean :0 Mean : 0.9862 W : 20119
## 3rd Qu.:0 3rd Qu.: 0.0000 E : 20047
## Max. :0 Max. :925.0000 NE : 14606
## (Other): 72096
## END_LOCATI LENGTH WIDTH
## :499225 Min. : 0.0000 Min. : 0.000
## COUNTYWIDE : 19731 1st Qu.: 0.0000 1st Qu.: 0.000
## SOUTH PORTION : 833 Median : 0.0000 Median : 0.000
## NORTH PORTION : 780 Mean : 0.2301 Mean : 7.503
## CENTRAL PORTION: 617 3rd Qu.: 0.0000 3rd Qu.: 0.000
## SPRINGFIELD : 575 Max. :2315.0000 Max. :4400.000
## (Other) :380536
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 :465934 Min. : 0.000 :618413
## 1st Qu.: 0.00 K :424665 1st Qu.: 0.000 K :281832
## Median : 0.00 M : 11330 Median : 0.000 M : 1994
## Mean : 12.06 0 : 216 Mean : 1.527 k : 21
## 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000 0 : 19
## Max. :5000.00 5 : 28 Max. :990.000 B : 9
## (Other): 84 (Other): 9
## WFO STATEOFFIC
## :142069 :248769
## OUN : 17393 TEXAS, North : 12193
## JAN : 13889 ARKANSAS, Central and North Central: 11738
## LWX : 13174 IOWA, Central : 11345
## PHI : 12551 KANSAS, Southwest : 11212
## TSA : 12483 GEORGIA, North and Central : 11120
## (Other):690738 (Other) :595920
## ZONENAMES
## :594029
## :205988
## GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M : 639
## GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA : 592
## JEFFERSON - JEFFERSON : 303
## MADISON - MADISON : 302
## (Other) :100444
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## Min. : 0 Min. :-14451 Min. : 0 Min. :-14455
## 1st Qu.:2802 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0
## Median :3540 Median : 8707 Median : 0 Median : 0
## Mean :2875 Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.:4019 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. :9706 Max. : 17124 Max. :9706 Max. :106220
## NA's :47 NA's :40
## REMARKS REFNUM
## :287433 Min. : 1
## : 24013 1st Qu.:225575
## Trees down.\n : 1110 Median :451149
## Several trees were blown down.\n : 568 Mean :451149
## Trees were downed.\n : 446 3rd Qu.:676723
## Large trees and power lines were blown down.\n: 432 Max. :902297
## (Other) :588295
DATA PROCESSING
The data seems to be a large data set constituting 902297 rows and 37 columns. But we make a choice to pick only the relevant columns out of all the columns.
Events Which were most harmeful with respect to the populations health can be determined by looking at the fields which specify the most number of deaths and injuries and these are labelled Fatailities and Injuries
We bind the Injusries and Fatalities to see which event caused the most harm
library(data.table)
stormag<-stormdata[stormdata$FATALITIES>0,c("EVTYPE","FATALITIES")]
injury<-stormdata[stormdata$INJURIES>0,c("EVTYPE","INJURIES")]
names(injury)<-c("Event","Total")
names(stormag)<-c("Event","Total")
storm<-rbind(stormag,injury)
fatdata<-aggregate(storm$Total,by=list(storm$Event),FUN="sum")
names(fatdata)<-c("Event","Total")
ftj <-fatdata[order(fatdata$Total,decreasing = TRUE),]
fatinjtop10<-ftj[1:10,]
le<-length(ftj[,2])
fatinjothers<-data.frame("Others",sum(ftj[11:le,2]))
names(fatinjothers)<-c("Event","Total")
fatinjtop<-rbind(fatinjtop10,fatinjothers)
RESULTS OF EVENTS WHICH CAUSED MOST HARM
print(fatinjtop)
## Event Total
## 184 TORNADO 96979
## 32 EXCESSIVE HEAT 8428
## 191 TSTM WIND 7461
## 47 FLOOD 7259
## 123 LIGHTNING 6046
## 69 HEAT 3037
## 42 FLASH FLOOD 2755
## 117 ICE STORM 2064
## 173 THUNDERSTORM WIND 1621
## 214 WINTER STORM 1527
## 1 Others 18496
We can see that Tornados, Excessive Heat, Wind and Flooods are the cause of major health hazard
PLOT OF THE DATA
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
ggplot(fatinjtop,aes(Event,Total,fill=Event),fill=fatinjtop[,Event])+geom_bar(stat="identity")+geom_text(aes(label=Total),vjust=-2)+theme(axis.ticks=element_blank(),axis.text.x=element_blank())+xlab("Events")+ylab("Total Fatalities & Injuries")+ggtitle("Top Weather Events Which Caused Most Harm ")+ylim(c(0,100500))
Tornado,Excessicve Heat, Excessive Heat, Wind, Floods and other causes caused maximum.
stormeco<-subset(stormdata,select=c("BGN_DATE","EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"))
stormeco$BGN_DATE<-format(as.Date(stormeco$BGN_DATE,format = "%m/%d/%Y %H:%M:%S"),"%Y")
stormeco$BGN_DATE<-as.numeric(stormeco$BGN_DATE)
#hist(stormeco$BGN_DATE,col="Blue",main="Histogram of Storm Data",xlab="Year",ylab="No Of Recrods",xlim=c(1950,2015),ylim=c(0,250000))
Since the earlier data records hold fewer events due to lack of god records we consider most recent rears of data for our analysis on events causing most amount of economic impact.
Based on the histogram the data above year 1990 holds good and hence I choose this year and filtered out the records. Before I do that I filter out data having the damage expenditure greater than 0
ppdmg<-stormeco[stormeco$PROPDMG>0,c("BGN_DATE","EVTYPE","PROPDMG","PROPDMGEXP")]
ccdmg<-stormeco[stormeco$CROPDMG>0,c("BGN_DATE","EVTYPE","CROPDMG","CROPDMGEXP")]
propdmg<-ppdmg[ppdmg$BGN_DATE>1990,]
cropdmg<-ccdmg[ccdmg$BGN_DATE>1990,]
head(propdmg,10)
## BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## 4889 1991 TORNADO 250.0 K
## 4891 1991 TORNADO 2.5 M
## 4944 1991 TORNADO 250.0 K
## 5005 1991 TORNADO 250.0 K
## 5107 1992 TORNADO 2.5 M
## 5129 1992 TORNADO 250.0 K
## 5130 1992 TORNADO 250.0 K
## 5163 1992 TORNADO 25.0 K
## 5312 1992 TORNADO 25.0 K
## 5315 1992 TORNADO 250.0 K
head(cropdmg,10)
## BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## 187566 1995 HURRICANE OPAL/HIGH WINDS 10 M
## 187571 1994 THUNDERSTORM WINDS 500 K
## 187581 1995 HURRICANE ERIN 1 M
## 187583 1995 HURRICANE OPAL 4 M
## 187584 1995 HURRICANE OPAL 10 m
## 187653 1994 THUNDERSTORM WINDS 50 K
## 187654 1994 THUNDERSTORM WINDS 50 K
## 187680 1994 TORNADO 5 K
## 187723 1994 TORNADO 50 K
## 187750 1995 THUNDERSTORM WINDS/HAIL 15 K
print(length(propdmg$PROPDMG))
## [1] 215060
print(length(cropdmg$CROPDMG))
## [1] 22099
In the below code we evaluate the expenditure by identifying the multiplicative term - wether Billion,Million, Thousand or Hudred and multiple the value appropriately to the Dmg Value.
Breaking the data into CROP and Property also improves the computational speed.
DATA PROCESSING
options(scipen=0)
symb<-c("k","K","b","B","h","H","m","M")
val<-c(1000,1000,1e+09,1e+09,100,100,1e+06,1e+06)
for(i in 1:length(cropdmg$CROPDMG))
{
if(cropdmg$CROPDMGEXP[i] %in% symb){
index<-grep(cropdmg$CROPDMGEXP[i],symb)
cropdmg$CROPDMG[i]<-val[index]*cropdmg$CROPDMG[i]
}
}
head(cropdmg,20)
## BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## 187566 1995 HURRICANE OPAL/HIGH WINDS 1.0e+07 M
## 187571 1994 THUNDERSTORM WINDS 5.0e+05 K
## 187581 1995 HURRICANE ERIN 1.0e+06 M
## 187583 1995 HURRICANE OPAL 4.0e+06 M
## 187584 1995 HURRICANE OPAL 1.0e+07 m
## 187653 1994 THUNDERSTORM WINDS 5.0e+04 K
## 187654 1994 THUNDERSTORM WINDS 5.0e+04 K
## 187680 1994 TORNADO 5.0e+03 K
## 187723 1994 TORNADO 5.0e+04 K
## 187750 1995 THUNDERSTORM WINDS/HAIL 1.5e+04 K
## 187772 1994 THUNDERSTORM WINDS 5.0e+03 K
## 187796 1994 TORNADO 5.0e+04 K
## 187848 1994 FLASH FLOOD 5.0e+04 K
## 187889 1995 TORNADO 5.0e+02 K
## 187910 1995 TORNADO 5.0e+03 K
## 188055 1995 TORNADO 5.0e+02 K
## 188061 1994 FLASH FLOODING 5.0e+04 K
## 188062 1994 FLASH FLOOD 5.0e+04 K
## 188071 1995 THUNDERSTORM WINDS HAIL 1.0e+04 K
## 188199 1994 HAIL 5.0e+04 K
tail(cropdmg,20)
## BGN_DATE EVTYPE CROPDMG CROPDMGEXP
## 899274 2011 WILDFIRE 4e+05 K
## 899546 2011 WILDFIRE 3e+03 K
## 899692 2011 HAIL 1e+04 K
## 900520 2011 WILDFIRE 3e+03 K
## 900610 2011 STRONG WIND 5e+03 K
## 900757 2011 DROUGHT 5e+03 K
## 900762 2011 DROUGHT 5e+03 K
## 900838 2011 WILDFIRE 5e+06 M
## 900882 2011 TORNADO 2e+03 K
## 900913 2011 THUNDERSTORM WIND 2e+03 K
## 900936 2011 FLOOD 1e+03 K
## 900947 2011 FLOOD 1e+03 K
## 901033 2011 FLOOD 1e+03 K
## 901034 2011 FLOOD 1e+03 K
## 901035 2011 FLOOD 1e+03 K
## 901036 2011 FLOOD 1e+03 K
## 901056 2011 FLOOD 1e+03 K
## 901566 2011 STRONG WIND 2e+04 K
## 901567 2011 STRONG WIND 2e+03 K
## 901684 2011 STRONG WIND 1e+03 K
for(i in 1:length(propdmg$PROPDMG))
{
if(propdmg$PROPDMGEXP[i] %in% symb){
index<-grep(propdmg$PROPDMGEXP[i],symb)
propdmg$PROPDMG[i]<-round(val[index]*propdmg$PROPDMG[i],0)
}
}
head(propdmg,20)
## BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## 4889 1991 TORNADO 250000 K
## 4891 1991 TORNADO 2500000 M
## 4944 1991 TORNADO 250000 K
## 5005 1991 TORNADO 250000 K
## 5107 1992 TORNADO 2500000 M
## 5129 1992 TORNADO 250000 K
## 5130 1992 TORNADO 250000 K
## 5163 1992 TORNADO 25000 K
## 5312 1992 TORNADO 25000 K
## 5315 1992 TORNADO 250000 K
## 5318 1992 TORNADO 250000 K
## 5320 1992 TORNADO 250000 K
## 5345 1992 TORNADO 2500000 M
## 5355 1992 TORNADO 25000 K
## 5356 1992 TORNADO 25000 K
## 5358 1992 TORNADO 2500000 M
## 5359 1992 TORNADO 2500000 M
## 5363 1992 TORNADO 2500000 M
## 5366 1992 TORNADO 250000 K
## 5368 1992 TORNADO 2500000 M
tail(propdmg,20)
## BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## 902107 2011 WINTER STORM 1000000 M
## 902187 2011 WINTER WEATHER 5000 K
## 902199 2011 WILDFIRE 1500 K
## 902201 2011 STRONG WIND 60000 K
## 902204 2011 HIGH WIND 15000 K
## 902208 2011 HIGH WIND 100000 K
## 902209 2011 HIGH WIND 10000 K
## 902210 2011 STRONG WIND 50000 K
## 902220 2011 WINTER STORM 10000 K
## 902224 2011 STRONG WIND 10000 K
## 902236 2011 STRONG WIND 25000 K
## 902243 2011 STRONG WIND 2000 K
## 902247 2011 WINTER WEATHER 5000 K
## 902248 2011 WINTER WEATHER 5000 K
## 902249 2011 WINTER STORM 2000 K
## 902250 2011 WINTER STORM 5000 K
## 902255 2011 STRONG WIND 600 K
## 902257 2011 STRONG WIND 1000 K
## 902259 2011 DROUGHT 2000 K
## 902260 2011 HIGH WIND 7500 K
The Economic Impact is evaluated and top 10 events which caused most of economic impact are plotted.
Here I have plotted the top events causing economic impact in terms of property damage and crop damage.
RESULTS
fpropdmg<-aggregate(propdmg$PROPDMG,by=list(propdmg$EVTYPE),FUN="sum")
fcropdmg<-aggregate(cropdmg$CROPDMG,by=list(cropdmg$EVTYPE),FUN="sum")
names(fpropdmg)<-c("Event","DmgValue")
names(fcropdmg)<-c("Event","DmgValue")
fpropdmg<-fpropdmg[order(fpropdmg$DmgValue,decreasing=TRUE),]
fcropdmg<-fcropdmg[order(fcropdmg$DmgValue,decreasing=TRUE),]
print(fpropdmg[1:10,])
## Event DmgValue
## 64 FLOOD 144657709807
## 182 HURRICANE/TYPHOON 69305840000
## 282 STORM SURGE 43323536000
## 334 TORNADO 28897947539
## 51 FLASH FLOOD 16140812067
## 106 HAIL 15732267543
## 174 HURRICANE 11868319010
## 342 TROPICAL STORM 7703890550
## 399 WINTER STORM 6688497251
## 159 HIGH WIND 5270046295
ggplot(fpropdmg[1:10,],aes(Event,DmgValue/1000000000,fill=Event),fill=fpropdmg[,Event])+geom_bar(stat="identity")+geom_text(aes(label=round(DmgValue/1000000000,1)),vjust=-2)+theme(axis.ticks=element_blank(),axis.text.x=element_blank())+xlab("Events")+ylab("Total Damage in $B")+ggtitle("Top Weather Events Which Caused Most Amount Of Property Damage")+ylim(c(0,175))
print(fcropdmg[1:10,])
## Event DmgValue
## 10 DROUGHT 13972566000
## 27 FLOOD 5661968450
## 78 RIVER FLOOD 5029459000
## 72 ICE STORM 5022113500
## 42 HAIL 3025954473
## 64 HURRICANE 2741910000
## 69 HURRICANE/TYPHOON 2607872800
## 23 FLASH FLOOD 1421317100
## 19 EXTREME COLD 1292973000
## 37 FROST/FREEZE 1094086000
ggplot(fcropdmg[1:10,],aes(Event,DmgValue/1000000000,fill=Event),fill=fcropdmg[,Event])+geom_bar(stat="identity")+geom_text(aes(label=round(DmgValue/1000000000,1)),vjust=-2)+theme(axis.ticks=element_blank(),axis.text.x=element_blank())+xlab("Events")+ylab("Total Damage in $B")+ggtitle("Top Weather Events Which Caused Most Amount Of Crop Damage")+ylim(c(0,20))
We can interpret from the plots that Floods, Hurricane/Typhoon have been devastating and have caused majority of the property damage amounting to approx. 20+ billion dollars of public property
Drought and Floods are the 2 events which account for majority crop damages across the US amounting to 23+ Billion Dollars
Conclusion:
Based on the plots we can say that Tornados, and other events such as Floods(+ oterevents such as tsunamis, flash floods) have been the major event to cause maximum fatalities and Injuries.
Flood, Hurrican, Typhoons have been economically devastating throught the period from 1990 to 2010.