The basic goal of this project is to explore the NOAA Storm Database and answer some basic questions about severe weather events. This analysis consists of tables, figures, or other summaries to help readers understand the findings. The program is written using R language, however, you don’t need to know R to be able to successfully comprehend this study.
Introduction: The rate of severe weather events have been more frequent in present days. Overwhelming number of studies show that Global Warming is to blame for such extreme events and the loss of property and lives associated with them. This study studies the how public health and economy get affected due to the storms and other severe weather events.
The data for this study came from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks when and where the major storms and other weather events took place in the United States, including the number of fatalities (if any), injuries and property damage.
This study established following research questions:
Research Question.1. Across the United States, which types of events are most harmful with respect to population health?
Research Question.2. Across the United States, which types of events have the greatest economic consequences?
These reports are prepared for government or municipal managers who are responsible for preparing for severe weather events and who need to prioritize resources for different types of events. However, this report refrains from making specific recommendations.
library(knitr)
library(markdown)
library(rmarkdown)
library(plyr)
library(stats)
getwd()
## [1] "C:/Users/nirma/Documents/Coursera/Data Science/Reproducible Research"
stormdata<-read.table("C:/Users/nirma/Documents/GitHub/RepRes_Project2/repdata_data_StormData.csv.bz2",header = TRUE,sep=",")
dim(stormdata)
## [1] 902297 37
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
names(stormdata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Variables of Interest and their Meanings
CROPDMG:Crop Damage
CROPDMGEXP: Units of Crop Damage
EVTYPE: Event Type (Storm, Tornado, Flood, etc.)
FATALITIES: Number of Fatalities
INJURIES: Number of Injuries
PROPDMG: Property Damage
PROPDMGEXP: Units of Property Damage
myVar<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP",
"CROPDMG","CROPDMGEXP")
newdata<-stormdata[myVar]
dim(newdata)
## [1] 902297 7
names(newdata)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
The statistics of property damage was mentioned in the column ‘PROPDMGEXP’. To assess the damage following steps were taken:
Find the Property Damage Exponent and Level using ‘unique ()’ syntax
Exclude invalid data like “+”, “-”, and “?”
Assign values for the reported damage
unique(newdata$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
newdata$PROPDMGEXP<-mapvalues(newdata$PROPDMGEXP, from= c("K","M","","B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"), to =c(10^3,10^6,1,10^9,10^6,0,1,10^5,10^6,0,10^4,10^2,10^3,10^2,10^7,10^2,0,10,10^8))
newdata$PROPDMGEXP<-as.numeric(as.character(newdata$PROPDMGEXP))
newdata$PROPDMGETOTAL<-(newdata$PROPDMG*newdata$PROPDMGEXP)/1000000000
The statistics of Crop damage was mentioned in the column ‘CROPDMGEXP’. To assess the damage following steps were taken:
Find the Crop Damage Exponent and Level using ‘unique ()’ syntax
Exclude invalid data like “?”
Assign values for the reported damage
unique(newdata$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
newdata$CROPDMGEXP<-mapvalues(newdata$CROPDMGEXP, from= c("","M", "K", "m", "B", "?", "0", "k","2"), to = c(1,10^6, 10^3, 10^6, 10^9, 0, 1, 10^3, 10^2))
newdata$CROPDMGEXP<-as.numeric(as.character(newdata$CROPDMGEXP))
newdata$CROPDMGETOTAL<-(newdata$CROPDMG*newdata$CROPDMGEXP)/1000000000
To assess the population health consequences of the severe weather events we need to check the number of fatalities and injuries caused by such events.
TotFatalities<-aggregate(FATALITIES~EVTYPE, data=newdata, FUN="sum")
TotInjuries<-aggregate(INJURIES~EVTYPE,data=newdata, FUN="sum")
dim(TotFatalities)
## [1] 985 2
dim(TotInjuries)
## [1] 985 2
#Ordering total Fatalities and injuries caused by the top 10 weather events
dedly10fatal<-TotFatalities[order(-TotFatalities$FATALITIES), ][1:10,]
dim(dedly10fatal)
## [1] 10 2
most10injuries<-TotInjuries[order(-TotInjuries$INJURIES), ][1:10,]
dim(most10injuries)
## [1] 10 2
dedly10fatal
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
most10injuries
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
#Plotting 10 Deadliest and most injury causing Weather Events in the United States
par(mfrow=c(1,2),mar=c(12,4,3,2),mgp=c(2.4, 0.8, 0),cex=0.6,bg='gray')
barplot(dedly10fatal$FATALITIES, names.arg=dedly10fatal$EVTYPE, las=2, main="Top Ten Dedliest Weather Events in the U.S.", ylab="Number of Fatalities", col="blue")
barplot(most10injuries$INJURIES, names.arg=most10injuries$EVTYPE, las=2, main="Top Ten Weather Events In Terms of Total Injuries", ylab="Number of Injuries", col="blue")
Tornedo is the cause of most fatalities and injuries in the United States. Excessive heat causes the second highest death, however, it’s causes fourth highest injuries.
It is no brainier that we assess the total property damage and crop damage and add them to find the direct economic consequences of severe weather events.
#Identifying top 10 weather events in terms of property and crop damage
propertydamage<-aggregate(PROPDMGETOTAL~EVTYPE,data=newdata, FUN="sum")
dim(propertydamage)
## [1] 985 2
cropdamage<-aggregate(CROPDMGETOTAL~EVTYPE,data=newdata, FUN="sum")
dim(cropdamage)
## [1] 985 2
#Identifying top 10 weather events in terms of property and crop damage
top10propdamage<-propertydamage[order(-propertydamage$PROPDMGETOTAL), ][1:10,]
top10propdamage
## EVTYPE PROPDMGETOTAL
## 170 FLOOD 144.657710
## 411 HURRICANE/TYPHOON 69.305840
## 834 TORNADO 56.947381
## 670 STORM SURGE 43.323536
## 153 FLASH FLOOD 16.822674
## 244 HAIL 15.735268
## 402 HURRICANE 11.868319
## 848 TROPICAL STORM 7.703891
## 972 WINTER STORM 6.688497
## 359 HIGH WIND 5.270046
top10cropdamage<-cropdamage[order(-cropdamage$CROPDMGETOTAL), ][1:10,]
top10cropdamage
## EVTYPE CROPDMGETOTAL
## 95 DROUGHT 13.972566
## 170 FLOOD 5.661968
## 590 RIVER FLOOD 5.029459
## 427 ICE STORM 5.022113
## 244 HAIL 3.025954
## 402 HURRICANE 2.741910
## 411 HURRICANE/TYPHOON 2.607873
## 153 FLASH FLOOD 1.421317
## 140 EXTREME COLD 1.292973
## 212 FROST/FREEZE 1.094086
par(mfrow=c(1,2),mar=c(12,4,3,2),mgp=c(2.4, 0.8, 0),cex=0.6,bg='gray')
barplot(top10propdamage$PROPDMGETOTAL, names.arg=top10propdamage$EVTYPE, las=2, main="Top Ten Weather Events Causing Property Damage", ylab="Total Property Damage", col="blue")
barplot(top10cropdamage$CROPDMGETOTAL, names.arg=top10cropdamage$EVTYPE, las=2, main="Top Ten Weather Events Causing Weather Events", ylab="Total Crop Damage", col="blue")
** Flood is by far the cause of the most property damage in the United States followed by Hurricane/Typhone, Tornedo, Storm Surge, Flash Flood. However, Drought is the biggest cause of the crop loss. The impact of flood is almost half of the impact of draught.**