Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This assignment will examine and analyse the database to determine the answers to the following questions
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Loading the library packages
require(plyr)
require(data.table)
Setting the working directory and preparing the database to be used for the analysis.
setwd("C:/Users/rohidah/Google Drive/Data Science/Reproducible Research/Assignment 2")
datas <- read.csv("repdata-data-StormData.csv.bz2")
datas$EVTYPE = toupper(datas$EVTYPE)
Checking the dimension of the data set to give ideas on information available in the data set.
dim(datas)
## [1] 902297 37
head(datas)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
From the above tables, we are extracting only information useful for our analysis namely data from the following columns: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP
newdataset <- datas[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")]
Quite a few event types are recorded differently even though they are of similar event. Thus this section will do some tidying up to the event type data to ensure consistencies and accuracies of the events.
newdataset[newdataset$EVTYPE == "TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
newdataset[newdataset$EVTYPE == "THUNDERSTORM WINDS", ]$EVTYPE = "THUNDERSTORM WIND"
newdataset[newdataset$EVTYPE == "RIVER FLOOD", ]$EVTYPE = "FLOOD"
newdataset[newdataset$EVTYPE == "FLASH FLOODING", ]$EVTYPE = "FLASH FLOOD"
newdataset[newdataset$EVTYPE == "FLOOD/FLASH FLOOD",]$EVTYPE = "FLASH FLOOD"
newdataset[newdataset$EVTYPE == "HURRICANE/TYPHOON", ]$EVTYPE = "HURRICANE-TYPHOON"
newdataset[newdataset$EVTYPE == "HURRICANE", ]$EVTYPE = "HURRICANE-TYPHOON"
head(newdataset)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
As the value of the damages are recorded using an indicator where H - Hudred, K - thousands, M - millions, B - Billions, a calculation of the cost needs to be done so that the costing reflects the correct values.
calcdmg <- function(dmg, dmgexp) dmg * switch(toupper(dmgexp), H=100, K=1000, M=1000000, B=1000000000, 1)
newdataset$pdmg <- mapply(calcdmg, newdataset$PROPDMG, newdataset$PROPDMGEXP)
newdataset$cdmg <- mapply(calcdmg, newdataset$CROPDMG, newdataset$CROPDMGEXP)
The below code is just a check point to ensure all data are filled and there is no data with NA.
sum (is.na (newdataset))
## [1] 0
To analyse the impact of the events towards human health and safety, a new colum called totalfatin is created to record the sum of both fatalities and injuries.
fatin <- mutate(newdataset, totalfatin = FATALITIES + INJURIES)
The data set is then filtered to create a subset called fatin2, containing only event type and the total fatal and injuries for ease refence.
fatin2 <- aggregate(totalfatin ~ EVTYPE, fatin, sum)
The subset fatin2 is then sorted to displat the top 5 severe events
top5fatin <- head(fatin2[order(fatin2$totalfatin, decreasing=TRUE), ], 5)
To analyse the impact of severe weathers towards economy, new colum called cpdmg is created to record the total damages to crop as well as properties.
cpdmg <-mutate(newdataset, totalcpdmg = cdmg + pdmg)
The data set is then filtered to create a subset called cpdmg2, containing only event type and the total damages for ease refence.
cpdmg2 <- aggregate(totalcpdmg ~ EVTYPE, cpdmg, sum)
The subset fatin2 is then sorted to display the top 5 severe wheather events that led to bad economic consequences
top5cpdmg <- head(cpdmg2[order(cpdmg2$totalcpdmg, decreasing=TRUE), ], 5)
From the above analysis, 2 graphs are plotted. One is a graph displaying 5 most harmful events to human health while another one is a graph depicting the most harmful event that lead to bas economic consequences.
barplot(top5fatin$totalfatin, main="Top 5 Most Most Harmful Events to Human Health", xlab="events", ylab="total fatalities", col =rainbow(5))
legend("topright", top5fatin$EVTYPE, cex = 0.6, fill=rainbow(5))
barplot(top5cpdmg$totalcpdmg, main="Top 5 Most events lead to greatest economic consequences", xlab="events", ylab="total damages(USD)", col =rainbow(5))
legend("topright", top5cpdmg$EVTYPE, cex = 0.6, fill=rainbow(5))
As such, the following conclusions can be made: