This is a project under the Reproducible Research Course. The analysis aim to answer the following questions a) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? b) Across the United States, which types of events have the greatest economic consequences?
The analysis used the historical data about severe weather events in U.S. from 1950 to November 2011, which are available from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
Results of the analysis show that although tornado cause more injuries and fatalaties than any other climatic events, it is flood that has cause more economic damages in the United States between 1950 to 2011.
The following codes will set the working directory to the location of the data file. It will also load the necessary packages that will be used in the analysis of the data.
setwd("E:/ROEL/JOHNS HOPKINS UNIVERSITY/REPRODUCIBLE RESEARCH")
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.7.0 successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
The following codes will extract the data from the bz2 format and will then read the data.
bunzip2("repdata_data_StormData.csv.bz2", "dataset.csv", remove = FALSE, skip = TRUE)
## [1] "dataset.csv"
## attr(,"temporary")
## [1] FALSE
dataset <- read.csv("dataset.csv")
We will take a look at the dimensions of the dataset.
dim(dataset)
## [1] 902297 37
(echo = TRUE)
## [1] TRUE
There are 37 fields in the dataset. We will look into 7 variables or fields, namely: “EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, “CROPDMGEXP”
data <- dataset[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP","CROPDMG", "CROPDMGEXP")]
sum (is.na (data))
## [1] 0
summary(data)
## STATE EVTYPE FATALITIES
## TX : 83728 HAIL :288661 Min. : 0.0000
## KS : 53440 TSTM WIND :219940 1st Qu.: 0.0000
## OK : 46802 THUNDERSTORM WIND: 82563 Median : 0.0000
## MO : 35648 TORNADO : 60652 Mean : 0.0168
## IA : 31069 FLASH FLOOD : 54277 3rd Qu.: 0.0000
## NE : 30271 FLOOD : 25326 Max. :583.0000
## (Other):621339 (Other) :170878
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0000 Min. : 0.00 :465934 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 K :424665 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 M : 11330 Median : 0.000
## Mean : 0.1557 Mean : 12.06 0 : 216 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000
## Max. :1700.0000 Max. :5000.00 5 : 28 Max. :990.000
## (Other): 84
## CROPDMGEXP
## :618413
## K :281832
## M : 1994
## k : 21
## 0 : 19
## B : 9
## (Other): 9
head(data)
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 AL TORNADO 0 15 25.0 K 0
## 2 AL TORNADO 0 0 2.5 K 0
## 3 AL TORNADO 0 2 25.0 K 0
## 4 AL TORNADO 0 2 2.5 K 0
## 5 AL TORNADO 0 2 2.5 K 0
## 6 AL TORNADO 0 6 2.5 K 0
A barplot is presented below to determine the top 20 climatic events that cause death in the United States between the period of 1950 to 2011.
death <- aggregate (FATALITIES~EVTYPE, data, sum)
death <- death [order(death$FATALITIES, decreasing=TRUE),]
par(mar=c(15, 5, 1, 2))
barplot (height = death$FATALITIES[1:20], names.arg = death$EVTYPE[1:20], las = 2, cex.names= 0.8,
col = 'blue')
title (main = "Top 20 Events that cause death", line=-1)
title (ylab = "Number of deaths", line=3)
Result show that tornado and excessive heat are the major causes of death in the United States.
A barplot is presented below to determine the top 20 climatic events that cause injury in the United States between the period of 1950 to 2011.
injury <- aggregate(INJURIES~EVTYPE, data, sum)
injury <- injury[order(injury$INJURIES, decreasing=TRUE),]
par(mar=c(15, 5, 1, 2))
barplot (height = injury$INJURIES[1:20], names.arg = injury$EVTYPE[1:20], las = 2, cex.names= 0.8,
col = 'yellow')
title (main = "Top 10 Events that cause injury", line=-1)
title (ylab = "Number of injury", line=4)
Result show that tornado is the major cause of injury in the United States.
The total economic damages of the different climatic events are analyzed in order to determine what event causes major economic damages.
symbol <- c("", "+", "-", "?", 0:9, "h", "H", "k", "K", "m", "M", "b", "B");
factor <- c(rep(0,4), 0:9, 2, 2, 3, 3, 6, 6, 9, 9)
multiplier <- data.frame (symbol, factor)
data$damage.prop <- data$PROPDMG*10^multiplier[match(data$PROPDMGEXP,multiplier$symbol),2]
data$damage.crop <- data$CROPDMG*10^multiplier[match(data$CROPDMGEXP,multiplier$symbol),2]
data$damage <- data$damage.prop + data$damage.crop
damage <- aggregate (damage~EVTYPE, data, sum);
damage$bilion <- damage$damage / 1e9;
damage <- damage [order(damage$bilion, decreasing=TRUE),]
par(mar=c(12, 6, 1, 1))
barplot (height = damage$bilion[1:20], names.arg = damage$EVTYPE[1:20], las = 2, cex.names = 0.8,
col = 'red')
title (" Top 20 Events that Cause Economic Damages", line=-5)
title (ylab = "Total damage (bilion USD)")
Flood is the climatic event that causes the highest economic damage. It is followed by Hurricance Typhoon and Tornado.
Although Tornado has cause more injuries and fatalaties, it is flood that has cause more economic damages.