Synopsis

This is a project under the Reproducible Research Course. The analysis aim to answer the following questions a) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? b) Across the United States, which types of events have the greatest economic consequences?

The analysis used the historical data about severe weather events in U.S. from 1950 to November 2011, which are available from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

Results of the analysis show that although tornado cause more injuries and fatalaties than any other climatic events, it is flood that has cause more economic damages in the United States between 1950 to 2011.

Setting the R Studio

The following codes will set the working directory to the location of the data file. It will also load the necessary packages that will be used in the analysis of the data.

setwd("E:/ROEL/JOHNS HOPKINS UNIVERSITY/REPRODUCIBLE RESEARCH")
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## R.utils v2.7.0 successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings

Loading the Data

The following codes will extract the data from the bz2 format and will then read the data.

bunzip2("repdata_data_StormData.csv.bz2", "dataset.csv", remove = FALSE, skip = TRUE)
## [1] "dataset.csv"
## attr(,"temporary")
## [1] FALSE
dataset <- read.csv("dataset.csv")

Processing the data

We will take a look at the dimensions of the dataset.

dim(dataset)
## [1] 902297     37
(echo = TRUE)
## [1] TRUE

There are 37 fields in the dataset. We will look into 7 variables or fields, namely: “EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, “CROPDMGEXP”

data <- dataset[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP","CROPDMG", "CROPDMGEXP")]
sum (is.na (data))
## [1] 0
summary(data)
##      STATE                      EVTYPE         FATALITIES      
##  TX     : 83728   HAIL             :288661   Min.   :  0.0000  
##  KS     : 53440   TSTM WIND        :219940   1st Qu.:  0.0000  
##  OK     : 46802   THUNDERSTORM WIND: 82563   Median :  0.0000  
##  MO     : 35648   TORNADO          : 60652   Mean   :  0.0168  
##  IA     : 31069   FLASH FLOOD      : 54277   3rd Qu.:  0.0000  
##  NE     : 30271   FLOOD            : 25326   Max.   :583.0000  
##  (Other):621339   (Other)          :170878                     
##     INJURIES            PROPDMG          PROPDMGEXP        CROPDMG       
##  Min.   :   0.0000   Min.   :   0.00          :465934   Min.   :  0.000  
##  1st Qu.:   0.0000   1st Qu.:   0.00   K      :424665   1st Qu.:  0.000  
##  Median :   0.0000   Median :   0.00   M      : 11330   Median :  0.000  
##  Mean   :   0.1557   Mean   :  12.06   0      :   216   Mean   :  1.527  
##  3rd Qu.:   0.0000   3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000  
##  Max.   :1700.0000   Max.   :5000.00   5      :    28   Max.   :990.000  
##                                        (Other):    84                    
##    CROPDMGEXP    
##         :618413  
##  K      :281832  
##  M      :  1994  
##  k      :    21  
##  0      :    19  
##  B      :     9  
##  (Other):     9
head(data)
##   STATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1    AL TORNADO          0       15    25.0          K       0           
## 2    AL TORNADO          0        0     2.5          K       0           
## 3    AL TORNADO          0        2    25.0          K       0           
## 4    AL TORNADO          0        2     2.5          K       0           
## 5    AL TORNADO          0        2     2.5          K       0           
## 6    AL TORNADO          0        6     2.5          K       0

Results

Deaths

A barplot is presented below to determine the top 20 climatic events that cause death in the United States between the period of 1950 to 2011.

death <- aggregate (FATALITIES~EVTYPE, data, sum)
death <- death [order(death$FATALITIES, decreasing=TRUE),]
par(mar=c(15, 5, 1, 2))
barplot (height = death$FATALITIES[1:20], names.arg = death$EVTYPE[1:20], las = 2, cex.names= 0.8,
         col = 'blue')
title (main = "Top 20 Events that cause death", line=-1)
title (ylab = "Number of deaths", line=3)

Result show that tornado and excessive heat are the major causes of death in the United States.

Injuries

A barplot is presented below to determine the top 20 climatic events that cause injury in the United States between the period of 1950 to 2011.

injury <- aggregate(INJURIES~EVTYPE, data, sum)
injury <- injury[order(injury$INJURIES, decreasing=TRUE),]
par(mar=c(15, 5, 1, 2))
barplot (height = injury$INJURIES[1:20], names.arg = injury$EVTYPE[1:20], las = 2, cex.names= 0.8,
         col = 'yellow')
title (main = "Top 10 Events that cause injury", line=-1)
title (ylab = "Number of injury", line=4)

Result show that tornado is the major cause of injury in the United States.

Economic Damages

The total economic damages of the different climatic events are analyzed in order to determine what event causes major economic damages.

symbol <- c("", "+", "-", "?", 0:9, "h", "H", "k", "K", "m", "M", "b", "B");
factor <- c(rep(0,4), 0:9, 2, 2, 3, 3, 6, 6, 9, 9)
multiplier <- data.frame (symbol, factor)

data$damage.prop <- data$PROPDMG*10^multiplier[match(data$PROPDMGEXP,multiplier$symbol),2]
data$damage.crop <- data$CROPDMG*10^multiplier[match(data$CROPDMGEXP,multiplier$symbol),2]
data$damage <- data$damage.prop + data$damage.crop

damage <- aggregate (damage~EVTYPE, data, sum);
damage$bilion <- damage$damage / 1e9;
damage <- damage [order(damage$bilion, decreasing=TRUE),]

par(mar=c(12, 6, 1, 1))
barplot (height = damage$bilion[1:20], names.arg = damage$EVTYPE[1:20], las = 2, cex.names = 0.8,
         col = 'red')
title (" Top 20 Events that Cause Economic Damages", line=-5)
title (ylab = "Total damage (bilion USD)")

Flood is the climatic event that causes the highest economic damage. It is followed by Hurricance Typhoon and Tornado.

Conclusion

Although Tornado has cause more injuries and fatalaties, it is flood that has cause more economic damages.