knitr::opts_chunk$set(warning=FALSE)
Storms and weather related events can not only damage property and crops but lead to significant injuries and fatalities. Therefore, it is essential to prevent such outcomes to the best extent possible. The information on the occurance of major storms and weather-related events and their economic and health costs are provided in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events listed start in year 1950 and end in November 2011. The data is cleaned and the major weather-related events are classified into tweleve categories. Tornadoes have the greatest impact in terms of injuries and fatalities, however floods cause the most economic damage in terms of their impact on property and crops. The following report contains information on the data processing, the calculation of the health and economic costs and presents the results in the form of figures.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(RColorBrewer)
The url is provided on the course assignment page. The file is listed as BZIP2 Compressed file, which is downloaded on to the local disk. As running this code chunk is time-consuming, cache = TRUE option is used.
setwd("E:/Coursera/Reproducible Research/Week 4 Peer Review")
if(!file.exists("./data/stormdata.csv.bz2")){dir.create("./data")
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile="./data/stormdata.csv.bz2")
bunzip2("./data/stormdata.csv.bz2", overwrite = F)
}
if(!exists("stormdata")) {
stormdata <- read.csv(bzfile("./data/stormData.csv.bz2"),header = TRUE)
}
Determining the structure of the dataset to better understanding the contents of the relevant variables.
str(stormdata)
## 'data.frame': 816264 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 15819 levels "1/1/1966 0:00:00",..: 6273 6273 4048 10786 2121 2121 2155 369 3794 3794 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 27114 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 12382 1608 4232 9662 4006 9214 1708 21855 22302 4232 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 50886 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6145 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 32363 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 22765 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 370154 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Selecting the relevant variables needed for the study. The information should determine the number of fatalities and injuries as well as damage to property and crop from each type of weather events.
storm <- select(stormdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(storm)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Once B,M and K are replaced by their numerical values, the total damaged value in dollar terms is calculated by multiplying the values listed in the prior column specifiying the level of damage and the column containing the replaced numerical values. This will give us a variable for property and crop damage, which is then aggregated later in the study to obtain economic loss.
#Replacing B,M and K in PROPDMGEXP variable with numeric values
storm$PROPDMGEXP <- as.character(storm$PROPDMGEXP)
storm$PROPDMGEXP[!grepl("K|M|B", storm$PROPDMGEXP) ] <- "1"
storm$PROPDMGEXP[grep("K", storm$PROPDMGEXP, ignore.case = TRUE)] <- "1000"
storm$PROPDMGEXP[grep("M", storm$PROPDMGEXP)] <- "1000000"
storm$PROPDMGEXP[grep("B", storm$PROPDMGEXP)] <- "1000000000"
storm$PROPDMGEXP<- as.numeric(as.character(storm$PROPDMGEXP))
storm$propdmg <- storm$PROPDMG * storm$PROPDMGEXP
#Replacing B,M and K in CROPDMGEXP variable with numeric values
storm$CROPDMGEXP <- as.character(storm$CROPDMGEXP)
storm$CROPDMGEXP[!grepl("K|M|B", storm$CROPDMGEXP) ] <- "1"
storm$CROPDMGEXP[grep("K", storm$CROPDMGEXP, ignore.case = TRUE)] <- "1000"
storm$CROPDMGEXP[grep("M", storm$CROPDMGEXP)] <- "1000000"
storm$CROPDMGEXP[grep("B", storm$CROPDMGEXP)] <- "1000000000"
storm$CROPDMGEXP<- as.numeric(as.character(storm$CROPDMGEXP))
storm$cropdmg <- storm$CROPDMG * storm$CROPDMGEXP