Author: D Elia
Sunday 26th October 2014

Reproducible Research - Assignment 2

Assessment of the impact of severe weather events in the United States

Synopsis

The aim of this report is to assess the impact of severe weather events across the United States.Using data from the National Oceanic and Atmospheric Administration (NOOA) database, the analysis shows which type of event is most harmful to the population and causes the greatest economic damages.
The results indicate that, across the United States, the most harmful event to the population is tornado.
In terms of economic damages, the most harmful event across the United States is flood.
It is recommneded that the municipality managers develops a policy to limit economic damages and prevent casualties in those states and areas most exposed to these extreme weather events.

Data Processing

The database is first downloaded into the current working directory and read into a .csv file.

# 1. Uploading file and reading data

# 1.1 Setting working directory
setwd("C:/Users/Daniela/Coursera/Reproducible Research/RepRes_Assessment2")

# 1.2 Download file
if (!"repdata-data-StormData.csv" %in% dir("./Data/")) {
url<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,dest="repdata-data-StormData.csv.bz2")
stormdata <- read.csv(bzfile("repdata-data-StormData.csv.bz2"),head= TRUE,sep=",",na.strings = "NA", stringsAsFactors = FALSE)
}

Of the entire dataset, only a few variables are needed to conduct the analysis.
The code below creates a dataset with only the variables used to determine which weather event is most harmful to the population in the United States and which one causes more economic damage.

## Create subset of the variables to be used in the analsys
dataset <- subset(stormdata, select=c(EVTYPE, FATALITIES:WFO))

The code below is used to format and clean the data. EVTYPE values are formatted in lower case. Weather event descriptions are standardised. Data for crop and property damages are converted in the same unit of measurement.

# 2. Cleaning data

# 2.3.1 Creating new cleaned dataset with EVTYPE converted to lower case
data<-dataset
data$EVTYPE <- tolower(stormdata$EVTYPE)

# 2.3.2 Standardising EVTYPE values
data$EVTYPE<-gsub("^(thunderstormwind|thunderstormwinds|tstmwindhail)$", "tstmwind", stormdata$EVTYPE)
data$EVTYPE<-gsub("highwinds", "highwind", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("marinethunderstormwind", "marinetstmwind", stormdata$EVTYPE)
data$EVTYPE<-gsub("coastalflooding", "coastalflood", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("^(extremecold|extremewindchill)$", "extremecoldwindchill", stormdata$EVTYPE)
data$EVTYPE<-gsub("^(floodflashflood|flashflooding)$", "flashflood", stormdata$EVTYPE)
data$EVTYPE<-gsub("flooding", "flood", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("ripcurrents", "ripcurrent", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("strongwinds", "strongwind", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("winterweathermix", "winterweather", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("heavysurfhighsurf", "highsurf", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("stormsurgetide", "stormsurge", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("urbanflood", "urbansmlstreamfld", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("wildforestfire", "wildfire", stormdata$EVTYPE, fixed=TRUE)
data$EVTYPE<-gsub("^wind$", "strongwind", stormdata$EVTYPE)

# 2.3.3 Standardising unit values for CROPDMG

# 2.3.3.1 Setting multipliers for CROPDM
data$cropMult <- 1.0
data$cropMult[data$CROPDMGEXP == "B"] <- 1000000000.0
data$cropMult[data$CROPDMGEXP == "K"] <- 1000.0
data$cropMult[data$CROPDMGEXP == "M"] <- 1000000.0
data$cropMult[data$CROPDMGEXP == "m"] <- 1000000.0
data$cropMult[data$CROPDMGEXP == "k"] <- 1000.0

# 2.3.3.2 Standardising unit values for CROPDMG
data$crop <- data$CROPDMG * data$cropMult

# 2.3.4 Standardising unit values for PROPDMG

# 2.3.4.1 Setting multipliers for PROPDMG
data$propMult <- 1.0
data$propMult[data$PROPDMGEXP == "B"] <- 1000000000.0
data$propMult[data$PROPDMGEXP == "K"] <- 1000.0
data$propMult[data$PROPDMGEXP == "M"] <- 1000000.0
data$propMult[data$PROPDMGEXP == "m"] <- 1000000.0
data$propMult[data$PROPDMGEXP == "k"] <- 1000.0

# 2.3.4.2 Formatting PROPDMG so that all values are in the same unit
data$prop <- data$PROPDMG * data$propMult

The code below summarised the data and creates the graphs used to support the results of the analysis.

# 3. Analyse data

# 3.1.1 Define two variable, one that accounts for the total number of casualties,
# the other for the total amount of damages caused in USD.

data$casualties<- data$FATALITIES+data$INJURIES
data$damages<-data$prop+data$crop

require(plyr)
## Loading required package: plyr
#3.2 Casualties

# 3.2.1 Create summary of number of casulaties by weather event
casualties_by_event<-ddply(data, "EVTYPE", summarize, cas=sum(casualties))

# 3.2.2 Order the data in descending order and select top 10 events by casualties
data_plot_casualties_order<-arrange(casualties_by_event, desc(cas))
data_plot_casualties<-data_plot_casualties_order[1:10,]

# 3.2.3 Create plot
plot1<- barplot(data_plot_casualties$cas, space=0.2, axes = FALSE, axisnames=FALSE, main = "CASUALTIES BY WEATHER EVENT ACROSS THE U.S. (1950-2011)"
, ylab="Total number of casualties", col="dark blue")
text(plot1, par("usr")[3], labels = data_plot_casualties$EVTYPE, srt=45, adj=c(1.1, 1.1), xpd=TRUE, cex=.9)
axis(2)

The picture above shows that tornadoes cause the most of the casualties, in terms of injuries and casualties, in the United States for the period considered.

#3.3 Economic Damages

# 3.3.1 Create summary of number of damulaties by weather event
damages_by_event<-ddply(data, "EVTYPE", summarize, dam=sum(damages/10000000))

# 3.3.2 Order the data in descending order and select top 10 events by damages
data_plot_damages_order<-arrange(damages_by_event, desc(dam))
data_plot_damages<-data_plot_damages_order[1:10,]

# 3.3.3 Create plot
plot2<- barplot(data_plot_damages$dam, space=0.2, axes = FALSE, axisnames=FALSE, main = "DAMAGES BY WEATHER EVENT ACROSS THE U.S. (1950-2011)"
                , ylab="US$ bn", col="dark blue")
text(plot2, par("usr")[3], labels = data_plot_damages$EVTYPE, srt=45, adj=c(1.1, 1.1), xpd=TRUE, cex=.9)
axis(2)

The picture above shows that floods are the most harmful weather event in terms of economic damages to properties and crops combined in the United States for the period considered.

Results

The weather event that caused the largest number of casualties in the United Staes between 1950 and 2011 is tornado. The weather event that caused the biggest amount of economic damages in the United States between 1950 and 2011 is flood.

*Data Source:https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2