Based on data provided by NOAA, we identify the event types which caused most economical and health damages between 1950 and November 2001.
As there are more than 900 different types, we only provide the top 20 most damaging types causing most damages.
We download the NOAA data, extract the data about health and economical impact of severe weather episodes.
As economical damages are expressed as number + exponent (like 10 k, but also 10 7), we have to transform the exponents in a usable format.
According to the documentation, k stands for one thousand, M for one million and B for one billion. We have decided to convert everything to numerical exponents. As some exponents present in the data are not documented, we have taken the following assumptions:
The data contain 2 types of economical damages: damages to property and to crops. We have added the dollar values of the 2 damage categories.
There are also 2 types of health damages: fatalities and casualties. As it is not possible to compare the values of a fatality with an injury, we have decided to consider the number of impacted persons by adding the 2 numbers.
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.20.0 (2016-02-17) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.4.0 (2016-09-13) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(data.table)
InDirectory <- "Z:/Professionnel/Cours/R Code/Assignment 8"
ZipFile <- "repdata_data_StormData.csv.bz2"
FullZip <- paste(InDirectory,"/", ZipFile, sep="")
CSVFile <- "StormData.csv"
FullCSV <- paste(InDirectory,"/", CSVFile, sep="")
if (!file.exists(FullZip)){
download.file(
"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
, FullZip)
}
# Unzip the file
bunzip2(FullZip, FullCSV, remove = FALSE, skip = TRUE)
## [1] "Z:/Professionnel/Cours/R Code/Assignment 8/StormData.csv"
## attr(,"temporary")
## [1] FALSE
# Read the CSV in data
data<-fread(FullCSV)
##
Read 0.0% of 967216 rows
Read 22.7% of 967216 rows
Read 39.3% of 967216 rows
Read 53.8% of 967216 rows
Read 66.2% of 967216 rows
Read 78.6% of 967216 rows
Read 88.9% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:01:10
## Warning in fread(FullCSV): Read less rows (902297) than were allocated
## (967216). Run again with verbose=TRUE and please report.
# make it a data frame
impact <-as.data.frame(data)
rm(data)
impact$EVTYPE <- as.factor(impact$EVTYPE)
impact$PROPDMGEXP <- as.factor(impact$PROPDMGEXP)
impact$CROPDMGEXP <- as.factor(impact$CROPDMGEXP)
# Convert litteral exponents to numeric
rawexp <- c("","-","?","+","0","1","2","3","4","5","6","7","8","B","h","H","K","m","M")
numexp <- c(0,0,0,0,0,1,2,3,4,5,6,7,8,9,0,0,3,6,6)
convertexp<-data.frame(rawexp, numexp)
merged<-merge(impact, convertexp, by.x="PROPDMGEXP", by.y="rawexp")
merged$property<-merged$PROPDMG*(10^merged$numexp)
merged<-merged[, !(names(merged) == "numexp")]
impact<-merge(merged, convertexp, by.x="CROPDMGEXP", by.y="rawexp")
impact$crop<-impact$CROPDMG*(10^impact$numexp)
# just keep the relevant columns
keep <- c("EVTYPE", "FATALITIES", "INJURIES", "property", "crop")
impact <- impact[keep]
# sum by event type
aggimpact <-aggregate(.~EVTYPE, data=impact, sum)
Economical damages are split between damages to property and to crops, expressed in dollar amounts.
# sort in decreasing order of economical damage
aggimpact<-aggimpact[order(aggimpact$property+aggimpact$crop, decreasing=TRUE),]
par(mar=c(15,5,1,1))
barplot(t(as.matrix(aggimpact[1:20,c(4,5)])),
col=c("red", "blue"),
names.arg=aggimpact$EVTYPE[1:20],
las=2,
cex.names=0.5,
main = "Economical Damages (USD)",
legend=c("Property", "Crop"))
The most damaging event types are Flood, followed by Hurricane/Typhoon and Tornado.
Personal damages are split between Injuries and Fatalities, expressed in number of persons impacted.
# sort in decreasing order of personal damage
aggimpact<-aggimpact[order(aggimpact$FATALITIES+aggimpact$INJURIES,
decreasing=TRUE),]
barplot(t(as.matrix(aggimpact[1:20,c(2,3)])),
col=c("red", "blue"),
names.arg=aggimpact$EVTYPE[1:20],
las=2,
cex.names=0.5,
main = "Health Damages (Persons)",
legend=c("Fatalities", "Injuries"))
The most damaging event types are Tornado, followed by Excessive heat and TSTM Wind.