The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The data analysis must address the following questions:
1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2.Across the United States, which types of events have the greatest economic consequences?
The results show that in the past 60 years tornados are most harmful with respect to population health and floods have the greatest economic consequences.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plyr)
## Warning: package 'plyr' was built under R version 3.3.2
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
library(grid)
library(car)
## Warning: package 'car' was built under R version 3.3.2
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
Downloading data from the website:
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileZip <- "repdata-data-StormData.csv.bs2"
download.file(fileUrl, fileZip, mode = "wb")
data <- read.csv("repdata-data-StormData.csv.bs2", header = TRUE, sep=",")
tidydata <- data[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
First we order the number of casualties descending and then the six most harmful types of events are shown:
casualties <- with(data, aggregate(INJURIES + FATALITIES ~ EVTYPE, data=data, FUN = "sum"))
names(casualties)[2] <- "Totalcasualties"
# Order the number of casualties descending
ordered_casualties <- casualties[order(-casualties$Totalcasualties),]
# Just see top6 using head()
Top6 <- head(ordered_casualties)
barplot(Top6$Totalcasualties, main = "Which event caused the most harmful with respect to population health", ylab = "Total number of casualties", names.arg=Top6$EVTYPE, cex.names=0.6)
The following table shows the most harmful events and numbers of casualties:
Top6
## EVTYPE Totalcasualties
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
To answer, we’ll need only variables: EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,ROPDMGEXP.
EconomicCons <- data[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
To get the real values of the damage, it is required to multiply the -DMG variables with the exponent variables i.e. the -DMGEXP values. CROPDMGEXP and PROPDMGEXP are showing the units in which damage is expressed: h - hundred, k - thousand, m - milion, b - bilion. Unknown values are recoded as 1.
EconomicCons$PROPDMGEXP <- as.numeric(Recode(as.character(EconomicCons$PROPDMGEXP),"'K'=10^3;'M'=10^6;''=1;'B'=10^9;'+'=1;'0'=1;'5'=10^5;'6'=10^6; '?'=1;'4'=10^4;'2'=10^2;'3'=10^3;'h'=10^2;'7'=10^7;'H'=10^2;'-'=1;'8'=10^8"))
## Warning: NAs durch Umwandlung erzeugt
## Warning: NAs introduced by coercion
EconomicCons$CROPDMGEXP <- as.numeric(Recode(as.character(EconomicCons$CROPDMGEXP),"''=1;
'M'=10^6;'K'=10^3;'m'=10^6;'B'=10^9;'?'=1;'0'=1;
'k'=10^3;'2'=10^2"))
The total economic damage is the sum of the property damage and crop damage after multiplying with the exponent variables. We interested only in the top ten event types for economic impact.
EconomicConsTotal <- mutate(EconomicCons, TOTALDMG = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
EconomicConsAggr <- aggregate(EconomicConsTotal$TOTALDMG, by=list(EconomicConsTotal$EVTYPE),FUN=sum)
names(EconomicConsAggr) <- c("EVTYPE","SUMEVENTS")
Economic10 <- (EconomicConsAggr[order(EconomicConsAggr$SUMEVENTS,decreasing = TRUE),])[1:10,]
rownames(Economic10) <- NULL
barplot(Economic10$SUMEVENTS,main = "Most harmful events and economic consequences", ylab = "Damages ($)", names.arg = Economic10$EVTYPE,
cex.axis = 0.5,cex.names = 0.5, las = 2)
The following table shows the ten most harmful events and economic consequences in $:
Economic10
## EVTYPE SUMEVENTS
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 STORM SURGE 43323541000
## 4 FLASH FLOOD 18243991079
## 5 DROUGHT 15018672000
## 6 HURRICANE 14610229010
## 7 RIVER FLOOD 10148404500
## 8 ICE STORM 8967041360
## 9 TROPICAL STORM 8382236550
## 10 WINTER STORM 6715441251