The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Your data analysis must address the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
if(!file.exists("stormdata.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormdata.bz2", method="curl")
}
StormData <- read.csv(bzfile("stormdata.bz2"), sep=",", header=T)
At first we change time format from the BGN_DATE variable by removing time component.
library("lubridate")
StormData$DATE <- gsub(" 0:00:00", "", StormData$BGN_DATE)
StormData$DATE <- strptime(StormData$DATE, "%m/%d/%Y")
To answer, we’ll need only variables: EVTYPE, FATALITIES and INJURIES. We’ll look at the combined total of FATALITIES and INJURIES by the type of weather event, order it and look at the top ten harmful event types. To find most harmful events sort sum of events in descending order and subset only top 10 results.
Harmful <- StormData[,c("EVTYPE","FATALITIES","INJURIES")]
SumHarmful <- aggregate(Harmful$FATALITIES + Harmful$INJURIES,
by=list(Harmful$EVTYPE),FUN=sum)
names(SumHarmful) <- c("EVTYPE","SUMEVENTS")
Harmful10 <- (SumHarmful[order(SumHarmful$SUMEVENTS,decreasing = TRUE),])[1:10,]
rownames(Harmful10) <- NULL
barplot(Harmful10$SUMEVENT,names.arg = Harmful10$EVTYPE, cex.axis = 0.6,cex.names = 0.5, las = 2)
The top 10 most harmful events with respect to population health are:
Harmful10
## EVTYPE SUMEVENTS
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
To answer, we’ll need only variables: EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,ROPDMGEXP.
EconomicCons <- StormData[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
To get the real values of the damage, it is required to multiply the -DMG variables with the exponent variables i.e. the -DMGEXP values. CROPDMGEXP and PROPDMGEXP are showing the units in which damage is expressed: h - hundred, k - thousand, m - milion, b - bilion. Unknown values are recoded as 1.
library(car)
EconomicCons$PROPDMGEXP <- as.numeric(Recode(as.character(EconomicCons$PROPDMGEXP),"'K'=10^3;'M'=10^6;''=1;'B'=10^9;'+'=1;'0'=1;'5'=10^5;'6'=10^6; '?'=1;'4'=10^4;'2'=10^2;'3'=10^3;'h'=10^2;'7'=10^7;'H'=10^2;'-'=1;'8'=10^8"))
## Warning: NAs introduced by coercion
EconomicCons$CROPDMGEXP <- as.numeric(Recode(as.character(EconomicCons$CROPDMGEXP),"''=1;
'M'=10^6;'K'=10^3;'m'=10^6;'B'=10^9;'?'=1;'0'=1;
'k'=10^3;'2'=10^2"))
The total economic damage is the sum of the property damage and crop damage after multiplying with the exponent variables. We interested only in the top ten event types for economic impact.
library(plyr)
##
## Attaching package: 'plyr'
##
## The following object is masked from 'package:lubridate':
##
## here
EconomicConsTotal <- mutate(EconomicCons, TOTALDMG = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
EconomicConsAggr <- aggregate(EconomicConsTotal$TOTALDMG, by=list(EconomicConsTotal$EVTYPE),FUN=sum)
names(EconomicConsAggr) <- c("EVTYPE","SUMEVENTS")
Economic10 <- (EconomicConsAggr[order(EconomicConsAggr$SUMEVENTS,decreasing = TRUE),])[1:10,]
rownames(Economic10) <- NULL
barplot(Economic10$SUMEVENTS,names.arg = Economic10$EVTYPE,
cex.axis = 0.5,cex.names = 0.5, las = 2)
The top 10 most harmful events with respect to population health are:
Economic10
## EVTYPE SUMEVENTS
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 STORM SURGE 43323541000
## 4 FLASH FLOOD 18243991078
## 5 DROUGHT 15018672000
## 6 HURRICANE 14610229010
## 7 RIVER FLOOD 10148404500
## 8 ICE STORM 8967041360
## 9 TROPICAL STORM 8382236550
## 10 WINTER STORM 6715441251