Data source

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Questions

Your data analysis must address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Download and processing data

if(!file.exists("stormdata.bz2")){
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormdata.bz2", method="curl")
}

StormData <- read.csv(bzfile("stormdata.bz2"), sep=",", header=T)

At first we change time format from the BGN_DATE variable by removing time component.

library("lubridate")
StormData$DATE <- gsub(" 0:00:00", "", StormData$BGN_DATE)
StormData$DATE <- strptime(StormData$DATE, "%m/%d/%Y")

Results

1. Across the United States, which types of events are most harmful with respect to population health?

To answer, we’ll need only variables: EVTYPE, FATALITIES and INJURIES. We’ll look at the combined total of FATALITIES and INJURIES by the type of weather event, order it and look at the top ten harmful event types. To find most harmful events sort sum of events in descending order and subset only top 10 results.

Harmful <- StormData[,c("EVTYPE","FATALITIES","INJURIES")]
SumHarmful <- aggregate(Harmful$FATALITIES + Harmful$INJURIES,
    by=list(Harmful$EVTYPE),FUN=sum)
names(SumHarmful) <- c("EVTYPE","SUMEVENTS")
Harmful10 <- (SumHarmful[order(SumHarmful$SUMEVENTS,decreasing = TRUE),])[1:10,]
rownames(Harmful10) <- NULL

 barplot(Harmful10$SUMEVENT,names.arg = Harmful10$EVTYPE, cex.axis = 0.6,cex.names = 0.5, las = 2)

The top 10 most harmful events with respect to population health are:

Harmful10
##               EVTYPE SUMEVENTS
## 1            TORNADO     96979
## 2     EXCESSIVE HEAT      8428
## 3          TSTM WIND      7461
## 4              FLOOD      7259
## 5          LIGHTNING      6046
## 6               HEAT      3037
## 7        FLASH FLOOD      2755
## 8          ICE STORM      2064
## 9  THUNDERSTORM WIND      1621
## 10      WINTER STORM      1527

2. Across the United States, which types of events have the greatest economic consequences?

To answer, we’ll need only variables: EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,ROPDMGEXP.

EconomicCons <- StormData[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

To get the real values of the damage, it is required to multiply the -DMG variables with the exponent variables i.e. the -DMGEXP values. CROPDMGEXP and PROPDMGEXP are showing the units in which damage is expressed: h - hundred, k - thousand, m - milion, b - bilion. Unknown values are recoded as 1.

library(car)

EconomicCons$PROPDMGEXP <- as.numeric(Recode(as.character(EconomicCons$PROPDMGEXP),"'K'=10^3;'M'=10^6;''=1;'B'=10^9;'+'=1;'0'=1;'5'=10^5;'6'=10^6; '?'=1;'4'=10^4;'2'=10^2;'3'=10^3;'h'=10^2;'7'=10^7;'H'=10^2;'-'=1;'8'=10^8"))
## Warning: NAs introduced by coercion
EconomicCons$CROPDMGEXP <- as.numeric(Recode(as.character(EconomicCons$CROPDMGEXP),"''=1;
    'M'=10^6;'K'=10^3;'m'=10^6;'B'=10^9;'?'=1;'0'=1;
    'k'=10^3;'2'=10^2"))

The total economic damage is the sum of the property damage and crop damage after multiplying with the exponent variables. We interested only in the top ten event types for economic impact.

library(plyr)
## 
## Attaching package: 'plyr'
## 
## The following object is masked from 'package:lubridate':
## 
##     here
EconomicConsTotal <- mutate(EconomicCons, TOTALDMG = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
EconomicConsAggr <- aggregate(EconomicConsTotal$TOTALDMG, by=list(EconomicConsTotal$EVTYPE),FUN=sum)
names(EconomicConsAggr) <- c("EVTYPE","SUMEVENTS")
Economic10 <- (EconomicConsAggr[order(EconomicConsAggr$SUMEVENTS,decreasing = TRUE),])[1:10,]
rownames(Economic10) <- NULL
barplot(Economic10$SUMEVENTS,names.arg = Economic10$EVTYPE,
        cex.axis = 0.5,cex.names = 0.5, las = 2)

The top 10 most harmful events with respect to population health are:

Economic10
##               EVTYPE    SUMEVENTS
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3        STORM SURGE  43323541000
## 4        FLASH FLOOD  18243991078
## 5            DROUGHT  15018672000
## 6          HURRICANE  14610229010
## 7        RIVER FLOOD  10148404500
## 8          ICE STORM   8967041360
## 9     TROPICAL STORM   8382236550
## 10      WINTER STORM   6715441251