Synopsis

U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this research is to explore the NOAA Storm Database and answer two questions about severe weather events.

Data Processing

Downloading and importing data

We are downloading data from URL provided by Reproducible Research course on Coursera.

if(!file.exists("repdata-data-StormData.csv.bz2")) {
        file <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download.file(file, destfile = "repdata-data-StormData.csv.bz2")
        file.name <- "repdata-data-StormData.csv.bz2"
}

data<-read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Assume that dplyr package is installed in your R environment.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Processing data

We need several columns from original data. Column with date of event should be converted to POSIXct format.

data <- select(data, BGN_DATE, EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
data$BGN_DATE <- as.POSIXct(data$BGN_DATE, format="%m/%d/%Y %H:%M:%S")

Columns PROPDMGEXP and CROPDMGEXP contains multipliers for value in PROPDMG and CROPDMG, so we need to convert values.
Available multipliers are

unique(data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
  • h is 100
  • K or k - 1,000
  • M or m - 1,000,000
  • B - 1,000,000,000
  • [0-8] - 10 ^ [0-8]

Let’s convert values. We’re adding two new columns with exact value of multipliers, then multiplying it with value in PROPDMG and CROPDMG columns.

data<-mutate(data, PROPDMG_MULT = ifelse(toupper(PROPDMGEXP) == 'B',1E9, ifelse(toupper(PROPDMGEXP) == 'K', 1E3, ifelse(toupper(PROPDMGEXP) == 'M', 1E6, ifelse(toupper(PROPDMGEXP)=="H",1e2, ifelse(is.numeric(PROPDMGEXP), 10^as.numeric(PROPDMGEXP), 1))))))

data<-mutate(data, CROPDMG_MULT = ifelse(toupper(CROPDMGEXP) == 'B',1E9, ifelse(toupper(CROPDMGEXP) == 'K', 1E3, ifelse(toupper(CROPDMGEXP) == 'M', 1E6, ifelse(toupper(CROPDMGEXP)=="H",1e2, ifelse(is.numeric(CROPDMGEXP), 10^as.numeric(CROPDMGEXP), 1))))))

data<-mutate(data, PROPDMG_TOTAL = PROPDMG*PROPDMG_MULT, CROPDMG_TOTAL = CROPDMG*CROPDMG_MULT)

We are grouping rows, then summarising total fatalities and injuries by event type.

answer1<-group_by(data,EVTYPE)
answer1sum<-summarise(answer1, TotalFatalities=sum(FATALITIES), TotalInjuries=sum(INJURIES))
answer1sum<-filter(answer1sum, TotalFatalities>0 | TotalInjuries>0)

answer2<-group_by(data,EVTYPE)
answer2sum<-summarise(answer1, TotalProp=sum(PROPDMG_TOTAL)/1e9, TotalCrop=sum(CROPDMG_TOTAL)/1e9)
answer2sum<-filter(answer2sum, TotalProp>0 | TotalCrop>0)

Results

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

In this section we will answer two questions

Across the United States, which types of events are most harmful with respect to population health?

Let’s look at graphics showing top 10 most fatal storm event types and top 10 most injurious:

par(mfrow = c(1, 2),
    mar = c(12,5,2,1),
    mgp = c(2, 1, 0),
    cex = 0.8,
    font= 1
) 

answer1print<-arrange(answer1sum, desc(TotalFatalities))
answer1print = answer1print[1:10,]

barplot(answer1print$TotalFatalities, names.arg=answer1print$EVTYPE, main="Top 10 Fatal Storm Types", ylab="number of fatalities", las=3, width=1)

answer1print<-arrange(answer1sum, desc(TotalInjuries))
answer1print = answer1print[1:10,]

barplot(answer1print$TotalInjuries, names.arg=answer1print$EVTYPE, main="Top 10 Injurious Storm Types", ylab="number of injuries", las=3, width=1)

Across the United States, which types of events have the greatest economic consequences?

par(mfrow = c(1, 2),
    mar = c(12,5,2,1),
    mgp = c(2, 1, 0),
    cex = 0.8,
    font= 1
) 

answer2print<-arrange(answer2sum, desc(TotalProp))
answer2print = answer2print[1:10,]

barplot(answer2print$TotalProp, names.arg=answer2print$EVTYPE, main="Top 10 Storm Types by Property Damage", ylab="US Dollars, Billions", las=3, width=1)

answer2print<-arrange(answer2sum, desc(TotalCrop))
answer2print = answer2print[1:10,]

barplot(answer2print$TotalCrop, names.arg=answer2print$EVTYPE, main="Top 10 Storm Types by Crop Damage", ylab="US Dollars, Billions", las=3, width=1)

Conclusion

Tornado is the most dangerous storm event, causing the greatest number of fatalities and injuries. Excessive heat is #2 in ranking by fatalities and #4 in ranking by injuries whereas Thunderstorm Wind is next after Tornado in ranking by injuries.

Flood causes most Property damages and is #2 in ranking of Crop damages. Drought is #1 reason of Crop damages.