U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The basic goal of this research is to explore the NOAA Storm Database and answer two questions about severe weather events.
We are downloading data from URL provided by Reproducible Research course on Coursera.
if(!file.exists("repdata-data-StormData.csv.bz2")) {
file <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file, destfile = "repdata-data-StormData.csv.bz2")
file.name <- "repdata-data-StormData.csv.bz2"
}
data<-read.csv(bzfile("repdata-data-StormData.csv.bz2"))
Assume that dplyr package is installed in your R environment.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
We need several columns from original data. Column with date of event should be converted to POSIXct format.
data <- select(data, BGN_DATE, EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
data$BGN_DATE <- as.POSIXct(data$BGN_DATE, format="%m/%d/%Y %H:%M:%S")
Columns PROPDMGEXP and CROPDMGEXP contains multipliers for value in PROPDMG and CROPDMG, so we need to convert values.
Available multipliers are
unique(data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
unique(data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
Let’s convert values. We’re adding two new columns with exact value of multipliers, then multiplying it with value in PROPDMG and CROPDMG columns.
data<-mutate(data, PROPDMG_MULT = ifelse(toupper(PROPDMGEXP) == 'B',1E9, ifelse(toupper(PROPDMGEXP) == 'K', 1E3, ifelse(toupper(PROPDMGEXP) == 'M', 1E6, ifelse(toupper(PROPDMGEXP)=="H",1e2, ifelse(is.numeric(PROPDMGEXP), 10^as.numeric(PROPDMGEXP), 1))))))
data<-mutate(data, CROPDMG_MULT = ifelse(toupper(CROPDMGEXP) == 'B',1E9, ifelse(toupper(CROPDMGEXP) == 'K', 1E3, ifelse(toupper(CROPDMGEXP) == 'M', 1E6, ifelse(toupper(CROPDMGEXP)=="H",1e2, ifelse(is.numeric(CROPDMGEXP), 10^as.numeric(CROPDMGEXP), 1))))))
data<-mutate(data, PROPDMG_TOTAL = PROPDMG*PROPDMG_MULT, CROPDMG_TOTAL = CROPDMG*CROPDMG_MULT)
We are grouping rows, then summarising total fatalities and injuries by event type.
answer1<-group_by(data,EVTYPE)
answer1sum<-summarise(answer1, TotalFatalities=sum(FATALITIES), TotalInjuries=sum(INJURIES))
answer1sum<-filter(answer1sum, TotalFatalities>0 | TotalInjuries>0)
answer2<-group_by(data,EVTYPE)
answer2sum<-summarise(answer1, TotalProp=sum(PROPDMG_TOTAL)/1e9, TotalCrop=sum(CROPDMG_TOTAL)/1e9)
answer2sum<-filter(answer2sum, TotalProp>0 | TotalCrop>0)
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
In this section we will answer two questions
Let’s look at graphics showing top 10 most fatal storm event types and top 10 most injurious:
par(mfrow = c(1, 2),
mar = c(12,5,2,1),
mgp = c(2, 1, 0),
cex = 0.8,
font= 1
)
answer1print<-arrange(answer1sum, desc(TotalFatalities))
answer1print = answer1print[1:10,]
barplot(answer1print$TotalFatalities, names.arg=answer1print$EVTYPE, main="Top 10 Fatal Storm Types", ylab="number of fatalities", las=3, width=1)
answer1print<-arrange(answer1sum, desc(TotalInjuries))
answer1print = answer1print[1:10,]
barplot(answer1print$TotalInjuries, names.arg=answer1print$EVTYPE, main="Top 10 Injurious Storm Types", ylab="number of injuries", las=3, width=1)
par(mfrow = c(1, 2),
mar = c(12,5,2,1),
mgp = c(2, 1, 0),
cex = 0.8,
font= 1
)
answer2print<-arrange(answer2sum, desc(TotalProp))
answer2print = answer2print[1:10,]
barplot(answer2print$TotalProp, names.arg=answer2print$EVTYPE, main="Top 10 Storm Types by Property Damage", ylab="US Dollars, Billions", las=3, width=1)
answer2print<-arrange(answer2sum, desc(TotalCrop))
answer2print = answer2print[1:10,]
barplot(answer2print$TotalCrop, names.arg=answer2print$EVTYPE, main="Top 10 Storm Types by Crop Damage", ylab="US Dollars, Billions", las=3, width=1)
Tornado is the most dangerous storm event, causing the greatest number of fatalities and injuries. Excessive heat is #2 in ranking by fatalities and #4 in ranking by injuries whereas Thunderstorm Wind is next after Tornado in ranking by injuries.
Flood causes most Property damages and is #2 in ranking of Crop damages. Drought is #1 reason of Crop damages.