This document uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities/injuries, and property/crop damage. We are examining the most harmful types of storm events in terms of fatalities/injuries and economic impact.
Coursera Reproducible Research Week 4: Case Studies & Commentaries: Introduciton
Assignment
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database (1950-2011)
Download the data file, and place it in “./data” folder.
library(reshape2)
library(ggplot2)
## Load data
myfile <- "./data/repdata%2Fdata%2FStormData.csv.bz2"
mydata <- read.csv(myfile, stringsAsFactors=FALSE)
## Observe data
dim(mydata)
## [1] 902297 37
names(mydata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
## Aggregate fatalities and injuries
fata <- aggregate(FATALITIES ~ EVTYPE, mydata, sum)
inju <- aggregate(INJURIES ~ EVTYPE, mydata, sum)
## Merge and calculate the total numbers
fata.inju <- merge(fata, inju, by = "EVTYPE")
fata.inju$TOTAL <- fata.inju$FATALITIES + fata.inju$INJURIES
## Order by total number descending
fata.inju <- fata.inju[order(-fata.inju$TOTAL), ]
## To display TOTAL top 10
fata.inju <- melt(fata.inju[1:10,], id = c("EVTYPE", "TOTAL"))
## Make it to be displayed in TOTAL order
fata.inju$EVTYPE <- factor(fata.inju$EVTYPE,
levels = fata.inju$EVTYPE[order(-fata.inju$TOTAL)])
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
fata.inju
## EVTYPE TOTAL variable value
## 1 TORNADO 96979 FATALITIES 5633
## 2 EXCESSIVE HEAT 8428 FATALITIES 1903
## 3 TSTM WIND 7461 FATALITIES 504
## 4 FLOOD 7259 FATALITIES 470
## 5 LIGHTNING 6046 FATALITIES 816
## 6 HEAT 3037 FATALITIES 937
## 7 FLASH FLOOD 2755 FATALITIES 978
## 8 ICE STORM 2064 FATALITIES 89
## 9 THUNDERSTORM WIND 1621 FATALITIES 133
## 10 WINTER STORM 1527 FATALITIES 206
## 11 TORNADO 96979 INJURIES 91346
## 12 EXCESSIVE HEAT 8428 INJURIES 6525
## 13 TSTM WIND 7461 INJURIES 6957
## 14 FLOOD 7259 INJURIES 6789
## 15 LIGHTNING 6046 INJURIES 5230
## 16 HEAT 3037 INJURIES 2100
## 17 FLASH FLOOD 2755 INJURIES 1777
## 18 ICE STORM 2064 INJURIES 1975
## 19 THUNDERSTORM WIND 1621 INJURIES 1488
## 20 WINTER STORM 1527 INJURIES 1321
ggplot(data = fata.inju,
aes(x = EVTYPE, y = value, fill = variable)) +
geom_bar(stat = "identity") +
ggtitle("Top 10 Harmful Events to Population Health") +
labs(x = "Type of Event", y = "Number") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position=c(0.8, 0.8), legend.title=element_blank()) +
scale_fill_discrete(labels=c("Fatalities", "Injuries"))
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
Across the United States, which types of events have the greatest economic consequences?
## Observe property damage and crop damage data
table(mydata$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(mydata$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
## Subset data
dmg <- mydata[, c("PROPDMG","PROPDMGEXP",
"CROPDMG","CROPDMGEXP",
"EVTYPE"),]
These are possible values of CROPDMGEXP and PROPDMGEXP:
H,h,K,k,M,m,B,b,+,-,?,0,1,2,3,4,5,6,7,8, and blank-character
## Covernt property damage data exponent
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == " ")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "+")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "-")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "?")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "0")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "1")] <- 10
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "2")] <- 100
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "3")] <- 1e3
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "4")] <- 1e4
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "5")] <- 1e5
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "6")] <- 1e6
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "7")] <- 1e7
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "8")] <- 1e8
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "h") | (dmg$PROPDMGEXP == "H")] <- 100
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "k") | (dmg$PROPDMGEXP == "K")] <- 1e3
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "m") | (dmg$PROPDMGEXP == "M")] <- 1e6
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "b") | (dmg$PROPDMGEXP == "B")] <- 1e9
dmg$PROPDMGEXP <- as.numeric(dmg$PROPDMGEXP)
table(dmg$PROPDMGEXP)
##
## 10 100 1000 10000 1e+05 1e+06 1e+07 1e+08 1e+09
## 255 20 424669 4 28 11341 5 1 40
## Covernt corp damage data exponent
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == " ")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "+")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "-")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "?")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "0")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "1")] <- 10
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "2")] <- 100
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "3")] <- 1e3
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "4")] <- 1e4
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "5")] <- 1e5
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "6")] <- 1e6
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "7")] <- 1e7
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "8")] <- 1e8
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "h") | (dmg$CROPDMGEXP == "H")] <- 100
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "k") | (dmg$CROPDMGEXP == "K")] <- 1e3
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "m") | (dmg$CROPDMGEXP == "M")] <- 1e6
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "b") | (dmg$CROPDMGEXP == "B")] <- 1e9
dmg$CROPDMGEXP <- as.numeric(dmg$CROPDMGEXP)
table(dmg$CROPDMGEXP)
##
## 10 100 1000 1e+06 1e+09
## 26 1 281853 1995 9
## Convert data to numeric form
dmg$PROPDMG <- dmg$PROPDMG * dmg$PROPDMGEXP
dmg$CROPDMG <- dmg$CROPDMG * dmg$CROPDMGEXP
## Aggregate property and crop damage data
propdmg <- aggregate(PROPDMG ~ EVTYPE, dmg, sum)
cropdmg <- aggregate(CROPDMG ~ EVTYPE, dmg, sum)
totaldmg <- merge(propdmg, cropdmg, by = "EVTYPE")
## Calculate total damage
totaldmg$TOTAL <- totaldmg$PROPDMG + totaldmg$CROPDMG
## Order data by TOTAL descending
totaldmg <- totaldmg[order(-totaldmg$TOTAL), ]
## To display TOTAL top 10
totaldmg <- melt(totaldmg[1:10,], id = c("EVTYPE", "TOTAL"))
## Make it to be displayed in TOTAL descending order
totaldmg$EVTYPE <- factor(totaldmg$EVTYPE,
levels = totaldmg$EVTYPE[order(-totaldmg$TOTAL)])
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Set Magnitude to Billion
totaldmg$TOTAL <- totaldmg$TOTAL / 1e9
totaldmg$value <- totaldmg$value / 1e9
totaldmg
## EVTYPE TOTAL variable value
## 1 FLOOD 150.319678 PROPDMG 144.6577098
## 2 HURRICANE/TYPHOON 71.913713 PROPDMG 69.3058400
## 3 TORNADO 57.362337 PROPDMG 56.9473824
## 4 STORM SURGE 43.323541 PROPDMG 43.3235360
## 5 HAIL 18.761224 PROPDMG 15.7352696
## 6 FLASH FLOOD 18.243993 PROPDMG 16.8226761
## 7 DROUGHT 15.018672 PROPDMG 1.0461060
## 8 HURRICANE 14.610229 PROPDMG 11.8683190
## 9 RIVER FLOOD 10.148404 PROPDMG 5.1189455
## 10 ICE STORM 8.967042 PROPDMG 3.9449283
## 11 FLOOD 150.319678 CROPDMG 5.6619684
## 12 HURRICANE/TYPHOON 71.913713 CROPDMG 2.6078728
## 13 TORNADO 57.362337 CROPDMG 0.4149547
## 14 STORM SURGE 43.323541 CROPDMG 0.0000050
## 15 HAIL 18.761224 CROPDMG 3.0259547
## 16 FLASH FLOOD 18.243993 CROPDMG 1.4213171
## 17 DROUGHT 15.018672 CROPDMG 13.9725660
## 18 HURRICANE 14.610229 CROPDMG 2.7419100
## 19 RIVER FLOOD 10.148404 CROPDMG 5.0294590
## 20 ICE STORM 8.967042 CROPDMG 5.0221135
ggplot(data = totaldmg,
aes(x = EVTYPE, y = value, fill = variable)) +
geom_bar(stat = "identity") +
ggtitle("Top 10 Events with Largest Economic Impact") +
labs(x = "Type of Event", y = "Total Cost (Billion USD)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position=c(0.8, 0.8), legend.title=element_blank()) +
scale_fill_discrete(labels=c("Property Damage", "Crop Damage"))
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated