0. Synopsis

This document uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities/injuries, and property/crop damage. We are examining the most harmful types of storm events in terms of fatalities/injuries and economic impact.

1. Introduction

1.1. Project

Coursera Reproducible Research Week 4: Case Studies & Commentaries: Introduciton

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

1.2. Data Resourse

U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database (1950-2011)

2. Data Preparation

Download the data file, and place it in “./data” folder.

library(reshape2)
library(ggplot2)

## Load data
myfile <- "./data/repdata%2Fdata%2FStormData.csv.bz2"
mydata <- read.csv(myfile, stringsAsFactors=FALSE)
## Observe data
dim(mydata)
## [1] 902297     37
names(mydata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

3. Question 1

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

3.1. Data Processing

## Aggregate fatalities and injuries
fata <- aggregate(FATALITIES ~ EVTYPE, mydata, sum)
inju <- aggregate(INJURIES ~ EVTYPE, mydata, sum)
## Merge and calculate the total numbers
fata.inju <- merge(fata, inju, by = "EVTYPE")
fata.inju$TOTAL <- fata.inju$FATALITIES + fata.inju$INJURIES
## Order by total number descending
fata.inju <- fata.inju[order(-fata.inju$TOTAL), ]
## To display TOTAL top 10
fata.inju <- melt(fata.inju[1:10,], id = c("EVTYPE", "TOTAL"))
## Make it to be displayed in TOTAL order
fata.inju$EVTYPE <- factor(fata.inju$EVTYPE,
                           levels = fata.inju$EVTYPE[order(-fata.inju$TOTAL)])
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

3.2. Result

fata.inju
##               EVTYPE TOTAL   variable value
## 1            TORNADO 96979 FATALITIES  5633
## 2     EXCESSIVE HEAT  8428 FATALITIES  1903
## 3          TSTM WIND  7461 FATALITIES   504
## 4              FLOOD  7259 FATALITIES   470
## 5          LIGHTNING  6046 FATALITIES   816
## 6               HEAT  3037 FATALITIES   937
## 7        FLASH FLOOD  2755 FATALITIES   978
## 8          ICE STORM  2064 FATALITIES    89
## 9  THUNDERSTORM WIND  1621 FATALITIES   133
## 10      WINTER STORM  1527 FATALITIES   206
## 11           TORNADO 96979   INJURIES 91346
## 12    EXCESSIVE HEAT  8428   INJURIES  6525
## 13         TSTM WIND  7461   INJURIES  6957
## 14             FLOOD  7259   INJURIES  6789
## 15         LIGHTNING  6046   INJURIES  5230
## 16              HEAT  3037   INJURIES  2100
## 17       FLASH FLOOD  2755   INJURIES  1777
## 18         ICE STORM  2064   INJURIES  1975
## 19 THUNDERSTORM WIND  1621   INJURIES  1488
## 20      WINTER STORM  1527   INJURIES  1321
ggplot(data = fata.inju, 
       aes(x = EVTYPE, y = value, fill = variable)) +
    geom_bar(stat = "identity") +
    ggtitle("Top 10 Harmful Events to Population Health") +
    labs(x = "Type of Event", y = "Number") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position=c(0.8, 0.8), legend.title=element_blank()) +
    scale_fill_discrete(labels=c("Fatalities", "Injuries"))
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

4. Question 2

Across the United States, which types of events have the greatest economic consequences?

4.1. Data Processing, Part 1

## Observe property damage and crop damage data
table(mydata$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(mydata$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
## Subset data
dmg <- mydata[, c("PROPDMG","PROPDMGEXP",
                  "CROPDMG","CROPDMGEXP",
                  "EVTYPE"),]

These are possible values of CROPDMGEXP and PROPDMGEXP:

H,h,K,k,M,m,B,b,+,-,?,0,1,2,3,4,5,6,7,8, and blank-character

  • H,h = hundred = 100
  • K,k = kilo = thousand = 1,000
  • M,m = million = 1,000,000
  • B,b = billion = 1,000,000,000
  • (+) = 1
  • (-) = 1
  • (?) = 1
  • black/empty character = 1
  • numeric 0…8 = 10^0…8
## Covernt property damage data exponent
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == " ")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "+")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "-")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "?")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "0")] <- 1
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "1")] <- 10
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "2")] <- 100
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "3")] <- 1e3
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "4")] <- 1e4
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "5")] <- 1e5
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "6")] <- 1e6
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "7")] <- 1e7
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "8")] <- 1e8
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "h") | (dmg$PROPDMGEXP == "H")] <- 100
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "k") | (dmg$PROPDMGEXP == "K")] <- 1e3
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "m") | (dmg$PROPDMGEXP == "M")] <- 1e6
dmg$PROPDMGEXP[(dmg$PROPDMGEXP == "b") | (dmg$PROPDMGEXP == "B")] <- 1e9
dmg$PROPDMGEXP <- as.numeric(dmg$PROPDMGEXP)
table(dmg$PROPDMGEXP)
## 
##     10    100   1000  10000  1e+05  1e+06  1e+07  1e+08  1e+09 
##    255     20 424669      4     28  11341      5      1     40
## Covernt corp damage data exponent
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == " ")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "+")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "-")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "?")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "0")] <- 1
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "1")] <- 10
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "2")] <- 100
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "3")] <- 1e3
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "4")] <- 1e4
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "5")] <- 1e5
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "6")] <- 1e6
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "7")] <- 1e7
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "8")] <- 1e8
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "h") | (dmg$CROPDMGEXP == "H")] <- 100
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "k") | (dmg$CROPDMGEXP == "K")] <- 1e3
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "m") | (dmg$CROPDMGEXP == "M")] <- 1e6
dmg$CROPDMGEXP[(dmg$CROPDMGEXP == "b") | (dmg$CROPDMGEXP == "B")] <- 1e9
dmg$CROPDMGEXP <- as.numeric(dmg$CROPDMGEXP)
table(dmg$CROPDMGEXP)
## 
##     10    100   1000  1e+06  1e+09 
##     26      1 281853   1995      9

4.2. Data Process, Part 2

## Convert data to numeric form
dmg$PROPDMG <- dmg$PROPDMG * dmg$PROPDMGEXP
dmg$CROPDMG <- dmg$CROPDMG * dmg$CROPDMGEXP
## Aggregate property and crop damage data
propdmg <- aggregate(PROPDMG ~ EVTYPE, dmg, sum)
cropdmg <- aggregate(CROPDMG ~ EVTYPE, dmg, sum)
totaldmg <- merge(propdmg, cropdmg, by = "EVTYPE")
## Calculate total damage
totaldmg$TOTAL <- totaldmg$PROPDMG + totaldmg$CROPDMG
## Order data by TOTAL descending
totaldmg <- totaldmg[order(-totaldmg$TOTAL), ]
## To display TOTAL top 10
totaldmg <- melt(totaldmg[1:10,], id = c("EVTYPE", "TOTAL"))
## Make it to be displayed in TOTAL descending order
totaldmg$EVTYPE <- factor(totaldmg$EVTYPE,  
                           levels = totaldmg$EVTYPE[order(-totaldmg$TOTAL)])
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Set Magnitude to Billion
totaldmg$TOTAL <- totaldmg$TOTAL / 1e9
totaldmg$value <- totaldmg$value / 1e9

4.3. Result

totaldmg
##               EVTYPE      TOTAL variable       value
## 1              FLOOD 150.319678  PROPDMG 144.6577098
## 2  HURRICANE/TYPHOON  71.913713  PROPDMG  69.3058400
## 3            TORNADO  57.362337  PROPDMG  56.9473824
## 4        STORM SURGE  43.323541  PROPDMG  43.3235360
## 5               HAIL  18.761224  PROPDMG  15.7352696
## 6        FLASH FLOOD  18.243993  PROPDMG  16.8226761
## 7            DROUGHT  15.018672  PROPDMG   1.0461060
## 8          HURRICANE  14.610229  PROPDMG  11.8683190
## 9        RIVER FLOOD  10.148404  PROPDMG   5.1189455
## 10         ICE STORM   8.967042  PROPDMG   3.9449283
## 11             FLOOD 150.319678  CROPDMG   5.6619684
## 12 HURRICANE/TYPHOON  71.913713  CROPDMG   2.6078728
## 13           TORNADO  57.362337  CROPDMG   0.4149547
## 14       STORM SURGE  43.323541  CROPDMG   0.0000050
## 15              HAIL  18.761224  CROPDMG   3.0259547
## 16       FLASH FLOOD  18.243993  CROPDMG   1.4213171
## 17           DROUGHT  15.018672  CROPDMG  13.9725660
## 18         HURRICANE  14.610229  CROPDMG   2.7419100
## 19       RIVER FLOOD  10.148404  CROPDMG   5.0294590
## 20         ICE STORM   8.967042  CROPDMG   5.0221135
ggplot(data = totaldmg, 
       aes(x = EVTYPE, y = value, fill = variable)) +
    geom_bar(stat = "identity") +
    ggtitle("Top 10 Events with Largest Economic Impact") +
    labs(x = "Type of Event", y = "Total Cost (Billion USD)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position=c(0.8, 0.8), legend.title=element_blank()) +
    scale_fill_discrete(labels=c("Property Damage", "Crop Damage"))
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated