Most Harmful Severe Weather Events: Tornado and Flood

Data Processing

Software Environment

sessionInfo()

## R version 3.2.2 (2015-08-14)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
## 
## locale:
## [1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 
## [2] LC_CTYPE=Chinese (Traditional)_Taiwan.950   
## [3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
## [4] LC_NUMERIC=C                                
## [5] LC_TIME=Chinese (Traditional)_Taiwan.950    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.2     tools_3.2.2     htmltools_0.2.6
##  [5] yaml_2.1.13     stringi_0.5-5   rmarkdown_0.8   knitr_1.11     
##  [9] stringr_1.0.0   digest_0.6.8    evaluate_0.7.2

Loading Data

NOAA storm database can be downloaded from here.

Please note you must read the file under ASCII encoding, or some characters will be recognized as EOF and data is incompleted.

library(R.utils)  # for bunzip2()

if (!file.exists("repdata_data_StormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  "repdata_data_StormData.csv.bz2")
}
if (!file.exists("repdata_data_StormData.csv")) {
    bunzip2("repdata_data_StormData.csv.bz2", remove=FALSE)
}
storm <- read.csv("repdata_data_StormData.csv", fileEncoding = "ascii")

Since we only care about healthy damage and economic damage, only following columns are preserved, other columns are abandoned.

Column Name	Type	Meaning
EVTYPE	Factor	Event type
FATALITIES	num	Number of fatalities
INJURIES	num	Number of injuries
PROPDMG	num	Property damage
PROPDMGEXP	Factor	Property damage exponent
CROPDMG	num	Crop damage
CROPDMGEXP	logi	Crop damage exponent

library(dplyr)  # for select, arrange, mutate, summarise

storm <- storm %>% select(EVTYPE, FATALITIES, INJURIES,
                        PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Preprocessing Data

The meanings of PROPDMGEXP and CROPDMGEXP explained in the article “How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP”.

PROPDMGEXP or CROPDMGEXP	Multiplying factor
-, ?, +, empty	0
numeric 0~8	10
H,h	100
K,k	1,000
M,m	1,000,000
B,b	1,000,000,000

The original levels of PROPDMGEXP is:

levels(storm$PROPDMGEXP)

##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"

The modified levels of PROPDMGEXP is:

tmpLvl <- levels(storm$PROPDMGEXP)
tmpLvl <- sub("[0-8]", "10", tmpLvl)
tmpLvl <- sub("h|H", "100", tmpLvl)
tmpLvl <- sub("k|K", "1000", tmpLvl)
tmpLvl <- sub("m|M", "1000000", tmpLvl)
tmpLvl <- sub("b|B", "1000000000", tmpLvl)
tmpLvl <- sub("^$|-|\\?|\\+", "0", tmpLvl)
levels(storm$PROPDMGEXP) <- tmpLvl
levels(storm$PROPDMGEXP)

## [1] "0"          "10"         "1000000000" "100"        "1000"      
## [6] "1000000"

The original levels of CROPDMGEXP is:

levels(storm$CROPDMGEXP)

## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

The modified levels of CROPDMGEXP is:

tmpLvl <- levels(storm$CROPDMGEXP)
tmpLvl <- sub("[0-8]", "10", tmpLvl)
tmpLvl <- sub("h|H", "100", tmpLvl)
tmpLvl <- sub("k|K", "1000", tmpLvl)
tmpLvl <- sub("m|M", "1000000", tmpLvl)
tmpLvl <- sub("b|B", "1000000000", tmpLvl)
tmpLvl <- sub("^$|-|\\?|\\+", "0", tmpLvl)
levels(storm$CROPDMGEXP) <- tmpLvl
levels(storm$CROPDMGEXP)

## [1] "0"          "10"         "1000000000" "1000"       "1000000"

Multiply exponent parts and fractional parts.

storm <- storm %>% mutate(
    PROPDMG = PROPDMG * as.numeric(as.character(PROPDMGEXP)),
    CROPDMG = CROPDMG * as.numeric(as.character(CROPDMGEXP))) %>%
    select(-PROPDMGEXP, -CROPDMGEXP)

Summarize Data

For each event type, sum up fatalities, injuries, property damage, and crop damage. Then use tidyr::gather to transform data.frame into long format. ans1 contains fatalities and injuries counts, used for answering first question. ans2 contains property damage and crop damage, used for answering second question.

library(tidyr)  # for gather()

storm2 <- storm %>% group_by(EVTYPE) %>% summarise(
    TOTAL_FATAL = sum(FATALITIES),
    TOTAL_INJURY = sum(INJURIES),
    TOTAL_PROP = sum(PROPDMG),
    TOTLA_CROP = sum(CROPDMG))

ans1 <- storm2 %>%
        arrange(desc(TOTAL_FATAL)) %>%
        select(-TOTAL_PROP,-TOTLA_CROP) %>% slice(1:10) %>%
        gather(TYPE, VALUE, TOTAL_FATAL:TOTAL_INJURY)

ans2 <- storm2 %>%
        arrange(desc(TOTAL_PROP)) %>%
        select(-TOTAL_FATAL,-TOTAL_INJURY) %>% slice(1:10) %>%
        gather(TYPE, VALUE, TOTAL_PROP:TOTLA_CROP)

Most Harmful Severe Weather Events: Tornado and Flood

Ping Chu Hung

2015-09-13

Synopsis

Data Processing

Software Environment

Loading Data

Preprocessing Data

Summarize Data

Results

Across the United States, which types of events are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?