Synopsis

We analysis the Storm Data from National Weather Service, and try to determine the severity of consequences of each type of storm. We use scripts to label the records with the proper event type, compute the total damage, and reveal the most dangerous storm types to population health and to our economy respectively.

Data Processing

We download the data from link listed on the course project page,

fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"
if (!file.exists(destfile)){
    download.file(fileurl, destfile )
}

and read the data into R.

dat <- read.csv("StormData.csv.bz2")

The event type in the original data is quite dirty and noisy. In order to better access the consequences of different event types, we need to clean up the data and assign the events to their proper types.

According to the National Weather Service, only the types listed in Table 2.1.1 of Storm Data Documentation are allowed. We copy and store the proper event types in ‘eventTypes.txt’. Here are the proper event types:

properTypes <- toupper(readLines("eventTypes.txt"))
print(properTypes)
##  [1] "ASTRONOMICAL LOW TIDE"    "AVALANCHE"               
##  [3] "BLIZZARD"                 "COASTAL FLOOD"           
##  [5] "COLD/WIND CHILL"          "DEBRIS FLOW"             
##  [7] "DENSE FOG"                "DENSE SMOKE"             
##  [9] "DROUGHT"                  "DUST DEVIL"              
## [11] "DUST STORM"               "EXCESSIVE HEAT"          
## [13] "EXTREME COLD/WIND CHILL"  "FLASH FLOOD"             
## [15] "FLOOD"                    "FROST/FREEZE"            
## [17] "FUNNEL CLOUD"             "FREEZING FOG"            
## [19] "HAIL"                     "HEAT"                    
## [21] "HEAVY RAIN"               "HEAVY SNOW"              
## [23] "HIGH SURF"                "HIGH WIND"               
## [25] "HURRICANE (TYPHOON)"      "ICE STORM"               
## [27] "LAKE-EFFECT SNOW"         "LAKESHORE FLOOD"         
## [29] "LIGHTNING"                "MARINE HAIL"             
## [31] "MARINE HIGH WIND"         "MARINE STRONG WIND"      
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"             
## [35] "SEICHE"                   "SLEET"                   
## [37] "STORM SURGE/TIDE"         "STRONG WIND"             
## [39] "THUNDERSTORM WIND"        "TORNADO"                 
## [41] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [43] "TSUNAMI"                  "VOLCANIC ASH"            
## [45] "WATERSPOUT"               "WILDFIRE"                
## [47] "WINTER STORM"             "WINTER WEATHER"

We first try to match the events where dat$EVTYPE contains the proper type name.

dat$event.type = NA
dat$UPEVTYPE = toupper(dat$EVTYPE)
for (typename in properTypes){
    dat$event.type[grep(typename, dat$UPEVTYPE)] <- typename
}

By inspecting the records that are not yet assigned a proper type and the records whose proper event type was overwritten by another, we defined a table that associate the EVTYPE with its proper name, and store it in additionalTypes.txt.

additionalTypes = read.table("additionalTypes.txt", sep = ',')
additionalTypes$V2 <- as.character(additionalTypes$V2)
print(additionalTypes)
##                          V1                       V2
## 1                 TSTM WIND        THUNDERSTORM WIND
## 2                 HURRICANE      HURRICANE (TYPHOON)
## 3                   TYPHOON      HURRICANE (TYPHOON)
## 4                      COLD          COLD/WIND CHILL
## 5               FOREST FIRE                 WILDFIRE
## 6                 LANDSLIDE              DEBRIS FLOW
## 7          MARINE TSTM WIND MARINE THUNDERSTORM WIND
## 8                 WINDCHILL          COLD/WIND CHILL
## 9                       FLD                    FLOOD
## 10                    FROST             FROST/FREEZE
## 11                   FREEZE             FROST/FREEZE
## 12                MUDSLIDES              DEBRIS FLOW
## 13 MARINE THUNDERSTORM WIND MARINE THUNDERSTORM WIND
## 14         MARINE TSTM WIND MARINE THUNDERSTORM WIND
## 15              FLASH FLOOD              FLASH FLOOD
## 16             EXTREME COLD  EXTREME COLD/WIND CHILL
for (i in seq(additionalTypes$V1)){
    dat$event.type[grep(additionalTypes$V1[i], dat$UPEVTYPE)] <- additionalTypes$V2[i]
}

After these operations, the fraction of records that are not yet assigned a proper type is:

mean(is.na(dat$event.type))
## [1] 0.005364087

Therefore we can safely say that they do not contribute significantly to our results.

We then calculate the damage to populuation health (fatalities and injuries) for each event type.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
mytab <- dat %>% filter(!is.na(event.type)) %>% group_by(event.type) %>% summarise_each(funs(sum), fatilities = FATALITIES, injuries = INJURIES) %>% arrange(-(fatilities + injuries))
top10 <- mytab[1:10,]
top10$event.type = factor(top10$event.type, top10$event.type)
top10 <- gather(top10, variable, value, - event.type)

And we calculate the economic consequences (PROPDMG + CROPDMG)

mytab2 <- dat %>% filter(!is.na(event.type)) %>% group_by(event.type) %>% summarise_each(funs(sum), propdmg = PROPDMG, cropdmg = CROPDMG) %>% arrange(-(propdmg + cropdmg))
eco10 <- mytab2[1:10,]
eco10$event.type = factor(eco10$event.type, eco10$event.type)
eco10 <- gather(eco10, variable, value, - event.type)

Results

Here we plot the top 10 most dangerous storm types for population health.

library(ggplot2)
g <- ggplot(top10, aes(x=event.type, y=value, fill=variable )) + geom_bar(stat = "identity",position="stack") + theme(axis.text.x = element_text(angle = 30, hjust = 1))+ labs(x = "Storm Type", y = "Fatilities and Injuries", title = "Damage to Population Health by Storm Event Types")
print(g)

From the figure it is obvious that tornados are by far the most dangerous to human for their high injury and fatility counts.

Here we plot the top 10 most costing storm types.

library(ggplot2)
g2 <- ggplot(eco10, aes(x=event.type, y=value, fill=variable )) + geom_bar(stat = "identity",position="stack") + theme(axis.text.x = element_text(angle = 30, hjust = 1))+ labs(x = "Storm Type", y = "Economic Damage", title = "Economic consequenses by Storm Event Types")
print(g2)

Here we see that tornados remain the most dangerous economically, with thunderstorm winds and flash flood following closely behind. Hails brings the most crop damage.

Contact me

If you find any mistakes/typos/grammer issues, feel free to leave a comment!