Synopsis

The National Weather Service storm dataset recorded the types and impact of severe weather events from 1950 - 2011. This analysis explores the storm dataset and answer 2 questions about these weather events:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

1. Set directory, download and load data.

setwd("/Users/yingjiang/Dropbox/Education/Coursera/data_science_spec/data_science_c5/Projects/Project2")
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              destfile = "stormdata.csv",
              method = "curl")
stormdata <- read.csv("stormdata.csv")

2. Clean the variable EVTYPE.

2.1 Get reference list of “allowed types of storms”.

From the following document:

NATIONAL WEATHER SERVICE INSTRUCTION 10-1605; AUGUST 17, 2007; Operations and Services; Performance, NWSPD 10-16; STORM DATA PREPARATION; Pg 6: Table 1 (Table of allowed event types)

… copied the allowed event types from Table, into Microsoft Word (for Mac 2011, Version 14.1.0(130310)).

Copied the text into Microsoft Excel (for Mac 2011, Version 14.1.0(130310)).

Saved as csv into current folder.

Cleaned the table:

  • removed white space and extraneous letter at the end

  • changed to upper case

eventtype <- read.csv("Eventtype.csv", header = F, stringsAsFactors = F)
eventtype <- substr(eventtype$V1, 1, nchar(eventtype$V1) - 2)
eventtype <- toupper(eventtype)
eventtype
##  [1] "ASTRONOMICAL LOW TIDE"    "AVALANCHE"               
##  [3] "BLIZZARD"                 "COASTAL FLOOD"           
##  [5] "COLD/WIND CHILL"          "DEBRIS FLOW"             
##  [7] "DENSE FOG"                "DENSE SMOKE"             
##  [9] "DROUGHT"                  "DUST DEVIL"              
## [11] "DUST STORM"               "EXCESSIVE HEAT"          
## [13] "EXTREME COLD/WIND CHILL"  "FLASH FLOOD"             
## [15] "FLOOD"                    "FROST/FREEZE"            
## [17] "FUNNEL CLOUD"             "FREEZING FOG"            
## [19] "HAIL"                     "HEAT"                    
## [21] "HEAVY RAIN"               "HEAVY SNOW"              
## [23] "HIGH SURF"                "HIGH WIND"               
## [25] "HURRICANE (TYPHOON)"      "ICE STORM"               
## [27] "LAKE-EFFECT SNOW"         "LAKESHORE FLOOD"         
## [29] "LIGHTNING"                "MARINE HAIL"             
## [31] "MARINE HIGH WIND"         "MARINE STRONG WIND"      
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"             
## [35] "SEICHE"                   "SLEET"                   
## [37] "STORM SURGE/TIDE"         "STRONG WIND"             
## [39] "THUNDERSTORM WIND"        "TORNADO"                 
## [41] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [43] "TSUNAMI"                  "VOLCANIC ASH"            
## [45] "WATERSPOUT"               "WILDFIRE"                
## [47] "WINTER STORM"             "WINTER WEATHER"

There are 48 allowed events.

2.2 Clean EVTYPE column

Attached an empty column to stormdata to later store the tidy events.

stormdata1 <- cbind(stormdata, character(length(stormdata$EVTYPE)))
colnames(stormdata1)[ncol(stormdata1)] <- "EVTYPE_TIDY"
stormdata1$EVTYPE <- factor(toupper(stormdata1$EVTYPE))
stormdata1$EVTYPE_TIDY <- character(length(stormdata1$EVTYPE_TIDY))

Populated rows that correspond exactly to the allowed event types.

for(i in 1:length(eventtype)) {
    stormdata1$EVTYPE_TIDY[stormdata1$EVTYPE == eventtype[i]] <- eventtype[i]
}
sum(stormdata1$EVTYPE_TIDY == "")
## [1] 266946

There are still 266946, out of 902297, cases of unclassified EVTYPE, which didn’t correspond to the 48 allowed events.

NOTE: Certain observations contained keywords that corresponded to multiple allowed-events (e.g. “blizzard” and “heavy snow” are classified differently in the allowed-events list, but could appear in the description for the same observation.) This effect was ignored, and the alphabetically latter occurence (in this case, “heavy snow”) was taken to be the tidy event name.

2.3 Clean the unclassfied EVTYPE.

The unclassified EVTYPE were extracted into a separate vector for clarity.

b <- stormdata1$EVTYPE[stormdata1$EVTYPE_TIDY == ""]

Tidied up these event types further, as much as possible, by inspection. The goal is to:

  1. Remove duplicates.

  2. Correct spelling.

  3. Group similar meanings together.

tofind <- c("(.*)WIND(.*)|(.*)GUST(.*)", # into "STRONG WIND"
            "(.*)DRY(.*)|DRIEST|LOW RAIN|BELOW NORMAL PRECI(.*)", #into "DROUGHT"
            "HOT|HIGH TEMPERATURE", # INTO "HEAT"
            "(.*)WARMTH(.*)|(.*)WARM(.*)", # into "HEAT"
            "LAKE(.*)EFFECT", # into "LAKE-EFFECT SNOW"
            "(.*)SNOW(.*)", # into "HEAVY SNOW"
            "(.*)WET(.*)",
            "(.*)RAIN(.*)",
            "(.*)PRECIPITATION(.*)",
            "(.*)PRECIPATATION(.*)",
            "(.*)SHOWER(.*)", # into "HEAVY RAIN"
            "(.*)FREEZE(.*)|(.*)FREEZING(.*)|LOW TEMP(.*)|(.*)ICE(.*)|(.*)THERMIA(.*)|(.*)FROST(.*)", # into "FROST/FREEZE"
            "(.*)DUST(.*)", # into "DUST STORM"
            "AVALAN(.*)", # into "AVALANCHE"
            "(.*)WIND(.*)CHILL(.*)|(.*)WIND(.*)COLD(.*)", # Into "COLD/WIND CHILL"
            "(.*)CHILL(.*)WIND(.*)|(.*)COLD(.*)WIND(.*)", # Into "COLD/WIND CHILL"
            "(.*)FIRE(.*)", # Into "WILDFIRE"
            "COASTAL(.*)", # INTO "COASTAL FLOOD"
            "(.*)FLOOD(.*)|(.*)URBAN(.*)", # into "FLOOD"
            "(.*)FOG(.*)", # into "FREEZING FOG"
            "(.*)FUNNEL(.*)", # into "FUNNEL CLOUD"
            "(.*)SURF(.*)", # into "HIGH SURF"
            "HIGH WAVES|HIGH TIDES|HIGH SEAS|HIGH SWELLS|ROGUE WAVES|ROUGH SEAS", # into "HIGH SURF"
            "HURRICANE|TYPHOON", # into "HURRICANE (TYPHOON)"
            "LIGHTING|LIGNTNING|LIGHTNING(.*)", # into "LIGHTNING"
            "SMOKE", # into "DENSE SMOKE"
            "STORM SURGE", # into "STORM SURGE/TIDE"
            "TORNDAO|TORNADOS|TORNADOES|TORNADO(.*)", # into "TORNADO"
            "(.*)WATER(.*)SPOUT(.*)|WAYTERSPOUT", # into "WATERSPOUT
            "TSTM(.*)|THUDERSTORM|TUNDERSTORM|(.*)THUNDERSTORM(.*)", # into "THUNDERSTORM WIND". Comprises of "THUNDERSTORMS", "THUNDERSTORMW", "THUNDERSTROM"
            "TROPICAL STORM(.*)", # into "TROPICAL STORM"
            "VOLCANIC(.*)", # into "VOLCANIC ASH"
            "(.*)WINTER(.*)|(.*)WINTRY(.*)|(.*)WINTERY(.*)" # into "WINTER WEATHER"
)

toreplace <- c("STRONG WIND",
               "DROUGHT",
               "HEAT", "HEAT",
               "LAKE-EFFECT SNOW",
               "HEAVY SNOW",
               "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN",
               "FROST/FREEZE",
               "DUST STORM",
               "AVALANCHE",
               "COLD/WIND CHILL", "COLD/WIND CHILL",
               "WILDFIRE",
               "COASTAL FLOOD", "FLOOD",
               "FREEZING FOG",
               "FUNNEL CLOUD",
               "HIGH SURF", "HIGH SURF",
               "HURRICANE (TYPHOON)",
               "LIGHTNING",
               "DENSE SMOKE",
               "STORM SURGE/TIDE",
               "TORNADO",
               "WATERSPOUT",
               "THUNDERSTORM WIND",
               "TROPICAL STORM",
               "VOLCANIC ASH",
               "WINTER WEATHER"
)

for(i in 1:length(tofind)) {
    b <- gsub(tofind[i], toreplace[i], b)
}

Created another empty vector that correspond to the unclassified EVTYPE (all empty entries from EVTYPE_TIDY), and filled it with the corresponding allowed event keywords.

EVTYPE_TIDY_unfilled <- stormdata1$EVTYPE_TIDY[stormdata1$EVTYPE_TIDY == ""]
for(i in 1:length(eventtype)) {
    EVTYPE_TIDY_unfilled[b == eventtype[i]] <- eventtype[i]    
}
stormdata1$EVTYPE_TIDY[which(stormdata1$EVTYPE_TIDY == "")] <- EVTYPE_TIDY_unfilled
levels(factor(stormdata1$EVTYPE_TIDY))
##  [1] ""                         "ASTRONOMICAL LOW TIDE"   
##  [3] "AVALANCHE"                "BLIZZARD"                
##  [5] "COASTAL FLOOD"            "COLD/WIND CHILL"         
##  [7] "DENSE FOG"                "DENSE SMOKE"             
##  [9] "DROUGHT"                  "DUST DEVIL"              
## [11] "DUST STORM"               "EXCESSIVE HEAT"          
## [13] "EXTREME COLD/WIND CHILL"  "FLASH FLOOD"             
## [15] "FLOOD"                    "FREEZING FOG"            
## [17] "FROST/FREEZE"             "FUNNEL CLOUD"            
## [19] "HAIL"                     "HEAT"                    
## [21] "HEAVY RAIN"               "HEAVY SNOW"              
## [23] "HIGH SURF"                "HIGH WIND"               
## [25] "HURRICANE (TYPHOON)"      "ICE STORM"               
## [27] "LAKE-EFFECT SNOW"         "LAKESHORE FLOOD"         
## [29] "LIGHTNING"                "MARINE HAIL"             
## [31] "MARINE HIGH WIND"         "MARINE STRONG WIND"      
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"             
## [35] "SEICHE"                   "SLEET"                   
## [37] "STORM SURGE/TIDE"         "STRONG WIND"             
## [39] "THUNDERSTORM WIND"        "TORNADO"                 
## [41] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [43] "TSUNAMI"                  "VOLCANIC ASH"            
## [45] "WATERSPOUT"               "WILDFIRE"                
## [47] "WINTER STORM"             "WINTER WEATHER"

After this operation, there are still 2848 unclassified observations. Forget about these, as there are entries that did not correspond to identifiable disaster events (Reports, summaries, regular phenomena).

3. Compute total damages to population health and property.

Subsetted data to contain only allowed events (removed empty EVTYPE_TIDY observations).

stormdata2 <- stormdata1[stormdata1$EVTYPE_TIDY != "", ]
stormdata2$EVTYPE_TIDY <- factor(stormdata2$EVTYPE_TIDY)

Health damage: Taking population health damage to be fatalities and injuries, calculated fatalities, injuries and their sums for each type of event.

fatalsum <- as.numeric(by(stormdata2$FATALITIES, stormdata2$EVTYPE_TIDY, sum))
injsum <- as.numeric(by(stormdata2$INJURIES, stormdata2$EVTYPE_TIDY, sum))
health_dmg <- fatalsum + injsum

Property damage: Absolute property damage dollar values were given by PROPDMG * PROPDMGEXP, where PROPDMGEXP give exponents of 10^3, 10^6 and 10^9 for “K”, “M” and “B” respectively. Crop damage was not included, for CROPDMGEXP values are missing.

property_dmg <- numeric(length(stormdata2$PROPDMG))

for(i in 1:length(stormdata2$PROPDMG)) {
    if(stormdata2$PROPDMGEXP[i] == "") {
        property_dmg[i] <- stormdata2$PROPDMG[i] * 1
    }
    if(stormdata2$PROPDMGEXP[i] == "K") {
        property_dmg[i] <- stormdata2$PROPDMG[i] * 1000
    }
    if(stormdata2$PROPDMGEXP[i] == "") {
        property_dmg[i] <- stormdata2$PROPDMG[i] * 1000000
    }
    if(stormdata2$PROPDMGEXP[i] == "") {
        property_dmg[i] <- stormdata2$PROPDMG[i] * 1000000000
    }
}

stormdata2 <- cbind(stormdata2, length(property_dmg))
colnames(stormdata2)[ncol(stormdata2)] <- "PROPDMG_FULL"
stormdata2$PROPDMG_FULL <- property_dmg
propsum <- as.numeric(by(stormdata2$PROPDMG_FULL, stormdata2$EVTYPE_TIDY, sum))

Combined health and property damage into a single dataframe.

est_dmg <- as.data.frame(cbind(levels(stormdata2$EVTYPE_TIDY),
                               fatalsum, injsum, health_dmg,
                               propsum))
colnames(est_dmg) <- c("Storm_type", "Fatalities", "Injuries", "Total_health_damage", "Total_property_dmg")
est_dmg$Storm_type <- factor(tolower(est_dmg$Storm_type)) # Note: can't plot "character" as x-axis
est_dmg$Fatalities <- as.numeric(as.character(est_dmg$Fatalities))
est_dmg$Injuries <- as.numeric(as.character(est_dmg$Injuries))
est_dmg$Total_health_damage <- as.numeric(as.character(est_dmg$Total_health_damage))
est_dmg$Total_property_dmg <- as.numeric(as.character(est_dmg$Total_property_dmg))
head(est_dmg)
##              Storm_type Fatalities Injuries Total_health_damage
## 1 astronomical low tide          0        0                   0
## 2             avalanche        225      170                 395
## 3              blizzard        101      805                 906
## 4         coastal flood          3        2                   5
## 5       cold/wind chill         95       12                 107
## 6             dense fog         18      342                 360
##   Total_property_dmg
## 1             320000
## 2            1621800
## 3           24683950
## 4           13290560
## 5            1990000
## 6            8224000

Results

1. Analyze damages to human health.

Plotted Fatalities, Injuries and their sums on the same axes.

par(las = 2, mar = c(12, 5, 4, 2)) # "las = 2" rotates x-axis 90o.
with(est_dmg, plot.default(Storm_type, Fatalities, 
                   type = "p",
                   pch = 15,
                   xlab = "",
                   ylab = "Human health damages",
                   main = "Storm damages to human health")) # Make first plot

with(est_dmg, points(Storm_type, Injuries,
                     type = "p",
                     pch = 16,
                     col = "red")) # Add plot

with(est_dmg, points(Storm_type, Total_health_damage,
                     type = "p",
                     pch = 17,
                     col = "blue")) # Add plot

legend("topleft",
       col = c("black", "red", "blue"),
       lty = 1,
       pch = c(15, 16, 17),
       cex = 0.8,
       legend = c("Fatalities", "Injuries", "Total")) # Annotate

axis(side = 1, at = 1:47, labels = as.character(est_dmg$Storm_type)) # Label tickmarks on x-axis

2. Results from exploring the health damage data:

  1. Higher fatalities roughly correspond to higher injuries.

  2. Adding fatalities and injuries number is a good estimator for considering total health damage.

  3. Make several categories for damages to human health:

  1. events that cause < 1000 total fatalities and injuries:
est_dmg$Storm_type[est_dmg$Total_health_damage <= 1000]
##  [1] astronomical low tide    avalanche               
##  [3] blizzard                 coastal flood           
##  [5] cold/wind chill          dense fog               
##  [7] dense smoke              drought                 
##  [9] dust devil               dust storm              
## [11] extreme cold/wind chill  freezing fog            
## [13] frost/freeze             funnel cloud            
## [15] heavy rain               high surf               
## [17] hurricane (typhoon)      lake-effect snow        
## [19] lakeshore flood          marine hail             
## [21] marine high wind         marine strong wind      
## [23] marine thunderstorm wind rip current             
## [25] seiche                   sleet                   
## [27] storm surge/tide         tropical depression     
## [29] tropical storm           tsunami                 
## [31] volcanic ash             waterspout              
## [33] winter weather          
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather
  1. events that cause between 1000 and 5000 total fatalities and injuries:
est_dmg$Storm_type[est_dmg$Total_health_damage > 1000 &
                   est_dmg$Total_health_damage < 5000]
## [1] flash flood       hail              heat              heavy snow       
## [5] high wind         ice storm         thunderstorm wind wildfire         
## [9] winter storm     
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather
  1. events that cause > 5000 total fatalities and injuries:
est_dmg$Storm_type[est_dmg$Total_health_damage >= 5000]
## [1] excessive heat flood          lightning      strong wind   
## [5] tornado       
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather

The type of storm that causes the most amount of fatalities and injuries is:

est_dmg$Storm_type[est_dmg$Total_health_damage == max(est_dmg$Total_health_damage)]
## [1] tornado
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather

3. Analyze damages to property

Note that plotting property damage on a linear scale does not create an effective visual representation of the distribution of property damage for different event types. This is because the difference between property damage caused by different event types are of several orders of magnitude. Therefore used a logarithmic scale on the y-axis.

par(mar = c(2, 6, 4, 2))

with(est_dmg, plot.default(xy.coords(Storm_type, Total_property_dmg/1000000),
                           xlim = c(1,60),
                           ylim = c(.001, max(Total_property_dmg/1000000)),
                           log = "y",
                           pch = 15,
                           xlab = "", xaxt = "n", # choose not to label the tickmarks here.
                           ylab = "",
                           main = "Storm damages to properties"))
## Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0 omitted
## from logarithmic plot
text(est_dmg$Storm_type, est_dmg$Total_property_dmg/1000000, est_dmg$Storm_type,
     cex=0.8, pos = 4, col="blue") # Labelled each point in the plot to effectively visualize which event caused more damages.

title(ylab = "Property damage (million $)",
      mgp = c(4, 1, 0)) # Labelled y-axis

4. Results from exploring the property damage data:

Again several categories could be created for damages to property:

  1. events that cause < 0.5 million dollars of damage:
est_dmg$Storm_type[log10(est_dmg$Total_property_dmg) <= 5]
## [1] dense smoke     lakeshore flood marine hail     rip current    
## [5] sleet          
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather
  1. events that cause between 0.5 million and 1 billion dollars of damage:
est_dmg$Storm_type[log10(est_dmg$Total_property_dmg) > 5 &
                   log10(est_dmg$Total_property_dmg) < 9]
##  [1] astronomical low tide    avalanche               
##  [3] blizzard                 coastal flood           
##  [5] cold/wind chill          dense fog               
##  [7] drought                  dust devil              
##  [9] dust storm               excessive heat          
## [11] extreme cold/wind chill  freezing fog            
## [13] frost/freeze             funnel cloud            
## [15] heat                     heavy rain              
## [17] high surf                high wind               
## [19] hurricane (typhoon)      ice storm               
## [21] lake-effect snow         marine high wind        
## [23] marine strong wind       marine thunderstorm wind
## [25] seiche                   storm surge/tide        
## [27] tropical depression      tropical storm          
## [29] tsunami                  volcanic ash            
## [31] waterspout               wildfire                
## [33] winter storm             winter weather          
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather
  1. events that cause > 1 billion dollars of damage:
est_dmg$Storm_type[log10(est_dmg$Total_property_dmg) >= 9]
## [1] flash flood       flood             hail              heavy snow       
## [5] lightning         strong wind       thunderstorm wind tornado          
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather

The type of storm that causes the most amount of property damage is:

est_dmg$Storm_type[log10(est_dmg$Total_property_dmg) == max(log10(est_dmg$Total_property_dmg))]
## [1] strong wind
## 47 Levels: astronomical low tide avalanche blizzard ... winter weather

Conclusion

To answer the questions,

  1. Across the United States, tornadoes and winds, lightning, floods and excessive heat cause the most harm to population health, in terms of generating the highest number of fatalities and injuries.

  2. Across the United States, tornadoes and winds, lightning, floods, snow and hail have the greatest economic consequences, in terms of generating the great amount of property damage in dollars.