Impact of Severe Weather Events

on Public Health and Economy of the United States

Amy Jiang
November 21 2014

Problem Statement

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis:

Basic settings

echo = TRUE  # Always make code visible
library(ggplot2)
library(plyr)


Data Processing

The NOAA storm data file is downloaded from internet and unzipped to Coursera folder under Desktop.

setwd("~/Desktop/Coursera/")
cache = TRUE
if (!"stormData.csv.bz2" %in% dir("~/Desktop/Coursera")) {
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormData.csv.bz2")
    bunzip2("stormData.csv.bz2", overwrite=T, remove=F)
}

The dimension, and content of NOAA data are studied.

data.storm <- read.csv("~/Desktop/Coursera/stormData.csv", sep = ",")
dim(data.storm)
## [1] 902297     37
head(data.storm, n=3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3

A plot graph of storm events distributed years showing that the storm data is more complete in recent years.

# Since we will only care about the year of the event date, format it to the number of year
data.storm$year <- as.numeric(format(as.Date(data.storm$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
# A quick histogram to see how data was distributed across the years
hist(data.storm$year, xlab= "Year", breaks = 30, main = "Distribution of Weather Events Data over Years", col="steel blue")

Following the histogram in above, we could see that the weather events captured since 1995 have been almost doubled than in the past. We believe this increase in the number of captured events is due to the advancement of modern technlogy, which enabled scientists to record the event that they were not able to capture before. Since our study is focused on the total impact of weather events over years, I have chosen to not discriminate data based on this known observance.

Study of Severe Weather Events’ Impact on Public Health

We are interested to see the top 15 types of weather events that resulted fatalities and injuries.

Top 15 types of weather events that have the highest fatality rate:

# Subtotal of fatablities by event type
data.storm.fatality <- aggregate(data.storm[, "FATALITIES"], by = list(data.storm$EVTYPE), FUN = "sum", na.rm=T)
# Sorting the data by numbers of fatalities in descending order
data.storm.fatality  <- arrange(data.storm.fatality, data.storm.fatality[,2], decreasing=T)
# Subsetting the top 15
data.storm.fatality <- data.storm.fatality[1:15,]
# Adding column headers
colnames(data.storm.fatality)  <- c("Event.Type", "Fatalities")
# Setting levels of Event.Type
data.storm.fatality[,"Event.Type"]  <- factor(data.storm.fatality[,"Event.Type"], levels = data.storm.fatality[,"Event.Type"])

# Show the top 15 fatality data
data.storm.fatality
##           Event.Type Fatalities
## 1            TORNADO       5633
## 2     EXCESSIVE HEAT       1903
## 3        FLASH FLOOD        978
## 4               HEAT        937
## 5          LIGHTNING        816
## 6          TSTM WIND        504
## 7              FLOOD        470
## 8        RIP CURRENT        368
## 9          HIGH WIND        248
## 10         AVALANCHE        224
## 11      WINTER STORM        206
## 12      RIP CURRENTS        204
## 13         HEAT WAVE        172
## 14      EXTREME COLD        160
## 15 THUNDERSTORM WIND        133
# Plotting the figure
ggplot(data.storm.fatality, aes(Event.Type, Fatalities)) + geom_bar(stat = "identity", color="steel blue", fill="steel blue", width = 0.7) + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + labs(x="Event Type", y="Total No. of Fatalities", title="Total Fatalities by Severe Weather\n Events in the U.S.\n from 1950 - 2011\n")

Top 15 types of weather events that have the highest injury rate:

# Subtotal of injuries by event type
data.storm.injury <- aggregate(data.storm[, "INJURIES"], by = list(data.storm$EVTYPE), FUN = "sum", na.rm=T)
# Sorting the data by numbers of injuries in descending order
data.storm.injury  <- arrange(data.storm.injury, data.storm.injury[,2], decreasing=T)
# Subsetting the top 15
data.storm.injury <- data.storm.injury[1:15,]
# Adding column headers
colnames(data.storm.injury)  <- c("Event.Type", "Injuries")
# Setting levels of Event.Type
data.storm.injury[,"Event.Type"]  <- factor(data.storm.injury[,"Event.Type"], levels = data.storm.injury[,"Event.Type"])

# Show the top 15 injury data
data.storm.injury
##           Event.Type Injuries
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
## 11      WINTER STORM     1321
## 12 HURRICANE/TYPHOON     1275
## 13         HIGH WIND     1137
## 14        HEAVY SNOW     1021
## 15          WILDFIRE      911
# Plotting the figure
ggplot(data.storm.injury, aes(Event.Type, Injuries)) + geom_bar(stat = "identity", color="steel blue", fill="steel blue", width = 0.7) + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + labs(x="Event Type", y="Total No. of Injuries", title="Total Injuries by Severe Weather\n Events in the U.S.\n from 1950 - 2011\n")


Study of Severe Weather Events’ Impact on Economy

We are interested to see the top 15 types of weeather events that resulted loss in property damage and crop damage.

Top 15 types of weather events that have caused the highest property damage:

# Remove unnecessary columns from data
data.storm.property.damage <- subset(data.storm, select = c("EVTYPE", "PROPDMG", "PROPDMGEXP"))
# Transform the measurements to number
data.storm.property.damage <- mutate(data.storm.property.damage, 
                      PROPDMGTTL = ifelse(PROPDMGEXP=="k" | PROPDMGEXP=="K", 
                                          PROPDMG * 1000, 
                                          ifelse(PROPDMGEXP=="m" | PROPDMGEXP=="M",
                                                 PROPDMG * 1000000,
                                                 ifelse(PROPDMGEXP=="b" | PROPDMGEXP=="B",
                                                        PROPDMG * 1000000000,
                                                        ifelse(PROPDMGEXP=="h" | PROPDMGEXP=="H",
                                                                PROPDMG * 100, PROPDMG)))))
# Calculating total by event types
data.storm.property.damage <- aggregate(data.storm.property.damage$PROPDMGTTL, by = list(data.storm.property.damage$EVTYPE), FUN = "sum", na.rm=T)
# Sort
data.storm.property.damage <- arrange(data.storm.property.damage, data.storm.property.damage[,2], decreasing=T)
# Select top 15
data.storm.property.damage <- data.storm.property.damage[1:15,]
# Adding column names
colnames(data.storm.property.damage)  <- c("Event.Type", "Property.Damage.Total")
# Setting levels of Event Type
data.storm.property.damage[,"Event.Type"]  <- factor(data.storm.property.damage[,"Event.Type"], levels = data.storm.property.damage[,"Event.Type"])

# Display the top 15 property damage table
data.storm.property.damage
##           Event.Type Property.Damage.Total
## 1              FLOOD          144657709807
## 2  HURRICANE/TYPHOON           69305840000
## 3            TORNADO           56937160779
## 4        STORM SURGE           43323536000
## 5        FLASH FLOOD           16140812067
## 6               HAIL           15732267543
## 7          HURRICANE           11868319010
## 8     TROPICAL STORM            7703890550
## 9       WINTER STORM            6688497251
## 10         HIGH WIND            5270046295
## 11       RIVER FLOOD            5118945500
## 12          WILDFIRE            4765114000
## 13  STORM SURGE/TIDE            4641188000
## 14         TSTM WIND            4484928495
## 15         ICE STORM            3944927860
# Plotting the figure
ggplot(data.storm.property.damage, aes(Event.Type, Property.Damage.Total/1e9)) + geom_bar(stat = "identity", color="steel blue", fill="steel blue", width = 0.7) + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + labs(x="Event Type", y="Property Damage (in billion dollars)", title="Total Property Damages by Severe Weather\n Events in the U.S.\n from 1950 - 2011\n")

Top 15 types of weather events that have caused the highest crop damage:

# Remove unnecessary columns from data
data.storm.crop.damage <- subset(data.storm, select = c("EVTYPE", "CROPDMG", "CROPDMGEXP"))
# Transform the measurements to number
data.storm.crop.damage <- mutate(data.storm.crop.damage, 
                      CROPDMGTTL = ifelse(CROPDMGEXP=="k" | CROPDMGEXP=="K", 
                                          CROPDMG * 1000, 
                                          ifelse(CROPDMGEXP=="m" | CROPDMGEXP=="M",
                                                 CROPDMG * 1000000,
                                                 ifelse(CROPDMGEXP=="b" | CROPDMGEXP=="B",
                                                        CROPDMG * 1000000000,
                                                        ifelse(CROPDMGEXP=="h" | CROPDMGEXP=="H",
                                                                CROPDMG * 100, CROPDMG)))))
# Calculating total by event types
data.storm.crop.damage <- aggregate(data.storm.crop.damage$CROPDMGTTL, by = list(data.storm.crop.damage$EVTYPE), FUN = "sum", na.rm=T)
# Sort
data.storm.crop.damage <- arrange(data.storm.crop.damage, data.storm.crop.damage[,2], decreasing=T)
# Select top 15
data.storm.crop.damage <- data.storm.crop.damage[1:15,]
# Adding column names
colnames(data.storm.crop.damage)  <- c("Event.Type", "Crop.Damage.Total")
# Setting levels of Event Type
data.storm.crop.damage[,"Event.Type"]  <- factor(data.storm.crop.damage[,"Event.Type"], levels = data.storm.crop.damage[,"Event.Type"])

# Display the top 15 property damage table
data.storm.crop.damage
##           Event.Type Crop.Damage.Total
## 1            DROUGHT       13972566000
## 2              FLOOD        5661968450
## 3        RIVER FLOOD        5029459000
## 4          ICE STORM        5022113500
## 5               HAIL        3025954473
## 6          HURRICANE        2741910000
## 7  HURRICANE/TYPHOON        2607872800
## 8        FLASH FLOOD        1421317100
## 9       EXTREME COLD        1292973000
## 10      FROST/FREEZE        1094086000
## 11        HEAVY RAIN         733399800
## 12    TROPICAL STORM         678346000
## 13         HIGH WIND         638571300
## 14         TSTM WIND         554007350
## 15    EXCESSIVE HEAT         492402000
# Plotting the figure
ggplot(data.storm.crop.damage, aes(Event.Type, Crop.Damage.Total/1e9)) + geom_bar(stat = "identity", color="steel blue", fill="steel blue", width = 0.7) + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + labs(x="Event Type", y="Crop Damage (in billion dollars)", title="Total Crop Damages by Severe Weather\n Events in the U.S.\n from 1950 - 2011\n")


Conclusion

Following our analysis on the provided data, Tornado and Excessive Heat are the top two severe weather events that have been most harmful to public health, while Drought and Flood are the top two severe weather events that have caused the greatest economic loss in the United States.