The Impact of Severe Weather Events in USA Between 1950 and 2011

Synopsis

In this report we aim to describe the impacts of severe weather events on public health and economy in the United States between the years 1950 and 2011. To investigate this question, we obtained data NOAA Storm Database. The relative information can be found in National Weather Service. The data can be downloaded on the url https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. We calculate the total fatalities and total injuries for each severe weather events. We find the tornato has the most negative impact on both fatalities and injuries. We compute the total economic damage caused by the severe weather events. We combine the Crop Damage and Property Damage together to get the total damage. According the total damage, we find that flood cause the most damager among all the severe weather events in USA. All data are showed on the barcharts, so it is easy to identify the most harmful severe weather events.

Data Processing

Loading and Processing the Raw Data

2.1 Set the option for knitr to cache results in a folder called cache. Cache by default all calculations.

library(knitr)
opts_chunk$set(echo = TRUE, results = "show", cache = TRUE)

2.2 Instapll packages and oad packages

# Install and load packages packages <- c('data.table', 'sqldf', 'ggplot2',
# 'xtable') install.packages(packages) sapply(packages, require,
# character.only = TRUE, quietly = TRUE)
library(data.table)
library(ggplot2)
library(xtable)
library(sqldf)

2.3 Download and read data

From the course web site ,we obtained data Storm Data which is in the form of a comma-separated-value file compressed via the bzip2 algorithm.

# download the data and save it into the data subfolder

if (!file.exists("./data")) {
    dir.create("./data")
}

Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./data/swe.csv.bz2")) {
    download.file(Url, destfile = "./data/swe.csv.bz2")
}

Read the data. First unzip it. Then read first 5 row. Choose 8 column we need to read.

# Loading from the raw CSV.bz2 file containing the data.
if (!file.exists("swe.csv")) {
    unz("./data/swe.csv.bz2", "swe.csv")
}
# read.csv is really slow, so I just comment it for the Rmd file.  swe0 <-
# read.csv('./data/swe.csv.bz2') terrible slow swe0 <- read.csv('./swe.csv')

Try first read 5 rows. Then choose what we need.

tab5rows <- read.csv("./swe.csv", header = TRUE, sep = ",", nrows = 5)
head(tab5rows)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00      130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00      145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00      900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0      NA         NA       NA       NA          0
## 2 TORNADO         0      NA         NA       NA       NA          0
## 3 TORNADO         0      NA         NA       NA       NA          0
## 4 TORNADO         0      NA         NA       NA       NA          0
## 5 TORNADO         0      NA         NA       NA       NA          0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0      NA         NA   14.0   100 3   0          0
## 2         NA         0      NA         NA    2.0   150 2   0          0
## 3         NA         0      NA         NA    0.1   123 2   0          0
## 4         NA         0      NA         NA    0.0   100 2   0          0
## 5         NA         0      NA         NA    0.0   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0         NA  NA         NA        NA
## 2        0     2.5          K       0         NA  NA         NA        NA
## 3        2    25.0          K       0         NA  NA         NA        NA
## 4        2     2.5          K       0         NA  NA         NA        NA
## 5        2     2.5          K       0         NA  NA         NA        NA
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806      NA      1
## 2     3042      8755          0          0      NA      2
## 3     3340      8742          0          0      NA      3
## 4     3458      8626          0          0      NA      4
## 5     3412      8642          0          0      NA      5

# classes <- sapply(tab5rows, class) swe0 <- read.table('./swe.csv') only
# read the columns we need
mycols <- rep("NULL", 37)
mycols[c(8, 22:28)] <- NA
swe1 <- read.csv("./swe.csv", header = TRUE, sep = ",", colClasses = mycols)

Checking and cleaning the data.

After reading in the data, we check the first few rows (there are 902297) rows in this dataset.

dim(swe1)

## [1] 902297      8

Subset the data. Only keep the related data for analysis.

# List most important columns related to event type, health impact and
# economic impact.  head(swe0[, c(8,22:28)]) Generate subset for analysis

# swe1 <- swe0[,c(8,22:28)]

# Label the subset

names(swe1) <- make.names(c("eventtype", "Eventmagnitude", "fatalities", "injuries", 
    "propertydamage", "propertyunit", "cropdamage", "cropunit"))
# unique(swe1$eventtype)

Since, the record of “type” is messy, we need to clean some typos to get the tidy record. There are many ways to group those types. With the help in coursera forum, we got a proper list of type.

f_events <- c("astronom", "avalan", "blizz", "flood", "wind chill", "debris", 
    "dense fog", "dense smoke", "drought", "dust devil", "dust", "heat", "frost", 
    "funnel", "freezing fog", "hail", "rain", "snow", "surf", "wind", "hurricane", 
    "typhoon", "ice storm", "lightning", "marine", "rip current", "seiche", 
    "sleet", "storm surge", "thunderstorm", "tornado", "tropical", "tsunami", 
    "volcanic", "waterspout", "wildfire", "winter storm", "winter weather")

x <- swe1$eventtype
evtypes <- gsub(" ", "", tolower(x))

Replace the “type” by the names in the list.

for (i in f_events) evtypes[grepl(i, evtypes)] = i
# table(evtypes)
swe1$eventtypeold <- swe1$eventtype
swe1$eventtype <- evtypes

Missing values are a common problem with environmental data and so we check to see what proportion of the observations are missing (i.e. coded as NA).

mean(is.na(swe1))  ## Are missing values important here?

## [1] 0

Because the proportion of missing values is relatively low (0), we choose to ignore missing values for now.

Results

the monitoring of all even types started in Jan 1996

3.1 Across the United States, which types of events (as indicated in the `EVTYPE` variable) are most harmful with respect to population health?

Generate a subset with the sum of the fatalities and injuries.

# convert Event Type to factor
swe1$eventtype <- as.factor(swe1$eventtype)

library(plyr)
health <- ddply(swe1, .(eventtype), summarise, total.fatalities = sum(fatalities), 
    total.injuries = sum(injuries))

dim(health)

## [1] 315   3


# summary(health)

Find out the top ten fatal and injury events

plot top tweenty fatal and injury events, order those by the total fatalities and take first twenty event for the graph.

Top 20 fatal severe weather events

ggplot(arrange(health, desc(health$total.fatalities))[1:20, ], aes(x = reorder(eventtype, 
    total.fatalities), total.fatalities, fill = total.fatalities)) + geom_bar(stat = "identity", 
    width = 0.75) + ylab("fatalities") + coord_flip() + xlab("") + ggtitle("Impact on Fatalities by Event Type")

plot of chunk unnamed-chunk-12

Top 20 severe weather events with injuries

ggplot(arrange(health, desc(total.injuries))[1:20, ], aes(x = reorder(eventtype, 
    total.injuries), total.injuries, fill = total.injuries)) + geom_bar(stat = "identity", 
    width = 0.75) + ylab("injuries") + coord_flip() + xlab("") + ggtitle("Impact on Injuries by Event Type")

plot of chunk unnamed-chunk-13

According to the digram, we can see that Tornado cause the most severest fatalities and injuries among those severe weather event.

3.2 Across the United States, which types of events have the greatest economic consequences?

To get the right unit for Crop Damage and Property Damage, we scale the amount of Crop Damage and Property Damage.

# To scale the amount of Crop Damage and Property Damage
swe1$propertyunit <- tolower(swe1$propertyunit)
swe1$cropunit <- tolower(swe1$cropunit)

swe1 <- data.table(swe1)
swe1 <- swe1[, `:=`(PropertyDamage, ifelse(propertyunit == "b", propertydamage * 
    1e+09, ifelse(propertyunit == "m", propertydamage * 1e+06, ifelse(propertyunit == 
    "k", propertydamage * 1000, ifelse(propertyunit == "h", propertydamage * 
    100, propertydamage)))))]

swe1 <- swe1[, `:=`(CropDamage, ifelse(cropunit == "b", cropdamage * 1e+09, 
    ifelse(cropunit == "m", cropdamage * 1e+06, ifelse(cropunit == "k", cropdamage * 
        1000, ifelse(cropunit == "h", cropdamage * 100, cropdamage)))))]

We combine Crop Damage and Property Damage to total damage.


swe1$TotalDamage <- swe1$CropDamage + swe1$PropertyDamage

damage <- ddply(swe1, .(eventtype), summarise, total.damage = sum(TotalDamage))

damageorder <- order(damage$total.damage, decreasing = T)

damage <- damage[damageorder, ]

damagetopten <- damage[1:20, ]

Top 20 severe weather events with biggest damage

ggplot(damagetopten, aes(x = reorder(eventtype, total.damage), total.damage, 
    fill = total.damage)) + geom_bar(stat = "identity") + ylab("Property Damage") + 
    coord_flip() + xlab("") + ggtitle("Impact on Property Damage by Event Type")

plot of chunk unnamed-chunk-16

According to the diagram, we conclude that the flood has the biggest impact on the property damage.

Conclusion

According to the diagrams, we conclude that the tornato and flood are the most danger severe weather events in USA between 1950 and 2011.