Reproducible Research: Peer Assessment 2

Impact of Severe Weather Events on Public Health and Economy in the United States

Synonpsis

In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health and economy. From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.

Basic settings

library(knitr)
opts_chunk$set(cache=T,echo=T,message=T,comment = NA)
library(R.utils) #use function bunzip2
library(ggplot2)
library(plyr)
require(gridExtra)

Data Processing

First, we download the data file and unzip it.

setwd("D:\\git\\RepData_PeerAssessment2")

if (!"stormData.csv.bz2" %in% dir("./data/")) {
    print("load dat")
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "data/stormData.csv.bz2")
    bzfile("data/stormData.csv.bz2",encoding = "utf-8")
    bunzip2("data/stormData.csv.bz2", overwrite=T, remove=F)
   }

Then, we read the generated csv file. If the data already exists in the working environment, we do not need to load it again. Otherwise, we read the csv file.

if (!"stormData" %in% ls()) {
    stormData <- read.csv("data/stormData.csv", sep = ",")
}
Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, : EOF within quoted string
dim(stormData)
[1] 425873     37
head(stormData, n = 2)
  STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
   EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
1 TORNADO         0                                               0
2 TORNADO         0                                               0
  COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
1         NA         0                        14   100 3   0          0
2         NA         0                         2   150 2   0          0
  INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
1       15    25.0          K       0                                    
2        0     2.5          K       0                                    
  LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
1     3040      8812       3051       8806              1
2     3042      8755          0          0              2

There are 37 rows and 37 columns in total. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

if (dim(stormData)[2] == 37) {
    stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
}
hist(stormData$year, breaks = 30)

Based on the above histogram, we see that the number of events tracked starts to significantly increase around 1995. So, we use the subset of the data from 1990 to 2011 to get most out of good records.

storm <- stormData[stormData$year >= 1995, ]
dim(storm)
[1] 205076     38

Now, there are 38 rows and 38 columns in total.

Impact on Public Health

In this section, we check the number of fatalities and injuries that are caused by the severe weather events. We would like to get the first 15 most severe types of weather events.

sortHelper <- function(fieldName, top = 15, dataset = stormData) {
    index <- which(colnames(dataset) == fieldName)
    field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
    names(field) <- c("EVTYPE", fieldName)
    field <- arrange(field, field[, 2], decreasing = T)
    field <- head(field, n = top)
    field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
    return(field)
}

fatalities <- sortHelper("FATALITIES", dataset = storm)
injuries <- sortHelper("INJURIES", dataset = storm)

Impact on Economy

We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).

convertHelper <- function(dataset = storm, fieldName, newFieldName) {
    totalLen <- dim(dataset)[2]
    index <- which(colnames(dataset) == fieldName)
    dataset[, index] <- as.character(dataset[, index])
    logic <- !is.na(toupper(dataset[, index]))
    dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
    dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
    dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
    dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
    dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
    dataset[, index] <- as.numeric(dataset[, index])
    dataset[is.na(dataset[, index]), index] <- 0
    dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
    names(dataset)[totalLen + 1] <- newFieldName
    return(dataset)
}

storm <- convertHelper(storm, "PROPDMGEXP", "propertyDamage")
Warning in convertHelper(storm, "PROPDMGEXP", "propertyDamage"): 强制改变过程中产生了
NA
storm <- convertHelper(storm, "CROPDMGEXP", "cropDamage")
Warning in convertHelper(storm, "CROPDMGEXP", "cropDamage"): 强制改变过程中产生了NA
names(storm)
 [1] "STATE__"        "BGN_DATE"       "BGN_TIME"       "TIME_ZONE"     
 [5] "COUNTY"         "COUNTYNAME"     "STATE"          "EVTYPE"        
 [9] "BGN_RANGE"      "BGN_AZI"        "BGN_LOCATI"     "END_DATE"      
[13] "END_TIME"       "COUNTY_END"     "COUNTYENDN"     "END_RANGE"     
[17] "END_AZI"        "END_LOCATI"     "LENGTH"         "WIDTH"         
[21] "F"              "MAG"            "FATALITIES"     "INJURIES"      
[25] "PROPDMG"        "PROPDMGEXP"     "CROPDMG"        "CROPDMGEXP"    
[29] "WFO"            "STATEOFFIC"     "ZONENAMES"      "LATITUDE"      
[33] "LONGITUDE"      "LATITUDE_E"     "LONGITUDE_"     "REMARKS"       
[37] "REFNUM"         "year"           "propertyDamage" "cropDamage"    
options(scipen=999)
property <- sortHelper("propertyDamage", dataset = storm)
crop <- sortHelper("cropDamage", dataset = storm)

Results

As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.

fatalities
           EVTYPE FATALITIES
1  EXCESSIVE HEAT       1088
2            HEAT        694
3         TORNADO        411
4     FLASH FLOOD        369
5       LIGHTNING        337
6           FLOOD        167
7       HEAT WAVE        161
8       TSTM WIND        154
9    RIP CURRENTS        143
10      HIGH WIND        138
11   WINTER STORM        120
12   EXTREME COLD        111
13     HEAVY SNOW         92
14   EXTREME HEAT         91
15      AVALANCHE         69
injuries
               EVTYPE INJURIES
1             TORNADO     7712
2               FLOOD     6460
3      EXCESSIVE HEAT     3309
4           TSTM WIND     2278
5           LIGHTNING     2168
6        WINTER STORM     1035
7         FLASH FLOOD      962
8                HEAT      808
9           HIGH WIND      569
10                FOG      529
11         HEAVY SNOW      501
12 THUNDERSTORM WINDS      444
13               HAIL      436
14   WILD/FOREST FIRE      395
15           BLIZZARD      365

And the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.

fatalitiesPlot <- qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Fatalities") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot <- qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", binwidth = 1) + 
    scale_y_continuous("Number of Injuries") + 
    theme(axis.text.x = element_text(angle = 45, 
    hjust = 1)) + xlab("Severe Weather Type") + 
    ggtitle("Total Injuries by Severe Weather\n Events in the U.S.\n from 1995 - 2011")
injuriesPlot 

Based on the above histograms, we find that excessive heat and tornado cause most fatalities; tornato causes most injuries in the United States from 1995 to 2011.

As for the impact on economy, we have got two sorted lists below by the amount of money cost by damages.

property
                      EVTYPE propertyDamage
1                      FLOOD    10109702527
2                  HURRICANE     8775364000
3                    TORNADO     6028722585
4                FLASH FLOOD     4889132861
5                       HAIL     3603203473
6             HURRICANE OPAL     3172846000
7           WILD/FOREST FIRE     2795268500
8                  TSTM WIND     2631852030
9  HEAVY RAIN/SEVERE WEATHER     2500000000
10                 ICE STORM     1673611010
11       SEVERE THUNDERSTORM     1200310000
12        THUNDERSTORM WINDS      924962745
13                   TYPHOON      600230000
14            TROPICAL STORM      488405000
15                  BLIZZARD      413910950
crop
              EVTYPE cropDamage
1            DROUGHT 7903431000
2          HURRICANE 2292450000
3              FLOOD 1813403000
4       EXTREME COLD 1222063000
5               HAIL  972614370
6        FLASH FLOOD  508313500
7               HEAT  401235000
8             FREEZE  396225000
9          TSTM WIND  347955000
10        HEAVY RAIN  325854800
11    TROPICAL STORM  265575000
12   DAMAGING FREEZE  262100000
13 EXCESSIVE WETNESS  142000000
14         HIGH WIND  138819300
15    HURRICANE ERIN  136010000

And the following is a pair of graphs of total property damage and total crop damage affected by these severe weather events.

propertyPlot <- qplot(EVTYPE, data = property, weight = propertyDamage, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Property Damage in US dollars")+ 
    xlab("Severe Weather Type") + ggtitle("Total Property Damage by\n Severe Weather Events in\n the U.S. from 1995 - 2011")

cropPlot<- qplot(EVTYPE, data = crop, weight = cropDamage, geom = "bar", binwidth = 1) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous("Crop Damage in US dollars") + 
    xlab("Severe Weather Type") + ggtitle("Total Crop Damage by \nSevere Weather Events in\n the U.S. from 1995 - 2011")
cropPlot

Based on the above histograms, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States from 1995 to 2011.

Conclusion

From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.