Impact of Severe Weather Events on Public Health and Economy in the United States

Synopsis

This report aims to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. It will use the estimates of fatalities, injuries, damages to decide which types of event are most harmful to the population health and economy. From these data, we found that Tornado, Excessive heat, Flood, Flash Flood, Heat are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.

library the package needed in this report firslty

## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
## 
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Loading required package: gridExtra
## Loading required package: grid

Download and read the data

#*Download the data into 'RepData_PeerAssessment2' and unzip the file*
setwd('~/Desktop/Data Science/datasciencecoursera/RepData_PeerAssessment2')
#Read the data 
if (!"stormData" %in% ls()) {
    storm <- read.csv("stormData.csv", header=T,sep = ",")
}
dim(storm)
## [1] 902297     37

Explore the data by ploting the histogram of recording in different years

hist(storm_sub$year,breaks=30)

It is obvious that more data are recorded after 1980.Then this report will use records from 1980 to 2011.

storm_sub_y <- filter(storm_sub,year >= 1980)
head(storm_sub_y)
##   year EVTYPE  F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1980   HAIL NA 200          0        0       0                  0
## 2 1980   HAIL NA  75          0        0       0                  0
## 3 1980   HAIL NA  75          0        0       0                  0
## 4 1980   HAIL NA 175          0        0       0                  0
## 5 1980   HAIL NA 175          0        0       0                  0
## 6 1980   HAIL NA 175          0        0       0                  0
##   CROPDMGEXP
## 1           
## 2           
## 3           
## 4           
## 5           
## 6

Find the most severe events in terms of Impact on Health

*In this section, It will check the number of fatalities and injuries that are caused by the severe weather events. The code will rank the first 10 most severe types of events based on fatalities and injuries.

storm_sub_yfa <- summarize(group_by(storm_sub_y,EVTYPE),total_fatalities = sum(FATALITIES))
rank_fa <- arrange(storm_sub_yfa,desc(total_fatalities))
result_fa <- head(rank_fa,n=10)
result_fa
## Source: local data frame [10 x 2]
## 
##            EVTYPE total_fatalities
## 1         TORNADO             2274
## 2  EXCESSIVE HEAT             1903
## 3     FLASH FLOOD              978
## 4            HEAT              937
## 5       LIGHTNING              816
## 6       TSTM WIND              504
## 7           FLOOD              470
## 8     RIP CURRENT              368
## 9       HIGH WIND              248
## 10      AVALANCHE              224
storm_sub_yj <- summarize(group_by(storm_sub_y,EVTYPE),total_injuries = sum(INJURIES))
rank_j <-arrange(storm_sub_yj,desc(total_injuries))
result_j <- head(rank_j,n=10)
result_j
## Source: local data frame [10 x 2]
## 
##               EVTYPE total_injuries
## 1            TORNADO          37971
## 2          TSTM WIND           6957
## 3              FLOOD           6789
## 4     EXCESSIVE HEAT           6525
## 5          LIGHTNING           5230
## 6               HEAT           2100
## 7          ICE STORM           1975
## 8        FLASH FLOOD           1777
## 9  THUNDERSTORM WIND           1488
## 10              HAIL           1361
#Plot the histgrom to show one result 
fatalitiesPlot <- qplot(EVTYPE, data = result_fa,weight = total_fatalities, geom = "bar", binwidth = 1) + 
        scale_y_continuous("Number of Fatalities") + 
        theme(axis.text.x = element_text(angle = 45, 
                                         hjust = 1)) + xlab("Severe Weather Type") + 
        ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1980 - 2011")

fatalitiesPlot

The two table shows that Tornado, Excessive heat, Flood, Flash Flood, Heat are are most harmful with respect to population health

Find the most severe events in terms of Impact on Economy

We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book. Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).

A function was written to convert the categorical orders into numerical

convertData <- function(dataset = storm_sub_y, fieldName, newFieldName) {
        totalLen <- dim(dataset)[2]
        index <- which(colnames(dataset) == fieldName)
        dataset[, index] <- as.character(dataset[, index])
        logic <- !is.na(toupper(dataset[, index]))
        dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
        dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
        dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
        dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
        dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
        dataset[, index] <- as.numeric(dataset[, index])
        dataset[is.na(dataset[, index]), index] <- 0
        dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
        names(dataset)[totalLen + 1] <- newFieldName
        return(dataset)
}

storm_prop <- convertData(storm_sub_y, "PROPDMGEXP", "propertyDamage")
## Warning in convertData(storm_sub_y, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storm_crop <- convertData(storm_sub_y, "CROPDMGEXP", "cropDamage")
## Warning in convertData(storm_sub_y, "CROPDMGEXP", "cropDamage"): NAs
## introduced by coercion
head(storm_prop) 
##   year EVTYPE  F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1980   HAIL NA 200          0        0       0          0       0
## 2 1980   HAIL NA  75          0        0       0          0       0
## 3 1980   HAIL NA  75          0        0       0          0       0
## 4 1980   HAIL NA 175          0        0       0          0       0
## 5 1980   HAIL NA 175          0        0       0          0       0
## 6 1980   HAIL NA 175          0        0       0          0       0
##   CROPDMGEXP propertyDamage
## 1                         0
## 2                         0
## 3                         0
## 4                         0
## 5                         0
## 6                         0

Get the total damage for property damage and crop damage

prop_damage <- summarize(group_by(storm_prop,EVTYPE),total_property_damage = sum(PROPDMGEXP))
prop_rank <- arrange(prop_damage,desc(total_property_damage))
prop_rank <- head(prop_rank,n=10)
prop_rank 
## Source: local data frame [10 x 2]
## 
##                EVTYPE total_property_damage
## 1                HAIL                278573
## 2   THUNDERSTORM WIND                246629
## 3           TSTM WIND                190143
## 4         FLASH FLOOD                103389
## 5             TORNADO                 99603
## 6               FLOOD                 56790
## 7           HIGH WIND                 43278
## 8  THUNDERSTORM WINDS                 35907
## 9           LIGHTNING                 33456
## 10       WINTER STORM                 22959
crop_damage <- summarize(group_by(storm_crop,EVTYPE),total_crop_damage = sum(CROPDMGEXP))
crop_rank <- arrange(crop_damage,desc(total_crop_damage))
crop_rank <- head(crop_rank,n=10)
crop_rank 
## Source: local data frame [10 x 2]
## 
##               EVTYPE total_crop_damage
## 1               HAIL            248610
## 2  THUNDERSTORM WIND            244488
## 3        FLASH FLOOD             65580
## 4              FLOOD             41916
## 5          HIGH WIND             34644
## 6            TORNADO             29037
## 7          TSTM WIND             20226
## 8       WINTER STORM             20166
## 9     WINTER WEATHER             19980
## 10        HEAVY SNOW             18084
damagePlot <- qplot(EVTYPE, data = crop_rank,weight = total_crop_damage, geom = "bar", binwidth = 1) + 
        scale_y_continuous("sum of damage") + 
        theme(axis.text.x = element_text(angle = 45, 
                                         hjust = 1)) + xlab("Severe Weather Type") + 
        ggtitle("Total Damage by Severe Weather\n Events in the U.S.\n from 1980 - 2011")
damagePlot

  • Based on the table, Hail, Thunderstorm wind, Flassh Flood have the greatest economic consequences*

Final result

Based on the above histograms, we find that Tornado, Excessive heat, Flood, Flash Flood, Heat are are most harmful with respect to population health; Hail, Thunderstorm wind, Flassh Flood have the greatest economic consequences