This report aims to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. It will use the estimates of fatalities, injuries, damages to decide which types of event are most harmful to the population health and economy. From these data, we found that Tornado, Excessive heat, Flood, Flash Flood, Heat are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
##
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Loading required package: gridExtra
## Loading required package: grid
#*Download the data into 'RepData_PeerAssessment2' and unzip the file*
setwd('~/Desktop/Data Science/datasciencecoursera/RepData_PeerAssessment2')
#Read the data
if (!"stormData" %in% ls()) {
storm <- read.csv("stormData.csv", header=T,sep = ",")
}
dim(storm)
## [1] 902297 37
hist(storm_sub$year,breaks=30)
It is obvious that more data are recorded after 1980.Then this report will use records from 1980 to 2011.
storm_sub_y <- filter(storm_sub,year >= 1980)
head(storm_sub_y)
## year EVTYPE F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1980 HAIL NA 200 0 0 0 0
## 2 1980 HAIL NA 75 0 0 0 0
## 3 1980 HAIL NA 75 0 0 0 0
## 4 1980 HAIL NA 175 0 0 0 0
## 5 1980 HAIL NA 175 0 0 0 0
## 6 1980 HAIL NA 175 0 0 0 0
## CROPDMGEXP
## 1
## 2
## 3
## 4
## 5
## 6
*In this section, It will check the number of fatalities and injuries that are caused by the severe weather events. The code will rank the first 10 most severe types of events based on fatalities and injuries.
storm_sub_yfa <- summarize(group_by(storm_sub_y,EVTYPE),total_fatalities = sum(FATALITIES))
rank_fa <- arrange(storm_sub_yfa,desc(total_fatalities))
result_fa <- head(rank_fa,n=10)
result_fa
## Source: local data frame [10 x 2]
##
## EVTYPE total_fatalities
## 1 TORNADO 2274
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
storm_sub_yj <- summarize(group_by(storm_sub_y,EVTYPE),total_injuries = sum(INJURIES))
rank_j <-arrange(storm_sub_yj,desc(total_injuries))
result_j <- head(rank_j,n=10)
result_j
## Source: local data frame [10 x 2]
##
## EVTYPE total_injuries
## 1 TORNADO 37971
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
#Plot the histgrom to show one result
fatalitiesPlot <- qplot(EVTYPE, data = result_fa,weight = total_fatalities, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.\n from 1980 - 2011")
fatalitiesPlot
The two table shows that Tornado, Excessive heat, Flood, Flash Flood, Heat are are most harmful with respect to population health
We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book. Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).
convertData <- function(dataset = storm_sub_y, fieldName, newFieldName) {
totalLen <- dim(dataset)[2]
index <- which(colnames(dataset) == fieldName)
dataset[, index] <- as.character(dataset[, index])
logic <- !is.na(toupper(dataset[, index]))
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
dataset[, index] <- as.numeric(dataset[, index])
dataset[is.na(dataset[, index]), index] <- 0
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
names(dataset)[totalLen + 1] <- newFieldName
return(dataset)
}
storm_prop <- convertData(storm_sub_y, "PROPDMGEXP", "propertyDamage")
## Warning in convertData(storm_sub_y, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storm_crop <- convertData(storm_sub_y, "CROPDMGEXP", "cropDamage")
## Warning in convertData(storm_sub_y, "CROPDMGEXP", "cropDamage"): NAs
## introduced by coercion
head(storm_prop)
## year EVTYPE F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 1980 HAIL NA 200 0 0 0 0 0
## 2 1980 HAIL NA 75 0 0 0 0 0
## 3 1980 HAIL NA 75 0 0 0 0 0
## 4 1980 HAIL NA 175 0 0 0 0 0
## 5 1980 HAIL NA 175 0 0 0 0 0
## 6 1980 HAIL NA 175 0 0 0 0 0
## CROPDMGEXP propertyDamage
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
prop_damage <- summarize(group_by(storm_prop,EVTYPE),total_property_damage = sum(PROPDMGEXP))
prop_rank <- arrange(prop_damage,desc(total_property_damage))
prop_rank <- head(prop_rank,n=10)
prop_rank
## Source: local data frame [10 x 2]
##
## EVTYPE total_property_damage
## 1 HAIL 278573
## 2 THUNDERSTORM WIND 246629
## 3 TSTM WIND 190143
## 4 FLASH FLOOD 103389
## 5 TORNADO 99603
## 6 FLOOD 56790
## 7 HIGH WIND 43278
## 8 THUNDERSTORM WINDS 35907
## 9 LIGHTNING 33456
## 10 WINTER STORM 22959
crop_damage <- summarize(group_by(storm_crop,EVTYPE),total_crop_damage = sum(CROPDMGEXP))
crop_rank <- arrange(crop_damage,desc(total_crop_damage))
crop_rank <- head(crop_rank,n=10)
crop_rank
## Source: local data frame [10 x 2]
##
## EVTYPE total_crop_damage
## 1 HAIL 248610
## 2 THUNDERSTORM WIND 244488
## 3 FLASH FLOOD 65580
## 4 FLOOD 41916
## 5 HIGH WIND 34644
## 6 TORNADO 29037
## 7 TSTM WIND 20226
## 8 WINTER STORM 20166
## 9 WINTER WEATHER 19980
## 10 HEAVY SNOW 18084
damagePlot <- qplot(EVTYPE, data = crop_rank,weight = total_crop_damage, geom = "bar", binwidth = 1) +
scale_y_continuous("sum of damage") +
theme(axis.text.x = element_text(angle = 45,
hjust = 1)) + xlab("Severe Weather Type") +
ggtitle("Total Damage by Severe Weather\n Events in the U.S.\n from 1980 - 2011")
damagePlot