In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health and economy. From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
First, we download the data file and unzip it.
if (!"repdata_data_StormData.csv.bz2" %in% dir("."))
{
print("Downloading Data")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata_data_StormData.csv.bz2")
}
## [1] "Downloading Data"
if (!"stormData" %in% ls())
{
stormData <- read.csv("repdata_data_StormData.csv.bz2")
}
dim(stormData)
## [1] 902297 37
Then, we read the generated csv file. If the data already exists in the working environment, we do not need to load it again. Otherwise, we read the csv file.
fatalitiesAndInjuries <- stormData %>% group_by (EVTYPE) %>% summarise(Fatalities = sum(FATALITIES), Injuries = sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
fatalitiesAndInjuries <- mutate(fatalitiesAndInjuries, total = Fatalities + Injuries)
topTen <- fatalitiesAndInjuries[order(fatalitiesAndInjuries$total, decreasing = TRUE),][1:10,]
topTen
## # A tibble: 10 x 4
## EVTYPE Fatalities Injuries total
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
There are 902297 rows and 37 columns in total. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
propertyAndCropDmgData <- stormData[ , c(8,25,26,27,28)]
table(propertyAndCropDmgData$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
Based on the above histogram, we see that the number of events tracked starts to significantly increase around 1995. So, we use the subset of the data from 1990 to 2011 to get most out of good records.
table(propertyAndCropDmgData$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Now, there are 681500 rows and 38 columns in total.
In this section, we check the number of fatalities and injuries that are caused by the severe weather events. We would like to get the first 15 most severe types of weather events.
propertyAndCropDmgData <- mutate(propertyAndCropDmgData, PropDmgInDollars = PROPDMG, CropDmgInDollars = CROPDMG)
propertyAndCropDmgData$PROPDMGEXP[!grepl("K|M|B", propertyAndCropDmgData$PROPDMGEXP, ignore.case = TRUE)] <- 0
propertyAndCropDmgData$PROPDMGEXP[grep("K", propertyAndCropDmgData$PROPDMGEXP, ignore.case = TRUE)] <- 3
propertyAndCropDmgData$PROPDMGEXP[grep("M", propertyAndCropDmgData$PROPDMGEXP, ignore.case = TRUE)] <- 6
propertyAndCropDmgData$PROPDMGEXP[grep("B", propertyAndCropDmgData$PROPDMGEXP, ignore.case = TRUE)] <- 9
propertyAndCropDmgData$PROPDMGEXP[grep("H", propertyAndCropDmgData$PROPDMGEXP, ignore.case = TRUE)] <- 2
propertyAndCropDmgData$PropDmgInDollars <- propertyAndCropDmgData$PROPDMG * 10^as.numeric(propertyAndCropDmgData$PROPDMGEXP)
propertyAndCropDmgData$CROPDMGEXP[!grepl("K|M|B", propertyAndCropDmgData$CROPDMGEXP, ignore.case = TRUE)] <- 0
propertyAndCropDmgData$CROPDMGEXP[grep("K", propertyAndCropDmgData$CROPDMGEXP, ignore.case = TRUE)] <- 3
propertyAndCropDmgData$CROPDMGEXP[grep("M", propertyAndCropDmgData$CROPDMGEXP, ignore.case = TRUE)] <- 6
propertyAndCropDmgData$CROPDMGEXP[grep("B", propertyAndCropDmgData$CROPDMGEXP, ignore.case = TRUE)] <- 9
propertyAndCropDmgData$CropDmgInDollars <- propertyAndCropDmgData$CROPDMG * 10^as.numeric(propertyAndCropDmgData$CROPDMGEXP)
dmgByEvent <- propertyAndCropDmgData %>% group_by(EVTYPE) %>% summarise(totalPropDmg = sum(PropDmgInDollars), totalCropDmg = sum(CropDmgInDollars))
## `summarise()` ungrouping output (override with `.groups` argument)
dmgByEvent <- mutate(dmgByEvent, totalDmgInDollars = totalPropDmg + totalCropDmg)
topPropDmg <- dmgByEvent[order(dmgByEvent$totalPropDmg, decreasing = TRUE),][1:10,]
topPropDmg
## # A tibble: 10 x 4
## EVTYPE totalPropDmg totalCropDmg totalDmgInDollars
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160779. 414953270 57352114049.
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 FLASH FLOOD 16140812067. 1421317100 17562129167.
## 6 HAIL 15732267048. 3025954473 18758221521.
## 7 HURRICANE 11868319010 2741910000 14610229010
## 8 TROPICAL STORM 7703890550 678346000 8382236550
## 9 WINTER STORM 6688497251 26944000 6715441251
## 10 HIGH WIND 5270046295 638571300 5908617595
We will convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book (Storm Events). Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).
topCropDmg <- dmgByEvent[order(dmgByEvent$totalCropDmg, decreasing = TRUE),][1:10,]
topCropDmg
## # A tibble: 10 x 4
## EVTYPE totalPropDmg totalCropDmg totalDmgInDollars
## <chr> <dbl> <dbl> <dbl>
## 1 DROUGHT 1046106000 13972566000 15018672000
## 2 FLOOD 144657709807 5661968450 150319678257
## 3 RIVER FLOOD 5118945500 5029459000 10148404500
## 4 ICE STORM 3944927860 5022113500 8967041360
## 5 HAIL 15732267048. 3025954473 18758221521.
## 6 HURRICANE 11868319010 2741910000 14610229010
## 7 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 8 FLASH FLOOD 16140812067. 1421317100 17562129167.
## 9 EXTREME COLD 67737400 1292973000 1360710400
## 10 FROST/FREEZE 9480000 1094086000 1103566000
As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.
ggplot(topTen, aes(total, EVTYPE)) + geom_bar(stat = "identity", fill = "red") + ggtitle("Events Responsible for most Fatalities and Injuries") + ylab("Event")+ xlab("Total Fatalities and Injuries")
And the following is a pair of graphs of total fatalities and total injuries affected by these severe weather events.