Objectives. To investigate and determine which weather events have severe impact on both public health and economic problems to the population so as to plan in advance for prevention.
Methods. Data was obtained from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The examined events in the database start in the year 1950 and end in November 2011. The severity of the weather event was established from the estimates of any fatalities, injuries, crop and property damage. These estimates are geared to answer the following questions;
Results. Based on the analysis, tornado is indicated to be the most severe event that cause most of the harmful to population health as well as property damage. River flood is indicated to cause severe crop damage.
Working space is cleared and required packages loaded
## Warning: package 'R.utils' was built under R version 3.2.5
## Loading required package: R.oo
## Warning: package 'R.oo' was built under R version 3.2.5
## Loading required package: R.methodsS3
## Warning: package 'R.methodsS3' was built under R version 3.2.5
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.20.0 (2016-02-17) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.4.0 (2016-09-13) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
## Warning: package 'dplyr' was built under R version 3.2.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'ggplot2' was built under R version 3.2.5
## Warning: package 'gridExtra' was built under R version 3.2.5
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
bunzip2(temp, "StormData.csv", remove = FALSE, skip = TRUE)
## [1] "StormData.csv"
## attr(,"temporary")
## [1] FALSE
Data <- read.csv("StormData.csv")
unlink(temp)
glimpse(Data) # brief overview of the data
## Observations: 902,297
## Variables: 37
## $ STATE__ <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ BGN_DATE <fctr> 4/18/1950 0:00:00, 4/18/1950 0:00:00, 2/20/1951 0:...
## $ BGN_TIME <fctr> 0130, 0145, 1600, 0900, 1500, 2000, 0100, 0900, 20...
## $ TIME_ZONE <fctr> CST, CST, CST, CST, CST, CST, CST, CST, CST, CST, ...
## $ COUNTY <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73, ...
## $ COUNTYNAME <fctr> MOBILE, BALDWIN, FAYETTE, MADISON, CULLMAN, LAUDER...
## $ STATE <fctr> AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL...
## $ EVTYPE <fctr> TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TORNA...
## $ BGN_RANGE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ BGN_AZI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ BGN_LOCATI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ END_DATE <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ END_TIME <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ COUNTY_END <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COUNTYENDN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ END_RANGE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ END_AZI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ END_LOCATI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ LENGTH <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3, ...
## $ WIDTH <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 400...
## $ F <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1, ...
## $ MAG <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, ...
## $ INJURIES <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50...
## $ PROPDMG <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25....
## $ PROPDMGEXP <fctr> K, K, K, K, K, K, K, K, K, K, M, M, K, K, K, K, K,...
## $ CROPDMG <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ CROPDMGEXP <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ WFO <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ STATEOFFIC <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ ZONENAMES <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ LATITUDE <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 333...
## $ LONGITUDE <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 874...
## $ LATITUDE_E <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404, ...
## $ LONGITUDE_ <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640, ...
## $ REMARKS <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ REFNUM <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...
From the above data, we are interested in injuries and fatalities that directly affect the population as well as the property damage, and crop damage expense. The selected variables of interest are grouped by event.
# group injuries and fatalities by event
Data2 <- aggregate(cbind(INJURIES, FATALITIES)~EVTYPE, data=Data, sum, na.rm=TRUE)
To address the initial question; which types of events are most harmful with respect to population health? The groupped data is sorted in descending order from the highest to low injuries and fatality. Then the first top 5 are selected.
Data3a <- Data2[order(-Data2$INJURIES),]#sort by descending estimates of injuries
Data3b <- Data2[order(-Data2$FATALITIES),]#sort by descending estimates of injuries
Data3c <- Data3a[1:5,] #select top 10 injuries
Data3d <- Data3b[1:5,] #select top 10 fatalities
head(Data3c)
## EVTYPE INJURIES FATALITIES
## 834 TORNADO 91346 5633
## 856 TSTM WIND 6957 504
## 170 FLOOD 6789 470
## 130 EXCESSIVE HEAT 6525 1903
## 464 LIGHTNING 5230 816
head(Data3d)
## EVTYPE INJURIES FATALITIES
## 834 TORNADO 91346 5633
## 130 EXCESSIVE HEAT 6525 1903
## 153 FLASH FLOOD 1777 978
## 275 HEAT 2100 937
## 464 LIGHTNING 5230 816
To address the second question; which types of events have the greatest economic consequences?
# Overview of what we are working on
summary(Data$PROPDMGEXP)
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
summary(Data$CROPDMGEXP)
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Data$PROPDMGEXP <- ifelse(Data$PROPDMGEXP =="k", 10^3, ifelse(Data$PROPDMGEXP =="K", 10^3,
ifelse(Data$PROPDMGEXP=="h", 10^2, ifelse(Data$PROPDMGEXP=="H", 10^2,
ifelse(Data$PROPDMGEXP=="m", 10^6, ifelse(Data$PROPDMGEXP=="M", 10^6,
ifelse(Data$PROPDMGEXP=="B", 10^9,NA)))))))
Data$CROPDMGEXP <- ifelse(Data$CROPDMGEXP =="k", 10^3, ifelse(Data$CROPDMGEXP =="K", 10^3,
ifelse(Data$CROPDMGEXP=="h", 10^2, ifelse(Data$CROPDMGEXP=="H", 10^2,
ifelse(Data$CROPDMGEXP=="m", 10^6, ifelse(Data$CROPDMGEXP=="M", 10^6,
ifelse(Data$CROPDMGEXP=="B", 10^9,NA)))))))
To get the estimated cost of the damage we multiply estimated property damage by property damage expense, crop damage by crop damage expense. Then the estimated cost of damage is sorted out in descending order from the most economical damage to the least and select only the first top 5.
Data$PropDamage <- Data$PROPDMG * Data$PROPDMGEXP
Data$CropDamage <- Data$CROPDMG * Data$CROPDMGEXP
Data4 <- aggregate(cbind(PropDamage, CropDamage)~EVTYPE, data=Data, mean, na.rm=TRUE)
head(Data4)
## EVTYPE PropDamage CropDamage
## 1 ASTRONOMICAL HIGH TIDE 5000.00 0.00
## 2 ASTRONOMICAL LOW TIDE 1839.08 0.00
## 3 AVALANCHE 15492.21 0.00
## 4 BLIZZARD 54524.11 64328.36
## 5 COASTAL FLOOD 321035.56 0.00
## 6 COASTAL FLOODING 12650000.00 28000.00
Data4a <- Data4[order(-Data4$PropDamage),]#sort by descending estimates of property and crop damage
Data4b <- Data4[order(-Data4$CropDamage),]#sort by descending estimates of property and crop damage
Data4c <- Data4a[1:5,] #select top 5 property damage
Data4d <- Data4b[1:5,] #select top 5 crop damage
The top 5 types of events that are most harmful with respect to population health
colnames(Data3c) <- c('EVTYPE', 'INJURIES', 'FATALITIES')
Injuries <- ggplot(Data3c, aes(x="", y=INJURIES, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+ ggtitle('Top 5 Storm Events by Injuries')
pieInjuries <- Injuries + coord_polar("y", start = 0)
colnames(Data3d) <- c('EVTYPE', 'INJURIES', 'FATALITIES')
Fatalities <- ggplot(Data3d, aes(x="", y=FATALITIES, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+ ggtitle('Top 5 Storm Events by Fatalities')
pieFatalities <- Fatalities + coord_polar("y", start = 0)
grid.arrange(pieInjuries, pieFatalities, nrow=2, heights=c(2.5, 2.5),
top=" Storm events with most severe consequences to public health from 1950 to 2011")
The top 5 types of events that have the greatest economic consequences
colnames(Data4c) <- c('EVTYPE', 'PropDamage', 'CropDamage')
Properties <- ggplot(Data4c, aes(x="", y=PropDamage, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+ ggtitle('Top 5 Storm Events by cost of property damages')
pieProperties <- Properties + coord_polar("y", start = 0)
colnames(Data4d) <- c('EVTYPE', 'PropDamage', 'CropDamage')
Crop <- ggplot(Data4d, aes(x="", y=CropDamage, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+
ggtitle('Top 5 Storm Events by cost of crop damages')
pieCrop <- Crop + coord_polar("y", start = 0)
grid.arrange(pieProperties, pieCrop, nrow=2, heights=c(2.5, 2.5),
top=" Storm events with greatest economic consequences from 1950 to 2011")
Based on the analysis, tornado is indicated to be the most severe event that cause most of the harmful consequences to population health as well as property damage. River flood is indicated to cause severe crop damage.