Storm and Weather events causes harmful for population health and damage property which impacts country’s economic conditions. U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is the collection of data from various parts of the country to study/analyze the cuases of more health and economic consequences to take preventive actions. Results of this analysis address the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.
Downloading data file
#setting local working directory
setwd("C:/Data/devtools/Git/RepData_PeerAssessment2")
library(knitr)
library(ggplot2)
#suppressMessages to suppress warning/ messages
suppressMessages(library(dplyr))
#setting working directory for knit
opts_knit$set(base.dir = "C:/Data/devtools/Git/RepData_PeerAssessment2")
stdata <- NULL
#Checking for file in current directory
if(!file.exists("PA2_StormData.bz2"))
{
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "PA2_StormData.bz2",mode = "wb")
}
Reading and checking data from file
#reading data from csv file
stdata <- read.csv(bzfile("PA2_StormData.bz2"))
#getting rows and columns count
colnms <- names(stdata)
#rows & columns
rws <- nrow(stdata); cls <- ncol(stdata)
Data from file:
. Numer of rows 902297
. Number of columns 37
Filtering required columns from dataframe for analysis. Ploting histogram to understand the data available for each year in U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (from 1970-2011).
knitr::opts_chunk$set(fig.width=40, fig.height=20, fig.path='figs/', warning=FALSE, message=FALSE)
#getting required data for analysis
prcdata <- stdata
names(prcdata) <- toupper(names(prcdata))
#getting required columns
prcdata <- prcdata[,c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
#formatting date
prcdata <- mutate(prcdata, BGN_DATE = as.Date(as.character(prcdata$BGN_DATE), "%m/%d/%Y"))
#Starting year
minDate <- min(prcdata$BGN_DATE)
maxDate <- max(prcdata$BGN_DATE)
#adding year column
prcdata$YEAR <- as.integer(format(prcdata$BGN_DATE, "%Y"))
opar=par(ps=26)
hist(prcdata$YEAR, breaks = 45, main="Number of events recorded per year", xlab="Year", ylab="Number of events", cex=1.0, cex.main=2.5)
Histogram results supporting the statement The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Considerable measurements being collected from 1970 to 2011 by NOAA for major storms and weather events.
Event type (EVTYPE) values should be edited/updated with proper charecter sequences and trailing spaces to get proper counts and labels from the data.
#filtering data from 1970 to 2011
strmdata <- filter(prcdata, YEAR >= 1970)
#converting to lower case
evntlbls <- toupper(strmdata$EVTYPE)
## Replace all punct. characters with a space
evntlbls <- gsub("(^[[:space:]]+|[[:space:]]+$)", "", evntlbls)
evntlbls <- gsub("[[:blank:][:punct:]+]", " ", evntlbls)
evntlbls <- gsub("^thunderstorm wind[:alnum:] | ^tstm wind[:alnum:]", "thunderstorm wind", evntlbls)
#updating data with updated labels
strmdata$EVTYPE <- evntlbls
#unique(strmdata$EVTYPE)
Subsetting wheather events which causes most harmful to population health and greatest economic consequences events from the data.
#Getting harmful events data from dataframe
hdata <- filter(strmdata,strmdata$FATALITIES > 0 | strmdata$INJURIES > 0)
#harmful data rows count
nrow(hdata)
## [1] 19585
#Fatalities events counts
fatcounts <- aggregate(FATALITIES ~ EVTYPE,data=hdata,FUN=sum)
#InjuryEvents by aggregation
injcounts <- aggregate(INJURIES ~ EVTYPE,data=hdata,FUN=sum)
#Top ten records for FATALITIES and INJURIES
fatTop10 <- head(fatcounts[order(fatcounts$FATALITIES, decreasing = T), ], 10)
injTop10 <- head(injcounts[order(injcounts$INJURIES, decreasing = T), ], 10)
# Updating column names
colnames(fatTop10) <- c("Event", "Fatalities")
colnames(injTop10) <- c("Event", "Injuries")
. Fatal Events
. Injury Events
fatTop10
## Event Fatalities
## 169 TORNADO 3272
## 27 EXCESSIVE HEAT 1903
## 36 FLASH FLOOD 978
## 61 HEAT 937
## 111 LIGHTNING 816
## 176 TSTM WIND 504
## 41 FLOOD 470
## 135 RIP CURRENT 368
## 82 HIGH WIND 248
## 2 AVALANCHE 224
injTop10
## Event Injuries
## 169 TORNADO 59611
## 176 TSTM WIND 6957
## 41 FLOOD 6789
## 27 EXCESSIVE HEAT 6525
## 111 LIGHTNING 5230
## 61 HEAT 2100
## 105 ICE STORM 1975
## 36 FLASH FLOOD 1777
## 158 THUNDERSTORM WIND 1488
## 59 HAIL 1361
par(mfrow = c(1, 2), mar = c(14, 6, 4, 3), mgp = c(2, 1, 0), cex = 1.0, cex.lab=2, cex.main=2.5)
ylim <- c(0, 1.1*max(fatTop10$Fatalities))
fatalPlot <- barplot(fatTop10$Fatalities, names.arg = fatTop10$Event, main = 'Top 10 events for fatalities', ylab = 'Number of fatalities', ylim = ylim, cex.axis = 2)
text(x = fatalPlot, y = fatTop10$Fatalities, label = round(fatTop10$Fatalities, 0), pos = 3)
ylim <- c(0, 1.1*max(injTop10$Injuries))
injuryPlot <- barplot(injTop10$Injuries, names.arg = injTop10$Event, main = 'Top 10 events for injuries', ylab = 'Number of injuries', ylim = ylim, cex.axis = 2)
text(x = fatalPlot, y = injTop10$Injuries, label = round(injTop10$Injuries, 0), pos = 3)
#Economic consequence events data
edata <- filter(strmdata, strmdata$PROPDMG > 0 | strmdata$CROPDMG > 0)
#economic data rows count
nrow(edata)
## [1] 235473
#Function to convert damage amount unit:
# h -> hundred, k -> thousand, m -> million, b -> billion
convertCurrUnit <- function(e)
{
if (e %in% c('h', 'H')){
return(2)
} else if (e %in% c('k', 'K')) {
return(3)
} else if (e %in% c('m', 'M')) {
return(6)
} else if (e %in% c('b', 'B')) {
return(9)
} else if (!is.na(as.numeric(e))) {# if a digit
return(as.numeric(e))
} else if (e %in% c('', '-', '?', '+')) {
return(0)
} else {
stop("Not valid.")
}
}
#Getting property damage
edata$PROPDMG <- edata$PROPDMG * (10 ** sapply(edata$PROPDMGEXP, FUN=convertCurrUnit))
#Getting corp damage
edata$CROPDMG <- edata$CROPDMG * (10 ** sapply(edata$CROPDMGEXP, FUN=convertCurrUnit))
# Fatal events
prcounts <- aggregate(PROPDMG ~ EVTYPE,data=edata,FUN=sum)
crcounts <- aggregate(CROPDMG ~ EVTYPE,data=edata,FUN=sum)
# Events caused most economic expenses
prevntTop10 <- head(prcounts[order(prcounts$PROPDMG, decreasing = T), ], 10)
crevntTop10 <- head(crcounts[order(crcounts$CROPDMG, decreasing = T), ], 10)
# Updating column names
colnames(prevntTop10) <- c("Event", "propDMG")
colnames(crevntTop10) <- c("Event", "cropDMG")
. Property damage
. Corp damage
prevntTop10
## Event propDMG
## 49 FLASH FLOOD 6.820237e+13
## 290 THUNDERSTORM WINDS 2.086532e+13
## 314 TORNADO 1.073677e+12
## 94 HAIL 3.157558e+11
## 196 LIGHTNING 1.729433e+11
## 62 FLOOD 1.446577e+11
## 172 HURRICANE TYPHOON 6.930584e+10
## 69 FLOODING 5.920825e+10
## 263 STORM SURGE 4.332354e+10
## 126 HEAVY SNOW 1.793259e+10
crevntTop10
## Event cropDMG
## 31 DROUGHT 13972566000
## 62 FLOOD 5661968450
## 230 RIVER FLOOD 5029459000
## 181 ICE STORM 5022113500
## 94 HAIL 3025974480
## 164 HURRICANE 2741910000
## 172 HURRICANE TYPHOON 2607872800
## 49 FLASH FLOOD 1421317100
## 44 EXTREME COLD 1312973000
## 81 FROST FREEZE 1094186000
par(mfrow = c(1, 2), mar = c(12, 5, 3, 2), mgp = c(3, 1, 0), cex = 1.0, las = 3, cex.lab=2, cex.main=2.5)
prdmgplot <- barplot((prevntTop10$propDMG/1000000000), names.arg = prevntTop10$Event, main = 'Top 10 events for fatalities', ylab = 'Number of fatalities (Billions)', log="y")
crdmgplot <- barplot((crevntTop10$cropDMG/1000000000), names.arg = crevntTop10$Event, main = 'Top 10 events for injuries', ylab = 'Number of injuries (Billions)', log="y")
This report shows that Flash Flood, Thunderstorm Winds, Tornado, Hail, Lightning, and Flood weather events caused huge property damage (billions of dollars) across the United States.
Drought, Flood, River flood, Ice Storm, Hurricane, Hurricane, and Typhoon events effected population health across the United States. Building the necessary infrastructure to predict weather events early, keeping necessary equipment, medication, and publishing safety precautions could help reducing population health problems.
Execute below script in commandline (or R console) to generate plot images and place them in ‘./figure’ folder
knit2html(“PA2_template.Rmd”, “PA2_template.html”)