Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
Click on links here to quickly view tasks completed in this assignment:
For this assignment you will need some specific tools
RStudio: You will need RStudio to publish your completed analysis document to RPubs. You can also use RStudio to edit/write your analysis.
knitr: You will need the knitr package in order to compile your R Markdown document and convert it to HTML
Before beginning the project, be sure to load the required R libraries and set any environmental variables. Note that setting messages in markdown to false suppresses messages from library loading such as version number and dependencies. Updating to latest versions of these libraries may improve ability to obtain results fairly similar to the steps outlined here.
# load libraries
library(data.table)
library(knitr)
library(R.utils)
library(magrittr)
library(ggplot2)
#### A. Loading data
# Load Data and read data
dat <- bzfile("repdata%2Fdata%2FStormData.csv.bz2")
dat2 <- read.csv(dat)
dim(dat2)
## [1] 902297 37
names (dat2)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
# Subsetting data columns for human population impact analysis.
total <- dat2[,c(8, 23, 24)]
total[,1] <- as.character(total[,1])
aggdata <- aggregate(total[,2:3], by=list(total$EVTYPE), FUN=sum, na.rm=T)
names(aggdata)[1] <- "EVTYPE"
mp <- aggdata[order(-aggdata[,2], -aggdata[,3]), ]
head(mp, n=10)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 153 FLASH FLOOD 978 1777
## 275 HEAT 937 2100
## 464 LIGHTNING 816 5230
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 585 RIP CURRENT 368 232
## 359 HIGH WIND 248 1137
## 19 AVALANCHE 224 170
names(mp)[1] <- "EVTYPE"
topf <- head(mp, n=20)
# First Analysis
ggplot(data=topf, aes(x=EVTYPE, y=log10(FATALITIES))) + geom_bar(stat = "identity", fill="#76eec6", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Top 20 Events causing fatalities in US") + labs(y=expression(log[10](FATALITIES)), x="Event Type")
# Second Analysis
names(mp)[1] <- "EVTYPE"
topf <- head(mp, n=20)
ggplot(data=topf, aes(x=EVTYPE, y=log10(INJURIES))) + geom_bar(stat = "identity", fill="#76eec6", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Top 20 Events causing injuries in US") + labs(y=expression(log[10](INJURIES)), x="Event Type")
So, as we can see, the number cause of fatalities and injuries is the TORNADO
#### Pareto Chart
# Load Data and read data
dat <- bzfile("repdata%2Fdata%2FStormData.csv.bz2")
dat2 <- read.csv(dat)
dim(dat2)
## [1] 902297 37
names (dat2)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
# Subsetting data columns for human population impact analysis.
total <- dat2[,c(8, 23, 24)]
total[,1] <- as.character(total[,1])
aggdata <- aggregate(total[,2:3], by=list(total$EVTYPE), FUN=sum, na.rm=T)
names(aggdata)[1] <- "EVTYPE"
mp <- aggdata[order(-aggdata[,2], -aggdata[,3]), ]
head(mp, n=10)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 153 FLASH FLOOD 978 1777
## 275 HEAT 937 2100
## 464 LIGHTNING 816 5230
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 585 RIP CURRENT 368 232
## 359 HIGH WIND 248 1137
## 19 AVALANCHE 224 170
So, as we can see, the number cause of fatalities and injuries is the TORNADO