Overview

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

Click on links here to quickly view tasks completed in this assignment:

  1. Data Processing
  2. Across the United States, which types of events are most harmful with respect to population health?
  3. Across the United States, which types of events have the greatest economic consequences?

Data

The data for this assignment comes from from the course web site:

Requirements

For this assignment you will need some specific tools


RStudio:
You will need RStudio to publish your completed analysis document to RPubs. You can also use RStudio to edit/write your analysis.


knitr:
You will need the knitr package in order to compile your R Markdown document and convert it to HTML

Before beginning the project, be sure to load the required R libraries and set any environmental variables. Note that setting messages in markdown to false suppresses messages from library loading such as version number and dependencies. Updating to latest versions of these libraries may improve ability to obtain results fairly similar to the steps outlined here.

# load libraries
library(data.table)
library(knitr)
library(R.utils)
library(magrittr)
library(ggplot2)

1. Data Processing

#### A.  Loading data

# Load Data and read data


dat <- bzfile("repdata%2Fdata%2FStormData.csv.bz2")
dat2 <- read.csv(dat)
dim(dat2)
## [1] 902297     37
names (dat2)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

2. Across the United States, which types of events are most harmful with respect to population health??

# Subsetting data columns for human population impact analysis.
total <- dat2[,c(8, 23, 24)]
total[,1] <- as.character(total[,1])
aggdata <- aggregate(total[,2:3], by=list(total$EVTYPE), FUN=sum, na.rm=T)
names(aggdata)[1] <- "EVTYPE"


mp <- aggdata[order(-aggdata[,2], -aggdata[,3]), ]
head(mp, n=10)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 130 EXCESSIVE HEAT       1903     6525
## 153    FLASH FLOOD        978     1777
## 275           HEAT        937     2100
## 464      LIGHTNING        816     5230
## 856      TSTM WIND        504     6957
## 170          FLOOD        470     6789
## 585    RIP CURRENT        368      232
## 359      HIGH WIND        248     1137
## 19       AVALANCHE        224      170
names(mp)[1] <- "EVTYPE"
topf <- head(mp, n=20)
# First Analysis
ggplot(data=topf, aes(x=EVTYPE, y=log10(FATALITIES))) + geom_bar(stat = "identity", fill="#76eec6", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Top 20 Events causing fatalities in US") + labs(y=expression(log[10](FATALITIES)), x="Event Type")

# Second Analysis
names(mp)[1] <- "EVTYPE"
topf <- head(mp, n=20)
ggplot(data=topf, aes(x=EVTYPE, y=log10(INJURIES))) + geom_bar(stat = "identity", fill="#76eec6", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle("Top 20 Events causing injuries in US") + labs(y=expression(log[10](INJURIES)), x="Event Type")

So, as we can see, the number cause of fatalities and injuries is the TORNADO

back to top


2. Across the United States, which types of events are most harmful with respect to population health?

#### Pareto Chart

# Load Data and read data


dat <- bzfile("repdata%2Fdata%2FStormData.csv.bz2")
dat2 <- read.csv(dat)
dim(dat2)
## [1] 902297     37
names (dat2)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
# Subsetting data columns for human population impact analysis.
total <- dat2[,c(8, 23, 24)]
total[,1] <- as.character(total[,1])
aggdata <- aggregate(total[,2:3], by=list(total$EVTYPE), FUN=sum, na.rm=T)
names(aggdata)[1] <- "EVTYPE"

mp <- aggdata[order(-aggdata[,2], -aggdata[,3]), ]
head(mp, n=10)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 130 EXCESSIVE HEAT       1903     6525
## 153    FLASH FLOOD        978     1777
## 275           HEAT        937     2100
## 464      LIGHTNING        816     5230
## 856      TSTM WIND        504     6957
## 170          FLOOD        470     6789
## 585    RIP CURRENT        368      232
## 359      HIGH WIND        248     1137
## 19       AVALANCHE        224      170

So, as we can see, the number cause of fatalities and injuries is the TORNADO

back to top