SYNOPSIS The basic goal of this project was to explore the NOAA Storm Database and answer some basic questions about severe weather events.
The results of the analysis show that the five events with the greatest FATALATIES are: 1. TORNADO 2. EXCESSIVE HEAT 3. FLASH FLOOD 4. HEAT 5. LIGHTNING
The results of the analysis show that the five events with the greatest INJURIES are: 1. TORNADO 2. TSTM WIND 3. FLOOD 4. EXCESSIVE HEAT 5. LIGHTNING
The results of the analysis show that the five events with the greatest economic impact are: 1. FLOOD HURRICANE 2. TYPHOON 3. TORNADO 4. STORM SURGE 5. HAIL
DATA PROCESSING
This step sets the working directory to where the data file is stored and loads the data set in. It then subsets the data to only include the columns that are necessary to answer our two questions. Important Pop Health variables are FATALITIES, INJURIES. Important economic variables are PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. It changes the letters in PROPDMGEXP and CROPDMDGEXP (B -billions, M- millions, K Thousands) into numeric values and sets the other values which are presumably errors to zero.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.1
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setwd("~/Documents/Mandeep's Documents/WORK Related/Courses/Reproducible Research/Project 2")
stormdata<-read.csv("repdata-data-StormData.csv")
sd_subset<-select(stormdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
levels(sd_subset$PROPDMGEXP)<-c(0,0,0,0,0,0,0,0,0,0,0,0,0,1e+09, 0,0,1000,0,1e+06)
levels(sd_subset$CROPDMGEXP)<-c(0,0,0,0,1e+09,0,1000,0,1e+06)
sd_subset$PROPDMGEXP<-as.numeric(as.character(sd_subset$PROPDMGEXP))
sd_subset$CROPDMGEXP<-as.numeric(as.character(sd_subset$CROPDMGEXP))
DATA ANALAYSIS
Harm to poplulation health
In this step the total number of fatalaties/injuries are summed for each event type. These are then sorted.
tot_fat_event<-sort(tapply(sd_subset$FATALITIES, sd_subset$EVTYPE, sum), decreasing=TRUE)
tot_inj_event<-sort(tapply(sd_subset$INJURIES, sd_subset$EVTYPE, sum), decreasing=TRUE)
Greatest economic consequence
Here we multiply the numbers in PROPDMG and CROPDMG by their magnitude(thousands, billions, millions) and then add them to create a new column EXPENSE
sd_subset$EXPENSE<-sd_subset$PROPDMG*sd_subset$PROPDMGEXP+sd_subset$CROPDMG*sd_subset$CROPDMGEXP
tot_expense_event<-sort(tapply(sd_subset$EXPENSE, sd_subset$EVTYPE, sum), decreasing=TRUE)
RESULTS
Here we list the 5 event types with the greatest total fatalaties.
head(tot_fat_event, 5)
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## 5633 1903 978 937 816
Here we list the 5 event types with the greatest total injuries.
head(tot_inj_event, 5)
## TORNADO TSTM WIND FLOOD EXCESSIVE HEAT LIGHTNING
## 91346 6957 6789 6525 5230
Here we list the 5 event types with the greatest total expense and show a barplot.
head(tot_expense_event, 5)
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## 1.503e+11 7.191e+10 5.734e+10 4.332e+10
## HAIL
## 1.875e+10
exp5<-head(tot_expense_event, 5)
barplot(exp5, xlab="Event Type", ylab="Expense (dollars)", main="Barplot of 5 most expensive event types")