Exploration of NOAA Database

SYNOPSIS The basic goal of this project was to explore the NOAA Storm Database and answer some basic questions about severe weather events.

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The results of the analysis show that the five events with the greatest FATALATIES are: 1. TORNADO 2. EXCESSIVE HEAT 3. FLASH FLOOD 4. HEAT 5. LIGHTNING

The results of the analysis show that the five events with the greatest INJURIES are: 1. TORNADO 2. TSTM WIND 3. FLOOD 4. EXCESSIVE HEAT 5. LIGHTNING

  1. Across the United States, which types of events have the greatest economic consequences?

The results of the analysis show that the five events with the greatest economic impact are: 1. FLOOD HURRICANE 2. TYPHOON 3. TORNADO 4. STORM SURGE 5. HAIL

DATA PROCESSING

This step sets the working directory to where the data file is stored and loads the data set in. It then subsets the data to only include the columns that are necessary to answer our two questions. Important Pop Health variables are FATALITIES, INJURIES. Important economic variables are PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. It changes the letters in PROPDMGEXP and CROPDMDGEXP (B -billions, M- millions, K Thousands) into numeric values and sets the other values which are presumably errors to zero.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.1
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setwd("~/Documents/Mandeep's Documents/WORK Related/Courses/Reproducible Research/Project 2")
stormdata<-read.csv("repdata-data-StormData.csv")
sd_subset<-select(stormdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
levels(sd_subset$PROPDMGEXP)<-c(0,0,0,0,0,0,0,0,0,0,0,0,0,1e+09, 0,0,1000,0,1e+06)
levels(sd_subset$CROPDMGEXP)<-c(0,0,0,0,1e+09,0,1000,0,1e+06)
sd_subset$PROPDMGEXP<-as.numeric(as.character(sd_subset$PROPDMGEXP))
sd_subset$CROPDMGEXP<-as.numeric(as.character(sd_subset$CROPDMGEXP))

DATA ANALAYSIS

Harm to poplulation health

In this step the total number of fatalaties/injuries are summed for each event type. These are then sorted.

tot_fat_event<-sort(tapply(sd_subset$FATALITIES, sd_subset$EVTYPE, sum), decreasing=TRUE)

tot_inj_event<-sort(tapply(sd_subset$INJURIES, sd_subset$EVTYPE, sum), decreasing=TRUE)

Greatest economic consequence

Here we multiply the numbers in PROPDMG and CROPDMG by their magnitude(thousands, billions, millions) and then add them to create a new column EXPENSE

sd_subset$EXPENSE<-sd_subset$PROPDMG*sd_subset$PROPDMGEXP+sd_subset$CROPDMG*sd_subset$CROPDMGEXP

tot_expense_event<-sort(tapply(sd_subset$EXPENSE, sd_subset$EVTYPE, sum), decreasing=TRUE)

RESULTS

Here we list the 5 event types with the greatest total fatalaties.

head(tot_fat_event, 5)
##        TORNADO EXCESSIVE HEAT    FLASH FLOOD           HEAT      LIGHTNING 
##           5633           1903            978            937            816

Here we list the 5 event types with the greatest total injuries.

head(tot_inj_event, 5)
##        TORNADO      TSTM WIND          FLOOD EXCESSIVE HEAT      LIGHTNING 
##          91346           6957           6789           6525           5230

Here we list the 5 event types with the greatest total expense and show a barplot.

head(tot_expense_event, 5)
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
##         1.503e+11         7.191e+10         5.734e+10         4.332e+10 
##              HAIL 
##         1.875e+10
exp5<-head(tot_expense_event, 5)
barplot(exp5, xlab="Event Type", ylab="Expense (dollars)", main="Barplot of 5 most expensive event types")

plot of chunk unnamed-chunk-6