Synopsis

The aim of this analysis is to use a dataset produced by the U.S. organisation National Oceanic and Atmospheric Administration (NOAA), to identify the severe weather events with the greatest impact on human health and economies in the U.S..

The key data in the dataset are the event types, starting date, numbers of fatalities, numbers of injuries, value of damage to crops and damage to property.

The raw data was read from a downloaded CSV file, replacing blank entries with the NA value.

The columns of interest (the key data identified above), were then extracted into a new data frame, which was was then cleaned up to remove incomplete data, reformat the character date field into a Posix compliant date field, calculate costs to the economy (crops and property combined) and health (fatalities and injuries).

The processed data were then used to calculate a data frame with one row per event type with the total economic cost, total fatalities and total injuries due that type of event of the period of the dataset.

The caculated data frame was then used to identify the most significant events in terms of economic cost, number of fatalities and number of injuries.

The resuts of the comparison were discussed, as well as potential problems with the comparison and alternative ways of calculating and comparing the impacts of the different weather event types.

Data Processing

The raw data for the analysis are available in zipped format from this location and were downloaded to a computer on 18/09/2014 at 20:35 UK time.

The code was then read into R using the following code

rawWeatherData<-read.csv("C:\\Users\\Roy Standard\\Documents\\Reproducible\\repdata-data-StormData.csv.bz2",na.strings="")

The columns of interest relating to the event type, start date of event, property damage, crop damage, fatalities, injuries, were extracted into a new data frame, and then further processed to select only those rows with values for all variables:

extWeatherData<-rawWeatherData[,c(2,8,23,24,25,26,27,28)]
extWeatherData2<-extWeatherData[complete.cases(extWeatherData),]

The character date field was then reformatted into a Posix compliant date field:

extWeatherData2$BGN_DATE<-as.Date(as.character(extWeatherData2$BGN_DATE),"%y/%d/%Y %H:%M:%S")

The exponents for property damage and crop damage fields were converted to character prior to being used to generate new columns in the processed data frame to represent the actual number that the property damage and crop damage values need to be multiplied by to give the actual value of the damage. The documentation for the data only mentions alphabetic codes for the multiplying value, and where numeric codes exist I have assumed that the actual multiplier is ten to the power of that numeric code.

extWeatherData2$PROPDMGEXP<-as.character(extWeatherData2$PROPDMGEXP)
extWeatherData2$pexp<-sapply(extWeatherData2$PROPDMGEXP,switch,
"B" = 1000000000,
"M" = 1000000,
"m" = 1000000,
"K" = 1000,
"k" = 1000,
"5" = 100000,
"3" = 1000,
"0" = 1
)
extWeatherData2$CROPDMGEXP<-as.character(extWeatherData2$CROPDMGEXP)
extWeatherData2$cexp<-sapply(extWeatherData2$CROPDMGEXP,switch,
"B" = 1000000000,
"M" = 1000000,
"m" = 1000000,
"K" = 1000,
"k" = 1000,
"?" = 1,
"0" = 1
)

Now we calculate the total economic cost of property damage and crop damage combined and store it as a new column in the data frame:

extWeatherData2$economic_cost<-extWeatherData2$CROPDMG*extWeatherData2$cexp + extWeatherData2$PROPDMG*extWeatherData2$pexp

Now we aggregate the data to create a new data frame with one row per event type containing the total econonomic cost due to the event, total fatalities due to the event and total injuries due to the event:

event_cost <-aggregate(extWeatherData2[,c(-1,-2,-5,-6,-7,-8,-9,-10)], by=list(extWeatherData2$EVTYPE), FUN=sum)
names(event_cost)[1]<-"Event"

Results

The events with the biggest impact on economic cost, injuries, fatalities can be identified as follows:

x<- event_cost[which.max(event_cost$FATALITIES),]
x
##      Event FATALITIES INJURIES economic_cost
## 99 TORNADO       1064    11960     1.657e+10
y<- event_cost[which.max(event_cost$INJURIES),]
y
##      Event FATALITIES INJURIES economic_cost
## 99 TORNADO       1064    11960     1.657e+10
z<- event_cost[which.max(event_cost$economic_cost),]
z
##    Event FATALITIES INJURIES economic_cost
## 23 FLOOD        261     6495      1.38e+11

So we may come to the conclusion that flood seems to result in the greatest economic impact, whereas tornados cause the greatest loss of life and injury. However when we plot the cost of flooding over the period covered by the dataset, we see that it looks like a single extraordinary event has resulted in most of the impact due to flooding, so flooding can cause a great impact but these type of floods may be very rare:

## Installing package into 'C:/Users/Roy Standard/Documents/R/win-library/3.1'
## (as 'lib' is unspecified)
## package 'lubridate' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Roy Standard\AppData\Local\Temp\RtmpGMUpHd\downloaded_packages

plot of chunk plots

Similar plots for the injury and fatality arising from tornado can be made: plot of chunk plots2

plot of chunk plots3

These indicate that the annual injury and death arising from tornado have been fairly steady over the last 20 years, and thus pose a constant threat which might justify immediate investment to try to aleviate these impacts.