II.A. Download Data
The file was retrieved and downloaded to local computer from the following website:
Using a Zip file utility package, the .bz2 file was unzipped and placed in the working directory (“C:/Users/tljon/datasciencecoursera”) with shortened file name – “StormData.csv”
##set working directory
setwd("C:/Users/tljon/datasciencecoursera")
##load libraries that may be needed to develop/support analysis and file prep
library(ggplot2)
library(plyr)
##load data into R
storm <- read.csv("StormData.csv", header=TRUE)
head(storm)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Review and determine which variables will need to be extracted to support data analysis
II.B. Extract data for health and economic impact analysis
et <- storm[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
##Due to the size of this list, it will not be displayed but will be used to produce the top 5 causes for fatalities and injuries respectively
II.B.1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
##Order by category in decreasing value the total number of fatalities (top 5)
fatal <- ddply(et, .(EVTYPE), summarize, FATALITIES=sum(FATALITIES))
fatal <- fatal[order(fatal$FATALITIES, decreasing=TRUE), ]
head(fatal,5)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
#Order by category in decreasing value the total number injured (top 5)
injured <- ddply(et, .(EVTYPE), summarize, INJURED=sum(INJURIES))
injured <- injured[order(injured$INJURED, decreasing=TRUE), ]
head(injured,5)
## EVTYPE INJURED
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
We will now take a look at a panel graph for the top 5 causes of fatalities and injuries
##Plot the Fatalities graph
plot1 <- ggplot(data=head(fatal,5), aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES)) +
geom_bar(fill="red",stat="identity") + coord_flip() +
ylab("Total Weather Fatalities") + xlab("Event Type") +
ggtitle("US Weather Fatalities - Top 5") +
theme(legend.position="none")
plot1

##Plot the Injuries graph
plot2 <- ggplot(data=head(injured,5), aes(x=reorder(EVTYPE, INJURED), y=INJURED)) +
geom_bar(fill="blue",stat="identity") + coord_flip() +
ylab("Total Weather Injuries") + xlab("Event Type") +
ggtitle("US Weather Injuries - Top 5") +
theme(legend.position="none")
plot2

II.B.2. Across the United States, which types of events have the greatest economic consequences?
We will now take a look at property and crop damage expenses.
##Analyze the categorization values for Property Damage Expense
unique(et$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
##Analyze the categorization values for Crop Damage Expense
unique(et$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M