Summary

Weather event data in the United States from 1950 through 20011 are examined to determine impact of human populations. Analysis of the data addresses two questions: 1) Which type of events are most harmful to population health? 2) Which type of events have the greatest consequences on economics?

Original data source: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Trivial processing steps not shown are data file downloading and placement in the proper working directory to faciliate loading the data.

options(warn=-1)

library(knitr)
library(ggplot2)
library(reshape2)

Data Processing

Each question requires a different subset of data for analysis.

Step 1. Read in the file.

storms<- read.csv(bzfile("repdata-data-StormData.csv.bz2"), header=T, sep=",")
head(storms,3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3

Step 2. Prepare the data to answer Q1. Before we process the data, an examination reveals severl types of weather events we want to aggregate. For example, there appear to be a number of ‘wind’ related EVTYPES. There are also many events related to cold wather and winter like conditions. Consolidating weather event types to provide a broader overview for presentation purposes is warranted.

# Aggretage EVTYPEs using logical vectors (grep).

stormTypes<- storms$EVTYPE

wind<- grepl('wind', stormTypes, ignore.case=TRUE)
heat<- grepl('heat|hot|warm', stormTypes, ignore.case=TRUE)
cold<- grepl('blizzard|freez|ice|snow|winter|cold', stormTypes, ignore.case=TRUE)
flood<- grepl('flood|rain', stormTypes, ignore.case=TRUE)
hurricane<- grepl('hurricane|typhoon', stormTypes, ignore.case=TRUE)
waters<- grepl('water|surf|current', stormTypes, ignore.case=TRUE)

stormTypes[wind] <- "WIND"
stormTypes[heat] <- "HEAT"
stormTypes[cold] <- "COLD"
stormTypes[flood] <- "FLOOD"
stormTypes[hurricane] <- "HURRICANE"
stormTypes[waters] <- "WATERS"

storms$STYPES <- stormTypes
sub1 <- data.frame( storms$STYPES, storms$FATALITIES, storms$INJURIES)

# Now aggregate health columns based on weather event types

stormInjuries<- aggregate( storms$INJURIES ~ storms$STYPES, sub1, sum)
stormFatalities <- aggregate( storms$FATALITIES ~ storms$STYPES, sub1, sum)
stormHarm <- merge(stormInjuries, stormFatalities, by = intersect(names(stormInjuries), names(stormFatalities)))
colnames(stormHarm) <- c("TYPE", "INJURIES", "FATALITIES")

# And prepare the data for presentation in the Results section

c1<- order(stormHarm$INJURIES, decreasing=TRUE)
c2<- order(stormHarm$FATALITIES, decreasing=TRUE)
c3 <- order(stormHarm$INJURIES + stormHarm$FATALTIES, decreasing=TRUE)
stormI<- head(stormHarm[c1,], 10) 
stormF <- head(stormHarm[c2,], 10)
stormA <- head(stormHarm[c3,], 10)

zLabels<- c(as.character(stormI$TYPE), as.character(stormF$TYPE))

Step 3. Now prepare the data to answer Q2.

#We already have our weather event types -- just need economic consequence aggregation

sub2 <- data.frame( storms$STYPES, storms$PROPDMG, storms$CROPDMG)
stormProperty<- aggregate( storms$PROPDMG ~ storms$STYPES, sub2, sum)
stormCrops <- aggregate( storms$CROPDMG ~ storms$STYPES, sub2, sum)
stormEharm <- merge(stormProperty, stormCrops, by = intersect(names(stormProperty), names(stormCrops)))
colnames(stormEharm) <- c("TYPE", "Property", "Crops")

#And we prepare again for presenation in the Results section

cE1<- order(stormEharm$Property, decreasing=TRUE)
cE2<- order(stormEharm$Crops, decreasing=TRUE)
cE3 <- order(stormEharm$Property + stormEharm$Crop, decreasing=TRUE)
stormP<- head(stormEharm[cE1,], 10) 
stormC <- head(stormEharm[cE2,], 10)
stormE <- head(stormEharm[cE3,], 10)

zELabels<- c(as.character(stormP$TYPE), as.character(stormC$TYPE))

Results

Q1: Which type of events are most harmful to population health?

par(mar=c(10,5,5,5))
color1=as.character(rep("grey", 10))
color2=as.character(rep("red",10))
colors<- c(color1, color2)

mids <- barplot(as.matrix(c(stormI[,2], stormF[,3])), beside=T,  names.arg = NULL, axisnames = FALSE, las=1, log="y", main="WEATHER RELATED HEALTH\nPopulation Health Most HarmfulWeather Events in U.S.\n(log scale)", col=colors, space=0.4)
par(las=2)
axis(1, at=mids, labels=zLabels, cex.axis=0.75)
par(las=1)
legend("topright", c("Left (grey): INJURIES", "Right (red): FATALITIES"), bg="yellow", cex=0.8)

vr<-sum(storms$INJURIES) / sum(storms$FATALITIES)

print(paste("Aggregate Number of Injuries is", round(vr,2), " times greater than aggregate number of Fatalities."))
## [1] "Aggregate Number of Injuries is 9.28  times greater than aggregate number of Fatalities."

A: Using a logarithmic scale to depict the data, above (which allows for an easier determination of ranking of event types), Tornado, other Wind, Heat, Flood, and Cold related events are cause of the most injuries. But (in order) Tornatos, Heat, Flood, Wind and Cold related weather events cause most fatalities.

Q2: Which type of events are most have the greatest economic consequences?

par(mar=c(10,5,5,5))
color1=as.character(rep("green", 10))
color2=as.character(rep("darkgreen",10))
colors<- c(color1, color2)

mids <- barplot(as.matrix(c(stormP[,2], stormC[,3])), beside=T,  names.arg = NULL, axisnames = FALSE, las=1, log="y", main="WEATHER RELATED ECONOMICS\nGreatest Impact\n(log scale)", col=colors, space=0.4)
par(las=2)
axis(1, at=mids, labels=zELabels, cex.axis=0.75)
par(las=1)
legend("topright", c("Left (green): PROPERTY", "Right (dark green): CROPS"), bg="pink", cex=0.8)

vr<-sum(storms$PROPDMG) / sum(storms$CROPDMG)

print(paste("Aggregate Property Damage is", round(vr,2), " times greater than aggregate Crop damage."))
## [1] "Aggregate Property Damage is 7.9  times greater than aggregate Crop damage."

A: Again a logarithmic scale is used to depict the data, above. A log scale allows for an easier comparison and ranking of event types. Tornado, other Wind, Flood, Hail, and Lightening related events are the cause of most Proerty damaage. But (in order) Hail, Flood, Wind, Tornado, and Drought cause the most Crop damage.

Conclusions

Public officials are better prepared to plan for effective programs to reduce weather related harm on population health and amelioarte economic consequences by determing how best to prioritize preparations for such events. An analysis of greatest damage as a function of weather event type can help public officials prioritize resources accordingly.