Synopsis

In this analysis I aim to look at the NOAA storm database to address the following questions:

-Across the United States, which types of events are most harmful with respect to population health?

-Across the United States, which types of events have the greatest economic consequences?

For the first part I intend to find which event has is the most harmful with respect to population health by identifying which events cause the most fatalaties, and which cause the most injuries. For the second part, by looking at the total property damage and total crop damage the total damage can be estimated, and the event which causes the most total damage can be discered.

Data Analysis

First the data is loaded into the Rstudio workspace using the read.csv() function.

r setwd("~/Documents/Coursera Data Science/Reproducible Research/RepData_PeerAssessment2/")

stormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Next the total number of fatalities is computed for each weather event using the tapply() function. It is then ordered via decresing amount, so events with the most fatalaies are at the beginning.

fatEV <- data.frame(tapply(as.numeric(stormData$FATALITIES), stormData$EVTYPE, sum))
colnames(fatEV) <- c( "Total Fatalities")
fatEV <- fatEV[order(fatEV$`Total Fatalities`, decreasing = TRUE),]

The same is done for the total number of injuries.

injEV <- data.frame(tapply(as.numeric(stormData$INJURIES), stormData$EVTYPE, sum))
colnames(injEV) <- c( "Total Injuries")
injEV <- injEV[order(injEV$`Total Injuries`, decreasing = TRUE),]

To look at the total damage, the sum is computed across all weather events. First the exponent has to be configured so that “K”, “M”, and “B” are converted to 10^3, 10^6, and 10^9 respectively. This is none by creating a new factor variable. Next, in a similar method to the the total fatalaties and injuries computed previously, the total property and crop damage is computed and sorted in decreasing order.

## Total Property Damage 
newT <- factor(stormData$PROPDMGEXP
               , labels =c(1,1,1,1,1,1,1,1,1,1,1,1,1,1000000000,100,100,1000,1000000,1000000))
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
propDmg <- data.frame(tapply(as.numeric(stormData$PROPDMG)*as.numeric(as.character(newT)),
                             stormData$EVTYPE, sum))
colnames(propDmg) <- "Total Property Damage"
propDmgR <- propDmg[order(propDmg, decreasing = TRUE),]

## Total Crop Damage
newCD <- factor(stormData$CROPDMGEXP, labels = c(1,1,1,1,1000000000,1000,1000,1000000,1000000))
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
cropDmg <- data.frame(tapply(as.numeric(stormData$CROPDMG)*as.numeric(as.character(newCD)),
                             stormData$EVTYPE, sum))
colnames(cropDmg) <- "Total Crop Damage"
cropDmgR <- cropDmg[order(cropDmg, decreasing = TRUE),]

Following the total damage is computed from simply the sum of the property and crop damages, and the ordered.

totDmg <- cbind(propDmg, cropDmg, propDmg + cropDmg)
colnames(totDmg) <- c("Total Property Damage",  "Total Crop Damage", "Total Damage")
totDmg <- totDmg[order(totDmg$`Total Damage`, decreasing = TRUE),]

Results

The total fatalities and injuries are plotted using a barplot, because it is in decreasing order those with the most fatalities and injuries are displayed to the left. Only the highest 15 weather events with are shown.

par(mfcol  = c(1,2))
barplot(fatEV[1:15], las = 2, ylim = c(0,6000), ylab = "Count", main = "Total Fatalities")
barplot(injEV[1:15], las = 2, ylim = c(0,100000), ylab = "Count", main = "Total Injuries")

It is clear from both the total fatalities and total injuries that tornados are the most harmful with respect to population health as they are both more than twice the next highest weather event.

The total property and crop damages are plotted as barplots. SInce they are sorted in decreasing order the highest are on the left.Only the highest 7 weather events are shown for each.

par(mfcol  = c(1,3))
barplot(propDmgR[1:7], las = 2, ylim = c(0, 1.5e11), ylab = "$",
        main = "Total Property Damage")
barplot(cropDmgR[1:7], las = 2, ylim = c(0, 1.5e10), ylab = "$",
        main = "Total Crop Damage")
barplot(totDmg[1:7,3], las = 2, ylim = c(0,1.5e11), ylab = "$",
        main = "Total Damage")

From the bar plots it is the floods that cause the most economic impact with over twice the amount of the next highest. From the box plots we can also see that property damage accounts for much more damage than crop damage.

Conclusion

Across the United States tornadoes are most harmful with respect to population health and floods have the greatest economic consequences according to the data from the NOAA storm data.