SYNOPSIS

Severe weather events has been a constant cause of deterioration in public health of communities. It has also caused economic impediment across the United States. In this project, I explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, crop damage, and property damage. The events in the database start in the year 1950 and end in November 2011. I used the recorded observations to show the effects of severe weather on public health as well as its effect on the economy.

DATA PROCESSING

Downloading Data

First the data set is downloaded and unzipped to the local directory.

if(!file.exists("./Download")){dir.create("./Download")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./Download/repdata_data_StormData.csv.bz2")

Loading the Data into R

Next, the data set is loaded into R using an R function. First we load in the R libraries we shall use to perform analyses. The data set is stored in a vector variable. Then some operations are performed on the data using R functions, to acquire all information about the data set.

library(dplyr)
library(ggplot2)
library(stats)
library(knitr)

Stormdata<-read.csv(bzfile("./Download/repdata_data_StormData.csv.bz2"),header=TRUE)
dim(Stormdata)
## [1] 902297     37
str(Stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Cleaning Data for Analysis

In order to perform the analysis required efficiently, only the rows and columns of the data set needed, for my analysis is extracted and stored in a new vector variable. Then the new data frame is viewed using the str function.

variables<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
newStormD<-Stormdata[variables]
str(newStormD)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

Next, The data set is cleaned up by the different weather events in the data, by grouping events that are similar. The grouping is done by the specific word that describes the event. Any event that doesn’t fall under any of the specified events is grouped as ‘OTHER’ events. These new group of events is stored in a new variable column ‘EVENT’.

newStormD$EVENT<-"OTHER"
newStormD$EVENT[grep("HAIL",newStormD$EVTYPE, ignore.case = TRUE)]<-"HAIL"
newStormD$EVENT[grep("WIND|COLD",newStormD$EVTYPE, ignore.case = TRUE)]<-"WIND"
newStormD$EVENT[grep("TYPHOON|HURRICANE",newStormD$EVTYPE, ignore.case = TRUE)]<-"TYPHOON/HURRICANE"
newStormD$EVENT[grep("FLOOD|FLD|URBAN",newStormD$EVTYPE, ignore.case = TRUE)]<-"FLOOD"
newStormD$EVENT[grep("TORNADO",newStormD$EVTYPE, ignore.case = TRUE)]<-"TORNADO"
newStormD$EVENT[grep("STORM|TIDE",newStormD$EVTYPE, ignore.case = TRUE)]<-"STORM"
newStormD$EVENT[grep("FIRE|WILDFIRE",newStormD$EVTYPE, ignore.case = TRUE)]<-"FIRE"
newStormD$EVENT[grep("RAIN",newStormD$EVTYPE, ignore.case = TRUE)]<-"RAIN"
newStormD$EVENT[grep("DROUGHT",newStormD$EVTYPE, ignore.case = T)]<-"DROUGHT"
newStormD$EVENT[grep("SNOW",newStormD$EVTYPE, ignore.case = TRUE)]<-"SNOW"
newStormD$EVENT[grep("LIGHTNING|LIGHTING",newStormD$EVTYPE, ignore.case = TRUE)]<-"LIGHTNING"
newStormD$EVENT[grep("BLIZZARD",newStormD$EVTYPE, ignore.case = TRUE)]<-"BLIZZARD"
newStormD$EVENT[grep("TSUNAMI",newStormD$EVTYPE, ignore.case = TRUE)]<-"TSUNAMI"
newStormD$EVENT[grep("WINTER",newStormD$EVTYPE, ignore.case = TRUE)]<-"WINTER"
newStormD$EVENT[grep("ICE",newStormD$EVTYPE, ignore.case = TRUE)]<-"ICE"
newStormD$EVENT[grep("AVALANCHE",newStormD$EVTYPE, ignore.case = TRUE)]<-"AVALANCHE"
newStormD$EVENT[grep("SMOKE",newStormD$EVTYPE, ignore.case = TRUE)]<-"SMOKE"
newStormD$EVENT[grep("HEAT|HIGH TEMPERATURE",newStormD$EVTYPE, ignore.case = TRUE)]<-"HEAT"
newStormD$EVENT[grep("CLOUD",newStormD$EVTYPE, ignore.case = TRUE)]<-"CLOUD"
newStormD$EVENT[grep("VOLCANIC ASH",newStormD$EVTYPE, ignore.case = TRUE)]<-"VOLCANIC ASH"
newStormD$EVENT[grep("DUST",newStormD$EVTYPE, ignore.case = TRUE)]<-"DUST"
newStormD$EVENT[grep("FOG",newStormD$EVTYPE, ignore.case = TRUE)]<-"FOG"
newStormD$EVENT[grep("SURF",newStormD$EVTYPE, ignore.case = TRUE)]<-"SURF"
newStormD$EVENT[grep("DEPRESSION",newStormD$EVTYPE, ignore.case = TRUE)]<-"DEPRESSION"
newStormD$EVENT[grep("DEBRIS",newStormD$EVTYPE, ignore.case = TRUE)]<-"DEBRIS"
newStormD$EVENT[grep("FROST|FREEZE|FREEZING",newStormD$EVTYPE, ignore.case = TRUE)]<-"FROST/FREEZE"
newStormD$EVENT[grep("WATERSPOUT|WATER SPOUT",newStormD$EVTYPE, ignore.case = TRUE)]<-"WATERSPOUT"
newStormD$EVENT[grep("CURRENT",newStormD$EVTYPE, ignore.case = TRUE)]<-"RIP CURRENT"
newStormD$EVENT[grep("ASTRONOMICAL LOW TIDE",newStormD$EVTYPE, ignore.case = TRUE)]<-"ASTRONOMICAL LOW TIDE"

The new arranged set of events is viewed based on the first ten (10) with the most occurrence.

sort(table(newStormD$EVENT),decreasing = T)[1:10]
## 
##      HAIL      WIND     STORM     FLOOD   TORNADO    WINTER      SNOW LIGHTNING 
##    289269    256228    110814     86090     60685     19604     17580     15779 
##      RAIN     CLOUD 
##     11892      6945

Now, the unspecified elements in the Property Damage expense column is cleaned to a multiple of 10 to the power (10^) corresponding to the unspecified elements. The cleaned column is stored in a new variable column and the new column is viewed.

newStormD$PROPDMGEXP2<-1
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "1")]<-10
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "H"|newStormD$PROPDMGEXP == "2"|newStormD$PROPDMGEXP == "h")]<-100
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "K"|newStormD$PROPDMGEXP == "3")]<-1000
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "4")]<-10000
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "5")]<-100000
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "M"| newStormD$PROPDMGEXP == "m"|newStormD$PROPDMGEXP == "6")]<-1000000
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "7")]<-10000000
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "8")]<-100000000
newStormD$PROPDMGEXP2[which(newStormD$PROPDMGEXP == "B")]<-1000000000
sort(table(newStormD$PROPDMGEXP2),decreasing = T)[1:10]
## 
##      1   1000  1e+06  1e+09  1e+05     10    100  1e+07  10000  1e+08 
## 466164 424669  11341     40     28     25     20      5      4      1

Next, we clean up unspecified elements for crop damage by correcting the values of expense. The corrected expense is placed in another variable column and the result is viewed.

newStormD$CROPDMGEXP2<-1
newStormD$CROPDMGEXP2[which(newStormD$PROPDMGEXP == "2")]<-100
newStormD$CROPDMGEXP2[which(newStormD$CROPDMGEXP == "K"|newStormD$PROPDMGEXP == "k")]<-1000
newStormD$CROPDMGEXP2[which(newStormD$CROPDMGEXP == "M"| newStormD$PROPDMGEXP == "m")]<-1000000
newStormD$CROPDMGEXP2[which(newStormD$CROPDMGEXP == "B")]<-1000000000
sort(table(newStormD$CROPDMGEXP2),decreasing = T)[1:5]
## 
##      1   1000  1e+06    100  1e+09 
## 618442 281832   2001     13      9

The actual total expense for each event was then calculated and stored in a new variable column. The expense incurred are of two (2) types; Property Damage and Crop Damage, and each was stored in a separate variable.

newStormD$propertyDamage<-newStormD$PROPDMG * newStormD$PROPDMGEXP2

newStormD$cropDamage<-newStormD$CROPDMG * newStormD$CROPDMGEXP2

RESULTS

TYPES OF EVENTS THAT ARE MOST HARMFUL TO POPULATION HEALTH

First, the effect of severe weather events on population health is visualized using the ggplot function in R by viewing events based on fatality rate in the United States.

newStormD %>% select(FATALITIES,EVENT) %>% group_by(EVENT) %>%
        summarise(sum1=sum(FATALITIES)) %>% top_n(n=10,wt=sum1) %>%
        ggplot(aes(y=sum1,x=reorder(x=EVENT,X=sum1),fill=EVENT)) + 
        geom_bar(stat = "identity",show.legend = F) + 
        labs(title="Plot to show the effect of severe weather events on human fatality rate") +
        labs(x="", y="Fatality Rate caused by severe weather events on the communities across the United States") + coord_flip()

Also, the effect of severe weather events on public health is not limited to the fatality rate, Injuries incurred also contribute to the effect of these events, this is also shown in a plot.

newStormD %>% select(INJURIES,EVENT) %>% group_by(EVENT) %>%
        summarise(sum2=sum(INJURIES)) %>% top_n(n=10,wt=sum2) %>%
        ggplot(aes(y=sum2,x=reorder(x=EVENT,X=sum2),fill=EVENT)) +
        geom_bar(stat="identity",show.legend = F) + 
        labs(title = "Plot to show the effect of severe weather events on human Injuries Incurred") +
        labs(x="",y="Injury Rates Inflicted on the communities by severe weather events across the United States") + coord_flip()

TYPES OF EVENTS WITH THE GREATEST ECONOMIC CONSEQUENCE

For the economic effect, a plot of the property damage incurred as a result of the different severe weather events is visualized.

newStormD %>% select(propertyDamage,EVENT) %>% group_by(EVENT) %>%
        summarise(sum3=sum(propertyDamage)) %>% top_n(n=10,wt=sum3) %>%
        ggplot(aes(y=sum3,x=reorder(x=EVENT,X=sum3), fill=EVENT)) +
        geom_bar(stat = "identity",show.legend = F) + 
        labs(title="Plot to show the economic consequence caused by severe weather events through property damage") +
        labs(x="",y="Property Damage Expense incurred due to Severe Weather Events in the United States") + coord_flip()

Also, an array of the range crop damage of severe weather events across the United States is analyzed.

cropEffect<-sort(tapply(newStormD$cropDamage,newStormD$EVENT,sum), decreasing = TRUE)
cropEffect[1:10]
##           DROUGHT             FLOOD TYPHOON/HURRICANE               ICE 
##       13972566000       12270384210        5516117800        5027114300 
##              HAIL              WIND      FROST/FREEZE             STORM 
##        3046420890        2802829655        1997061000        1348752392 
##              RAIN              HEAT 
##         917815800         904469280

CONCLUSION

From the analysis performed, we can deduce that:
1. Tornado has the most severe effect on public health in the United States, causing the highest number of fatality rate and Injuries incurred.
2. Flood causes the most property damage across all events, there by causing the most economic consequence on property damage, while Drought causes the most crop damage across the United States, this also contribute to the effect on the economy.