Synopsis

An analysis of the NOAA Database is conducted to establish causes of harm to human health and the economy. The events that cause the most harm to human health are identified as the events that cause the five greatest number of both injuries and fatalities. The event that cause the greatest damage to the economy are identified by ranking the events that cause greater than $10 billion in damages to crops and property. This document contains all code required to read in the data, process and analysis the data and produce the Tables and Figures.

Data Processing

The Storm Data FAQ page is referenced to understand the variables to guide both the post processing and the analysis.

The data are downloaded. The file name is renamed to facilitate reading.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")

The file is read into stormdata.

stormdata <- read.csv(file.path("StormData.csv.bz2"))

The names and structure of the variables are below.

names(stormdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

It looks like many of the variables will not be used. EVTYPE, FATALITIES, INJURIES can be used as they are. Unfortunately, damages are divided into four variables. PROPDMG, CROPDMG, PROPDMGEXP and CROPDMGEXP. The last two of these variable ending in “EXP” are factors indicating One, Tens, hundreds, Thousands, Millions or Billions or something else as modifier to damages. It will be necessary to reduce the number of levels in the variable ending in “EXP” to a unique list of interpretable values.

unique(stormdata$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
unique(stormdata$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
rlist <- list(ONE=c(""," "), H=c("H","h"), K=c("k","K"), M=c("m","M"), B=c("b","B"))  #Other values will be changed to NA
levels(stormdata$PROPDMGEXP) <- rlist
levels(stormdata$CROPDMGEXP) <- rlist

After reducing the levels, we can create two new variables TOTALCROP and TOTAL PROP for the total crop damage and total property damage. The rows of PROPDMG and CROPDMG corresponding are multiplied by the values in column 2 of conv when the corresponding value in column 1 is equal to the values in PROPDMGEXP and CROPDMGEXP. The totals are set to zero is there is an NA in either variable ending in “EXP”

conv <- data.frame(c("ONE","H","K","M","B"),
                   c(1,100,1000,1000000,1000000000))
stormdata$TOTALCROP <- 0
stormdata$TOTALPROP <- 0
for (n in 1:nrow(conv)) {
     locs<-with(stormdata,which(CROPDMGEXP==conv[n,1]))
     if (length(locs)!=0) {
          stormdata$TOTALCROP[locs] <- conv[n,2]*stormdata$CROPDMG[locs]
     }
     locs<-with(stormdata,which(PROPDMGEXP==conv[n,1]))
     if (length(locs)!=0) {
          stormdata$TOTALPROP[locs] <- conv[n,2]*stormdata$PROPDMG[locs]
     }
}

Next, we check to see how many row have NA’s. Since there are only a few, we will drop them.

mean(stormdata$TOTALCROP==NA  | stormdata$TOTALPROP==NA)
## [1] NA
stormdata$TOTALCROP[which(is.na(stormdata$CROPDMGEXP))] <- 0
stormdata$TOTALPROP[which(is.na(stormdata$PROPDMGEXP))] <- 0

Results

The results will address the following questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequence?

Events with the most harm to human health

Question 1 will be answered first by aggregating the total number of injuries and fatalities by event type.

fatal <- aggregate(FATALITIES~EVTYPE,stormdata,sum)
injury <- aggregate(INJURIES~EVTYPE,stormdata,sum)

Order fatal by the total fatalities and retain the five most fatal event types.

fatal <-fatal[order(-fatal[,2],fatal[,1]),]
fatal <- fatal[1:5,]
print(fatal)              
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816

Table 1: Table showing how many fatalities for each Event type(EVTYPE)
Order injury by the total injuries and retain the five most injurious event types.

injury <- injury[order(-injury[,2],injury[,1]),]
injury <- injury[1:5,]
print(injury)              
##             EVTYPE INJURIES
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230

Table 2: Table showiing how many injuries for each Event type(EVTYPE)

Now we can see which events cause the most harm to human health. Tornados cause the most harm to health in term of injuries and fatalities. The next most harmful cause depends on which criteria is selected.

Which events have the greatest economic harm

To determine which events have the greatest economic harm we will consider only the total damages. We will aggregate total damages by event type then keep only the totals that exceed $10 Billion.

library(ggplot2)
dmgtot <- aggregate(TOTALPROP+TOTALCROP~EVTYPE,stormdata,
                    sum,na.action = na.omit)
dmgtot <- dmgtot[which(dmgtot[,2]>1E10),]
names(dmgtot) <- c("Event_Type","Total_Damage_USD")
ggplot(dmgtot,aes(x=reorder(Event_Type,-Total_Damage_USD),
                      y=Total_Damage_USD))+
     geom_bar(stat="identity", fill="steelblue")+
     theme(axis.text.x = element_text(angle = 90, hjust = 1))+
     xlab(NULL)

detach(package:ggplot2)

Figure 1: Bar plot showing Total Damages in US Dolors(USD) of weather events with more than 10 Billion in damages

Figure 1 shows that floods cause the greatest damage which totals approximately $150 billion. Tornadoes which cause the greatest harm to human health are the third most expensive.

Additional Information

Below is the session info.

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8.1 x64 (build 9600)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.12     digest_0.6.12    rprojroot_1.2    plyr_1.8.4      
##  [5] grid_3.4.1       gtable_0.2.0     backports_1.1.0  magrittr_1.5    
##  [9] evaluate_0.10.1  scales_0.5.0     ggplot2_2.2.1    rlang_0.1.2     
## [13] stringi_1.1.5    lazyeval_0.2.0   rmarkdown_1.6    labeling_0.3    
## [17] tools_3.4.1      stringr_1.2.0    munsell_0.4.3    yaml_2.1.14     
## [21] compiler_3.4.1   colorspace_1.3-2 htmltools_0.3.6  knitr_1.17      
## [25] tibble_1.3.4