Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

Based on this primary analysis we decided to look at the top five storm events to understand how they impact life and property. In the case of human fatalities and injuries, tornados have the most significant impact. Whereas, flooding has the most effect on property and crop damage. Although this is an introductory analysis, it is interesting enough to look deeper into how each event individually affects fatalities v. injuries or property damage v. crop damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Download and Read Dataset

library(dplyr)
library(tidyr)
library(data.table)
library(R.utils)
library(R.cache)
library(stringr)
library(ggplot2)
setwd("C:/Users/Wayne Office Laptop/Documents/GitHub/U.S.-National-Oceanic-and-Atmospheric-Administration-s--NOAA--storm-database")
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if(!file.exists("./FStormData.csv.bz2")){
        file.create("./repdata_data_StormData.csv.bz2")
        download.file(fileURL,destfile = "./FStormData.csv.bz2")
}

pathdata <- "./"
list.files("./")
## [1] "FStormData.csv"                                                                    
## [2] "FStormData.csv.bz2"                                                                
## [3] "rsconnect"                                                                         
## [4] "storm.R"                                                                           
## [5] "U.S.-National-Oceanic-and-Atmospheric-Administration-s--NOAA--storm-database.Rproj"
## [6] "US Storm.Rmd"                                                                      
## [7] "US_Storm.html"                                                                     
## [8] "US_Storm.Rmd"
#decompress files
if(!file.exists("./FStormData.csv.bz2")){
        bunzip2("FStormData.csv.bz2",remove=F)
}
# read data
#### fill blanks with NA
FStormData <- fread("FStormData.csv", strip.white = T,na.strings=c("NA","N/A","") )
## 
Read 11.4% of 967216 rows
Read 28.9% of 967216 rows
Read 44.5% of 967216 rows
Read 55.8% of 967216 rows
Read 71.3% of 967216 rows
Read 79.6% of 967216 rows
Read 88.9% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:09

View Dataset Statistics

###look at data
dim(FStormData)
## [1] 902297     37
str(FStormData)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : chr  "3" "2" "2" "2" ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

DATA PROCESSING

1. Make Column Names Consistent.

Some column names had underscore, while others didn’t. There were 2 columns named “State”. Changed one to “STATENUM”.

#### remove underscores from col names and  resolve duplicate STATE column
newColNames <- colnames(FStormData)
newColNames <- gsub("\\_","",newColNames)
colnames(FStormData) <- newColNames
colnames(FStormData)[1] <- "STATENUM"
colnames(FStormData)[33] <- "LONGITUDE1"
colnames(FStormData)[35] <- "LONGITUDE2"

2. Create Data Subset for Investigation.

Created a subset with only the variables needed for analysis.

### Create a data subset with relevant variables for analysis 
FStormDataTidy<- FStormData %>%
        select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

3. Replace NAs with 0 in numeric columns need for calculations

The dataset has columns property damage “PROPDMG”" and crop damage “CROPDMG”" and respective columns “PROPDMGEXP”, and “CROPDMGEXP” which are mulipliers for property and crop damage. Replacing NAs with zero to prevent future calculation errors.

### replace NA with 0 in PROP and CROP multiplier columns
FStormDataTidy$PROPDMG[is.na(FStormDataTidy$PROPDMG)] <- 0
FStormDataTidy$PROPDMGEXP[is.na(FStormDataTidy$PROPDMGEXP)] <- 0
FStormDataTidy$CROPDMG[is.na(FStormDataTidy$CROPDMG)] <- 0
FStormDataTidy$CROPDMGEXP[is.na(FStormDataTidy$CROPDMGEXP)] <- 0

4. Trim whitespace from text columns.

Removing whitespace from damge colums, and making all lowercase to make string processing a bit easier.

## PROP and CROP Damage EXP to lowercase and trim all white space
### from EVTYPE

FStormDataTidy$EVTYPE <- str_trim(FStormDataTidy$EVTYPE, side = "both")
FStormDataTidy$PROPDMGEXP <- tolower(FStormDataTidy$PROPDMGEXP)
FStormDataTidy$CROPDMGEXP <- tolower(FStormDataTidy$CROPDMGEXP)

5. Make damage multiplier columns as character vectors for processsing, and create new columns.

Now that we have the multiplier columns “PROPDMGEXP” and CROPDMGEXP" ready for string processing, we now need to convert relevent letter characters that indicate a multiplier like “K” to “1000” and put those in new columns “PROPX” and “CROPX”.

### Make multiplier columns as character vectors
FStormDataTidy$PROPDMGEXP <- as.character(FStormDataTidy$PROPDMGEXP)
FStormDataTidy$CROPDMGEXP <- as.character(FStormDataTidy$CROPDMGEXP)
### creating new columns PROPX and CROPX
FStormDataTidy$PROPX <- as.character(FStormDataTidy$PROPX)
FStormDataTidy$CROPX <- as.character(FStormDataTidy$CROPX)

6. Convert multiplier codes to a number characters in created PROPX and CROPX columns.

Here we make the conversions from letter characters to number characters.

### Create PROPX multiplier from PROPDMGEXP
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "-", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "?", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "+", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "0", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "1", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "2", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "3", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "4", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "5", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "6", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "7", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "8", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "h", ][, "PROPX"] <- "100"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "k", ][, "PROPX"] <- "1000"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "m", ][, "PROPX"] <- "1000000"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "b", ][, "PROPX"] <- "1000000000"

### Create CROPX multiplier from CROPDMGEXP
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "?", ][, "CROPX"] <- "0"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "0", ][, "CROPX"] <- "0"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "2", ][, "CROPX"] <- "0"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "k", ][, "CROPX"] <- "1000"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "m", ][, "CROPX"] <- "1000000"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "b", ][, "CROPX"] <- "1000000000"

7. Convert PROPX and CROPX columns to numeric.

Now we have multiplier number characters, we’ll just convert to numeric variables for further calculations.

# Make new multiplier column a numeric
FStormDataTidy$PROPX <- as.numeric(FStormDataTidy$PROPX)
FStormDataTidy$CROPX <- as.numeric(FStormDataTidy$CROPX)

8. Make storm event names more consistent.

In the original dataset many of the storm event names were inconsistent. For example there were Tornados, and Tornado F1, events. The result was too many different categories for each storm event. Here we want to condense the many storm categories into common storm classes.

# Torando events
FStormDataTidy$EVTYPE <- gsub("TORNADOES, TSTM WIND, HAIL","TORNADO",FStormDataTidy$EVTYPE) 
FStormDataTidy$EVTYPE <- gsub("TORNADO F0","TORNADO",FStormDataTidy$EVTYPE) 
FStormDataTidy$EVTYPE <- gsub("TORNADO F1","TORNADO",FStormDataTidy$EVTYPE) 
FStormDataTidy$EVTYPE <- gsub("TORNADO F2","TORNADO",FStormDataTidy$EVTYPE) 
FStormDataTidy$EVTYPE <- gsub("TORNADO F3","TORNADO",FStormDataTidy$EVTYPE) 
FStormDataTidy$EVTYPE <- gsub("TORNADO F4","TORNADO",FStormDataTidy$EVTYPE) 
FStormDataTidy$EVTYPE <- gsub("TORNADO F5","TORNADO",FStormDataTidy$EVTYPE) 
# Hurricane Events
FStormDataTidy$EVTYPE <- gsub("HURRICANE EMILY","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE ERIN","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE FELIX","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE GORDON","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE OPAL","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE OPAL/HIGH WINDS","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE/TYPHOON","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE/HIGH WINDS","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE-GENERATED SWELLS","HURRICANE",FStormDataTidy$EVTYPE)
#Flood Events
FStormDataTidy$EVTYPE <- gsub("FLOODS", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOODING", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/RIVER FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/RAIN/WINDS", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/FLASH FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/FLASH", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD & HEAVY RAIN", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLASH FLOODING", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLASH FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLASH FLOOD/FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("RIVER FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
# Wild Fire Events
FStormDataTidy$EVTYPE <- gsub("WILDFIRES", "WILDFIRE", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FOREST FIRES", "WILDFIRE", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("WILD/FOREST FIRE", "WILDFIRE", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("WILD FIRES","WILDFIRE", FStormDataTidy$EVTYPE)
# Hail Events
FStormDataTidy$EVTYPE <- gsub("HAILSTORM","HAIL", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub(".*HAIL .*", "HAIL", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("SMALL HAIL", "HAIL", FStormDataTidy$EVTYPE)
# Storm Surge Events
FStormDataTidy$EVTYPE <- gsub(".*STORM SURGE/.*", "STORM SURGE", FStormDataTidy$EVTYPE)

RESULTS

1. Across the United States, which types of events are most harmful to population health?

A. Make calculations.

Here we calculate total storm damage for property and life. Total property damage = total property + total crop damage.

### Calculate total property and crop damage
FStormDataTidy$TOTALPROPDAMAGE <- FStormDataTidy$PROPDMG * FStormDataTidy$PROPX
FStormDataTidy$TOTALCROPDAMAGE <- FStormDataTidy$CROPDMG * FStormDataTidy$CROPX
FStormDataTidy$TOTALDAMAGE <- FStormDataTidy$TOTALPROPDAMAGE + FStormDataTidy$TOTALCROPDAMAGE
head(FStormDataTidy)
##     EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPX
## 1: TORNADO          0       15    25.0          k       0          0  1000
## 2: TORNADO          0        0     2.5          k       0          0  1000
## 3: TORNADO          0        2    25.0          k       0          0  1000
## 4: TORNADO          0        2     2.5          k       0          0  1000
## 5: TORNADO          0        2     2.5          k       0          0  1000
## 6: TORNADO          0        6     2.5          k       0          0  1000
##    CROPX TOTALPROPDAMAGE TOTALCROPDAMAGE TOTALDAMAGE
## 1:     0           25000               0       25000
## 2:     0            2500               0        2500
## 3:     0           25000               0       25000
## 4:     0            2500               0        2500
## 5:     0            2500               0        2500
## 6:     0            2500               0        2500

Total human damage = total fatalities + total injuries. Here we calculate top 5 most damaging human events.

### Calculate which types of events are most harmful to population health?
FStormDataTidy$TOTALHUMAN <- FStormDataTidy$FATALITIES + FStormDataTidy$INJURIES

mostHumanEvent <- FStormDataTidy %>%
        select(EVTYPE,FATALITIES,INJURIES,TOTALHUMAN) %>%
        group_by(EVTYPE)%>%
        summarise(TOTALHUMANEVENT = sum(TOTALHUMAN))%>%
        arrange(desc(TOTALHUMANEVENT))

mostHumanEvent <- mostHumanEvent[1:5,]
mostHumanEvent
## # A tibble: 5 x 2
##           EVTYPE TOTALHUMANEVENT
##            <chr>           <dbl>
## 1        TORNADO           97022
## 2          FLOOD           10110
## 3 EXCESSIVE HEAT            8428
## 4      TSTM WIND            7461
## 5      LIGHTNING            6046

B. Plot graph

### Create Plot
g <- ggplot(mostHumanEvent, aes(x= reorder(EVTYPE, -TOTALHUMANEVENT),y=TOTALHUMANEVENT))

g + geom_bar(stat = "identity", fill = "blue") + theme_minimal() +
        labs(title = "Top 5 Most Life Damaging Natural Events") +
        labs(x = "Life Damaging Event", y = "Total Casualties (Death + Injury)")
\label{fig:fig1}Figure 1: Total Top 5 Human damage per event

Figure 1: Total Top 5 Human damage per event

2. Across the United States, which types of events have the greatest economic consequences?

A. Make calculations

Calculating top 5 costly property damage events.

### which types of events have the greatest economic consequences?
mostDamgeEvent <- FStormDataTidy %>%
        select(EVTYPE,TOTALPROPDAMAGE,TOTALCROPDAMAGE,TOTALDAMAGE) %>%
        group_by(EVTYPE)%>%
        summarise(TOTALDAMEVENT = sum(TOTALDAMAGE/10^9))%>%
        arrange(desc(TOTALDAMEVENT))
mostDamgeEvent <- mostDamgeEvent[1:5,]
mostDamgeEvent
## # A tibble: 5 x 2
##        EVTYPE TOTALDAMEVENT
##         <chr>         <dbl>
## 1       FLOOD     179.16349
## 2   HURRICANE      90.27147
## 3     TORNADO      58.95939
## 4 STORM SURGE      47.96558
## 5        HAIL      19.02088

B. Create graph

### plot Most damaging economic events
b <- ggplot(mostDamgeEvent, aes(x= reorder(EVTYPE, -TOTALDAMEVENT),y=TOTALDAMEVENT))

b + geom_bar(stat = "identity", fill = "red") + theme_minimal() +
        labs(title = "Top 5 Most Damaging Natural Events") +
        labs(x = "Damaging Event", y = "Property Damage (Billions)") +
        ylim(c(0,200))
\label{fig:fig2}Figure 2: Total Top 5 property damage per event

Figure 2: Total Top 5 property damage per event

Conclusion

The most damaging event to human life are tornados by a very significant margin; followed by flooding and excessive heat. One reason for this is that there is a minimal warning to prepare for tornado evacuations, and it is hard to predict where a tornado will land, thus leaving people more vulnerable to tornado storms than other types of weather events.

Regarding property damage, flooding causes the most damage followed by hurricanes and tornados. Water damage is the result of numerous weather events; hurricanes, storms, snow melting. Also, flooding is damaging to crops as well as property.

More analysis and data processing can be performed to better understand particular damage ratios to property v. crops. and to parse injuries v. fatalities for each storm event.