Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Based on this primary analysis we decided to look at the top five storm events to understand how they impact life and property. In the case of human fatalities and injuries, tornados have the most significant impact. Whereas, flooding has the most effect on property and crop damage. Although this is an introductory analysis, it is interesting enough to look deeper into how each event individually affects fatalities v. injuries or property damage v. crop damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(dplyr)
library(tidyr)
library(data.table)
library(R.utils)
library(R.cache)
library(stringr)
library(ggplot2)
setwd("C:/Users/Wayne Office Laptop/Documents/GitHub/U.S.-National-Oceanic-and-Atmospheric-Administration-s--NOAA--storm-database")
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("./FStormData.csv.bz2")){
file.create("./repdata_data_StormData.csv.bz2")
download.file(fileURL,destfile = "./FStormData.csv.bz2")
}
pathdata <- "./"
list.files("./")
## [1] "FStormData.csv"
## [2] "FStormData.csv.bz2"
## [3] "rsconnect"
## [4] "storm.R"
## [5] "U.S.-National-Oceanic-and-Atmospheric-Administration-s--NOAA--storm-database.Rproj"
## [6] "US Storm.Rmd"
## [7] "US_Storm.html"
## [8] "US_Storm.Rmd"
#decompress files
if(!file.exists("./FStormData.csv.bz2")){
bunzip2("FStormData.csv.bz2",remove=F)
}
# read data
#### fill blanks with NA
FStormData <- fread("FStormData.csv", strip.white = T,na.strings=c("NA","N/A","") )
##
Read 11.4% of 967216 rows
Read 28.9% of 967216 rows
Read 44.5% of 967216 rows
Read 55.8% of 967216 rows
Read 71.3% of 967216 rows
Read 79.6% of 967216 rows
Read 88.9% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:09
###look at data
dim(FStormData)
## [1] 902297 37
str(FStormData)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr NA NA NA NA ...
## $ BGN_LOCATI: chr NA NA NA NA ...
## $ END_DATE : chr NA NA NA NA ...
## $ END_TIME : chr NA NA NA NA ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr NA NA NA NA ...
## $ END_LOCATI: chr NA NA NA NA ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : chr "3" "2" "2" "2" ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr NA NA NA NA ...
## $ WFO : chr NA NA NA NA ...
## $ STATEOFFIC: chr NA NA NA NA ...
## $ ZONENAMES : chr NA NA NA NA ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr NA NA NA NA ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
Some column names had underscore, while others didn’t. There were 2 columns named “State”. Changed one to “STATENUM”.
#### remove underscores from col names and resolve duplicate STATE column
newColNames <- colnames(FStormData)
newColNames <- gsub("\\_","",newColNames)
colnames(FStormData) <- newColNames
colnames(FStormData)[1] <- "STATENUM"
colnames(FStormData)[33] <- "LONGITUDE1"
colnames(FStormData)[35] <- "LONGITUDE2"
Created a subset with only the variables needed for analysis.
### Create a data subset with relevant variables for analysis
FStormDataTidy<- FStormData %>%
select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
The dataset has columns property damage “PROPDMG”" and crop damage “CROPDMG”" and respective columns “PROPDMGEXP”, and “CROPDMGEXP” which are mulipliers for property and crop damage. Replacing NAs with zero to prevent future calculation errors.
### replace NA with 0 in PROP and CROP multiplier columns
FStormDataTidy$PROPDMG[is.na(FStormDataTidy$PROPDMG)] <- 0
FStormDataTidy$PROPDMGEXP[is.na(FStormDataTidy$PROPDMGEXP)] <- 0
FStormDataTidy$CROPDMG[is.na(FStormDataTidy$CROPDMG)] <- 0
FStormDataTidy$CROPDMGEXP[is.na(FStormDataTidy$CROPDMGEXP)] <- 0
Removing whitespace from damge colums, and making all lowercase to make string processing a bit easier.
## PROP and CROP Damage EXP to lowercase and trim all white space
### from EVTYPE
FStormDataTidy$EVTYPE <- str_trim(FStormDataTidy$EVTYPE, side = "both")
FStormDataTidy$PROPDMGEXP <- tolower(FStormDataTidy$PROPDMGEXP)
FStormDataTidy$CROPDMGEXP <- tolower(FStormDataTidy$CROPDMGEXP)
Now that we have the multiplier columns “PROPDMGEXP” and CROPDMGEXP" ready for string processing, we now need to convert relevent letter characters that indicate a multiplier like “K” to “1000” and put those in new columns “PROPX” and “CROPX”.
### Make multiplier columns as character vectors
FStormDataTidy$PROPDMGEXP <- as.character(FStormDataTidy$PROPDMGEXP)
FStormDataTidy$CROPDMGEXP <- as.character(FStormDataTidy$CROPDMGEXP)
### creating new columns PROPX and CROPX
FStormDataTidy$PROPX <- as.character(FStormDataTidy$PROPX)
FStormDataTidy$CROPX <- as.character(FStormDataTidy$CROPX)
Here we make the conversions from letter characters to number characters.
### Create PROPX multiplier from PROPDMGEXP
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "-", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "?", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "+", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "0", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "1", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "2", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "3", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "4", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "5", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "6", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "7", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "8", ][, "PROPX"] <- "0"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "h", ][, "PROPX"] <- "100"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "k", ][, "PROPX"] <- "1000"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "m", ][, "PROPX"] <- "1000000"
FStormDataTidy[FStormDataTidy$PROPDMGEXP == "b", ][, "PROPX"] <- "1000000000"
### Create CROPX multiplier from CROPDMGEXP
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "?", ][, "CROPX"] <- "0"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "0", ][, "CROPX"] <- "0"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "2", ][, "CROPX"] <- "0"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "k", ][, "CROPX"] <- "1000"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "m", ][, "CROPX"] <- "1000000"
FStormDataTidy[FStormDataTidy$CROPDMGEXP == "b", ][, "CROPX"] <- "1000000000"
Now we have multiplier number characters, we’ll just convert to numeric variables for further calculations.
# Make new multiplier column a numeric
FStormDataTidy$PROPX <- as.numeric(FStormDataTidy$PROPX)
FStormDataTidy$CROPX <- as.numeric(FStormDataTidy$CROPX)
In the original dataset many of the storm event names were inconsistent. For example there were Tornados, and Tornado F1, events. The result was too many different categories for each storm event. Here we want to condense the many storm categories into common storm classes.
# Torando events
FStormDataTidy$EVTYPE <- gsub("TORNADOES, TSTM WIND, HAIL","TORNADO",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("TORNADO F0","TORNADO",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("TORNADO F1","TORNADO",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("TORNADO F2","TORNADO",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("TORNADO F3","TORNADO",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("TORNADO F4","TORNADO",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("TORNADO F5","TORNADO",FStormDataTidy$EVTYPE)
# Hurricane Events
FStormDataTidy$EVTYPE <- gsub("HURRICANE EMILY","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE ERIN","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE FELIX","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE GORDON","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE OPAL","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE OPAL/HIGH WINDS","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE/TYPHOON","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE/HIGH WINDS","HURRICANE",FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("HURRICANE-GENERATED SWELLS","HURRICANE",FStormDataTidy$EVTYPE)
#Flood Events
FStormDataTidy$EVTYPE <- gsub("FLOODS", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOODING", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/RIVER FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/RAIN/WINDS", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/FLASH FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/FLASH", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD & HEAVY RAIN", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLASH FLOODING", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLASH FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLASH FLOOD/FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("RIVER FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FLOOD/FLOOD", "FLOOD", FStormDataTidy$EVTYPE)
# Wild Fire Events
FStormDataTidy$EVTYPE <- gsub("WILDFIRES", "WILDFIRE", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("FOREST FIRES", "WILDFIRE", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("WILD/FOREST FIRE", "WILDFIRE", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("WILD FIRES","WILDFIRE", FStormDataTidy$EVTYPE)
# Hail Events
FStormDataTidy$EVTYPE <- gsub("HAILSTORM","HAIL", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub(".*HAIL .*", "HAIL", FStormDataTidy$EVTYPE)
FStormDataTidy$EVTYPE <- gsub("SMALL HAIL", "HAIL", FStormDataTidy$EVTYPE)
# Storm Surge Events
FStormDataTidy$EVTYPE <- gsub(".*STORM SURGE/.*", "STORM SURGE", FStormDataTidy$EVTYPE)
Here we calculate total storm damage for property and life. Total property damage = total property + total crop damage.
### Calculate total property and crop damage
FStormDataTidy$TOTALPROPDAMAGE <- FStormDataTidy$PROPDMG * FStormDataTidy$PROPX
FStormDataTidy$TOTALCROPDAMAGE <- FStormDataTidy$CROPDMG * FStormDataTidy$CROPX
FStormDataTidy$TOTALDAMAGE <- FStormDataTidy$TOTALPROPDAMAGE + FStormDataTidy$TOTALCROPDAMAGE
head(FStormDataTidy)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPX
## 1: TORNADO 0 15 25.0 k 0 0 1000
## 2: TORNADO 0 0 2.5 k 0 0 1000
## 3: TORNADO 0 2 25.0 k 0 0 1000
## 4: TORNADO 0 2 2.5 k 0 0 1000
## 5: TORNADO 0 2 2.5 k 0 0 1000
## 6: TORNADO 0 6 2.5 k 0 0 1000
## CROPX TOTALPROPDAMAGE TOTALCROPDAMAGE TOTALDAMAGE
## 1: 0 25000 0 25000
## 2: 0 2500 0 2500
## 3: 0 25000 0 25000
## 4: 0 2500 0 2500
## 5: 0 2500 0 2500
## 6: 0 2500 0 2500
Total human damage = total fatalities + total injuries. Here we calculate top 5 most damaging human events.
### Calculate which types of events are most harmful to population health?
FStormDataTidy$TOTALHUMAN <- FStormDataTidy$FATALITIES + FStormDataTidy$INJURIES
mostHumanEvent <- FStormDataTidy %>%
select(EVTYPE,FATALITIES,INJURIES,TOTALHUMAN) %>%
group_by(EVTYPE)%>%
summarise(TOTALHUMANEVENT = sum(TOTALHUMAN))%>%
arrange(desc(TOTALHUMANEVENT))
mostHumanEvent <- mostHumanEvent[1:5,]
mostHumanEvent
## # A tibble: 5 x 2
## EVTYPE TOTALHUMANEVENT
## <chr> <dbl>
## 1 TORNADO 97022
## 2 FLOOD 10110
## 3 EXCESSIVE HEAT 8428
## 4 TSTM WIND 7461
## 5 LIGHTNING 6046
### Create Plot
g <- ggplot(mostHumanEvent, aes(x= reorder(EVTYPE, -TOTALHUMANEVENT),y=TOTALHUMANEVENT))
g + geom_bar(stat = "identity", fill = "blue") + theme_minimal() +
labs(title = "Top 5 Most Life Damaging Natural Events") +
labs(x = "Life Damaging Event", y = "Total Casualties (Death + Injury)")
Figure 1: Total Top 5 Human damage per event
Calculating top 5 costly property damage events.
### which types of events have the greatest economic consequences?
mostDamgeEvent <- FStormDataTidy %>%
select(EVTYPE,TOTALPROPDAMAGE,TOTALCROPDAMAGE,TOTALDAMAGE) %>%
group_by(EVTYPE)%>%
summarise(TOTALDAMEVENT = sum(TOTALDAMAGE/10^9))%>%
arrange(desc(TOTALDAMEVENT))
mostDamgeEvent <- mostDamgeEvent[1:5,]
mostDamgeEvent
## # A tibble: 5 x 2
## EVTYPE TOTALDAMEVENT
## <chr> <dbl>
## 1 FLOOD 179.16349
## 2 HURRICANE 90.27147
## 3 TORNADO 58.95939
## 4 STORM SURGE 47.96558
## 5 HAIL 19.02088
### plot Most damaging economic events
b <- ggplot(mostDamgeEvent, aes(x= reorder(EVTYPE, -TOTALDAMEVENT),y=TOTALDAMEVENT))
b + geom_bar(stat = "identity", fill = "red") + theme_minimal() +
labs(title = "Top 5 Most Damaging Natural Events") +
labs(x = "Damaging Event", y = "Property Damage (Billions)") +
ylim(c(0,200))
Figure 2: Total Top 5 property damage per event
The most damaging event to human life are tornados by a very significant margin; followed by flooding and excessive heat. One reason for this is that there is a minimal warning to prepare for tornado evacuations, and it is hard to predict where a tornado will land, thus leaving people more vulnerable to tornado storms than other types of weather events.
Regarding property damage, flooding causes the most damage followed by hurricanes and tornados. Water damage is the result of numerous weather events; hurricanes, storms, snow melting. Also, flooding is damaging to crops as well as property.
More analysis and data processing can be performed to better understand particular damage ratios to property v. crops. and to parse injuries v. fatalities for each storm event.