Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Results This analysis shows that the Tornadoes are the most harmful events for public health, and Flood has the greatest economic impacts in United States.
Install (if necessary) and load the following libraries for data processing.
suppressMessages(library(tidyverse))
suppressMessages(library(R.utils))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(lubridate))
suppressMessages(require(gridExtra))
suppressMessages(library(plyr))
The data were loaded into R using the data.table package. For the economic consequence portion of the analysis, property and crop damage amounts were converted to billion dollars. Crop and property damage costs were then aggregated as a single cost.
Download data to working directory and save as a CSV file:
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Storm_Data.csv")
Read the CSV file into data frame.
cache = TRUE
storms <- read.csv("Storm_Data.csv")
head(storms)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
dim(storms)
## [1] 902297 37
The data shows 902297 rows and 37 columns. The storm events start in the year 1950 and end in 2011. Below examines the distribution of data by year:
# Create new column to show year
storms$year <- as.numeric(format(as.Date(storms$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
#Histogram of storm data by year
hist(storms$year, breaks = 30, border="green", col="blue")
Based on this plot, only data after 1990 will be used in this analysis.
cache = TRUE
storms_subset <- storms[storms$year >= 1990,]
dim(storms_subset)
## [1] 751740 38
The dataset now has 751740 rows and 38 columns.
To determine sever weather impact on public health, the data will be checked for the number of fatalieies and injuries. The top 10 most severe types of weather events will used.
sort <- function(fieldName, top = 10, dataset = storms) {
index <- which(colnames(dataset) == fieldName)
field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
names(field) <- c("EVTYPE", fieldName)
field <- arrange(field, field[, 2], decreasing = T)
field <- head(field, n = top)
field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
return(field)
}
fatalities <- sort("FATALITIES", dataset = storms_subset) # Count of top 10 fatal storm events
injuries <- sort("INJURIES", dataset = storms_subset) # Count of top 10 injury storm events
To determine economic impact, property damage and crop damage data will be converted into comparable numerical forms per the units described in the code book for the storm data (Code Book). The columns PROPDMGEXP and CROPDMGEXP record a multiplier for each variable for Hundred (H), Thousand (K), Million (M) and Billion (B).
convert <- function(dataset = storms_subset, fieldName, newFieldName) {
totalLen <- dim(dataset)[2]
index <- which(colnames(dataset) == fieldName)
dataset[, index] <- as.character(dataset[, index])
logic <- !is.na(toupper(dataset[, index]))
dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
dataset[, index] <- as.numeric(dataset[, index])
dataset[is.na(dataset[, index]), index] <- 0
dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
names(dataset)[totalLen + 1] <- newFieldName
return(dataset)
}
storms_subset <- convert(storms_subset, "PROPDMGEXP", "propertyDamage")
## Warning in convert(storms_subset, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storms_subset <- convert(storms_subset, "CROPDMGEXP", "cropDamage")
## Warning in convert(storms_subset, "CROPDMGEXP", "cropDamage"): NAs
## introduced by coercion
names(storms_subset)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE"
## [5] "COUNTY" "COUNTYNAME" "STATE" "EVTYPE"
## [9] "BGN_RANGE" "BGN_AZI" "BGN_LOCATI" "END_DATE"
## [13] "END_TIME" "COUNTY_END" "COUNTYENDN" "END_RANGE"
## [17] "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES"
## [25] "PROPDMG" "PROPDMGEXP" "CROPDMG" "CROPDMGEXP"
## [29] "WFO" "STATEOFFIC" "ZONENAMES" "LATITUDE"
## [33] "LONGITUDE" "LATITUDE_E" "LONGITUDE_" "REMARKS"
## [37] "REFNUM" "year" "propertyDamage" "cropDamage"
options(scipen=999)
property <- sort("propertyDamage", dataset = storms_subset)
crop <- sort("cropDamage", dataset = storms_subset)
Related to storm events impact on public health, below are the two lists generated, showing the top ten event types:
Fatalities
fatalities
## EVTYPE FATALITIES
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1752
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 FLOOD 470
## 7 RIP CURRENT 368
## 8 TSTM WIND 327
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Injury
injuries
## EVTYPE INJURIES
## 1 TORNADO 26674
## 2 FLOOD 6789
## 3 EXCESSIVE HEAT 6525
## 4 LIGHTNING 5230
## 5 TSTM WIND 5022
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 WINTER STORM 1321
fatalplot <- ggplot(data=fatalities, aes(x=EVTYPE, y=FATALITIES))+geom_bar(stat="identity", fill = "purple") + coord_flip()
injuryplot <- ggplot(data=injuries, aes(x=EVTYPE, y=INJURIES))+geom_bar(stat="identity", fill = "green") + coord_flip()
grid.arrange(fatalplot, injuryplot, ncol =2 )
The bar plots show that Exessive Heat is the leading cause of fatal type weather events and Tornado is the leading cause of injury.
Related to storm events economic impact, below are the two lists generated, showing the top ten event types: Property
property
## EVTYPE propertyDamage
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 STORM SURGE 43323536000
## 4 TORNADO 30468735507
## 5 FLASH FLOOD 16822673979
## 6 HAIL 15735267513
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
Crops
crop
## EVTYPE cropDamage
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
propertyplot <- ggplot(data=property, aes(x=EVTYPE, y=propertyDamage))+geom_bar(stat="identity", fill = "magenta") + coord_flip()
cropplot <- ggplot(data=crop, aes(x=EVTYPE, y=cropDamage))+geom_bar(stat="identity", fill = "yellow") + coord_flip()
grid.arrange(propertyplot, cropplot, ncol =2 )
Analysis of NOAA storm data from 1990 to 2011 shows that public health is most impacted by excessive heat and tornadoes. Heat results in the most fatalities, and tornadoes result in the most injures. The economy is most impacted by flood and drought. Flood causes the most property damage, while drought damages the most crops.