Storm Data Analysis, health and economic impacts of storms in the US

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Results This analysis shows that the Tornadoes are the most harmful events for public health, and Flood has the greatest economic impacts in United States.

Libraries

Install (if necessary) and load the following libraries for data processing.

suppressMessages(library(tidyverse))
suppressMessages(library(R.utils))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(lubridate))
suppressMessages(require(gridExtra))
suppressMessages(library(plyr))

Data Processing

The data were loaded into R using the data.table package. For the economic consequence portion of the analysis, property and crop damage amounts were converted to billion dollars. Crop and property damage costs were then aggregated as a single cost.

Download data to working directory and save as a CSV file:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Storm_Data.csv")

Read the CSV file into data frame.

cache = TRUE
storms <- read.csv("Storm_Data.csv")
head(storms)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
dim(storms)
## [1] 902297     37

The data shows 902297 rows and 37 columns. The storm events start in the year 1950 and end in 2011. Below examines the distribution of data by year:

# Create new column to show year
storms$year <- as.numeric(format(as.Date(storms$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))

#Histogram of storm data by year
hist(storms$year, breaks = 30, border="green", col="blue")

Based on this plot, only data after 1990 will be used in this analysis.

cache = TRUE
storms_subset <- storms[storms$year >= 1990,]
dim(storms_subset)
## [1] 751740     38

The dataset now has 751740 rows and 38 columns.

Public Health Impact

To determine sever weather impact on public health, the data will be checked for the number of fatalieies and injuries. The top 10 most severe types of weather events will used.

sort <- function(fieldName, top = 10, dataset = storms) {
    index <- which(colnames(dataset) == fieldName)
    field <- aggregate(dataset[, index], by = list(dataset$EVTYPE), FUN = "sum")
    names(field) <- c("EVTYPE", fieldName)
    field <- arrange(field, field[, 2], decreasing = T)
    field <- head(field, n = top)
    field <- within(field, EVTYPE <- factor(x = EVTYPE, levels = field$EVTYPE))
    return(field)
}

fatalities <- sort("FATALITIES", dataset = storms_subset) # Count of top 10 fatal storm events
injuries <- sort("INJURIES", dataset = storms_subset) # Count of top 10 injury storm events

Economic Impact

To determine economic impact, property damage and crop damage data will be converted into comparable numerical forms per the units described in the code book for the storm data (Code Book). The columns PROPDMGEXP and CROPDMGEXP record a multiplier for each variable for Hundred (H), Thousand (K), Million (M) and Billion (B).

convert <- function(dataset = storms_subset, fieldName, newFieldName) {
    totalLen <- dim(dataset)[2]
    index <- which(colnames(dataset) == fieldName)
    dataset[, index] <- as.character(dataset[, index])
    logic <- !is.na(toupper(dataset[, index]))
    dataset[logic & toupper(dataset[, index]) == "B", index] <- "9"
    dataset[logic & toupper(dataset[, index]) == "M", index] <- "6"
    dataset[logic & toupper(dataset[, index]) == "K", index] <- "3"
    dataset[logic & toupper(dataset[, index]) == "H", index] <- "2"
    dataset[logic & toupper(dataset[, index]) == "", index] <- "0"
    dataset[, index] <- as.numeric(dataset[, index])
    dataset[is.na(dataset[, index]), index] <- 0
    dataset <- cbind(dataset, dataset[, index - 1] * 10^dataset[, index])
    names(dataset)[totalLen + 1] <- newFieldName
    return(dataset)
}

storms_subset <- convert(storms_subset, "PROPDMGEXP", "propertyDamage")
## Warning in convert(storms_subset, "PROPDMGEXP", "propertyDamage"): NAs
## introduced by coercion
storms_subset <- convert(storms_subset, "CROPDMGEXP", "cropDamage")
## Warning in convert(storms_subset, "CROPDMGEXP", "cropDamage"): NAs
## introduced by coercion
names(storms_subset)
##  [1] "STATE__"        "BGN_DATE"       "BGN_TIME"       "TIME_ZONE"     
##  [5] "COUNTY"         "COUNTYNAME"     "STATE"          "EVTYPE"        
##  [9] "BGN_RANGE"      "BGN_AZI"        "BGN_LOCATI"     "END_DATE"      
## [13] "END_TIME"       "COUNTY_END"     "COUNTYENDN"     "END_RANGE"     
## [17] "END_AZI"        "END_LOCATI"     "LENGTH"         "WIDTH"         
## [21] "F"              "MAG"            "FATALITIES"     "INJURIES"      
## [25] "PROPDMG"        "PROPDMGEXP"     "CROPDMG"        "CROPDMGEXP"    
## [29] "WFO"            "STATEOFFIC"     "ZONENAMES"      "LATITUDE"      
## [33] "LONGITUDE"      "LATITUDE_E"     "LONGITUDE_"     "REMARKS"       
## [37] "REFNUM"         "year"           "propertyDamage" "cropDamage"
options(scipen=999)
property <- sort("propertyDamage", dataset = storms_subset)
crop <- sort("cropDamage", dataset = storms_subset) 

Results

Related to storm events impact on public health, below are the two lists generated, showing the top ten event types:

Fatalities

fatalities
##            EVTYPE FATALITIES
## 1  EXCESSIVE HEAT       1903
## 2         TORNADO       1752
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6           FLOOD        470
## 7     RIP CURRENT        368
## 8       TSTM WIND        327
## 9       HIGH WIND        248
## 10      AVALANCHE        224

Injury

injuries
##               EVTYPE INJURIES
## 1            TORNADO    26674
## 2              FLOOD     6789
## 3     EXCESSIVE HEAT     6525
## 4          LIGHTNING     5230
## 5          TSTM WIND     5022
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10      WINTER STORM     1321

Bar plots for Fatalities and Injuries by storm events

fatalplot <- ggplot(data=fatalities, aes(x=EVTYPE, y=FATALITIES))+geom_bar(stat="identity", fill = "purple") + coord_flip()
injuryplot <- ggplot(data=injuries, aes(x=EVTYPE, y=INJURIES))+geom_bar(stat="identity", fill = "green") + coord_flip()
grid.arrange(fatalplot, injuryplot, ncol =2 )

The bar plots show that Exessive Heat is the leading cause of fatal type weather events and Tornado is the leading cause of injury.

Related to storm events economic impact, below are the two lists generated, showing the top ten event types: Property

property
##               EVTYPE propertyDamage
## 1              FLOOD   144657709807
## 2  HURRICANE/TYPHOON    69305840000
## 3        STORM SURGE    43323536000
## 4            TORNADO    30468735507
## 5        FLASH FLOOD    16822673979
## 6               HAIL    15735267513
## 7          HURRICANE    11868319010
## 8     TROPICAL STORM     7703890550
## 9       WINTER STORM     6688497251
## 10         HIGH WIND     5270046295

Crops

crop
##               EVTYPE  cropDamage
## 1            DROUGHT 13972566000
## 2              FLOOD  5661968450
## 3        RIVER FLOOD  5029459000
## 4          ICE STORM  5022113500
## 5               HAIL  3025954473
## 6          HURRICANE  2741910000
## 7  HURRICANE/TYPHOON  2607872800
## 8        FLASH FLOOD  1421317100
## 9       EXTREME COLD  1292973000
## 10      FROST/FREEZE  1094086000

Bar plots for properties damages and damages by storm events

propertyplot <- ggplot(data=property, aes(x=EVTYPE, y=propertyDamage))+geom_bar(stat="identity", fill = "magenta") + coord_flip()
cropplot <- ggplot(data=crop, aes(x=EVTYPE, y=cropDamage))+geom_bar(stat="identity", fill = "yellow") + coord_flip()
grid.arrange(propertyplot, cropplot, ncol =2 )

Conclusions

Analysis of NOAA storm data from 1990 to 2011 shows that public health is most impacted by excessive heat and tornadoes. Heat results in the most fatalities, and tornadoes result in the most injures. The economy is most impacted by flood and drought. Flood causes the most property damage, while drought damages the most crops.