Exploring the NOAA Storm Database for Health and Economic Impacts

SYNOPSIS

Storms and other severe weather events have been a serious problem in terms of public health and economic impact for communities and municipalities.

This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database collects the data related to the major storms and weather events in the United States, including the time and the place of their occurance, as well as the estimates related to any fatalities, injuries, and property damage.

The following questions are addressed in this analysis:

After the data by storm events type are aggregared and analyzed, the following conclusions are reached:

DATA PROCESSING

The storm data can be downloaded from the following website: Storm Data.

The documentation of the storm database can be downloaded from the following website: Storm Data Documentation.

Finally, the National Climatic Data Center Storm Events FAQ can be found on the following website: National Climatic Data Center Storm Events FAQ.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Download and load dataset

Set working directory.

setwd("C:/Users/Owner/Documents/Coursera/NOAA") 

Download the storm dataset (if not present) and load it into R.

data.file.name <- "C:/Users/Owner/Documents/Coursera/NOAA/repdata-data-StormData.csv.bz2"
if (!file.exists(data.file.name)) {

  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(url = url, destfile = data.file.name)
}
storm.data <- read.csv("C:/Users/Owner/Documents/Coursera/NOAA/repdata-data-StormData.csv.bz2")

Let us examine the size of the storm dataset and list the first six rowns.

storm.data$EVTYPE = toupper(storm.data$EVTYPE)
dim(storm.data)
## [1] 902297     37
head(storm.data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Extracting the Required Data

This storm dataset contains substantially more information than it is required for the current analysis. Only the required data related to the health and the economic impact will be extracted.

storm.event <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
req.storm.data <- storm.data[storm.event]

Let us examine the size of the storm dataset that is now reduced to just seven required columns and let us list again the first six rowns.

dim(req.storm.data)
## [1] 902297      7
head(req.storm.data)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Analysis of Property Damage

Property damage exponents (PROPDMGEXP) are listed and assigned appropriate numerical values. Invalid data are excluded by assigning the value of zero. The property damage value is calculated by multiplying the property damage (PROPDMG) and the property exponent value extracted from property damage exponents (PROPDMGEXP).

unique(req.storm.data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "K"] <- 1e+03
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "M"] <- 1e+06
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "" ] <- 1e+00
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "B"] <- 1e+09
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "m"] <- 1e+06
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "0"] <- 1e+00
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "5"] <- 1e+05
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "6"] <- 1e+06
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "4"] <- 1e+04
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "2"] <- 1e+02
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "3"] <- 1e+03
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "h"] <- 1e+02
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "7"] <- 1e+07
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "H"] <- 1e+02
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "1"] <- 1e+01
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "8"] <- 1e+08

# Assign the value of zero to invalid exponent data  

req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "+"] <- 0.0
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "-"] <- 0.0
req.storm.data$PROPEXP[req.storm.data$PROPDMGEXP == "?"] <- 0.0

# Calculate the value of property damage  

req.storm.data$PROPDMGVAL <- req.storm.data$PROPDMG * req.storm.data$PROPEXP

Analysis of Crop Damage

Crop damage exponents (CROPDMGEXP) are listed and assigned appropriate numerical values. Invalid data are excluded by assigning the value of zero. The crop damage value is calculated by multiplying the crop damage (CROPDMG) and the crop exponent value extracted from crop damage exponents (CROPDMGEXP).

unique(req.storm.data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "M"] <- 1e+06
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "K"] <- 1e+03
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "m"] <- 1e+06
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "B"] <- 1e+09
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "0"] <- 1e+00
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "k"] <- 1e+03
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "2"] <- 1e+02
req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "" ] <- 1e+00

# Assigning the value of zero to invalid exponent data  

req.storm.data$CROPEXP[req.storm.data$CROPDMGEXP == "?"] <- 0.0

# Calculate the value of crop damage  

req.storm.data$CROPDMGVAL <- req.storm.data$CROPDMG * req.storm.data$CROPEXP

Calculating the Totals of Each Incident by Event Type

In this analysis only FATALITIES and INJURIES are selected and designated as “most harmful to population health” events. Similarly, only PROPDMG and CROPDMG are selected and designated as events that have the “greatest economic consequences”.

The total values are calculated for each of these four incident types.

total.fatalities <- aggregate(FATALITIES ~ EVTYPE, req.storm.data, FUN = sum)
total.injuries <- aggregate(INJURIES ~ EVTYPE, req.storm.data, FUN = sum)
total.propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, req.storm.data, FUN = sum)
total.cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, req.storm.data, FUN = sum)

RESULTS

Plotting the Fatalities and Injuries Summary Data

Top ten causes of fatalities and injuries are calculated and plotted. Clearly, TORNADOES are the number one cause of both fatalities and injuries in the United States.

top.fatalities <- total.fatalities[order(-total.fatalities$FATALITIES), ][1:10, ]
top.injuries <- total.injuries[order(-total.injuries$INJURIES), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(top.fatalities$FATALITIES, las = 3, names.arg = top.fatalities$EVTYPE, main = "Main Causes of Fatalities", 
        ylab = "Number of Fatalities", col = "red")
barplot(top.injuries$INJURIES, las = 3, names.arg = top.injuries$EVTYPE, main = "Main Causes of Injuries", 
        ylab = "Number of Injuries", col = "red")

Plotting the Property and Crop Damage Summary Data

Top ten causes of property and crop damage are calculated and plotted. Clearly, FLOODS are the number one cause of property damage and DROUGHTS are the number one cause of crop damage in the United States.

top.propdmg <- total.propdmg[order(-total.propdmg$PROPDMGVAL), ][1:10, ]
top.cropdmg <- total.cropdmg[order(-total.cropdmg$CROPDMGVAL), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(top.propdmg$PROPDMGVAL/10^9, las = 3, names.arg = top.propdmg$EVTYPE, main = "Main Causes of Property Damage", 
        ylab = "Cost of Damage ($ Billions)", col = "blue")
barplot(top.cropdmg$CROPDMGVAL/10^9, las = 3, names.arg = top.cropdmg$EVTYPE, main = "Main Causes of Crop Damage", 
        ylab = "Cost of Damage ($ Billions)", col = "blue")

CONCLUSIONS