Health and Economic Impact of Severe Weather Conditions in the US Based on Analysis of NOAA Storm Database

Sypnosis

This report analyzes the NOAA Storm database from 1950 to Nov 2011 to determine which type of severe weather pattern has the most impact on population health and economic consequences across all states in the US. Effect on health is measured by the number of fatalities and injuries caused by the weather events. Tornado, thunderstorm wind and excessive heat were the top events adversely affecting human health. Effect on economy is measured in terms of the damage caused to property and crops. Floods, droughts, hurricane and tornado are the top events adversely affecting economic conditions.

Data Processing

Loading the data

Download the storm file from the NOAA website. Read in the .bz2 file using the read.csv function.

filename <- "storm.csv.bz2"
if (!file.exists(filename)){
  fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileURL, filename)
}
storm <- read.csv("storm.csv.bz2")
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Subsetting the data

Since we are interested in fatalities/ injuries and economic consequences/ damages across states, the relevant variables for our analysis are
* STATES
* EVTYPE as a measure of event type (e.g. tornado, flood, etc.)
* FATALITIES as a measure of harm to human health
* INJURIES as a measure of harm to human health
* PROPDMG as a measure of property damage and hence economic damage in USD
* PROPDMGEXP as a measure of magnitude of property damage (e.g. thousands, millions USD, etc.)
* CROPDMG as a measure of crop damage and hence economic damage in USD
* CROPDMGEXP as a measure of magnitude of crop damage (e.g. thousands, millions USD, etc.)

We create a subset of the original storm data with these 8 variables.

vars <- c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm2 <- storm[vars]
# Check subset storm2 data for missing values
sum(is.na(storm2))
## [1] 0

There are no missing values in the storm2 data subset.

str(storm2)
## 'data.frame':    902297 obs. of  8 variables:
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

Cleaning the data

From the structure of new dataset storm2, we see that the variables, property damage PROPDMGEXP and crop damage CROPDMGEXP, are levels instead of numeric. From the cookbook, the letter “K” stands for thousands, while “M” for millions and “B” for billions. Hence, we need to transform the exponential terms back into actual values.

To compute property damage value

storm2$PROPEXP[storm2$PROPDMGEXP == "K"] <- 1000
storm2$PROPEXP[storm2$PROPDMGEXP == "M"] <- 1000000
storm2$PROPEXP[storm2$PROPDMGEXP == ""] <- 1
storm2$PROPEXP[storm2$PROPDMGEXP == "B"] <- 1000000000
storm2$PROPEXP[storm2$PROPDMGEXP == "m"] <- 1000000
storm2$PROPEXP[storm2$PROPDMGEXP == "0"] <- 1
storm2$PROPEXP[storm2$PROPDMGEXP == "5"] <- 100000
storm2$PROPEXP[storm2$PROPDMGEXP == "6"] <- 1000000
storm2$PROPEXP[storm2$PROPDMGEXP == "4"] <- 10000
storm2$PROPEXP[storm2$PROPDMGEXP == "2"] <- 100
storm2$PROPEXP[storm2$PROPDMGEXP == "3"] <- 1000
storm2$PROPEXP[storm2$PROPDMGEXP == "h"] <- 100
storm2$PROPEXP[storm2$PROPDMGEXP == "7"] <- 10000000
storm2$PROPEXP[storm2$PROPDMGEXP == "H"] <- 100
storm2$PROPEXP[storm2$PROPDMGEXP == "1"] <- 10
storm2$PROPEXP[storm2$PROPDMGEXP == "8"] <- 100000000
# Assign 0 when the exponent value is invalid so that they are excluded
storm2$PROPEXP[storm2$PROPDMGEXP == "+"] <- 0
storm2$PROPEXP[storm2$PROPDMGEXP == "-"] <- 0
storm2$PROPEXP[storm2$PROPDMGEXP == "?"] <- 0
# compute the property damage value
storm2$PROPDMGVAL <- storm2$PROPDMG * storm2$PROPEXP

Similarly, to compute crop damage value

storm2$CROPEXP[storm2$CROPDMGEXP == "M"] <- 1000000
storm2$CROPEXP[storm2$CROPDMGEXP == "K"] <- 1000
storm2$CROPEXP[storm2$CROPDMGEXP == "m"] <- 1000000
storm2$CROPEXP[storm2$CROPDMGEXP == "B"] <- 1000000000
storm2$CROPEXP[storm2$CROPDMGEXP == "0"] <- 1
storm2$CROPEXP[storm2$CROPDMGEXP == "k"] <- 1000
storm2$CROPEXP[storm2$CROPDMGEXP == "2"] <- 100
storm2$CROPEXP[storm2$CROPDMGEXP == ""] <- 1
# Assign 0 when the exponent value is invalid so that they are excluded
storm2$CROPEXP[storm2$CROPDMGEXP == "?"] <- 0
# Compute the crop damage value
storm2$CROPDMGVAL <- storm2$CROPDMG * storm2$CROPEXP

Results

Events that are harmful to population health

fatal <- aggregate(FATALITIES ~ EVTYPE, data=storm2, FUN=sum)
injury <- aggregate(INJURIES ~ EVTYPE, data=storm2, FUN=sum)
# Sort in decreasing orders the top 10 events
fatal10 <- fatal[order(-fatal$FATALITIES),][1:10,]
injury10 <- injury[order(-injury$INJURIES),][1:10,]
# Plot the top 10 events with highest fatalities and highest injuries
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8)
barplot(fatal10$FATALITIES/(1*10^3), names.arg=fatal10$EVTYPE, col=heat.colors(10), ylim= c(0,6),
        ylab="Number of Fatalities (thousands)", main=" Top 10 Events with Highest Fatalities")
barplot(injury10$INJURIES/(1*10^3), names.arg=injury10$EVTYPE, col=terrain.colors(10), ylim= c(0,90),
        ylab="Number of Injuries (thousands)", main=" Top 10 Events with Highest Injuries")

The panel plot shows that tornado has caused the highest number of fatalities and injuries.

Events that have the greatest economic consequences

propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, data=storm2, FUN=sum)
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, data=storm2, FUN=sum)
# Sort in decreasing orders the top 10 events
propdmg10 <-propdmg[order(-propdmg$PROPDMGVAL),][1:10,]
cropdmg10 <-cropdmg[order(-cropdmg$CROPDMGVAL),][1:10,]

# Plot the top 10 events with highest economic consequences
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8, cex.main = 0.9)
barplot((propdmg10$PROPDMGVAL)/(1*10^9), names.arg=propdmg10$EVTYPE, col=heat.colors(10, alpha = 1), ylim= c(0,140),
        ylab=" Cost of Property Damage($ billions)", main="Top 10 Events Causing Highest Property Damage")
barplot((cropdmg10$CROPDMGVAL)/(1*10^9), names.arg=cropdmg10$EVTYPE, col=terrain.colors(10, alpha = 1), ylim= c(0,14), 
        ylab=" Cost of Crop Damage($ billions)", main="Top 10 Events Causing Highest Crop Damage")

The panel plot shows that floods caused the most property damage, whereas droughts was most harmful to crops.