Effects of Climate Events on Population and Economy

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Here we explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We determine the the event type that is most harmful with respect to population health to be torndado, and the event type that has the greatest economic consequences across the U.S. to be flood.

Data Processing

Loading and Processing the Raw Data

Both data and documentation are available on the Coursera website.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "data.csv.bz2", method = "curl")

original_data <- read.csv("data.csv.bz2", header = TRUE, stringsAsFactors = FALSE)

# check structure
str(original_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There are 902297 observations of 37 variables. The events start in 1950 and continue to 2011. We will consider the following variables to answer the questions: state, type of event, number of fatalities and injuries and estimates of damage.

Subsetting the Data

# subset data frame
data <- original_data[, c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

#look at structure
str(data)
## 'data.frame':    902297 obs. of  8 variables:
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
# check for missing values
sum(is.na(data))
## [1] 0

This data frame has 902297 observations in 8 variables. It has no missing values.

Results

Impact on Population Health: Fatalities

First, we investigate the event type that caused the largest total number of fatalities from 1950 to November 2011.

fatal <- aggregate (FATALITIES ~ EVTYPE, data, sum)
# order
fatal <- fatal[order(fatal$FATALITIES, decreasing = TRUE) ,]

# plot
barplot(height = fatal$FATALITIES[1:20], names.arg = fatal$EVTYPE[1:20], las = 2, cex.names = .3, col = rainbow(20, start = 0, end = .5))
title(main = "Top 20 Severe Weather Events that Cause Fatalities from 1950-2011")
title (ylab = "Total Number of Fatalities")

The above plot shows the total number of fatalities for the top 20 severe weather events. Tornados have claimed the most fatalities over 1950-2011.

Impact on Population Health: Injuries

Next, we investigate the event type that caused the largest total number of injuries from 1950 to November 2011.

injury <- aggregate (INJURIES ~ EVTYPE, data, sum)
# order
injury <- injury[order(injury$INJURIES, decreasing = TRUE) ,]

# plot
barplot(height = injury$INJURIES[1:20]/1000, names.arg = injury$EVTYPE[1:20], las = 2, cex.names = .3, col = rainbow(20, start = 0, end = .5))
title(main = "Top 20 Severe Weather Events that Cause Injuries from 1950-2011")
title (ylab = "Total Number of Injuries in Thousands")

The above plot shows the total number of fatalities for the top 20 severe weather events. Tornados are responsible for the most injuries over 1950-2011. Across the U.S., tornadoes are the severe weather event most harmful with respect to population health, as they cause the most fatalities and injuries.

Economic Damage

We now investigate the total ecomonic domage produced by each type of severe weather event. We consider both property and crop damages in the U.S. over 1950-2011.

# check exponent data-- there is a mix of "K", "M", etc.
unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
# force to upper case for consistency
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
# look at values and counts
table(data$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      H      K      M 
##      4      5      1     40      7 424665  11337
# convert the exponents to numerical values
data$PROPEXP[data$PROPDMGEXP == "K"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "M"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == ""] <- 1
data$PROPEXP[data$PROPDMGEXP == "B"] <- 1e+09
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "4"] <- 10000
data$PROPEXP[data$PROPDMGEXP == "2"] <- 100
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "H"] <- 100
data$PROPEXP[data$PROPDMGEXP == "1"] <- 10
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
# set any invalid exponent to zero
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
# compute property damage
data$PROPDMGVAL <- data$PROPDMG * data$PROPEXP
# aggregate
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, data = data, FUN = sum)



### Prepare the Crop Damage Data
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
# force to upper case for consistency
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
# look at values and counts
table(data$CROPDMGEXP)
## 
##             ?      0      2      B      K      M 
## 618413      7     19      1      9 281853   1995
# convert the exponents to numerical values
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "2"] <- 100
data$CROPEXP[data$CROPDMGEXP == ""] <- 1
# set any invalid exponent to zero
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
# compute the crop damage 
data$CROPDMGVAL <- data$CROPDMG * data$CROPEXP
# aggregate
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, data = data, FUN = sum)

# get top 10 events with highest property damage
propdmg10 <- propdmg[order(-propdmg$PROPDMGVAL), ][1:10, ]
# get top 10 events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$CROPDMGVAL), ][1:10, ]
# make a panel plot
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmg10$PROPDMGVAL/(10^9), las = 3, names.arg = propdmg10$EVTYPE, 
    main = "Top 10 Events with Greatest Property Damages", ylab = "Cost of damages ($ billions)", 
    col = "blue")
barplot(cropdmg10$CROPDMGVAL/(10^9), las = 3, names.arg = cropdmg10$EVTYPE, 
    main = "Top 10 Events With Greatest Crop Damages", ylab = "Cost of damages ($ billions)", 
    col = "blue")

The severe weather event that caused the greatest amount of property damage over 1950-2011 across the U.S. is flooding. The severe weather event that caused the greatest amount of crop damage over 1950-2011 across the U.S. is drought.