By Robert J. Chen

08/22/2019

Synopsis

Severe weather events have public health and economic damage. For public health, tornado is the top one for casualties (fatalities and injuries). For economy, flood is the top one for property damage, and drought is the top one for crop damage. Nevertheless, hail and thunderstorm wind are the most frequent weather events.

Background

Severe weather events such as storms have public health and economic impacts. They result in fatalities, injuries, and property damage. This project explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. This Courera project is to answer the following questions. Which types of events are most harmful with respect to population health? Which types of events have the greatest economic consequences?

Data

The data is a comma-separated-value file compressed via bzip2. It can be downloaded from: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 The events in the database start in the year 1950 and end in November 2011. In the earlier years there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Read the compressed csv file and then store it as a dataframe.

## Read the csv file and save it in raw_data. (This may take some time.)
setwd("~/Dropbox/@Next/AI/JH_repro/HW2")
raw_data <- read.csv("repdata_data_StormData.csv.bz2")

Summary of Data

# Sample view of the data
head(raw_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
# Samples size:
nrow(raw_data)
## [1] 902297
# Variables measured:
names(raw_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
# Number of different weather event types:
length(table(raw_data$EVTYPE))
## [1] 985

Extracting relevant variables

In this project, we are interested in the consequences of the different types of events (EVTYPE) with respect to population health (FATALITIES,INJURIES) and the economy (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP).

# Impact on population health
# Aggregate the data by event type
fatality <- aggregate(FATALITIES ~ EVTYPE, raw_data, sum)
injury <- aggregate(INJURIES ~ EVTYPE, raw_data, sum)

# Impact on the economy
economy <- raw_data[, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

# Check the property damage exponent
unique(raw_data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# Convert the property damage exponent and calculate the property damage value
economy$PROPEXP <- c(1000,1e+06,1,1e+09,1e+06,1,1,1e+05,1e+06,1,10000,100,1000,100,1e+07,1,1,1,1e+08)[match(economy$PROPDMGEXP,as.character(unique(raw_data$PROPDMGEXP)))]
economy$PROPDMGVAL <- economy$PROPDMG*economy$PROPEXP

# Check the property damage exponent
unique(raw_data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
# Converting the crop damage exponent
economy$CROPEXP<- c(1,1e+06,1000,1e+06,1e+09,1,1,1000,100)[match(economy$CROPDMGEXP,as.character(unique(raw_data$CROPDMGEXP)))]
economy$CROPDMGVAL <- economy$CROPDMG*economy$CROPEXP

# Aggregate the data by event type
property <- aggregate(PROPDMGVAL ~ EVTYPE, economy, sum)
crop <- aggregate(CROPDMGVAL ~ EVTYPE, economy, sum)

Results

Impact on population health

# Rank the top 10 events for fatalities
fatality10 <- fatality[order(-fatality$FATALITIES), ][1:10, ]
# Rank the top 10 events for injuries
injury10 <- injury[order(-injury$INJURIES), ][1:10, ]
# Plot the data
barplot(fatality10$FATALITIES, las = 3, names.arg = fatality10$EVTYPE, main = "Top 10 Events for Fatalities", ylab = "Number of Fatalities", col = "lightblue")

barplot(injury10$INJURIES, las = 3, names.arg = injury10$EVTYPE, main = "Top 10 Events for Injuries", ylab = "Number of Injuries", col = "lightblue")

They show that tornado and excessive heat are the weather events with the top 2 fatalities, where the fatality due to tornado of 5,633 is significantly higher than others. Similarly, tornado, with 91,346 injuries, is significantly more disastrous than others. Therefore, among the 985 weather types, tornado has the most impact on population health in the United States.

Impact on economy

# Rank the top 10 events for property damage
property10 <- property[order(-property$PROPDMGVAL), ][1:10, ]
# Rank the top 10 events for crop damage
crop10 <- crop[order(-crop$CROPDMGVAL), ][1:10, ]
barplot(property10$PROPDMGVAL/(10^9), las = 3, names.arg = property10$EVTYPE, main = "Top 10 Events for Property Damages", ylab = "Property Damage ($ billions)", col = "moccasin")

barplot(crop10$CROPDMGVAL/(10^9), las = 3, names.arg = crop10$EVTYPE, main = "Top 10 Events for Crop Damages", ylab = "Crop Damage ($ billions)", col = "moccasin")

They show that flood, hurricane/typhoon and tornado are the weather events with the top 3 property damage. The damage by flood, 144.66 billion, is significantly higher than others. Drought, flood, river flood, and ice flood are the top 4 weather events for the crop damage. The damage by drought, $13.97 billion, is significantly higher than others. Therefore, among the 985 weather types, flood has the most economic consequence in the United States.

Rank of frequency

# Plotting the frequency of events
freq10 <- sort(table(raw_data$EVTYPE),decreasing = TRUE)[1:10]
nm <- c(names(freq10)) 
barplot(freq10, las = 3, names.arg = gsub(" ", "\n", nm), main = "Top 10 Frequent Weather Events", ylab = "Frequency Count", col = "gray", las = 3)

While hail and thunderstorm wind are the most frequent weather events, they do not incur severe health or economic consequences. Even though tornado, flood, and drought do not occur so frequently, their damages are relatively high.