Data Analysis Report of Health and Economic Impact by Severe Weather Events

Based on NOAA Storm Database

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this report, you will find some brief results of data analysis on the health and economic impact by the severe weather events based on the data from NOAA database.

Data Processing

Download the dataset from URL.

if (!file.exists("../repdata-data-StormData.csv.bz2")) {
        download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                      "../repdata-data-StormData.csv.bz2")
}

Unzip the file.

if (!file.exists("../repdata-data-StormData.csv")) {
        library(R.utils, warn.conflicts = FALSE)
        bunzip2("../repdata-data-StormData.csv.bz2", "../repdata-data-StormData.csv", remove = FALSE)
}

Read in the dataset.

data <- read.csv("../repdata-data-StormData.csv")
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Subset the data containing weather event, health and economic impact only.

library(dplyr, warn.conflicts = FALSE)

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
newData <- select(data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(newData)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Calculate the total damage number for both property and crop.

unique(newData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
newData$PROPEXP[newData$PROPDMGEXP == "-"|newData$PROPDMGEXP == "?"|newData$PROPDMGEXP == "+"] <- 0
newData$PROPEXP[newData$PROPDMGEXP == "0" | newData$PROPDMGEXP == ""] <- 10^0
newData$PROPEXP[newData$PROPDMGEXP == "1"] <- 10^1
newData$PROPEXP[newData$PROPDMGEXP == "2"|newData$PROPDMGEXP == "h"|newData$PROPDMGEXP == "H"] <- 10^2
newData$PROPEXP[newData$PROPDMGEXP == "3" | newData$PROPDMGEXP == "K"] <- 10^3
newData$PROPEXP[newData$PROPDMGEXP == "4"] <- 10^4
newData$PROPEXP[newData$PROPDMGEXP == "5"] <- 10^5
newData$PROPEXP[newData$PROPDMGEXP == "6"|newData$PROPDMGEXP == "m"|newData$PROPDMGEXP == "M"] <- 10^6
newData$PROPEXP[newData$PROPDMGEXP == "7"] <- 10^7
newData$PROPEXP[newData$PROPDMGEXP == "8"] <- 10^8
newData$PROPEXP[newData$PROPDMGEXP == "B"] <- 10^9

unique(newData$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
newData$CROPEXP[newData$CROPDMGEXP == "?"] <- 0
newData$CROPEXP[newData$CROPDMGEXP == "0" | newData$CROPDMGEXP == ""] <- 1
newData$CROPEXP[newData$CROPDMGEXP == "2"] <- 10^2
newData$CROPEXP[newData$CROPDMGEXP == "k" | newData$CROPDMGEXP == "K"] <- 10^3
newData$CROPEXP[newData$CROPDMGEXP == "m" | newData$CROPDMGEXP == "M"] <- 10^6
newData$CROPEXP[newData$CROPDMGEXP == "B"] <- 10^9

newData <- mutate(newData, PROPDMGVALUE = PROPDMG*PROPEXP, CROPDMGVALUE = CROPDMG*CROPEXP)

head(newData)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0           
##   PROPEXP CROPEXP PROPDMGVALUE CROPDMGVALUE
## 1    1000       1        25000            0
## 2    1000       1         2500            0
## 3    1000       1        25000            0
## 4    1000       1         2500            0
## 5    1000       1         2500            0
## 6    1000       1         2500            0

Group the new dataset by event and calculate the total damage number for each category.

fatalData <- newData %>% group_by(EVTYPE) %>% summarize(sum(FATALITIES))
colnames(fatalData) = c("EVTYPE", "SUM")
injuryData <- newData %>% group_by(EVTYPE) %>% summarize(sum(INJURIES))
colnames(injuryData) = c("EVTYPE", "SUM")
property <- newData %>% group_by(EVTYPE) %>% summarize(sum(PROPDMGVALUE))
colnames(property) = c("EVTYPE", "SUM")
crop <- newData %>% group_by(EVTYPE) %>% summarize(sum(CROPDMGVALUE))
colnames(crop) = c("EVTYPE", "SUM")

Results

Question 1: Across the United States, which types of events are most harmful with respect to population health?

par(mfrow = c(1, 2), mar = c(11, 4, 4, 2))
fatal10 <- fatalData[order(fatalData$SUM, decreasing = TRUE), ][1:10, ]
injury10 <- injuryData[order(injuryData$SUM, decreasing = TRUE), ][1:10, ]
barplot(fatal10$SUM, names.arg = fatal10$EVTYPE, las = 3, ylab = "Total Number of Fatalities",
        col = "red", main = "Top 10 Highest Fatality Events")
barplot(injury10$SUM, names.arg = injury10$EVTYPE, las = 3, ylab = "Total Number of Injuries",
        col = "red", main = "Top 10 Highest Injuries Events")

Above shown are two barplots of the top 10 highest fatality and injury events, from which we can see that the most harmful weather event to population health is Tornado. It has caused both the highest fatalities and the highest injuries across the United States.

Question 2: Across the United States, which types of events have the greatest economic consequences?

par(mfrow = c(1, 2), mar = c(11, 4, 4, 2))
property10 <- property[order(property$SUM, decreasing = TRUE), ][1:10, ]
crop10 <- crop[order(crop$SUM, decreasing = TRUE), ][1:10, ]
barplot(property10$SUM/(10^9), names.arg = property10$EVTYPE, las = 3,
        ylab = "Total Damage ($ in Billions)", col = "red",
        main = "Top 10 Highest Property Damage")
barplot(crop10$SUM/(10^9), names.arg = crop10$EVTYPE, las = 3,
        ylab = "Total Damage ($ in Billions)", col = "red",
        main = "Top 10 Highest Crop Damage")

Above shown are two barplots of the top 10 highest property and crop damage events. We can infer that across the United States, flood, typhoon and tornado have caused the greatest damage to properties. Drought and flood are the two major causes for the greatest damage to crops.