Exploring the U.S. National Oceanic and Atmospheric Administration (NOAA)

Storm Database- The Brief Analyses of Health and Economic Impacts

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project explored the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database that tracks characteristics of major storms and weather events in the United States. Data included when and where the events occurred and estimates of fatalities, injuries, and property damage.

The results showed that tornados caused the most total fatalities and injuries in the United States., thus are the most harmful weather events on health. The top 10 weather events.

The analyses of the storm data found that floods have the greatest ecomonic consequencies across the United States.

Data

The data for this assignment can be downloaded from the course web site:

There is also some documentation of the database available:

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

Load and preprocess data

library(readr)
stormdata <- read.csv("repdata_data_StormData.csv")
head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There are 37 variables (columns) and 902297 observations (rows)

Data Processing

From a list of variables in storm.data, the following are relevant variables:

EVTYPE: weather events (i.e. Tornados, Wind, Snow, Flood, etc)

FATALITIES: approx. number of deaths

INJURIES: approx. number of injuries

PROPDMG: approx. property damags

PROPDMGEXP: the units for property damage value

CROPDMG: approx. crop damages

CROPDMGEXP: the units for crop damage value

1. Create a new dataframe to perform the analyses

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ dplyr   1.0.0
## ✓ tibble  3.0.3     ✓ stringr 1.4.0
## ✓ tidyr   1.1.0     ✓ forcats 0.5.0
## ✓ purrr   0.3.4
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
df <- subset(stormdata, 
             select=c( "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", 
                       "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

2. Analyse the health impacts

fatalities <- df %>% select(EVTYPE, FATALITIES) %>% group_by(EVTYPE) %>% 
        summarise(total.fatalities = sum(FATALITIES)) %>% 
        arrange(-total.fatalities)
## `summarise()` ungrouping output (override with `.groups` argument)
head(fatalities, 10)
## # A tibble: 10 x 2
##    EVTYPE         total.fatalities
##    <fct>                     <dbl>
##  1 TORNADO                    5633
##  2 EXCESSIVE HEAT             1903
##  3 FLASH FLOOD                 978
##  4 HEAT                        937
##  5 LIGHTNING                   816
##  6 TSTM WIND                   504
##  7 FLOOD                       470
##  8 RIP CURRENT                 368
##  9 HIGH WIND                   248
## 10 AVALANCHE                   224
injuries <- df %>% select(EVTYPE, INJURIES) %>% group_by(EVTYPE) %>% summarise(total.injuries = sum(INJURIES)) %>% arrange(-total.injuries)
## `summarise()` ungrouping output (override with `.groups` argument)
head(injuries, 10)
## # A tibble: 10 x 2
##    EVTYPE            total.injuries
##    <fct>                      <dbl>
##  1 TORNADO                    91346
##  2 TSTM WIND                   6957
##  3 FLOOD                       6789
##  4 EXCESSIVE HEAT              6525
##  5 LIGHTNING                   5230
##  6 HEAT                        2100
##  7 ICE STORM                   1975
##  8 FLASH FLOOD                 1777
##  9 THUNDERSTORM WIND           1488
## 10 HAIL                        1361

3. Analyse the ecomonic impacts

The actual damage is indicated by PROPDMGEXP and CROPDMGEXP.

The interpretation of the index in the PROPDMGEXP and CROPDMGEXP are as follow:

H, h -> hundreds = x100

K, K -> kilos = x1,000

M, m -> millions = x1,000,000

B,b -> billions = x1,000,000,000

(+) -> x1

(-) -> x0

(?) -> x0

blank -> x0

The total damage caused by each event was calculated using the following codes:

damage <- df %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
table(damage$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5      6 
## 465934      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330
table(damage$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
Symbols <- sort(unique(as.character(damage$PROPDMGEXP)))
Index <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
convert <- data.frame(Symbols, Index)

damage$Prop.Convert <- convert$Index[match(damage$PROPDMGEXP, convert$Symbols)]
damage$Crop.Convert <- convert$Index[match(damage$CROPDMGEXP, convert$Symbols)]

damage <- damage %>% mutate(PROPDMG = PROPDMG*Prop.Convert) %>% 
        mutate(CROPDMG = CROPDMG*Crop.Convert) %>% 
        mutate(TOTAL.DMG = PROPDMG+CROPDMG)

total.damage <- damage %>% group_by(EVTYPE) %>% 
        summarise(TOTAL.DMG.EVT=sum(TOTAL.DMG)) %>% arrange(-TOTAL.DMG.EVT) 
## `summarise()` ungrouping output (override with `.groups` argument)
head(total.damage)
## # A tibble: 6 x 2
##   EVTYPE            TOTAL.DMG.EVT
##   <fct>                     <dbl>
## 1 FLOOD              150319678250
## 2 HURRICANE/TYPHOON   71913712800
## 3 TORNADO             57352117607
## 4 STORM SURGE         43323541000
## 5 FLASH FLOOD         17562132111
## 6 DROUGHT             15018672000

Results

1. The health impacts of the weather events

Figure 1 demonstrated the top 10 weeather events caused the most fatalities.

Tornados caused the most fatalities in the U.S.

Fig1 <- ggplot(fatalities[1:10,], aes(x=EVTYPE, y=total.fatalities)) + 
        geom_bar(stat="identity") + 
        theme(axis.text.x = element_text(angle=60, vjust=1, hjust=1)) + 
        ggtitle("Total Fatalities Caused by Weather Events") + 
        labs(x="Event Type", y="Total Fatalities")
Fig1

Figure 2 demonstrated the top 10 weather events caused the most injuries.

Tornados caused the most injurues in the U.S.

Fig2 <- ggplot(injuries[1:10,], aes(x=EVTYPE, y=total.injuries)) + 
        geom_bar(stat="identity") + 
        theme(axis.text.x = element_text(angle=60, vjust=1, hjust=1)) + 
        ggtitle("Injuries Caused by Weather Events") + 
        labs(x="Event Type", y="Total Injuries")
Fig2

2. The ecomonic impacts of the weather events are demonstrated in Figrue 3

Figure 3 revealed that floods caused th most economic damage in the U.S.

Fig3 <- ggplot(total.damage[1:10,], aes(x=EVTYPE, y=TOTAL.DMG.EVT)) + 
        geom_bar(stat="identity") + 
        theme(axis.text.x = element_text(angle=60, vjust=1, hjust=1)) + 
        ggtitle("Economic damage Caused by Weather Events") + 
        labs(x="Event Type", y="Total Economic Damage")
Fig3