Analysis of NOAA Storm Database

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This goal of this analysis is to answer the following two questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

The data for this assignment is in the form of a compressed CSV file and is available here. The documentation for the file can be found in the links below:

Loading libraries:

library(dplyr)
library(ggplot2)
library(gridExtra)
library(grid)

Downloading the data file and loading it into RStudio:

fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, "stormData.csv")
data <- read.csv("stormData.csv")

Use the head and summary function to take a look at the data:

head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
summary(data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

Data Analysis

Here, we assume that harmful events with respect to population health refer to FATALITIES and INJURIES. And, events having greatest economic consequences are calculated with Crop (CROPDMG) and Property (PROPDMG) damage. CROPDMG and PROPDMG damage have two related variables called Crop damage exponent (CROPDMGEXP) and Property damage exponent (PROPDMGEXP). Total damage value has to be calculated by multiplying the damage and the exponent values.

var <-c ("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm <- data[var]
dim(storm)
## [1] 902297      7

Analysis for Question 1

Events causing the maximum number of fatalities and injuries are calculated as:

fatal <- aggregate(FATALITIES ~ EVTYPE, data = storm, FUN = sum)
fatal10 <- fatal[order(-fatal$FATALITIES), ][1:10, ] 
fatal10
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
injury <- aggregate(INJURIES ~ EVTYPE, data = storm, FUN = sum)
injury10 <- injury[order(-injury$INJURIES), ][1:10, ] 
injury10
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Plots for the fatalities and injuries for the top 10 events:

fatalplot <- ggplot(fatal10, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES)) + 
      geom_bar(stat = "identity", fill = "pink") + 
      theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) +
      xlab("Event") + ylab("Fatalities")

injuryplot <- ggplot(injury10, aes(x = reorder(EVTYPE, - INJURIES), y = INJURIES)) + 
      geom_bar(stat = "identity", fill = "pink") + 
      theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) +
      xlab("Event") + ylab("Injuries") 
grid.arrange(fatalplot, injuryplot, ncol=2, nrow=1,
     top = textGrob("Most Harmful Events w.r.t. Public Health",gp = gpar(fontsize = 14, font = 3)))

Fig.1

The plots above show that most fatalities and injuries are caused by Tornado events.

Analysis for Question 2

unique(storm$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

The values of exponents are not consistent. Hence, we need to assign proper values to non-consistent ones. We convert known symbols to corresponding numbers and unknown/invalid symbols to 0.

storm$corrPROPEXP[storm$PROPDMGEXP == "K"] <- 1000
storm$corrPROPEXP[storm$PROPDMGEXP == "M"] <- 1e+06
storm$corrPROPEXP[storm$PROPDMGEXP == ""] <- 1
storm$corrPROPEXP[storm$PROPDMGEXP == "B"] <- 1e+09
storm$corrPROPEXP[storm$PROPDMGEXP == "m"] <- 1e+06
storm$corrPROPEXP[storm$PROPDMGEXP == "0"] <- 1
storm$corrPROPEXP[storm$PROPDMGEXP == "5"] <- 1e+05
storm$corrPROPEXP[storm$PROPDMGEXP == "6"] <- 1e+06
storm$corrPROPEXP[storm$PROPDMGEXP == "4"] <- 10000
storm$corrPROPEXP[storm$PROPDMGEXP == "2"] <- 100
storm$corrPROPEXP[storm$PROPDMGEXP == "3"] <- 1000
storm$corrPROPEXP[storm$PROPDMGEXP == "h"] <- 100
storm$corrPROPEXP[storm$PROPDMGEXP == "7"] <- 1e+07
storm$corrPROPEXP[storm$PROPDMGEXP == "H"] <- 100
storm$corrPROPEXP[storm$PROPDMGEXP == "1"] <- 10
storm$corrPROPEXP[storm$PROPDMGEXP == "8"] <- 1e+08
storm$corrPROPEXP[storm$PROPDMGEXP == "+"] <- 0
storm$corrPROPEXP[storm$PROPDMGEXP == "-"] <- 0
storm$corrPROPEXP[storm$PROPDMGEXP == "?"] <- 0
storm$corrCROPEXP[storm$CROPDMGEXP == "M"] <- 1e+06
storm$corrCROPEXP[storm$CROPDMGEXP == "K"] <- 1000
storm$corrCROPEXP[storm$CROPDMGEXP == "m"] <- 1e+06
storm$corrCROPEXP[storm$CROPDMGEXP == "B"] <- 1e+09
storm$corrCROPEXP[storm$CROPDMGEXP == "0"] <- 1
storm$corrCROPEXP[storm$CROPDMGEXP == "k"] <- 1000
storm$corrCROPEXP[storm$CROPDMGEXP == "2"] <- 100
storm$corrCROPEXP[storm$CROPDMGEXP == ""] <- 1
storm$corrCROPEXP[storm$CROPDMGEXP == "?"] <- 0

Calculating total damage value as:

storm$TOTPROPDMG <- storm$PROPDMG * storm$corrPROPEXP
storm$TOTCROPDMG <- storm$CROPDMG * storm$corrCROPEXP

Aggregating and sorting the property and crop damage by event type (top 10):

prop <- aggregate(TOTPROPDMG ~ EVTYPE, data = storm, FUN = sum, na.rm = TRUE)
prop <- prop[with(prop, order(-TOTPROPDMG)),]
prop <- head(prop, 10)
print(prop)
##                EVTYPE   TOTPROPDMG
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380617
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046260
crop <- aggregate(TOTCROPDMG ~ EVTYPE, data = storm, FUN = sum, na.rm = TRUE)
crop <- crop[with(crop, order(-TOTCROPDMG)),]
crop <- head(crop, 10)
print(crop)
##                EVTYPE  TOTCROPDMG
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Plots for the property and crop damage for the top 10 events:

propplot <- ggplot(prop, aes(x = reorder(EVTYPE, -TOTPROPDMG), y = TOTPROPDMG)) + 
                  geom_bar(stat = "identity", fill = "pink") + 
                  theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) + 
                  xlab("Event") + ylab("Property Damage (in $)") 
cropplot <- ggplot(crop, aes(x = reorder(EVTYPE, - TOTCROPDMG), y = TOTCROPDMG)) + 
                  geom_bar(stat = "identity", fill = "pink") + 
                  theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) + 
                  xlab("Event") + ylab("Crop Damage (in $)")
grid.arrange(propplot, cropplot, ncol = 2, nrow = 1, top = textGrob("Events with Greatest Economic Consequences",gp = gpar(fontsize = 14, font = 3)))

Fig. 2

The plots above show that flood events cause the greatest property damage while drought events cause the greatest crop damage. Flood events are second on the crop damage list, which means they have the greatest economic consequences.

Results

The following statements can be confirmed from our analysis:

  1. Most fatalities and injuries are caused by Tornado events, making them the most harmful with respect to population health. (refer Fig. 1)
  2. Flood events cause the greatest property damage while drought events cause the greatest crop damage. Also, flood events are second on the crop damage list, which means they have the greatest economic consequences. (refer Fig. 2)