Synopsis

The basic goal of this report is to explore the NOAA Storm Database and answer some basic questions about severe weather events. In this report we aim to find out which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences across United States.

Data Processing

1. Data Source

  • The data for this study come from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) database.The data can be obtained from the course web site: Strom Data.
  • The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
  • The definition & construction of variables are available from National Weather Service Storm Data Documentation.
  • National Climatic Data Center Storm Events FAQ

2. Loading Data

setwd("/Users/benchan/Google Drive/R/Reproducible Research/")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "StormData.csv.bz2")
storm_data <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)

3. Subseting Data

  • Check out the structure and summary of the data set.
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
summary(storm_data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI       
##  Min.   :   0.000   Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character  
##  Mean   :   1.484                                        
##  3rd Qu.:   1.000                                        
##  Max.   :3749.000                                        
##                                                          
##    END_DATE           END_TIME           COUNTY_END COUNTYENDN    
##  Length:902297      Length:902297      Min.   :0    Mode:logical  
##  Class :character   Class :character   1st Qu.:0    NA's:902297   
##  Mode  :character   Mode  :character   Median :0                  
##                                        Mean   :0                  
##                                        3rd Qu.:0                  
##                                        Max.   :0                  
##                                                                   
##    END_RANGE          END_AZI           END_LOCATI       
##  Min.   :  0.0000   Length:902297      Length:902297     
##  1st Qu.:  0.0000   Class :character   Class :character  
##  Median :  0.0000   Mode  :character   Mode  :character  
##  Mean   :  0.9862                                        
##  3rd Qu.:  0.0000                                        
##  Max.   :925.0000                                        
##                                                          
##      LENGTH              WIDTH                F               MAG         
##  Min.   :   0.0000   Min.   :   0.000   Min.   :0.0      Min.   :    0.0  
##  1st Qu.:   0.0000   1st Qu.:   0.000   1st Qu.:0.0      1st Qu.:    0.0  
##  Median :   0.0000   Median :   0.000   Median :1.0      Median :   50.0  
##  Mean   :   0.2301   Mean   :   7.503   Mean   :0.9      Mean   :   46.9  
##  3rd Qu.:   0.0000   3rd Qu.:   0.000   3rd Qu.:1.0      3rd Qu.:   75.0  
##  Max.   :2315.0000   Max.   :4400.000   Max.   :5.0      Max.   :22000.0  
##                                         NA's   :843563                    
##    FATALITIES          INJURIES            PROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Median :  0.0000   Median :   0.0000   Median :   0.00  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##                                                          
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000                     
##                                                         
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
  • To reduce the computing time of processing this huge dataset, I subset only the relevant variables for the following study, which are the event types (variable “EVTYPE”) and the figures related to population health impacts (variable “Fatalities” & “Injures”“) and economic consequences (variable”PropDMG“,”PROPDMGEXP“,”CROPDMG" & “CROPDMGEXP”).

Health Impact

library(dplyr)
storm_fatalities <- storm_data %>% select(EVTYPE, FATALITIES) %>% group_by(EVTYPE) %>% summarise(total.fatalities = sum(FATALITIES)) %>% arrange(-total.fatalities)
head(storm_fatalities, 10)
## # A tibble: 10 x 2
##            EVTYPE total.fatalities
##             <chr>            <dbl>
##  1        TORNADO             5633
##  2 EXCESSIVE HEAT             1903
##  3    FLASH FLOOD              978
##  4           HEAT              937
##  5      LIGHTNING              816
##  6      TSTM WIND              504
##  7          FLOOD              470
##  8    RIP CURRENT              368
##  9      HIGH WIND              248
## 10      AVALANCHE              224
storm_injuries <- storm_data %>% select(EVTYPE, INJURIES) %>% group_by(EVTYPE) %>% summarise(total.injuries = sum(INJURIES)) %>% arrange(-total.injuries)
head(storm_injuries, 10)
## # A tibble: 10 x 2
##               EVTYPE total.injuries
##                <chr>          <dbl>
##  1           TORNADO          91346
##  2         TSTM WIND           6957
##  3             FLOOD           6789
##  4    EXCESSIVE HEAT           6525
##  5         LIGHTNING           5230
##  6              HEAT           2100
##  7         ICE STORM           1975
##  8       FLASH FLOOD           1777
##  9 THUNDERSTORM WIND           1488
## 10              HAIL           1361

Economic Impact

  • The data provides two types of economic impact, namely property damage (PROPDMG) and crop damage (CROPDMG). The actual damage in $USD is indicated by PROPDMGEXP and CROPDMGEXP parameters. the index in the PROPDMGEXP and CROPDMGEXP can be interpreted as the following:
  • H, h -> hundreds = x100
  • K, K -> kilos = x1,000
  • M, m -> millions = x1,000,000
  • B,b -> billions = x1,000,000,000
  • (+) -> x1
  • (-) -> x0
  • (?) -> x0
  • blank -> x0
storm_damage <- storm_data %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

Symbol <- sort(unique(as.character(storm_damage$PROPDMGEXP)))
Multiplier <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
convert.Multiplier <- data.frame(Symbol, Multiplier)

storm_damage$Prop.Multiplier <- convert.Multiplier$Multiplier[match(storm_damage$PROPDMGEXP, convert.Multiplier$Symbol)]
storm_damage$Crop.Multiplier <- convert.Multiplier$Multiplier[match(storm_damage$CROPDMGEXP, convert.Multiplier$Symbol)]

storm_damage <- storm_damage %>% mutate(PROPDMG = PROPDMG*Prop.Multiplier) %>% mutate(CROPDMG = CROPDMG*Crop.Multiplier) %>% mutate(TOTAL.DMG = PROPDMG+CROPDMG)

storm_damage_total <- storm_damage %>% group_by(EVTYPE) %>% summarize(TOTAL.DMG.EVTYPE = sum(TOTAL.DMG))%>% arrange(-TOTAL.DMG.EVTYPE) 

head(storm_damage_total,10)
## # A tibble: 10 x 2
##               EVTYPE TOTAL.DMG.EVTYPE
##                <chr>            <dbl>
##  1             FLOOD     150319678250
##  2 HURRICANE/TYPHOON      71913712800
##  3           TORNADO      57352117607
##  4       STORM SURGE      43323541000
##  5       FLASH FLOOD      17562132111
##  6           DROUGHT      15018672000
##  7         HURRICANE      14610229010
##  8       RIVER FLOOD      10148404500
##  9         ICE STORM       8967041810
## 10    TROPICAL STORM       8382236550

Results

Health Impact

The top 10 events with the highest total fatalities and injuries are shown graphically. As shown in the figures, tornado is the most harmful event with respect to population health.

library(ggplot2)
plot_fatalities <- ggplot(storm_fatalities[1:10,], 
                          aes(x=reorder(EVTYPE, -total.fatalities), 
                              y=total.fatalities, fill = EVTYPE)) + 
    geom_bar(stat="identity") + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
          plot.title = element_text(hjust = 0.5)) + 
    ggtitle("Top 10 Events with Highest Total Fatalities") +
    labs(x = "EVENT TYPE", y = "Total Fatalities")
plot_fatalities

plot_injuries <- ggplot(storm_injuries[1:10,], 
                          aes(x=reorder(EVTYPE, -total.injuries), 
                              y=total.injuries, fill = EVTYPE)) + 
    geom_bar(stat="identity") + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
          plot.title = element_text(hjust = 0.5)) + 
    ggtitle("Top 10 Events with Highest Total Injuries") +
    labs(x = "EVENT TYPE", y = "Total Injuries")
plot_injuries

Economic Impact

The top 10 events with the highest total economic damages (property and crop combined) are shown graphically. As shown in the figure, flood has greatest economic consequences.

plot_damage <- ggplot(storm_damage_total[1:10,], 
                      aes(x=reorder(EVTYPE, -TOTAL.DMG.EVTYPE), 
                          y=TOTAL.DMG.EVTYPE, fill = EVTYPE)) + 
    geom_bar(stat="identity") + 
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1),
          plot.title = element_text(hjust = 0.5)) +
    ggtitle("Top 10 Events with Highest Economic Impact") +
    labs(x="EVENT TYPE", y="Total Economic Impact ($USD)")
plot_damage