Synopsis

This report analyzes the impact of different severe weather events on public health and economy in the United States from 1994 to 2011. Our analysis is based on data collected by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It tracks characteristics of major storms and weather events in the U.S. from 1950 - 2011, including when and where they occur, as well as estimates of any fatalities, injuries and property damage. To determine the impact of storms on U.S. public health and economy, we use estimates on fatalities and injuries - on the one side - and estimates on property and crop damages - on the other side. We focus out attention on the period that goes from 1994 to 2011, as more recent years are most significant in terms of data availability. Our finding is that excessive heat and tornado are most harmful with respect to population health. In particular, Tornado is the most hazordous climate event in terms of injuries - with more than 22,000 injuries. Excessive heat is the most significant event in terms of fatalities - with 1,903 deaths. With respect to the impact on U.S. economy, we find that Flood, drought and hurricane/typhoon have the greatest economic consequences. In more details, Floods have caused the greatest property damages - more than 144 billion USD. Drought, instead, turns out to be the main cause of crop damages - with more than 13 billion USD.

Basic settings

echo = TRUE  

library("R.utils")
library(dplyr)
library(ggplot2)
require(gridExtra)

Session Info

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=Italian_Italy.1252      LC_CTYPE=Italian_Italy.1252       
## [3] LC_MONETARY=Italian_Italy.1252     LC_NUMERIC=C                      
## [5] LC_TIME=English_United States.1252
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] gridExtra_0.9.1   ggplot2_1.0.0     dplyr_0.3.0.2     R.utils_2.0.0    
## [5] R.oo_1.19.0       R.methodsS3_1.7.0
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   colorspace_1.2-4 DBI_0.3.1        digest_0.6.4    
##  [5] evaluate_0.5.5   formatR_1.0      gtable_0.1.2     htmltools_0.2.6 
##  [9] knitr_1.8        magrittr_1.0.1   MASS_7.3-33      munsell_0.4.2   
## [13] parallel_3.1.1   plyr_1.8.1       proto_0.3-10     Rcpp_0.11.3     
## [17] reshape2_1.4     rmarkdown_0.5.1  scales_0.2.4     stringr_0.6.2   
## [21] tools_3.1.1      yaml_2.1.13

Data Processing

Read data:

data <- read.csv("repdata-data-StormData.csv", header = TRUE)

Look at data:

str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

To reduce dataset size, we just keep columns of interest:

storm_data <- select(data,STATE,BGN_DATE,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

Check for missing values:

sum(is.na(storm_data))
## [1] 0

Extract variable “year” from date format:

storm_data <- mutate(storm_data, year = as.numeric(format(as.Date(BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y")))

Have a look at sample size by year:

hist(storm_data$year, breaks = 60)

Select only more recent years that should be more complete:

storm_data <- filter(storm_data, year >= 1994)

PROPDMGEXP and CROPDMGEXP variables need to be recoded into numerical formats according to the multiplier as indicated in the Storm Events CodeBook (H = Hundred, K = Thousand, M = Million and B = Billion).

levels(storm_data$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(storm_data$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
unit <- c("", "+", "-", "?", 0:8, "h", "H", "k", "K", "m", "M", "B")
multiplier <- c(rep(0,4), 0:8, 2, 2, 3, 3, 6, 6, 9)
mult.df <- data.frame(unit, multiplier)

storm_data$PROPDMGEXP <- mult.df[match(storm_data$PROPDMGEXP, mult.df$unit),2]
storm_data$CROPDMGEXP <- mult.df[match(storm_data$CROPDMGEXP, mult.df$unit),2]

To get the amount of economic damages in dollars, let’s multiply the number of property/crops damages by their recoded expenses ($):

storm_data <- mutate(storm_data, PROPERTY_DAMAGE = PROPDMG * 10 ^ PROPDMGEXP, CROP_DAMAGE = CROPDMG * 10 ^ CROPDMGEXP)

After data processing, let’s look at the dataset:

str(storm_data)
## 'data.frame':    702131 obs. of  12 variables:
##  $ STATE          : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ BGN_DATE       : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 763 399 4735 4648 3932 1805 6403 10570 10570 10570 ...
##  $ EVTYPE         : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 201 629 429 657 657 410 786 786 834 244 ...
##  $ FATALITIES     : num  0 0 0 0 0 2 0 0 0 0 ...
##  $ INJURIES       : num  0 0 2 0 0 0 0 0 0 0 ...
##  $ PROPDMG        : num  0 0 0 0 0 0.1 50 5 500 0 ...
##  $ PROPDMGEXP     : num  0 0 0 0 0 9 3 6 3 0 ...
##  $ CROPDMG        : num  0 0 0 0 0 10 0 500 0 0 ...
##  $ CROPDMGEXP     : num  0 0 0 0 0 6 0 3 0 0 ...
##  $ year           : num  1995 1995 1994 1995 1995 ...
##  $ PROPERTY_DAMAGE: num  0e+00 0e+00 0e+00 0e+00 0e+00 1e+08 5e+04 5e+06 5e+05 0e+00 ...
##  $ CROP_DAMAGE    : num  0e+00 0e+00 0e+00 0e+00 0e+00 1e+07 0e+00 5e+05 0e+00 0e+00 ...
head(storm_data)
##   STATE          BGN_DATE                    EVTYPE FATALITIES INJURIES
## 1    AL  1/6/1995 0:00:00             FREEZING RAIN          0        0
## 2    AL 1/22/1995 0:00:00                      SNOW          0        0
## 3    AL  2/9/1994 0:00:00     ICE STORM/FLASH FLOOD          0        2
## 4    AL  2/6/1995 0:00:00                  SNOW/ICE          0        0
## 5    AL 2/11/1995 0:00:00                  SNOW/ICE          0        0
## 6    AL 10/4/1995 0:00:00 HURRICANE OPAL/HIGH WINDS          2        0
##   PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year PROPERTY_DAMAGE CROP_DAMAGE
## 1     0.0          0       0          0 1995           0e+00       0e+00
## 2     0.0          0       0          0 1995           0e+00       0e+00
## 3     0.0          0       0          0 1994           0e+00       0e+00
## 4     0.0          0       0          0 1995           0e+00       0e+00
## 5     0.0          0       0          0 1995           0e+00       0e+00
## 6     0.1          9      10          6 1995           1e+08       1e+07

Impact on Public Health

The first part of this project asks us to find out the severest weather events in terms of population health. Therefore, we rank the total number of fatalities by wheather event type to get the list of the top 15 severest wheather event type.

fatalities_ranking <- 

        storm_data %>%
    group_by(EVTYPE) %>% 
    select(FATALITIES) %>%
    summarise(
        FATALITIES = sum(FATALITIES)
        ) %>% 
    arrange(desc(FATALITIES)) %>% 
    mutate(rank = dense_rank(desc(FATALITIES))) %>%
    filter(rank <= 15) %>%
        mutate(EVTYPE = factor(EVTYPE, levels = EVTYPE))

Then, we do the same for the number of injuries:

injuries_ranking <- 

    storm_data %>%
    group_by(EVTYPE) %>% 
    select(INJURIES) %>%
    summarise(
                INJURIES = sum(INJURIES)       
        ) %>% 
    arrange(desc(INJURIES)) %>% 
    mutate(rank = dense_rank(desc(INJURIES))) %>%
    filter(rank <= 15) %>%
        mutate(EVTYPE = factor(EVTYPE, levels = EVTYPE))

Impact on Economy

The second part of this project ask us to find out the severest weather events in terms of economic damages. As in the previous section, we aggregate property/crop damages by wheather event type. Then, we ranked them to get the lists of 15 weather events that have had the severest consequences on the U.S. economy.

property_damage_ranking <- 

        storm_data %>%
        group_by(EVTYPE) %>% 
        select(PROPERTY_DAMAGE) %>%
        summarise(
            PROPERTY_DAMAGE = sum(PROPERTY_DAMAGE)
        ) %>% 
        arrange(desc(PROPERTY_DAMAGE)) %>% 
        mutate(rank = dense_rank(desc(PROPERTY_DAMAGE))) %>%
        filter(rank <= 15) %>%
        mutate(EVTYPE = factor(EVTYPE, levels = EVTYPE))

crop_damage_ranking <- 

        storm_data %>%
        group_by(EVTYPE) %>% 
        select(CROP_DAMAGE) %>%
        summarise(
            CROP_DAMAGE = sum(CROP_DAMAGE)  
        ) %>% 
        arrange(desc(CROP_DAMAGE)) %>% 
        mutate(rank = dense_rank(desc(CROP_DAMAGE))) %>%
        filter(rank <= 15) %>%
        mutate(EVTYPE = factor(EVTYPE, levels = EVTYPE))

Results

Let’s print out the two lists with the 15 most significant storm events in terms of damages on population health:

fatalities_ranking
## Source: local data frame [15 x 3]
## 
##               EVTYPE FATALITIES rank
## 1     EXCESSIVE HEAT       1903    1
## 2            TORNADO       1593    2
## 3        FLASH FLOOD        951    3
## 4               HEAT        930    4
## 5          LIGHTNING        794    5
## 6              FLOOD        450    6
## 7        RIP CURRENT        368    7
## 8          HIGH WIND        242    8
## 9          TSTM WIND        241    9
## 10         AVALANCHE        224   10
## 11      RIP CURRENTS        204   11
## 12      WINTER STORM        195   12
## 13         HEAT WAVE        172   13
## 14      EXTREME COLD        150   14
## 15 THUNDERSTORM WIND        133   15
injuries_ranking
## Source: local data frame [15 x 3]
## 
##               EVTYPE INJURIES rank
## 1            TORNADO    22571    1
## 2              FLOOD     6778    2
## 3     EXCESSIVE HEAT     6525    3
## 4          LIGHTNING     5116    4
## 5          TSTM WIND     3631    5
## 6               HEAT     2095    6
## 7          ICE STORM     1971    7
## 8        FLASH FLOOD     1754    8
## 9  THUNDERSTORM WIND     1476    9
## 10      WINTER STORM     1298   10
## 11 HURRICANE/TYPHOON     1275   11
## 12         HIGH WIND     1099   12
## 13        HEAVY SNOW      980   13
## 14              HAIL      943   14
## 15          WILDFIRE      911   15

and let’s make a plot summarizing all these information:

fatalities_plot <- 
    qplot(EVTYPE, data = fatalities_ranking, weight = FATALITIES, geom = "bar") + 
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
      geom_histogram(colour = "white", fill = "black", binwidth = 1) +
      xlab("Severe Weather Events") + 
      scale_y_continuous("Number of Fatalities") +  
      ggtitle("Number of Fatalities\n by Top 15 Severe Weather\n Events in the U.S.\n from 1994 - 2011")

injuries_plot <-
    qplot(EVTYPE, data = injuries_ranking, weight = INJURIES, geom = "bar") + 
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
      geom_histogram(colour = "darkgreen", fill = "white", binwidth = 1) +
      xlab("Severe Weather Events") + 
      scale_y_continuous("Number of Injuries") +  
      ggtitle("Number of Injuries\n by Top 15 Severe Weather\n Events in the U.S.\n from 1994 - 2011")

grid.arrange(fatalities_plot, injuries_plot, ncol = 2)

From the histogram above, Tornado and Flood turn out to be the 2 severest climate events in terms of number of injuries - with 22,571 and 6,778 injuries. Excessive heat and Tornado have caused hte greatest number of fatalities - with 1,903 and 1,593 deaths from 1994 to 2011.

Finally, let’s look at the 15 most significant storm events in terms of economic damages:

property_damage_ranking
## Source: local data frame [15 x 3]
## 
##               EVTYPE PROPERTY_DAMAGE rank
## 1              FLOOD    144179608807    1
## 2  HURRICANE/TYPHOON     69305840000    2
## 3        STORM SURGE     43193536000    3
## 4            TORNADO     25630588401    4
## 5        FLASH FLOOD     16398255929    5
## 6               HAIL     15338044461    6
## 7          HURRICANE     11862819010    7
## 8     TROPICAL STORM      7703385550    8
## 9          HIGH WIND      5266939295    9
## 10          WILDFIRE      4765114000   10
## 11  STORM SURGE/TIDE      4641188000   11
## 12         TSTM WIND      4484273495   12
## 13         ICE STORM      3832377860   13
## 14 THUNDERSTORM WIND      3480404972   14
## 15    HURRICANE OPAL      3172846000   15
crop_damage_ranking
## Source: local data frame [15 x 3]
## 
##               EVTYPE CROP_DAMAGE rank
## 1            DROUGHT 13922066000    1
## 2              FLOOD  5506942450    2
## 3          ICE STORM  5022113500    3
## 4               HAIL  2982699123    4
## 5          HURRICANE  2741410000    5
## 6  HURRICANE/TYPHOON  2607872800    6
## 7        FLASH FLOOD  1402661500    7
## 8       EXTREME COLD  1292973000    8
## 9       FROST/FREEZE  1094086000    9
## 10        HEAVY RAIN   733399800   10
## 11    TROPICAL STORM   677841000   11
## 12         HIGH WIND   633566300   12
## 13         TSTM WIND   553997350   13
## 14    EXCESSIVE HEAT   492402000   14
## 15 THUNDERSTORM WIND   414833050   15

and again, let’s plot the results:

property_damage_plot <- 
    qplot(EVTYPE, data = property_damage_ranking, weight = PROPERTY_DAMAGE/10^6, geom = "bar") + 
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
      geom_histogram(colour = "white", fill = "darkgrey", binwidth = 1) +
      xlab("Severe Weather Events") + 
      scale_y_continuous("Property Damage [Million $]") +  
      ggtitle("Million $ Property Damage\n by Top 15 Severe Weather\n Events in the U.S.\n from 1994 - 2011")

crop_damage_plot <-
    qplot(EVTYPE, data = crop_damage_ranking, weight = CROP_DAMAGE/10^6, geom = "bar") + 
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
      geom_histogram(colour = "white", fill = "brown", binwidth = 1) +
      xlab("Severe Weather Events") + 
      scale_y_continuous("Crop Damage [Million $]") +  
      ggtitle("Million $ Crop Damage\n by Top 15 Severe Weather\n Events in the U.S.\n from 1994 - 2011")

grid.arrange(property_damage_plot, crop_damage_plot, ncol = 2)

In terms of property damages, we show that Floods and Hurricane/Typhoon have been the most severe weather events - with more than 144 and 69 billion USD, respectively. We also show that Drought and Flood represent the top 2 causes of crop damages - with more than 13 and 5 billion USD, respectively.

Conclusion

Our finding is that across the United States from 1994 to 2011, Excessive heat and Tornado had the greatest impact on most population health - while Flood, Hurricane/typhoon and Drought had the greatest economic consequences.