Health and Economic Impact of Weather Events in the US

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

The analysis on the storm event database revealed that tornadoes are the most dangerous weather event to the population health. The second most dangerous event type is the excessive heat. The economic impact of weather events was also analyzed. Flash floods and thunderstorm winds caused billions of dollars in property damages between 1950 and 2011. The largest crop damage caused by drought, followed by flood and hails.

Data Processing

The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.

library(data.table)
storm <-  fread("repdata%2Fdata%2FStormData.csv" )
## 
Read 17.6% of 967216 rows
Read 34.1% of 967216 rows
Read 49.6% of 967216 rows
Read 60.0% of 967216 rows
Read 75.5% of 967216 rows
Read 82.7% of 967216 rows
Read 92.0% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:09
storm <- as.data.frame(storm)

Then, explore the structure and the first few observations of the dataset.

head(storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : chr  "3" "2" "2" "2" ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There are 37 variables, and here we select those related to our analysis.

main <- storm[ , c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
length(unique((main$EVTYPE)))
## [1] 985
# translate all letters to lowercase
event_types <- tolower(storm$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
## [1] 874
main$EVTYPE <- event_types

No further data preprocessing was performed although the event type field can be processed further to merge event types such as tstm wind and thunderstorm wind. After the cleaning, as expected, the number of unique event types reduce significantly. For further analysis, the cleaned event types are used.

Dangerous Events with respect to Population Health

library (dplyr)
main <- group_by(main, EVTYPE)
str(main)
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "tornado" "tornado" "tornado" "tornado" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  - attr(*, "vars")=List of 1
##   ..$ : symbol EVTYPE
##  - attr(*, "drop")= logi TRUE
##  - attr(*, "indices")=List of 874
##   ..$ : int 246123
##   ..$ : int 448451
##   ..$ : int 265834
##   ..$ : int 408332
##   ..$ : int 311102
##   ..$ : int  286076 354104 354971 384795
##   ..$ : int 314005
##   ..$ : int 418262
##   ..$ : int 377548
##   ..$ : int  317133 318247 322497 327547
##   ..$ : int  416340 489974
##   ..$ : int 453796
##   ..$ : int  427305 427326 434839 434856
##   ..$ : int  206502 245649 246813 291161 307730 308334
##   ..$ : int 189190
##   ..$ : int  451314 468698 488145 488630 488995 488996 489016 489018 489090 489098 ...
##   ..$ : int  647989 648793 648794 648795 649611 649791 651162 651163 651165 651174 ...
##   ..$ : int 214946
##   ..$ : int  188985 197153 198143 198144 198145 198177 198178 245869 251400 263661 ...
##   ..$ : int 231123
##   ..$ : int  253530 312546 359346 435863
##   ..$ : int 232609
##   ..$ : int  239661 239662
##   ..$ : int  217288 230923
##   ..$ : int 324074
##   ..$ : int  324077 330916 330920
##   ..$ : int  267119 278569 284176 296231 296803 297295 301079 324112 339058 345010 ...
##   ..$ : int  188773 188776 188791 188793 188794 188795 188796 188798 188808 188833 ...
##   ..$ : int  210551 233563
##   ..$ : int 210521
##   ..$ : int 209768
##   ..$ : int  192948 216768
##   ..$ : int 188918
##   ..$ : int 298057
##   ..$ : int 188775
##   ..$ : int 214965
##   ..$ : int 296256
##   ..$ : int 284188
##   ..$ : int  192273 192275 438114 443079
##   ..$ : int  192992 192994 198082 198085 198091 198186 198193 210560 221429 230915 ...
##   ..$ : int  210564 216243
##   ..$ : int 201214
##   ..$ : int 233584
##   ..$ : int 188779
##   ..$ : int  248508 248510 435511
##   ..$ : int 246723
##   ..$ : int 330956
##   ..$ : int 324060
##   ..$ : int  188817 188854 189046 189075 192337 192651 192656 194071 194081 194083 ...
##   ..$ : int  194118 194357 194389 207190 207717 208320 208733 208737 216666 217311 ...
##   ..$ : int  313896 313901 330959 351804 365202
##   ..$ : int  252624 252627 260029 265425 265817 265825 266919 266922 284072 296261
##   ..$ : int  208153 220873
##   ..$ : int  217275 217323
##   ..$ : int 252630
##   ..$ : int 296867
##   ..$ : int  188093 188503 190674 194046 194048 194086 194087 194088 194415 195024 ...
##   ..$ : int  200789 201130 201820 202202
##   ..$ : int  201030 209196
##   ..$ : int 202180
##   ..$ : int  290347 298652 307783 308344 308346 308367 308605
##   ..$ : int 282922
##   ..$ : int 199737
##   ..$ : int  279967 279974
##   ..$ : int  323510 323970 330778 330780
##   ..$ : int  198726 209858 213991
##   ..$ : int  232879 232880 232990 451319
##   ..$ : int  570799 570800 587890 593048 593049 601129 606118 606332 606440 606459 ...
##   ..$ : int  323971 330781 499069 499071 506531 506532
##   ..$ : int 232612
##   ..$ : int 198197
##   ..$ : int 350344
##   ..$ : int  417607 434950
##   ..$ : int  283092 283093 297372 392224
##   ..$ : int 221548
##   ..$ : int  197132 198113 202399 204462 216220 252787 252813 252817
##   ..$ : int 213455
##   ..$ : int  187616 189608 192263 192272 192276 192277 192278 192279 192280 192282 ...
##   ..$ : int  626159 649577 677177 685298 725455 733335 780034 784173 839524 848497
##   ..$ : int  192629 269335
##   ..$ : int  192660 216979
##   ..$ : int 359910
##   ..$ : int 265183
##   ..$ : int  194360 194361 197783 197785 197790 199732 208575 208577 208610 215081 ...
##   ..$ : int  206503 208539 208546 208553 208560 208586 208589 208598 208601 208606 ...
##   ..$ : int 474064
##   ..$ : int  194402 377949 382044 382131 415946 416284 416434 450915 452927
##   ..$ : int  359360 359394 364828 364838 427330 434862
##   ..$ : int 232811
##   ..$ : int  192410 192425 192714 193165 193166 193316 193575 193666 214720 215623 ...
##   ..$ : int 192853
##   ..$ : int 192713
##   ..$ : int  193261 193672
##   ..$ : int 192458
##   ..$ : int 193583
##   ..$ : int  192696 192708 193660 193705 193999
##   ..$ : int 192697
##   ..$ : int 246836
##   ..$ : int  309727 413600 413603 435506
##   .. [list output truncated]
##  - attr(*, "group_sizes")= int  1 1 1 1 1 4 1 1 1 4 ...
##  - attr(*, "biggest_group_size")= int 288661
##  - attr(*, "labels")='data.frame':   874 obs. of  1 variable:
##   ..$ EVTYPE: chr  " " "   high surf advisory" " coastal flood" " flash flood" ...
##   ..- attr(*, "vars")=List of 1
##   .. ..$ : symbol EVTYPE
##   ..- attr(*, "drop")= logi TRUE
pophealth <- summarise(main, count = n(), totFat = sum(FATALITIES), totInj = sum(INJURIES), meanFat = mean(FATALITIES), meanInj = mean(INJURIES))

Top 10 eventtypes that leading to thelargest number of total deaths are:

totdeath  <- head(arrange(pophealth[ , c("EVTYPE", "totFat")], desc(totFat)), 10)
totdeath 
## # A tibble: 10 × 2
##            EVTYPE totFat
##             <chr>  <dbl>
## 1         tornado   5633
## 2  excessive heat   1903
## 3     flash flood    978
## 4            heat    937
## 5       lightning    816
## 6       tstm wind    504
## 7           flood    470
## 8     rip current    368
## 9       high wind    248
## 10      avalanche    224

Top 10 eventtypes that leading to the largest number of total injuries are:

totinj <- head(arrange(pophealth[ , c("EVTYPE", "totInj")], desc(totInj)), 10)
totinj
## # A tibble: 10 × 2
##               EVTYPE totInj
##                <chr>  <dbl>
## 1            tornado  91346
## 2          tstm wind   6957
## 3              flood   6789
## 4     excessive heat   6525
## 5          lightning   5230
## 6               heat   2100
## 7          ice storm   1975
## 8        flash flood   1777
## 9  thunderstorm wind   1488
## 10              hail   1361

Top 10 eventtypes that leading to the largest mean number of total deaths are:

meandeath <- head(arrange(pophealth[ , c("EVTYPE", "meanFat", "count")], desc(meanFat)), 10)
meandeath
## # A tibble: 10 × 3
##                        EVTYPE   meanFat count
##                         <chr>     <dbl> <int>
## 1  tornadoes  tstm wind  hail 25.000000     1
## 2               cold and snow 14.000000     1
## 3       tropical storm gordon  8.000000     1
## 4       record excessive heat  5.666667     3
## 5                extreme heat  4.363636    22
## 6           heat wave drought  4.000000     1
## 7              high wind seas  4.000000     1
## 8               marine mishap  3.500000     2
## 9               winter storms  3.333333     3
## 10        heavy surf and wind  3.000000     1

Top 10 eventtypes that leading to the largest mean number of total injuris are:

meaninj <- head(arrange(pophealth[ , c("EVTYPE", "meanInj", "count")], desc(meanInj)), 10)
meaninj
## # A tibble: 10 × 3
##                     EVTYPE   meanInj count
##                      <chr>     <dbl> <int>
## 1    tropical storm gordon 43.000000     1
## 2               wild fires 37.500000     4
## 3            thunderstormw 27.000000     1
## 4       high wind and seas 20.000000     1
## 5          snow high winds 18.000000     2
## 6          glaze ice storm 15.000000     1
## 7        heat wave drought 15.000000     1
## 8        hurricane typhoon 14.488636    88
## 9  winter storm high winds  7.500000     2
## 10            extreme heat  7.045455    22

Economic Effects of Weather Events

To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.

In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.

unique(main$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"

We can see that there are many dirty data. We need to deal with it, transforming all the damage value to the same unit.

exp_transform <- function(e) {
    # h -> hundred, k -> thousand, m -> million, b -> billion
    if (e %in% c('h', 'H'))
        return(2)
    else if (e %in% c('k', 'K'))
        return(3)
    else if (e %in% c('m', 'M'))
        return(6)
    else if (e %in% c('b', 'B'))
        return(9)
    else if (!is.na(as.numeric(e))) # if a digit
        return(as.numeric(e))
    else if (e %in% c('', '-', '?', '+'))
        return(0)
    else {
        stop("Invalid exponent value.")
    }
}
prop_dmg_exp <- sapply(main$PROPDMGEXP, FUN=exp_transform)
main$prop_dmg <- main$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(main$CROPDMGEXP, FUN=exp_transform)
main$crop_dmg <- main$CROPDMG * (10 ** crop_dmg_exp)

To calculate the economic loss:

main <- group_by(main, EVTYPE)
eco_loss <- summarise(main, totProp = sum(prop_dmg), totCrop = sum(crop_dmg), count = n(), meanProp = mean(prop_dmg, na.rm = TRUE), meanCrop = mean(crop_dmg, na.rm = TRUE))

Top 10 eventtypes that leading to thelargest number of total property damage are:

totprop  <- head(arrange(eco_loss[ , c("EVTYPE", "totProp")], desc(totProp)), 10)
totprop 
## # A tibble: 10 × 2
##               EVTYPE      totProp
##                <chr>        <dbl>
## 1              flood 144657709807
## 2  hurricane typhoon  69305840000
## 3            tornado  56947380677
## 4        storm surge  43323536000
## 5        flash flood  16822673979
## 6               hail  15735267513
## 7          hurricane  11868319010
## 8     tropical storm   7703890550
## 9       winter storm   6688497251
## 10         high wind   5270046295

Top 10 eventtypes that leading to the largest number of total crop damage are:

totcrop <- head(arrange(eco_loss[ , c("EVTYPE", "totCrop")], desc(totCrop)), 10)
totcrop
## # A tibble: 10 × 2
##               EVTYPE     totCrop
##                <chr>       <dbl>
## 1            drought 13972566000
## 2              flood  5661968450
## 3        river flood  5029459000
## 4          ice storm  5022113500
## 5               hail  3025954473
## 6          hurricane  2741910000
## 7  hurricane typhoon  2607872800
## 8        flash flood  1421317100
## 9       extreme cold  1312973000
## 10      frost freeze  1094186000

Top 10 eventtypes that leading to the largest mean number of proper damage are:

meanpop <- head(arrange(eco_loss[ , c("EVTYPE", "meanProp", "count")], desc(meanProp)), 10)
meanpop 
## # A tibble: 10 × 3
##                        EVTYPE   meanProp count
##                         <chr>      <dbl> <int>
## 1  tornadoes  tstm wind  hail 1600000000     1
## 2   heavy rain severe weather 1250000000     2
## 3           hurricane typhoon  787566364    88
## 4              hurricane opal  352538444     9
## 5                 storm surge  165990559   261
## 6                  wild fires  156025000     4
## 7   hurricane opal high winds  100000000     1
## 8         severe thunderstorm   92720000    13
## 9                   hailstorm   80333333     3
## 10                  hurricane   68208730   174

Top 10 eventtypes that leading to the largest mean number of crop damages are:

meancrop<- head(arrange(eco_loss[ , c("EVTYPE", "meanCrop", "count")], desc(meanCrop)), 10)
meancrop
## # A tibble: 10 × 3
##                       EVTYPE  meanCrop count
##                        <chr>     <dbl> <int>
## 1          excessive wetness 142000000     1
## 2    cold and wet conditions  66000000     1
## 3            damaging freeze  37028750     8
## 4          hurricane typhoon  29634918    88
## 5                river flood  29072017   173
## 6                early frost  21000000     2
## 7             hurricane erin  19430000     7
## 8           flood rain winds  18800000     6
## 9                  hurricane  15758103   174
## 10 hurricane opal high winds  10000000     1

Results

Health impact of weather events

The following plots show the top dangerous weather events for population health in terms of total number of fatalities and injuries respectively:

library(ggplot2)
library(gridExtra)
p1 <- ggplot(totdeath, aes(x = reorder(EVTYPE,totFat), y = totFat, fill = totFat)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ylab("Total number of fatalities") +
  xlab("Event type") 
  
  
  
p2 <- ggplot(totinj, aes(x = reorder(EVTYPE,totInj), y = totInj, fill = totInj)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ylab("Total number of injuries") +
  xlab("Event type") 
 

grid.arrange(grobs = list(p1,p2), main = "top dangerous weather events for population health ")

Tornadoes cause most number of deaths and injuries among all event types. There are more than 5,000 deaths and more than 90,000 injuries in the last 60 years in US, due to tornadoes. The other event types that are most dangerous with respect to population health are excessive heat, heat and flash floods.

Economic impact of weather events

The following plots show the top dangerous weather events for economic loss in terms of total number of property damage and crop damage respectively:

p3 <- ggplot(totprop, aes(x = reorder(EVTYPE,totProp), y = log10(totProp), fill = totProp)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ylab("Property damage in dollars (log-scale)") +
  xlab("Event type") 
  
  
p4 <- ggplot(totcrop, aes(x = reorder(EVTYPE,totCrop), y = totCrop, fill = totCrop)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  ylab("Crop damage in dollars") +
  xlab("Event type") 

grid.arrange(grobs = list(p3,p4), main="Weather costs to the US economy (1950-2011)")

Property damages are given in logarithmic scale due to large range of values. The data shows that flash floods and thunderstorm winds cost the largest property damages among weather-related natural diseasters. Note that, due to untidy nature of the available data, type flood and flash flood are separate values and should be merged for more accurate data-driven conclusions.

The most severe weather event in terms of crop damage is the drought. In the last half century, the drought has caused more than 10 billion dollars damage. Other severe crop-damage-causing event types are floods and hails.