Synopsis

This paper analyzes the effects of storms and various weather events on health and the economy in communities.

To study this we have the NOAA (U.S. National Oceanic and Atmospheric Administration’s) storm database that contains data of when and where the phenomena occur , as well as the estimate of deaths, people injured and urban destruction.

We will study the types of events (variable EVTYPE) that are most harmful to the health of the population and which types of events have the most serious economic consequences

Data Processing

I load the stormData.csv.bz2 file from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

library(knitr)
library(ggplot2)
library(tidyverse)
## -- Attaching packages ------------------------------------------------ tidyverse 1.3.0 --
## v tibble  3.0.0     v dplyr   1.0.1
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.4
## -- Conflicts --------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
if(!file.exists("stormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
  destfile = "stormData.csv.bz2", method = "curl")
}

# Loading data
data <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)

#Summary of data
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The fields necessary to perform the study are obtained from the data which are:

BGNDATE: Begin date

EVTYPE: Type of event ej. tornado, storm, flood,etc

FATALITIES: number of fatalities

INJURIES: number of injuries

PROPDMG: Property damage in dollars

PROPDMGEXP: K=thousands, M=millions, B=billions

CROPDMG: Crop damage in dollars

CROPDMGEXP: H=hundreds, K=thousands, M=millions, B=billions

I proceed to calculate total values according to variables PROPDMG and PROPDMGEXP in new variable PROPDMGTOTAL

data <- data %>% mutate (
  
  PROPDMGTOTAL =  case_when(
                      PROPDMGEXP=='H' ~  PROPDMG * 10^2 ,
                      PROPDMGEXP=='K' ~  PROPDMG * 10^3 ,
                      PROPDMGEXP=='M' ~  PROPDMG * 10^6 ,
                      PROPDMGEXP=='B' ~  PROPDMG * 10^9 ,
                      TRUE ~ 0
                  )  
  
  )

I proceed to calculate total values according to variables CROPDMG and CROPDMGEXP in new variable CROPDMGTOTAL and I look at the new columns created

data <- data %>% mutate (
  
  CROPDMGTOTAL =  case_when(
                      CROPDMGEXP=='H' ~  CROPDMG * 10^2 ,
                      CROPDMGEXP=='K' ~  CROPDMG * 10^3 ,
                      CROPDMGEXP=='M' ~  CROPDMG * 10^6 ,
                      CROPDMGEXP=='B' ~  CROPDMG * 10^9 ,
                      TRUE ~ 0
                  )  
  
  )
str(data)
## 'data.frame':    902297 obs. of  39 variables:
##  $ STATE__     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE    : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME    : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE   : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY      : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME  : chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE       : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE      : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI     : chr  "" "" "" "" ...
##  $ BGN_LOCATI  : chr  "" "" "" "" ...
##  $ END_DATE    : chr  "" "" "" "" ...
##  $ END_TIME    : chr  "" "" "" "" ...
##  $ COUNTY_END  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN  : logi  NA NA NA NA NA NA ...
##  $ END_RANGE   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI     : chr  "" "" "" "" ...
##  $ END_LOCATI  : chr  "" "" "" "" ...
##  $ LENGTH      : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH       : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F           : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES  : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES    : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG     : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP  : chr  "K" "K" "K" "K" ...
##  $ CROPDMG     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP  : chr  "" "" "" "" ...
##  $ WFO         : chr  "" "" "" "" ...
##  $ STATEOFFIC  : chr  "" "" "" "" ...
##  $ ZONENAMES   : chr  "" "" "" "" ...
##  $ LATITUDE    : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE   : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E  : num  3051 0 0 0 0 ...
##  $ LONGITUDE_  : num  8806 0 0 0 0 ...
##  $ REMARKS     : chr  "" "" "" "" ...
##  $ REFNUM      : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ PROPDMGTOTAL: num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ CROPDMGTOTAL: num  0 0 0 0 0 0 0 0 0 0 ...
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM PROPDMGTOTAL CROPDMGTOTAL
## 1       3051       8806              1        25000            0
## 2          0          0              2         2500            0
## 3          0          0              3        25000            0
## 4          0          0              4         2500            0
## 5          0          0              5         2500            0
## 6          0          0              6         2500            0

Results

Total Fatalities by EvType

I count total Fatalities by event type and graph on a histogram

totalFatalities <- aggregate(data$FATALITIES, by = list(data$EVTYPE), "sum")
names(totalFatalities) <- c("Event", "Fatalities")
totalFatalitiesSorted <- totalFatalities[order(-totalFatalities$Fatalities), ][1:20, ]
totalFatalitiesSorted
##                       Event Fatalities
## 834                 TORNADO       5633
## 130          EXCESSIVE HEAT       1903
## 153             FLASH FLOOD        978
## 275                    HEAT        937
## 464               LIGHTNING        816
## 856               TSTM WIND        504
## 170                   FLOOD        470
## 585             RIP CURRENT        368
## 359               HIGH WIND        248
## 19                AVALANCHE        224
## 972            WINTER STORM        206
## 586            RIP CURRENTS        204
## 278               HEAT WAVE        172
## 140            EXTREME COLD        160
## 760       THUNDERSTORM WIND        133
## 310              HEAVY SNOW        127
## 141 EXTREME COLD/WIND CHILL        125
## 676             STRONG WIND        103
## 30                 BLIZZARD        101
## 350               HIGH SURF        101

Total Injuries by EvType

I count total Injuries by event type and graph on a histogram

totalInjuries <- aggregate(data$INJURIES, by = list(data$EVTYPE), "sum")
names(totalInjuries) <- c("Event", "Injuries")
totalInjuriesSorted <- totalInjuries[order(-totalInjuries$Injuries), ][1:20, ]
totalInjuriesSorted
##                  Event Injuries
## 834            TORNADO    91346
## 856          TSTM WIND     6957
## 170              FLOOD     6789
## 130     EXCESSIVE HEAT     6525
## 464          LIGHTNING     5230
## 275               HEAT     2100
## 427          ICE STORM     1975
## 153        FLASH FLOOD     1777
## 760  THUNDERSTORM WIND     1488
## 244               HAIL     1361
## 972       WINTER STORM     1321
## 411  HURRICANE/TYPHOON     1275
## 359          HIGH WIND     1137
## 310         HEAVY SNOW     1021
## 957           WILDFIRE      911
## 786 THUNDERSTORM WINDS      908
## 30            BLIZZARD      805
## 188                FOG      734
## 955   WILD/FOREST FIRE      545
## 117         DUST STORM      440
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(totalFatalitiesSorted$Fatalities, names.arg = totalFatalitiesSorted$Event, col = 'blue',
        main = 'Top 20 Weather Events for Fatalities', ylab = 'Number of Fatalities')
barplot(totalInjuriesSorted$Injuries, names.arg = totalInjuriesSorted$Event, col = 'orange',
        main = 'Top 20 Weather Events for Injuries', ylab = 'Number of Injuries')

It is observed tornadoes, excess heat, floods , avalanches, cause more deaths and injuries, there are more than 5.000 deaths and more han 8.000 injuries in the last 60 years in US, due to tornadoes.

Economic impact of weather events

I calculate the cost of property and crop damage separately using the variables created PROPDMGTOTAL and CROPDMGTOTAL

totalProperty <- aggregate(data$PROPDMGTOTAL/1000000, by = list(data$EVTYPE), "sum")
names(totalProperty) <- c("Event", "Property")
totalPropertySorted <- totalProperty[order(-totalProperty$Property), ][1:20, ]
totalPropertySorted 
##                         Event   Property
## 170                     FLOOD 144657.710
## 411         HURRICANE/TYPHOON  69305.840
## 834                   TORNADO  56925.660
## 670               STORM SURGE  43323.536
## 153               FLASH FLOOD  16140.812
## 244                      HAIL  15727.367
## 402                 HURRICANE  11868.319
## 848            TROPICAL STORM   7703.891
## 972              WINTER STORM   6688.497
## 359                 HIGH WIND   5270.046
## 590               RIVER FLOOD   5118.945
## 957                  WILDFIRE   4765.114
## 671          STORM SURGE/TIDE   4641.188
## 856                 TSTM WIND   4484.928
## 427                 ICE STORM   3944.928
## 760         THUNDERSTORM WIND   3483.121
## 409            HURRICANE OPAL   3152.846
## 955          WILD/FOREST FIRE   3001.829
## 298 HEAVY RAIN/SEVERE WEATHER   2500.000
## 786        THUNDERSTORM WINDS   1733.455
totalCrop <- aggregate(data$CROPDMGTOTAL/1000000, by = list(data$EVTYPE), "sum")
names(totalCrop) <- c("Event", "Crop")
totalCropSorted <- totalCrop[order(-totalCrop$Crop), ][1:20, ]
totalCropSorted 
##                 Event       Crop
## 95            DROUGHT 13972.5660
## 170             FLOOD  5661.9685
## 590       RIVER FLOOD  5029.4590
## 427         ICE STORM  5022.1135
## 244              HAIL  3025.5374
## 402         HURRICANE  2741.9100
## 411 HURRICANE/TYPHOON  2607.8728
## 153       FLASH FLOOD  1421.3171
## 140      EXTREME COLD  1292.9730
## 212      FROST/FREEZE  1094.0860
## 290        HEAVY RAIN   733.3998
## 848    TROPICAL STORM   678.3460
## 359         HIGH WIND   638.5713
## 856         TSTM WIND   554.0073
## 130    EXCESSIVE HEAT   492.4020
## 192            FREEZE   446.2250
## 834           TORNADO   414.9531
## 760 THUNDERSTORM WIND   414.8431
## 275              HEAT   401.4615
## 957          WILDFIRE   295.4728

We visualize property damage costs and crop damage costs in the following graphs

par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(totalPropertySorted$Property, names.arg = totalPropertySorted$Event , col = 'Blue',
        main = 'Property Damage  in Million US dollars from 1995 - 2011 ', ylab = 'Cost of Property Damage', ylim = c(0, 150000))
barplot(totalCropSorted$Crop, names.arg = totalCropSorted$Event, col = 'Orange',
        main = 'Crop Damage in million US dollars', ylab = 'Cost of  Crop Damage', ylim = c(0, 20000))

Based on histograms we observed that floods, typhoons and hurricanes and tornadoes cause higher costs for property damage above DOL 56 billion, and we observed that floods, overflowing rivers, and seguias extremely affect crops over valores 5 billion