The Impact that Severe Weather Conditions have on Public Health and Economics in the U.S.

Synopsis

This report analyses the affect that severe weather conditions can have on public health and the economics within the U.S. The analysis is based on historical data collected from 1950 to November 2011. For the purpose of this analysis the Storm Data was used. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, include when and where they occur, aswell as estimates of any fatalities, injuries and property damage. From this data, we found that tornadoes causes the most amount of harm to public health, whereas flood and drought cause the most amount of damage to property and crops.

Data Processing

The data used for this analysis is the Storm Events Database and comes in comma-separated-value file compressed via the bzip2. The documentation for the database is available from the National Weather Service.

Loading the Data and Required Libraries

Load the required libraries

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gridExtra)

Load the data (i.e. read.csv())

storm_data <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, stringsAsFactors = FALSE)

Initial Analysis of the Raw Data

The dim command returns that the data contains 902297 observations and 37 variables.

dim(storm_data)
## [1] 902297     37

The structure of the dataset using the str command.

str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The first 6 lines of the dataset using the head command.

head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

List all the variables within the data using the names command to determine which columns will be needed to answer the questions.

names(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Processing the Data

The dataset contains 37 variables plus newly created YEAR variable, but for this analysis we will only require 8 of them.

We will start off be creating a dataset with the 8 required variables. They are:

  • STATE
  • EVTYPE
  • FATALITIES
  • INJURIES
  • PROPDMG
  • PROPDMGEXP
  • CROPDMG
  • CROPDMGEXP
data <- storm_data[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Extracting fatalities and deaths

deaths <- aggregate(FATALITIES ~ EVTYPE, data=data, FUN=sum)

injuries <- aggregate(INJURIES ~ EVTYPE, data=data, FUN=sum)

Preparing property and crop data

Converting PROPDMG and CROPDMG to same scale values.

unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
data$PROPDMGMUL[data$PROPDMGEXP == "b" | data$PROPDMGEXP == "B"] <- 1e+09
data$PROPDMGMUL[data$PROPDMGEXP == "M" | data$PROPDMGEXP == "M"] <- 1e+06
data$PROPDMGMUL[data$PROPDMGEXP == "k" | data$PROPDMGEXP == "K"] <- 1e+03
data$PROPDMGMUL[data$PROPDMGEXP == "h" | data$PROPDMGEXP == "H"] <- 100

data$PROPDMGMUL[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPDMGMUL[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPDMGMUL[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPDMGMUL[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPDMGMUL[data$PROPDMGEXP == "4"] <- 1e+04
data$PROPDMGMUL[data$PROPDMGEXP == "3"] <- 1e+03
data$PROPDMGMUL[data$PROPDMGEXP == "2"] <- 100
data$PROPDMGMUL[data$PROPDMGEXP == "1"] <- 10
data$PROPDMGMUL[data$PROPDMGEXP == "0"] <- 1
data$PROPDMGMUL[data$PROPDMGEXP == "-"] <- 1

data$PROPDMGMUL[data$PROPDMGEXP == "+"] <- 0
data$PROPDMGMUL[data$PROPDMGEXP == "?"] <- 0

data$PROPDMGVAL <- data$PROPDMG * data$PROPDMGMUL
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
data$CROPDMGMUL[data$CROPDMGEXP == "2"] <- 2
data$CROPDMGMUL[data$CROPDMGEXP == "0"] <- 1
data$CROPDMGMUL[data$CROPDMGEXP == "?"] <- 0
data$CROPDMGMUL[data$CROPDMGEXP == "b" | data$CROPDMGEXP == "B"] <- 1e+09
data$CROPDMGMUL[data$CROPDMGEXP == "M" | data$CROPDMGEXP == "M"] <- 1e+06
data$CROPDMGMUL[data$CROPDMGEXP == "k" | data$CROPDMGEXP == "K"] <- 1e+03

data$CROPDMGVAL <- data$CROPDMG * data$CROPDMGMUL

Extracting Property and Crop Damage Values from the dataset

property_damage <- aggregate(PROPDMGVAL ~ EVTYPE, data=data, FUN=sum)

crop_damage <- aggregate(CROPDMGVAL ~ EVTYPE, data=data, FUN=sum)

Results

Fatalities Caused by Weather Events

The following show the top 20 severe weather events that have caused the most fatalities.

deaths_20 <- arrange(deaths, desc(FATALITIES))[1:20,]

deaths_20
##                     EVTYPE FATALITIES
## 1                  TORNADO       5633
## 2           EXCESSIVE HEAT       1903
## 3              FLASH FLOOD        978
## 4                     HEAT        937
## 5                LIGHTNING        816
## 6                TSTM WIND        504
## 7                    FLOOD        470
## 8              RIP CURRENT        368
## 9                HIGH WIND        248
## 10               AVALANCHE        224
## 11            WINTER STORM        206
## 12            RIP CURRENTS        204
## 13               HEAT WAVE        172
## 14            EXTREME COLD        160
## 15       THUNDERSTORM WIND        133
## 16              HEAVY SNOW        127
## 17 EXTREME COLD/WIND CHILL        125
## 18             STRONG WIND        103
## 19                BLIZZARD        101
## 20               HIGH SURF        101
g1 = ggplot(data=deaths_20, aes(x=reorder(EVTYPE, -FATALITIES), y=FATALITIES, fill=FATALITIES)) +
  geom_bar(stat="identity", fill = heat.colors(20), color = "black") +
  labs(x="", y="Number of Fatalities") +
  labs(title="Top 20 fatalities caused by Weather Events in the U.S. (1950 - 2011)") +
  scale_y_continuous(breaks = seq(0,6000, by=1000)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(g1)

The plot shows the total number of fatalities for the 20 top weather events, ordered by fatalities in descending order. Based on the plot above, we find that Tornado caused the most fatalities.

Injuries Caused by Weather Events

The following show the top 20 severe weather events that have caused the most injuries.

injuries_20 <- arrange(injuries, desc(INJURIES))[1:20,]

injuries_20
##                EVTYPE INJURIES
## 1             TORNADO    91346
## 2           TSTM WIND     6957
## 3               FLOOD     6789
## 4      EXCESSIVE HEAT     6525
## 5           LIGHTNING     5230
## 6                HEAT     2100
## 7           ICE STORM     1975
## 8         FLASH FLOOD     1777
## 9   THUNDERSTORM WIND     1488
## 10               HAIL     1361
## 11       WINTER STORM     1321
## 12  HURRICANE/TYPHOON     1275
## 13          HIGH WIND     1137
## 14         HEAVY SNOW     1021
## 15           WILDFIRE      911
## 16 THUNDERSTORM WINDS      908
## 17           BLIZZARD      805
## 18                FOG      734
## 19   WILD/FOREST FIRE      545
## 20         DUST STORM      440
g2 = ggplot(data=injuries_20, aes(x=reorder(EVTYPE, -INJURIES), y=INJURIES, fill=INJURIES)) +
  geom_bar(stat="identity", fill = topo.colors(20), color = "black") +
  labs(x="", y="Number of Injuries") +
  labs(title="Top 20 injuries caused by Weather Events in the U.S. (1950 - 2011)") +
  scale_y_continuous(breaks = seq(0,90000, by=10000)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  

print(g2)

The plot shows the total number of injuries for the 20 top weather events, ordered by injuries in descending order. Based on the plot above, we find that Tornado caused the most fatalities.

Economic Damage Caused by Weather Events

The following show the top 10 severe weather events that have caused the most damage to property in USD.

property_10 <- arrange(property_damage, desc(PROPDMGVAL))[1:10,]

property_10
##               EVTYPE   PROPDMGVAL
## 1              FLOOD 144657709800
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56935880614
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16822673772
## 6               HAIL  15730367456
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497251
## 10         HIGH WIND   5270046275

The following show the top 10 severe weather events that have caused the most damage to crops in USD.

crop_10 <- arrange(crop_damage, desc(CROPDMGVAL))[1:10,]

crop_10
##               EVTYPE  CROPDMGVAL
## 1            DROUGHT 13972566000
## 2              FLOOD  5661968450
## 3        RIVER FLOOD  5029459000
## 4          ICE STORM  5022113500
## 5               HAIL  3025954470
## 6          HURRICANE  2741910000
## 7  HURRICANE/TYPHOON  2607872800
## 8        FLASH FLOOD  1421317100
## 9       EXTREME COLD  1292973000
## 10      FROST/FREEZE  1094086000

Plotting the top 10 events that caused the most amount damage to properties and crops in the U.S. from 1950-2011.

p1 <- ggplot(data=property_10, aes(x=reorder(EVTYPE, -PROPDMGVAL), y=PROPDMGVAL / 1e+09, fill=PROPDMGVAL)) +
  geom_bar(stat="identity", fill = topo.colors(10), color = "black") +
  labs(x="", y="Cost of Property Damage ($ Billions)") +
  labs(title="Top 10 Weather Events Causing Highest Property Damage") +
  scale_y_continuous(breaks = seq(0,150, by=10)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(size=8))


p2 <- ggplot(data=crop_10, aes(x=reorder(EVTYPE, -CROPDMGVAL), y=CROPDMGVAL / 1e+09, fill=CROPDMGVAL)) +
  geom_bar(stat="identity", fill = terrain.colors(10), color = "black") +
  labs(x="", y="Cost of Crop Damage ($ Billions)") +
  labs(title="Top 10 Weather Events Causing Highest Crop Damage") +
  scale_y_continuous(breaks = seq(0,15, by=2)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(size=8))  

grid.arrange(p1, p2, ncol=2)

The plots show the total amount of damage in billions of dollars caused to property and crops. Both plots show the 10 top weather events that caused the most damage in descending order. Based on the plots above, we find that Flood caused the most property damage and Drought caused the most crop damage.

Conclusions

Over the last 60 years tornadoes have caused the most number of deaths and injuries among all other events. There have been more than 5300 deaths and more than 91000 injuries.

The most severe weather events in terms of propery damage is flood which caused a total of nearly $145 Billion dollars worth of damage. This is followed by Hurricanes/Typhoons and Tornadoes.

With regards to crop damage, drought caused the most amount of damage, nearly $14 Billion dollars. This was followed by floods, river floods and ice storms.