Analysis of Weather Events in U.S. on Health and the Economy

Synopsis

This report looks into different severe weather events throughtout history in the U.S. and works to analyze how these events have impacted the health of people living in the U.S. as well as the amount of damage done by each event. The data was taken from the National Oceanic and Atmospheric Administration’s (NOAA) database, and spans from 1950 - 2011. The data is split in to two seperate analysis, the first looking at the number of fatalities and injuries, with the second focusing on the damage, measured in dollars, to crops and property. Flood, drought, hurricances, and tornados seem to produce the biggest impact, but more thorough results are given below.

Data Processing

To begin, the data was downloaded from the NOAA website and read in to R. To speed up the process, a check is performed to see if the dataset has already been downloaded.

Also, the documentation provided on the NOAA website was used to determine which columns of the dataset we will need (those pertaining to fatalities, injury, property & crop damage). A subset of the data was taken to help make things more manageable, only using these columns. This dataframe was called stormtrim.

if (!file.exists("storm.csv")) {
        fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
        download.file(fileURL, destfile='storm.csv')
}

storm <- read.csv(bzfile('storm.csv.'),header=TRUE, stringsAsFactors = FALSE)

stormtrim = storm[, c(8, 23, 24, 25, 26, 27, 28)]

Now that we have our data, we take a look at the first few rows to get an idea of the data, the structure of the data, as well as the names of each column

str(stormtrim)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
head(stormtrim)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0
names(stormtrim)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"

When ivestigating property and crop damage, the provided dataset is a little hard to work with. The data gives the dollar amount using two columns: the first is simply a number and the scond is information on a multiplier.

The multipler is letter code as follows: H = Hundred K = Thousand M = Million B = Billion

I converted these letter codes in to the appropriate numbers, and then multipied by the first data column and was able to come up with the actual dollar amount for each event. This value is reported in the stormtrim dataframe as total damage amounts.

One of the issues that was discovered during the beggining of project was that there were additional codes in this column that I was not able to immediately decipher. But as was assumed would happen, when focusing only only those events that actually instances of what we were focusing on (processing shown below), the rows with these codes fell off.

stormtrim$propmulti <- as.character(stormtrim[,5])
stormtrim$propmulti[toupper(stormtrim[,5]) == 'H'] <- "2"
stormtrim$propmulti[toupper(stormtrim[,5]) == 'K'] <- "3"
stormtrim$propmulti[toupper(stormtrim[,5]) == 'M'] <- "6"
stormtrim$propmulti[toupper(stormtrim[,5]) == 'B'] <- "9"
stormtrim$propmulti <- as.numeric(stormtrim$propmulti)
## Warning: NAs introduced by coercion
stormtrim$totalpropdmg <- stormtrim$PROPDMG*(10^stormtrim$propmulti)

stormtrim$cropmulti <- as.character(stormtrim[,7])
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'H'] <- "2"
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'K'] <- "3"
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'M'] <- "6"
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'B'] <- "9"
stormtrim$cropmulti <- as.numeric(stormtrim$cropmulti)
## Warning: NAs introduced by coercion
stormtrim$totalcropdmg <- stormtrim$CROPDMG*(10^stormtrim$cropmulti)

Any NA values in the dataset were converted to 0.

stormtrim[is.na(stormtrim)] = 0

Since this was such a large dataset even after trimming it down a little, I went ahead and seperated out the injury data and fatality data in to two seperate dataframes, but only keeping the rows that had greater than 0 instances, i.e. only the rows that acually had a fatality or injury.

stormfat <- subset(stormtrim, FATALITIES > 0)
storminj <- subset(stormtrim, INJURIES > 0)

Weather Events and Pulblic Health

The first analysis was done to see which events had the greatest impact on public health. To do this, I plotted which events have caused the most falatlities, and which have caused the most injuries.

I used an aggregate function to find the total number of injuries and fatalities for each Event Type.

totalfatalities <- aggregate(stormfat$FATALITIES, by = list(stormfat$EVTYPE), FUN = "sum")

totalfatalities <- totalfatalities[order(-totalfatalities[,2]),]
names(totalfatalities) <- c("EVENT TYPE", "NUM OF FATALITIES")

totalinj <- aggregate(storminj$INJURIES, by = list(storminj$EVTYPE), FUN = "sum")

totalinj <- totalinj[order(-totalinj[,2]),]
names(totalinj) <- c("EVENT TYPE", "NUM OF INJURIES")

Weather Events and Damage Cost

A similar approach was taken to investigate how weather events impacted property and crop damage. Since the analysis involved nearly identical step to above, I will present the code without further explanation.

stormcrop <- subset(stormtrim, totalcropdmg > 0)
stormprop <- subset(stormtrim, totalpropdmg > 0)

totalprop <- aggregate(stormprop$totalpropdmg, by = list(stormprop$EVTYPE), FUN = "sum")

totalprop <- totalprop[order(-totalprop[,2]),]
names(totalprop) <- c("EVENT TYPE", "Property Damage ($)")

totalcrop <- aggregate(stormcrop$totalcropdmg, by = list(stormcrop$EVTYPE), FUN = "sum")

totalcrop <- totalcrop[order(-totalcrop[,2]),]
names(totalcrop) <- c("EVENT TYPE", "Property Damage ($)")

Results

To take a look at the events that produced the most injuries and fatalities, the data was ordered and the top 15 event types are presented below:

head(totalfatalities, 15)
##            EVENT TYPE NUM OF FATALITIES
## 141           TORNADO              5633
## 26     EXCESSIVE HEAT              1903
## 35        FLASH FLOOD               978
## 57               HEAT               937
## 97          LIGHTNING               816
## 145         TSTM WIND               504
## 40              FLOOD               470
## 116       RIP CURRENT               368
## 75          HIGH WIND               248
## 2           AVALANCHE               224
## 163      WINTER STORM               206
## 117      RIP CURRENTS               204
## 58          HEAT WAVE               172
## 30       EXTREME COLD               160
## 136 THUNDERSTORM WIND               133
head(totalinj, 15)
##            EVENT TYPE NUM OF INJURIES
## 129           TORNADO           91346
## 135         TSTM WIND            6957
## 30              FLOOD            6789
## 20     EXCESSIVE HEAT            6525
## 85          LIGHTNING            5230
## 47               HEAT            2100
## 79          ICE STORM            1975
## 28        FLASH FLOOD            1777
## 121 THUNDERSTORM WIND            1488
## 45               HAIL            1361
## 152      WINTER STORM            1321
## 76  HURRICANE/TYPHOON            1275
## 63          HIGH WIND            1137
## 53         HEAVY SNOW            1021
## 149          WILDFIRE             911

Here you are able to see the numbers for each event type and how they compare to one another. It is easy to see the devestating impact of tornados through these numbers, but the plot below gives an even better picture of the data.

par(mfrow = c(1, 2), mgp = c(5, 1, 0), mar = c(10, 3, 5, 1), cex = .7, las = 2)
barplot(height = totalfatalities[1:15,2], names.arg = totalfatalities[1:15,1], col = "gold",
        main = 'Top 15 Events with Fatalities', ylab = '# of Fatalities')
barplot(height = totalinj[1:15,2], names.arg = totalinj[1:15,1], col = "blue1",
        main = 'Top 15 Events with Injuries', ylab = '# of Injuries')

The results from the property and crop damage are presented below in a similar manner.

Again, I took a look at the top 15 events concerning damage, this data is displayed below.

head(totalcrop, 15)
##            EVENT TYPE Property Damage ($)
## 10            DROUGHT         13972566000
## 27              FLOOD          5661968450
## 78        RIVER FLOOD          5029459000
## 72          ICE STORM          5022113500
## 42               HAIL          3025954470
## 64          HURRICANE          2741910000
## 69  HURRICANE/TYPHOON          2607872800
## 23        FLASH FLOOD          1421317100
## 19       EXTREME COLD          1292973000
## 37       FROST/FREEZE          1094086000
## 54         HEAVY RAIN           733399800
## 111    TROPICAL STORM           678346000
## 60          HIGH WIND           638571300
## 115         TSTM WIND           554007350
## 16     EXCESSIVE HEAT           492402000
head(totalprop, 15)
##            EVENT TYPE Property Damage ($)
## 62              FLOOD        144657709800
## 179 HURRICANE/TYPHOON         69305840000
## 331           TORNADO         56947380614
## 279       STORM SURGE         43323536000
## 50        FLASH FLOOD         16822673772
## 103              HAIL         15735267456
## 171         HURRICANE         11868319010
## 339    TROPICAL STORM          7703890550
## 396      WINTER STORM          6688497251
## 156         HIGH WIND          5270046260
## 243       RIVER FLOOD          5118945500
## 386          WILDFIRE          4765114000
## 280  STORM SURGE/TIDE          4641188000
## 345         TSTM WIND          4484928495
## 187         ICE STORM          3944927860

Similar to the health consequences from above, it is easy to see the massive damage that is caused by flooding in both cases, it also important to note that drought has a large impact on crop damage yet does not appear on the property damage. This follows with what would be assumed for the event type.

These results were then charted in the same manner as the fatality and injury data.

par(mfrow = c(1, 2), mgp = c(5, 1, 0), mar = c(10, 8, 5, 1), cex = .7, las = 2)
barplot(height = totalprop[1:15,2], names.arg = totalprop[1:15,1], col = "thistle",
        main = 'Top 15 Events for Property Damage', ylab = '$ of Damage')
barplot(height = totalcrop[1:15,2], names.arg = totalcrop[1:15,1], col = "firebrick",
        main = 'Top 15 Events for Crop Damage', ylab = '$ of Damage')