This report looks into different severe weather events throughtout history in the U.S. and works to analyze how these events have impacted the health of people living in the U.S. as well as the amount of damage done by each event. The data was taken from the National Oceanic and Atmospheric Administration’s (NOAA) database, and spans from 1950 - 2011. The data is split in to two seperate analysis, the first looking at the number of fatalities and injuries, with the second focusing on the damage, measured in dollars, to crops and property. Flood, drought, hurricances, and tornados seem to produce the biggest impact, but more thorough results are given below.
To begin, the data was downloaded from the NOAA website and read in to R. To speed up the process, a check is performed to see if the dataset has already been downloaded.
Also, the documentation provided on the NOAA website was used to determine which columns of the dataset we will need (those pertaining to fatalities, injury, property & crop damage). A subset of the data was taken to help make things more manageable, only using these columns. This dataframe was called stormtrim.
if (!file.exists("storm.csv")) {
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(fileURL, destfile='storm.csv')
}
storm <- read.csv(bzfile('storm.csv.'),header=TRUE, stringsAsFactors = FALSE)
stormtrim = storm[, c(8, 23, 24, 25, 26, 27, 28)]
Now that we have our data, we take a look at the first few rows to get an idea of the data, the structure of the data, as well as the names of each column
str(stormtrim)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
head(stormtrim)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
names(stormtrim)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
When ivestigating property and crop damage, the provided dataset is a little hard to work with. The data gives the dollar amount using two columns: the first is simply a number and the scond is information on a multiplier.
The multipler is letter code as follows: H = Hundred K = Thousand M = Million B = Billion
I converted these letter codes in to the appropriate numbers, and then multipied by the first data column and was able to come up with the actual dollar amount for each event. This value is reported in the stormtrim dataframe as total damage amounts.
One of the issues that was discovered during the beggining of project was that there were additional codes in this column that I was not able to immediately decipher. But as was assumed would happen, when focusing only only those events that actually instances of what we were focusing on (processing shown below), the rows with these codes fell off.
stormtrim$propmulti <- as.character(stormtrim[,5])
stormtrim$propmulti[toupper(stormtrim[,5]) == 'H'] <- "2"
stormtrim$propmulti[toupper(stormtrim[,5]) == 'K'] <- "3"
stormtrim$propmulti[toupper(stormtrim[,5]) == 'M'] <- "6"
stormtrim$propmulti[toupper(stormtrim[,5]) == 'B'] <- "9"
stormtrim$propmulti <- as.numeric(stormtrim$propmulti)
## Warning: NAs introduced by coercion
stormtrim$totalpropdmg <- stormtrim$PROPDMG*(10^stormtrim$propmulti)
stormtrim$cropmulti <- as.character(stormtrim[,7])
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'H'] <- "2"
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'K'] <- "3"
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'M'] <- "6"
stormtrim$cropmulti[toupper(stormtrim[,7]) == 'B'] <- "9"
stormtrim$cropmulti <- as.numeric(stormtrim$cropmulti)
## Warning: NAs introduced by coercion
stormtrim$totalcropdmg <- stormtrim$CROPDMG*(10^stormtrim$cropmulti)
Any NA values in the dataset were converted to 0.
stormtrim[is.na(stormtrim)] = 0
Since this was such a large dataset even after trimming it down a little, I went ahead and seperated out the injury data and fatality data in to two seperate dataframes, but only keeping the rows that had greater than 0 instances, i.e. only the rows that acually had a fatality or injury.
stormfat <- subset(stormtrim, FATALITIES > 0)
storminj <- subset(stormtrim, INJURIES > 0)
The first analysis was done to see which events had the greatest impact on public health. To do this, I plotted which events have caused the most falatlities, and which have caused the most injuries.
I used an aggregate function to find the total number of injuries and fatalities for each Event Type.
totalfatalities <- aggregate(stormfat$FATALITIES, by = list(stormfat$EVTYPE), FUN = "sum")
totalfatalities <- totalfatalities[order(-totalfatalities[,2]),]
names(totalfatalities) <- c("EVENT TYPE", "NUM OF FATALITIES")
totalinj <- aggregate(storminj$INJURIES, by = list(storminj$EVTYPE), FUN = "sum")
totalinj <- totalinj[order(-totalinj[,2]),]
names(totalinj) <- c("EVENT TYPE", "NUM OF INJURIES")
A similar approach was taken to investigate how weather events impacted property and crop damage. Since the analysis involved nearly identical step to above, I will present the code without further explanation.
stormcrop <- subset(stormtrim, totalcropdmg > 0)
stormprop <- subset(stormtrim, totalpropdmg > 0)
totalprop <- aggregate(stormprop$totalpropdmg, by = list(stormprop$EVTYPE), FUN = "sum")
totalprop <- totalprop[order(-totalprop[,2]),]
names(totalprop) <- c("EVENT TYPE", "Property Damage ($)")
totalcrop <- aggregate(stormcrop$totalcropdmg, by = list(stormcrop$EVTYPE), FUN = "sum")
totalcrop <- totalcrop[order(-totalcrop[,2]),]
names(totalcrop) <- c("EVENT TYPE", "Property Damage ($)")
To take a look at the events that produced the most injuries and fatalities, the data was ordered and the top 15 event types are presented below:
head(totalfatalities, 15)
## EVENT TYPE NUM OF FATALITIES
## 141 TORNADO 5633
## 26 EXCESSIVE HEAT 1903
## 35 FLASH FLOOD 978
## 57 HEAT 937
## 97 LIGHTNING 816
## 145 TSTM WIND 504
## 40 FLOOD 470
## 116 RIP CURRENT 368
## 75 HIGH WIND 248
## 2 AVALANCHE 224
## 163 WINTER STORM 206
## 117 RIP CURRENTS 204
## 58 HEAT WAVE 172
## 30 EXTREME COLD 160
## 136 THUNDERSTORM WIND 133
head(totalinj, 15)
## EVENT TYPE NUM OF INJURIES
## 129 TORNADO 91346
## 135 TSTM WIND 6957
## 30 FLOOD 6789
## 20 EXCESSIVE HEAT 6525
## 85 LIGHTNING 5230
## 47 HEAT 2100
## 79 ICE STORM 1975
## 28 FLASH FLOOD 1777
## 121 THUNDERSTORM WIND 1488
## 45 HAIL 1361
## 152 WINTER STORM 1321
## 76 HURRICANE/TYPHOON 1275
## 63 HIGH WIND 1137
## 53 HEAVY SNOW 1021
## 149 WILDFIRE 911
Here you are able to see the numbers for each event type and how they compare to one another. It is easy to see the devestating impact of tornados through these numbers, but the plot below gives an even better picture of the data.
par(mfrow = c(1, 2), mgp = c(5, 1, 0), mar = c(10, 3, 5, 1), cex = .7, las = 2)
barplot(height = totalfatalities[1:15,2], names.arg = totalfatalities[1:15,1], col = "gold",
main = 'Top 15 Events with Fatalities', ylab = '# of Fatalities')
barplot(height = totalinj[1:15,2], names.arg = totalinj[1:15,1], col = "blue1",
main = 'Top 15 Events with Injuries', ylab = '# of Injuries')
The results from the property and crop damage are presented below in a similar manner.
Again, I took a look at the top 15 events concerning damage, this data is displayed below.
head(totalcrop, 15)
## EVENT TYPE Property Damage ($)
## 10 DROUGHT 13972566000
## 27 FLOOD 5661968450
## 78 RIVER FLOOD 5029459000
## 72 ICE STORM 5022113500
## 42 HAIL 3025954470
## 64 HURRICANE 2741910000
## 69 HURRICANE/TYPHOON 2607872800
## 23 FLASH FLOOD 1421317100
## 19 EXTREME COLD 1292973000
## 37 FROST/FREEZE 1094086000
## 54 HEAVY RAIN 733399800
## 111 TROPICAL STORM 678346000
## 60 HIGH WIND 638571300
## 115 TSTM WIND 554007350
## 16 EXCESSIVE HEAT 492402000
head(totalprop, 15)
## EVENT TYPE Property Damage ($)
## 62 FLOOD 144657709800
## 179 HURRICANE/TYPHOON 69305840000
## 331 TORNADO 56947380614
## 279 STORM SURGE 43323536000
## 50 FLASH FLOOD 16822673772
## 103 HAIL 15735267456
## 171 HURRICANE 11868319010
## 339 TROPICAL STORM 7703890550
## 396 WINTER STORM 6688497251
## 156 HIGH WIND 5270046260
## 243 RIVER FLOOD 5118945500
## 386 WILDFIRE 4765114000
## 280 STORM SURGE/TIDE 4641188000
## 345 TSTM WIND 4484928495
## 187 ICE STORM 3944927860
Similar to the health consequences from above, it is easy to see the massive damage that is caused by flooding in both cases, it also important to note that drought has a large impact on crop damage yet does not appear on the property damage. This follows with what would be assumed for the event type.
These results were then charted in the same manner as the fatality and injury data.
par(mfrow = c(1, 2), mgp = c(5, 1, 0), mar = c(10, 8, 5, 1), cex = .7, las = 2)
barplot(height = totalprop[1:15,2], names.arg = totalprop[1:15,1], col = "thistle",
main = 'Top 15 Events for Property Damage', ylab = '$ of Damage')
barplot(height = totalcrop[1:15,2], names.arg = totalcrop[1:15,1], col = "firebrick",
main = 'Top 15 Events for Crop Damage', ylab = '$ of Damage')