This report will examine weather data to determine the types of events most likely to cause an impact to the population health and the economy. Specifcally, the data set used is NOAA storm data in the United States from 1950-2011. The data will be analyzed to determine the types of weather events that cause the greatest injuries and fatalities in the popluation. In addition, the data will be analyzed to determine the top weather events that cause damage to property and crops in the United States.
Load necessary libraries:
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(scales)
The raw data file “StormData.csv.bz2” is in the working directory and will be read with read.csv:
# The file "StormData.csv.bz2" should already be in the working directory.
stormdata <- read.csv('StormData.csv.bz2')
print('Finished reading data file.', quote = FALSE)
## [1] Finished reading data file.
Next, this code chunk will reduce the raw data file down to seven columns of variables that will be sufficient to answer the two assignment questions, as well as check for NA values:
storm <- stormdata[, c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG',
'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
names(storm) <- tolower(names(storm))
na_sum <- sum(is.na(storm))
The NA count for these seven columns is 0.
To answer the question about harm to human health, the injuries and fatalities variables have been combined - the sum of these two columns will be stored in a new column named causalties. The should allow us to examine the full impact of the various weather events on the population health.
storm <- mutate(storm, casualties = fatalities + injuries)
stormcas <- aggregate(casualties ~ evtype, storm, sum)
stormcas <- stormcas[order(stormcas$casualties, decreasing = TRUE),]
head(stormcas, 30)
## evtype casualties
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
## 153 FLASH FLOOD 2755
## 427 ICE STORM 2064
## 760 THUNDERSTORM WIND 1621
## 972 WINTER STORM 1527
## 359 HIGH WIND 1385
## 244 HAIL 1376
## 411 HURRICANE/TYPHOON 1339
## 310 HEAVY SNOW 1148
## 957 WILDFIRE 986
## 786 THUNDERSTORM WINDS 972
## 30 BLIZZARD 906
## 188 FOG 796
## 585 RIP CURRENT 600
## 955 WILD/FOREST FIRE 557
## 586 RIP CURRENTS 501
## 278 HEAT WAVE 481
## 117 DUST STORM 462
## 978 WINTER WEATHER 431
## 848 TROPICAL STORM 398
## 19 AVALANCHE 394
## 140 EXTREME COLD 391
## 676 STRONG WIND 383
## 89 DENSE FOG 360
## 290 HEAVY RAIN 349
Examining the top 30 weather events that cause casualties in the United States, we see a variety of different weather types. Also, examining the data along with NOAA data set codebook, several the columns are near duplicates. Several of these duplicate weather event types in the evtype variable will be combined to obtain a more accurate analysis of the data. For example, “TSTM WIMD”, “THUNDERSTORM WIND”, and “THUNDERSTORM WINDS” appear to be duplicates and will be combined into one type.
storm[storm$evtype == 'THUNDERSTORM WINDS', ]$evtype = 'THUNDERSTORM WIND'
storm[storm$evtype == 'TSTM WIND', ]$evtype = 'THUNDERSTORM WIND'
storm[storm$evtype == 'RIP CURRENTS', ]$evtype = 'RIP CURRENT'
storm[storm$evtype == 'HEAT', ]$evtype = 'EXCESSIVE HEAT'
storm[storm$evtype == 'HEAT WAVE', ]$evtype = 'EXCESSIVE HEAT'
storm[storm$evtype == 'HURRICANE/TYPHOON', ]$evtype = 'HURRICANE'
After combining the duplicate event types, the following script will re-run the analysis and produce a list of the top ten weather events that are most harmful to population health (defined as total casualties, including both injuries and fatalities caused by the weather event). The casualties data column will also be ordered in decreasing order so that we determine the most harmful event types:
stormcas <- aggregate(casualties ~ evtype, storm, sum)
stormcas <- stormcas[order(stormcas$casualties, decreasing = TRUE),]
Next, to answer the second assignment question, we need to process the property and crop damage variables to obtain totals in the proper form to sum as a monetary total. Let’s examine the levels of the propdmgexp and cropdmgexp variables:
summary(storm$propdmgexp)
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
summary(storm$cropdmgexp)
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
To properly use the damage and damage exponent data, the following script will change “k|K” to thousands, “m|M” to millions, and “b|B” to billions; any remaining exponents are used very rarely and will be ignored. The script will create two new data columns, multiplying property damage by its exponent, and crop damage by its exponent, using the mutate function.
storm$propdmgexp <- as.character(storm$propdmgexp)
storm$cropdmgexp <- as.character(storm$cropdmgexp)
exponent_function <- function(x) {
if (x == 'k' | x== 'K') {
1e+03
} else if (x == 'm' | x == 'M') {
1e+06
} else if (x == 'b' | x == 'B') {
1e+09
} else {
0
}
}
storm$propdmgexp <- as.numeric(sapply(storm$propdmgexp, exponent_function))
storm$cropdmgexp <- as.numeric(sapply(storm$cropdmgexp, exponent_function))
storm <- mutate(storm, prop = propdmg * propdmgexp, crop = cropdmg * cropdmgexp)
The final data processing step is to aggregate the property damage and crop damage variables by event type. Again, the damage variables will be ordered in decreasing order so that we can determine the most harmful weather events:
stormpropdmg <- aggregate(prop ~ evtype, storm, sum)
stormpropdmg <- stormpropdmg[order(stormpropdmg$prop, decreasing = TRUE),]
stormpropsum <- sum(stormpropdmg$prop)
stormcropdmg <- aggregate(crop ~ evtype, storm, sum)
stormcropdmg <- stormcropdmg[order(stormcropdmg$crop, decreasing = TRUE),]
stormcropsum <- sum(stormcropdmg$crop)
Now that we have finished processing the data, we can examine the results to determine if the two assignment questions can be answered.
Question 1: Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
Ordering the new data file stormcas so that we can see the “Top 10” weather events that impact population health, reveals the that tornadoes, heat, thunderstorms, flooding, and lightning cause the most signficant casualties.
head(stormcas, 10)
## evtype casualties
## 829 TORNADO 96979
## 130 EXCESSIVE HEAT 11946
## 756 THUNDERSTORM WIND 10054
## 170 FLOOD 7259
## 461 LIGHTNING 6046
## 153 FLASH FLOOD 2755
## 424 ICE STORM 2064
## 966 WINTER STORM 1527
## 400 HURRICANE 1446
## 357 HIGH WIND 1385
Next, a plot of the top ten weather events that cause casualties in the United States according to the NOAA data set:
g1 <- ggplot(stormcas[1:10,], aes(x=reorder(evtype, casualties), y=casualties, fill = casualties)) +
geom_bar(stat = 'identity') +
coord_flip() +
ylab('Total Casualties (Injuries and Fatalities)') +
xlab('') +
ggtitle('Top Ten Weather Events in the United States (1950-2011)') +
theme(legend.position='none')
g1
Question 2: Across the United States, which types of events have the greatest economic consequences?
Examining first the property damage results, the following is a list of the top ten weather events causing property damage in the United States from 1950-2011:
head(stormpropdmg, 10)
## evtype prop
## 170 FLOOD 144657709800
## 400 HURRICANE 81174159010
## 829 TORNADO 56937160480
## 666 STORM SURGE 43323536000
## 153 FLASH FLOOD 16140811510
## 244 HAIL 15732266720
## 756 THUNDERSTORM WIND 9704002430
## 843 TROPICAL STORM 7703890550
## 966 WINTER STORM 6688497250
## 357 HIGH WIND 5270046260
The top events include flooding, hurricanes, tornado, storm surge and flash flooding.
The following is a plot in answer to Question 2, the top ten weather events causing property damage:
g2 <- ggplot(stormpropdmg[1:10,], aes(x=reorder(evtype, prop), y = prop, fill = prop)) +
geom_bar(stat = 'identity') +
coord_flip() +
ylab('Total Property Damage') +
xlab('') +
ggtitle('Top Ten Weather Events in the United States (1950-2011)') +
theme(legend.position='none')
g2
The final part of the analysis is to examine the impact of weather events on crop damage. The top ten weather events causing crop damage in the United States from 1950-2011 are:
head(stormcropdmg, 10)
## evtype crop
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 400 HURRICANE 5349782800
## 586 RIVER FLOOD 5029459000
## 424 ICE STORM 5022113500
## 244 HAIL 3025954450
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 756 THUNDERSTORM WIND 1159505100
## 212 FROST/FREEZE 1094086000
Although there is some overlap with the list of property damage events, the top crop damage events include drought, flood, hurricane, river flood, and ice storm.
And, a plot of the top ten weather events causing crop damage:
g3 <- ggplot(stormcropdmg[1:10,], aes(x=reorder(evtype, crop), y = crop, fill = crop)) +
geom_bar(stat = 'identity') +
coord_flip() +
ylab('Total Crop Damage') +
xlab('') +
ggtitle('Top Ten Weather Events in the United States (1950-2011)') +
theme(legend.position='none')
g3
The total amount of property damage recorded in this data set is $427,318,642,100 and the total amount for crop damage is $49,104,191,910.