Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This project will answer two basic questions regarding the NOAA storm database:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The data for this assignment are in a comma-separated-value file compressed via the bzip2 algorithm. The original data can be downloaded from the course web site:
NOAA Storm Data [47Mb]
There is also some documentation of the database available. The documentation provides information on how some of the variables are constructed/defined.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The first step is to set the environment for the analysis. This involves setting the working directory and loading any needed R packges. In this case, I need the SQLDF, DPLYR, and ROCKCHALK packages to do some processing on the data. I need the GGPLOT2 and GRIDEXTRA packages for plotting the data.
## Set the working directory
setwd("~/R/Coursera/Data Science/Course 5/Assignment 2")
## Load needed packages
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rockchalk)
##
## Attaching package: 'rockchalk'
## The following object is masked from 'package:dplyr':
##
## summarize
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
If the original data from the course bzip2 archive aren’t already present, they are downloaded from the course site.
## Download the original zipped data if it doesn't already exist
destfile="NOAAStormData.bz2"
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(destfile)) {
download.file(fileURL,destfile,method="auto")
}
The source data file is a comma delimited file. The data are loaded into R using the read.csv function. After loading, I check the contents of the file to ensure that the download, extraction, and conversion to R data worked as desired.
## Read the CSV file
stormdata <- read.csv("NOAAStormData.bz2")
## Describe the dataset
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Since the data has numerous unneeded variables and the analysis is extremely basic, I restrict the data to only those variables that are needed to answer the assignment questions. Event type, fatalities, injuries, crop damage, and property damage variables are kept for the analysis.
## Limit data to needed variables
stormdata <- subset(stormdata,
select=c("EVTYPE","FATALITIES","INJURIES","CROPDMG","CROPDMGEXP","PROPDMG","PROPDMGEXP"))
## Determine unique values of event type
events <- sqldf("select distinct EVTYPE as 'Events' from stormdata order by EVTYPE")
head(events,n=25)
## Events
## 1 HIGH SURF ADVISORY
## 2 COASTAL FLOOD
## 3 FLASH FLOOD
## 4 LIGHTNING
## 5 TSTM WIND
## 6 TSTM WIND (G45)
## 7 WATERSPOUT
## 8 WIND
## 9 ?
## 10 ABNORMAL WARMTH
## 11 ABNORMALLY DRY
## 12 ABNORMALLY WET
## 13 ACCUMULATED SNOWFALL
## 14 AGRICULTURAL FREEZE
## 15 APACHE COUNTY
## 16 ASTRONOMICAL HIGH TIDE
## 17 ASTRONOMICAL LOW TIDE
## 18 AVALANCE
## 19 AVALANCHE
## 20 BEACH EROSIN
## 21 BEACH EROSION
## 22 BEACH EROSION/COASTAL FLOOD
## 23 BEACH FLOOD
## 24 BELOW NORMAL PRECIPITATION
## 25 BITTER WIND CHILL
After looking at the contents of event type (EVTYPE), it is apparent that the same event type has multiple entries that vary in spelling, capitalization, and punctuation. I converted all event types to uppercase and combined event types with similar spelling. Note that due to the volume of output produced by the combineLevels function, its output has been suppressed. The code has been included for completeness.
## Convert event type variable to all upper case to merge same event types with differing case
levels(stormdata$EVTYPE) <- toupper(levels(stormdata$EVTYPE))
## Collapse levels of EVTYPE with differing spelling & punctuation
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BEACH EROSION","BEACH EROSIN"), newLabel = c("BEACH EROSION"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BITTER WIND CHILL","BITTER WIND CHILL TEMPERATURES"), newLabel = c("BITTER WIND CHILL"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BLOWING SNOW & EXTREME WIND CH","BLOWING SNOW- EXTREME WIND CHI","BLOWING SNOW/EXTREME WIND CHIL"), newLabel = c("BLOWING SNOW/EXTREME WIND CHIL"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BLOW-OUT TIDE","BLOW-OUT TIDES"), newLabel = c("BLOW-OUT TIDES"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("COLD TEMPERATURE","COLD TEMPERATURES"), newLabel = c("COLD TEMPERATURES"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("DUST DEVEL","DUST DEVIL"), newLabel = c("DUST DEVIL"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("FROST/FREEZE","FROST\\FREEZE"), newLabel = c("FROST/FREEZE"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("FUNNEL CLOUD","FUNNEL CLOUD."), newLabel = c("FUNNEL CLOUD"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HAIL 0.75","HAIL 075","HAIL 75","HAIL(0.75)"), newLabel = c("HAIL 0.75"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HAIL 0.88","HAIL 088","HAIL 88"), newLabel = c("HAIL 0.88"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HAIL 1.75","HAIL 1.75)"), newLabel = c("HAIL 1.75"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HEAVY PRECIPATATION","HEAVY PRECIPITATION"), newLabel = c("HEAVY PRECIPITATION"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HEAVY SNOW/HIGH","HEAVY SNOW/HIGH WIND","HEAVY SNOW/HIGH WINDS"), newLabel = c("HEAVY SNOW/HIGH WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HEAVY RAIN","HEAVY RAINFALL","HEAVY RAINS","HVY RAIN"), newLabel = c("HEAVY RAIN"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("LIGHTING","LIGHTNING","LIGHTNING."), newLabel = c("LIGHTNING"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("THUDERSTORM WINDS","THUNDEERSTORM WINDS","THUNDERESTORM WINDS","THUNDERSTORM WINDS","THUNDERSTORM W INDS","THUNDERSTORM WIND",
"THUNDERSTORM WIND.","THUNDERSTORM WINDS","THUNDERSTORM WINDS LE CEN","THUNDERSTORM WINDS.","THUNDERSTORM WINDSS",
"THUNDERSTORM WINS","THUNDERSTORMS WIND","THUNDERSTORMS WINDS","THUNDERSTORMW","THUNDERSTORMW WINDS","THUNDERSTORMWINDS",
"THUNDERSTROM WIND","THUNDERSTROM WINDS","THUNDERTORM WINDS","THUNDERTSORM WIND","THUNDESTORM WINDS","THUNERSTORM WINDS",
"TSTM","TSTM WIND","TSTM WIND","TSTM WIND DAMAGE","TSTM WINDS","TSTM WND","TSTMW","TUNDERSTORM WIND"), newLabel = c("THUNDERSTORM WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("THUNDERSTORM DAMAGE","THUNDERSTORM DAMAGE TO"), newLabel = c("THUNDERSTORM DAMAGE"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("TORNADO","TORNADOES","TORNADOS","TORNDAO"), newLabel = c("TORNADO"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WATER SPOUT","WATERSPOUT","WATERSPOUT-","WATERSPOUT/","WATERSPOUTS","WAYTERSPOUT"), newLabel = c("WATERSPOUTS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WIND","WINDS","WND"), newLabel = c("WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WINTER STORM","WINTER STORMS"), newLabel = c("WINTER STORMS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WINTER STORM HIGH WINDS","WINTER STORM/HIGH WIND","WINTER STORM/HIGH WINDS"), newLabel = c("WINTER STORM/HIGH WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WINTER MIX","WINTER WEATHER MIX","WINTER WEATHER/MIX","WINTERY MIX","WINTRY MIX"), newLabel = c("WINTERY MIX"))
The property and crop damage values are expressed with “exponents”. For example, 32K is used to represent 32,000. These “exponents” need to be converted into numerics so that calculations can be carried out on the damage values.
## Determine unique values of property damage "exponent"
unique(stormdata$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
## Determine unique values of crop damage "exponent"
unique(stormdata$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
## Change the damage "exponents" to uppercase
stormdata$PROPDMGEXP <- toupper(stormdata$PROPDMGEXP)
stormdata$CROPDMGEXP <- toupper(stormdata$CROPDMGEXP)
## Convert the damage "exponents" into numeric values
stormdata$PROPEXP[stormdata$PROPDMGEXP == "H"] <- 100
stormdata$PROPEXP[stormdata$PROPDMGEXP == "K"] <- 1000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "M"] <- 1000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "B"] <- 1000000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == ""] <- 1
stormdata$PROPEXP[stormdata$PROPDMGEXP == "0"] <- 1
stormdata$PROPEXP[stormdata$PROPDMGEXP == "1"] <- 10
stormdata$PROPEXP[stormdata$PROPDMGEXP == "2"] <- 100
stormdata$PROPEXP[stormdata$PROPDMGEXP == "3"] <- 1000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "4"] <- 10000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "5"] <- 100000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "6"] <- 1000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "7"] <- 10000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "8"] <- 100000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "H"] <- 100
stormdata$CROPEXP[stormdata$CROPDMGEXP == "K"] <- 1000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "M"] <- 1000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "B"] <- 1000000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == ""] <- 1
stormdata$CROPEXP[stormdata$CROPDMGEXP == "0"] <- 1
stormdata$CROPEXP[stormdata$CROPDMGEXP == "1"] <- 10
stormdata$CROPEXP[stormdata$CROPDMGEXP == "2"] <- 100
stormdata$CROPEXP[stormdata$CROPDMGEXP == "3"] <- 1000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "4"] <- 10000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "5"] <- 100000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "6"] <- 1000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "7"] <- 10000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "8"] <- 100000000
## Convert invalid "exponents" to zero (0)
stormdata$PROPEXP[stormdata$PROPDMGEXP == "+"] <- 0
stormdata$PROPEXP[stormdata$PROPDMGEXP == "-"] <- 0
stormdata$PROPEXP[stormdata$PROPDMGEXP == "?"] <- 0
stormdata$CROPEXP[stormdata$CROPDMGEXP == "+"] <- 0
stormdata$CROPEXP[stormdata$CROPDMGEXP == "-"] <- 0
stormdata$CROPEXP[stormdata$CROPDMGEXP == "?"] <- 0
## Compute numeric damage values
stormdata$PROPDMGVAL <- stormdata$PROPDMG * stormdata$PROPEXP
stormdata$CROPDMGVAL <- stormdata$CROPDMG * stormdata$CROPEXP
stormdata$TOTLDMGVAL <- stormdata$PROPDMGVAL + stormdata$CROPDMGVAL
Both injuries and fatalities are “harmful to population health”. I am going to look at fatalities and injuries separately, but also combine them to see the rankings of harmful event types changes. Thus, I need to create a combined “population harm” variable that is the sum of injuries and fatalities.
## Create "population harm" variable that is the sum of injuries and fatalities
stormdata$POPHARM <- stormdata$FATALITIES + stormdata$INJURIES
## Aggregate population harm by event type
popharm <- aggregate(POPHARM ~ EVTYPE, data=stormdata, FUN=sum)
## Aggregate fatalities by event type
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=stormdata, FUN=sum)
## Aggregate injuries by event type
injuries <- aggregate(INJURIES ~ EVTYPE, data=stormdata, FUN=sum)
## Aggregate total damages by event type
totldamage <- aggregate(TOTLDMGVAL ~ EVTYPE, data=stormdata, FUN=sum)
## Aggregate property damages by event type
propdamage <- aggregate(PROPDMGVAL ~ EVTYPE, data=stormdata, FUN=sum)
## Aggregate crop damages by event type
cropdamage <- aggregate(CROPDMGVAL ~ EVTYPE, data=stormdata, FUN=sum)
To start, basic descriptive statistics (minimum, 1st quartile, median, mean 3rd quartile, maximum) for the numeric variables are produced. The basic statistics give a high-level view of the scope of the consequences of the events contained in the database.
## Calculate basic descriptive statistics
summary(stormdata$POPHARM)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1725 0.0000 1742.0000
summary(stormdata$FATALITIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.0168 0.0000 583.0000
summary(stormdata$INJURIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1557 0.0000 1700.0000
summary(stormdata$TOTLDMGVAL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 5.29e+05 1.00e+03 1.15e+11
summary(stormdata$PROPDMGVAL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 4.746e+05 5.000e+02 1.150e+11
summary(stormdata$CROPDMGVAL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
The next thing that I want to explore is the effect event type has on the population and economic consequences. I explored the top 10 events for each of the consequences calculated above.
## Determine top 10 event types for population harm
top10popharm <- popharm[order(popharm$POPHARM, decreasing=TRUE), ][1:10, ]
## Determine top 10 event types for fatalities
top10fatalities <- fatalities[order(fatalities$FATALITIES, decreasing=TRUE), ][1:10, ]
## Determine top 10 event types for injuries
top10injuries <- injuries[order(injuries$INJURIES, decreasing=TRUE), ][1:10, ]
## Determine top 10 event types for total damage
top10totldamage <- totldamage[order(totldamage$TOTLDMGVAL, decreasing=TRUE), ][1:10, ]
## Determine top 10 event types for property damage
top10propdamage <- propdamage[order(propdamage$PROPDMGVAL, decreasing=TRUE), ][1:10, ]
## Determine top 10 event types for crop damage
top10cropdamage <- cropdamage[order(cropdamage$CROPDMGVAL, decreasing=TRUE), ][1:10, ]
## Display top 10 for each
top10popharm
## EVTYPE POPHARM
## 823 TORNADO 96979
## 821 THUNDERSTORM WINDS 10097
## 103 EXCESSIVE HEAT 8428
## 141 FLOOD 7259
## 820 LIGHTNING 6047
## 217 HEAT 3037
## 125 FLASH FLOOD 2755
## 352 ICE STORM 2064
## 826 WINTER STORMS 1554
## 286 HIGH WIND 1385
top10fatalities
## EVTYPE FATALITIES
## 823 TORNADO 5633
## 103 EXCESSIVE HEAT 1903
## 125 FLASH FLOOD 978
## 217 HEAT 937
## 820 LIGHTNING 817
## 821 THUNDERSTORM WINDS 702
## 141 FLOOD 470
## 486 RIP CURRENT 368
## 286 HIGH WIND 248
## 19 AVALANCHE 224
top10injuries
## EVTYPE INJURIES
## 823 TORNADO 91346
## 821 THUNDERSTORM WINDS 9395
## 141 FLOOD 6789
## 103 EXCESSIVE HEAT 6525
## 820 LIGHTNING 5230
## 217 HEAT 2100
## 352 ICE STORM 1975
## 125 FLASH FLOOD 1777
## 195 HAIL 1361
## 826 WINTER STORMS 1338
top10totldamage
## EVTYPE TOTLDMGVAL
## 141 FLOOD 150319678257
## 338 HURRICANE/TYPHOON 71913712800
## 823 TORNADO 57362335387
## 561 STORM SURGE 43323541000
## 195 HAIL 18761221986
## 125 FLASH FLOOD 18243991079
## 73 DROUGHT 15018672000
## 329 HURRICANE 14610229010
## 821 THUNDERSTORM WINDS 11075628264
## 491 RIVER FLOOD 10148404500
top10propdamage
## EVTYPE PROPDMGVAL
## 141 FLOOD 144657709807
## 338 HURRICANE/TYPHOON 69305840000
## 823 TORNADO 56947381117
## 561 STORM SURGE 43323536000
## 125 FLASH FLOOD 16822673979
## 195 HAIL 15735267513
## 329 HURRICANE 11868319010
## 821 THUNDERSTORM WINDS 9916049526
## 705 TROPICAL STORM 7703890550
## 826 WINTER STORMS 6688997251
top10cropdamage
## EVTYPE CROPDMGVAL
## 73 DROUGHT 13972566000
## 141 FLOOD 5661968450
## 491 RIVER FLOOD 5029459000
## 352 ICE STORM 5022113500
## 195 HAIL 3025954473
## 329 HURRICANE 2741910000
## 338 HURRICANE/TYPHOON 2607872800
## 125 FLASH FLOOD 1421317100
## 112 EXTREME COLD 1312973000
## 821 THUNDERSTORM WINDS 1159578738
Many of the same event types appear on the top 10 for the various population and economic consequences.
To answer the questions posed for the assignment, I will take a look at plots of population and economic harm by event type.
## Plot fatalities
plotfatalities <- ggplot() + geom_bar(data = top10fatalities[1:5,], aes(x = EVTYPE, y = FATALITIES,
fill = interaction(FATALITIES, EVTYPE)), stat = "identity", show.legend = F) +
theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Fatalities") +
ylab("Number") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
## Plot injuries
plotinjuries <- ggplot() + geom_bar(data = top10injuries[1:5,], aes(x = EVTYPE, y = INJURIES,
fill = interaction(INJURIES, EVTYPE)), stat = "identity", show.legend = F) +
theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Injuries") +
ylab("Number") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
## Display both plots in two panels
grid.arrange(plotfatalities, plotinjuries, ncol=2, top="Top 5 event types impacting population health")
Tornados are the most harmful to population health regardless of whether the harm is injury or death. Excessive heat and flooding are also very harmful regardless of type of injury.
## Plot property damage
plotproperty <- ggplot() + geom_bar(data = top10propdamage[1:5,], aes(x = EVTYPE, y = PROPDMGVAL,
fill = interaction(PROPDMGVAL, EVTYPE)), stat = "identity", show.legend = F) +
theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Property damage") +
ylab("Damage amount") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
## Plot crop damage
plotcrop <- ggplot() + geom_bar(data = top10cropdamage[1:5,], aes(x = EVTYPE, y = CROPDMGVAL,
fill = interaction(CROPDMGVAL, EVTYPE)), stat = "identity", show.legend = F) +
theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Crop damage") +
ylab("Damage amount") + theme(axis.text.x = element_text(angle = 30, hjust = 1))
## Display both plots in two panels
grid.arrange(plotproperty, plotcrop, ncol=2, top="Top 5 event types causing economic loss")
Just as one would expect, property damage is largely caused by water - storms and flooding. Similarly, it is intuitive that drought would cause the most crop damage with water - storms and flooding - also being prominent causes of crop damage.