Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The storm database includes weather events from 1950 through the year 2011 and contains data estimates such as the number fatalities and injuries for each weather event as well as economic cost damage to properties and crops for each weather event.
The estimates for fatalities and injuries were used to determine weather events with the most harmful impact to population health. Property damage and crop damage cost estimates were used to determine weather events with the greatest economic consequences.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. File can be downloaded from the course web site:
Storm Data [https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation [https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf]
National Climatic Data Center Storm Events FAQ [https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf]
knitr::opts_chunk$set(fig.path='Plots/')
library(data.table)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
library(R.utils)
## Warning: package 'R.utils' was built under R version 4.4.1
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.26.0 (2024-01-24 05:12:50 UTC) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
##
## throw
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, load, save
## R.utils v2.12.3 (2023-11-18 01:00:02 UTC) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, isOpen, nullfile, parse, use, warnings
library(knitr)
## Warning: package 'knitr' was built under R version 4.4.1
We first need to get the data. To do this we have to download the file, unzip it, and read it into R.
Download and unzip data
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormData.csv.bz2")
bunzip2("stormData.csv.bz2", overwrite=T, remove=F)
Read the data into R
stormData <- read.csv("stormData.csv")
Then we have a quick look at the data
dim(stormData)
## [1] 902297 37
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
head(stormData, 3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
tail(stormData, 3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 902295 2 11/8/2011 0:00:00 02:58:00 PM AKS 213 AKZ213 AK
## 902296 2 11/9/2011 0:00:00 10:21:00 AM AKS 202 AKZ202 AK
## 902297 1 11/28/2011 0:00:00 08:00:00 PM CST 6 ALZ006 AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME
## 902295 HIGH WIND 0 11/9/2011 0:00:00 01:15:00 PM
## 902296 BLIZZARD 0 11/9/2011 0:00:00 05:00:00 PM
## 902297 HEAVY SNOW 0 11/29/2011 0:00:00 04:00:00 AM
## COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG
## 902295 0 NA 0 0 0 NA 81
## 902296 0 NA 0 0 0 NA 0
## 902297 0 NA 0 0 0 NA 0
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO
## 902295 0 0 0 K 0 K AFG
## 902296 0 0 0 K 0 K AFG
## 902297 0 0 0 K 0 K HUN
## STATEOFFIC
## 902295 ALASKA, Northern
## 902296 ALASKA, Northern
## 902297 ALABAMA, North
## ZONENAMES LATITUDE
## 902295 ST LAWRENCE IS. BERING STRAIT - ST LAWRENCE IS. BERING STRAIT 0
## 902296 NORTHERN ARCTIC COAST - NORTHERN ARCTIC COAST 0
## 902297 MADISON - MADISON 0
## LONGITUDE LATITUDE_E LONGITUDE_
## 902295 0 0 0
## 902296 0 0 0
## 902297 0 0 0
## REMARKS
## 902295 EPISODE NARRATIVE: A 960 mb low over the southern Aleutians at 0300AKST on the 8th intensified to 945 mb near the Gulf of Anadyr by 2100AKST on the 8th. The low crossed the Chukotsk Peninsula as a 956 mb low at 0900AKST on the 9th, and moved into the southern Chukchi Sea as a 958 mb low by 2100AKST on the 9th. The low then tracked to the northwest and weakened to 975 mb about 150 miles north of Wrangel Island by 1500AKST on the 10th. The storm was one of the strongest storms to impact the west coast of Alaska since November 1974. \n\nZone 201: Blizzard conditions were observed at Wainwright from approximately 1153AKST through 1611AKST on the 9th. The visibility was frequently reduced to one quarter mile in snow and blowing snow. There was a peak wind gust to 43kt (50 mph) at the Wainwright ASOS. During this event, there was also a peak wind gust to \n68 kt (78 mph) at the Cape Lisburne AWOS. \n\nZone 202: Blizzard conditions were observed at Barrow from approximately 1021AKST through 1700AKST on the 9th. The visibility was frequently reduced to one quarter mile or less in blowing snow. There was a peak wind gust to 46 kt (53 mph) at the Barrow ASOS. \n\nZone 207: Blizzard conditions were observed at Kivalina from approximately 0400AKST through 1230AKST on the 9th. The visibility was frequently reduced to one quarter of a mile in snow and blowing snow. There was a peak wind gust to 61 kt (70 mph) at the Kivalina ASOS. The doors to the village transportation shed were blown out to sea. Many homes lost portions of their tin roofing, and satellite dishes were ripped off of roofs. One home had its door blown off. At Point Hope, severe blizzard conditions were observed. There was a peak wind gust of 68 kt (78 mph) at the Point Hope AWOS before power was lost to the AWOS. It was estimated that the wind gusted as high as 85 mph in the village during the height of the storm during the morning and early afternoon hours on the 9th. Five power poles were knocked down in the storm EVENT NARRATIVE:
## 902296 EPISODE NARRATIVE: A 960 mb low over the southern Aleutians at 0300AKST on the 8th intensified to 945 mb near the Gulf of Anadyr by 2100AKST on the 8th. The low crossed the Chukotsk Peninsula as a 956 mb low at 0900AKST on the 9th, and moved into the southern Chukchi Sea as a 958 mb low by 2100AKST on the 9th. The low then tracked to the northwest and weakened to 975 mb about 150 miles north of Wrangel Island by 1500AKST on the 10th. The storm was one of the strongest storms to impact the west coast of Alaska since November 1974. \n\nZone 201: Blizzard conditions were observed at Wainwright from approximately 1153AKST through 1611AKST on the 9th. The visibility was frequently reduced to one quarter mile in snow and blowing snow. There was a peak wind gust to 43kt (50 mph) at the Wainwright ASOS. During this event, there was also a peak wind gust to \n68 kt (78 mph) at the Cape Lisburne AWOS. \n\nZone 202: Blizzard conditions were observed at Barrow from approximately 1021AKST through 1700AKST on the 9th. The visibility was frequently reduced to one quarter mile or less in blowing snow. There was a peak wind gust to 46 kt (53 mph) at the Barrow ASOS. \n\nZone 207: Blizzard conditions were observed at Kivalina from approximately 0400AKST through 1230AKST on the 9th. The visibility was frequently reduced to one quarter of a mile in snow and blowing snow. There was a peak wind gust to 61 kt (70 mph) at the Kivalina ASOS. The doors to the village transportation shed were blown out to sea. Many homes lost portions of their tin roofing, and satellite dishes were ripped off of roofs. One home had its door blown off. At Point Hope, severe blizzard conditions were observed. There was a peak wind gust of 68 kt (78 mph) at the Point Hope AWOS before power was lost to the AWOS. It was estimated that the wind gusted as high as 85 mph in the village during the height of the storm during the morning and early afternoon hours on the 9th. Five power poles were knocked down in the storm EVENT NARRATIVE:
## 902297 EPISODE NARRATIVE: An intense upper level low developed on the 28th at the base of a highly amplified upper trough across the Great Lakes and Mississippi Valley. The upper low closed off over the mid South and tracked northeast across the Tennessee Valley during the morning of the 29th. A warm conveyor belt of heavy rainfall developed in advance of the low which dumped from around 2 to over 5 inches of rain across the eastern two thirds of north Alabama and middle Tennessee. The highest rain amounts were recorded in Jackson and DeKalb Counties with 3 to 5 inches. The rain fell over 24 to 36 hour period, with rainfall remaining light to moderate during most its duration. The rainfall resulted in minor river flooding along the Little River, Big Wills Creek and Paint Rock. A landslide occurred on Highway 35 just north of Section in Jackson County. A driver was trapped in his vehicle, but was rescued unharmed. Trees, boulders and debris blocked 100 to 250 yards of Highway 35.\n\nThe rain mixed with and changed to snow across north Alabama during the afternoon and evening hours of the 28th, and lasted into the 29th. The heaviest bursts of snow occurred in northwest Alabama during the afternoon and evening hours, and in north central and northeast Alabama during the overnight and morning hours. Since ground temperatures were in the 50s, and air temperatures in valley areas only dropped into the mid 30s, most of the snowfall melted on impact with mostly trace amounts reported in valley locations. However, above 1500 foot elevation, snow accumulations of 1 to 2 inches were reported. The heaviest amount was 2.3 inches on Monte Sano Mountain, about 5 miles northeast of Huntsville.EVENT NARRATIVE: Snowfall accumulations of up to 2.3 inches were reported on the higher elevations of eastern Madison County. A snow accumulation of 1.5 inches was reported 2.7 miles south of Gurley, while 2.3 inches was reported 3 miles east of Huntsville atop Monte Sano Mountain.
## REFNUM
## 902295 902295
## 902296 902296
## 902297 902297
We then create a subset of the data including only the following columns. This will make it easier to work with.
BGN_DATE - Date EVTYPE - Event type FATALITIES - No. of fatalities INJURIES - No. of injuries PROPDMG - Property damage PROPDMGEXP - Unit of property damage amount CROPDMG - Crop damage CROPDMGEXP -Unit of crop damage amount
keepCols <- c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
StormData_Used <- stormData[keepCols]
Format the date and create a new column for year
StormData_Used$Year <- as.numeric(format(as.Date(StormData_Used$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
As the units for property and crop damage are not the same for each entry, we will have to normalise the data.
unique(StormData_Used$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
StormData_Used$PROPDMGEXP <- as.character(StormData_Used$PROPDMGEXP)
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'H'] <- "2"
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'K'] <- "3"
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'M'] <- "6"
StormData_Used$PROPDMGEXP[toupper(StormData_Used$PROPDMGEXP) == 'B'] <- "9"
StormData_Used$PROPDMGEXP <- as.numeric(StormData_Used$PROPDMGEXP)
## Warning: NAs introduced by coercion
StormData_Used$PROPDMGEXP[is.na(StormData_Used$PROPDMGEXP)] <- 0
StormData_Used$TOTALPROPDMG <- StormData_Used$PROPDMG * 10^StormData_Used$PROPDMGEXP
Now for Crop Damage
unique(StormData_Used$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
StormData_Used$CROPDMGEXP <- as.character(StormData_Used$CROPDMGEXP)
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'H'] <- "2"
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'K'] <- "3"
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'M'] <- "6"
StormData_Used$CROPDMGEXP[toupper(StormData_Used$CROPDMGEXP) == 'B'] <- "9"
StormData_Used$CROPDMGEXP <- as.numeric(StormData_Used$CROPDMGEXP)
## Warning: NAs introduced by coercion
StormData_Used$CROPDMGEXP[is.na(StormData_Used$CROPDMGEXP)] <- 0
StormData_Used$TOTALCROPDMG <- StormData_Used$CROPDMG * 10^StormData_Used$CROPDMGEXP
Now let’s look at new dataset following manipulation
head(StormData_Used, 3)
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 4/18/1950 0:00:00 TORNADO 0 15 25.0 3 0
## 2 4/18/1950 0:00:00 TORNADO 0 0 2.5 3 0
## 3 2/20/1951 0:00:00 TORNADO 0 2 25.0 3 0
## CROPDMGEXP Year TOTALPROPDMG TOTALCROPDMG
## 1 0 1950 25000 0
## 2 0 1950 2500 0
## 3 0 1951 25000 0
To determine the impact on public health we would like to plot the data for both injuries and fatalities caused by severe weather events.
To do this we will subset the data to include the top ten most damaging, then combine these to form one dataset, and finally plot this data.
TotFatalities <- aggregate(StormData_Used$FATALITIES, by = list(StormData_Used$EVTYPE), "sum")
names(TotFatalities) <- c("Event", "Fatalities")
TotFatalities <- TotFatalities[order(-TotFatalities$Fatalities), ][1:10, ]
TotFatalities
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
TotInjuries <- aggregate(StormData_Used$INJURIES, by = list(StormData_Used$EVTYPE), "sum")
names(TotInjuries) <- c("Event", "Injuries")
TotInjuries <- TotInjuries[order(-TotInjuries$Injuries), ][1:10, ]
TotInjuries
## Event Injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
Merge the datasets
TotHealthDamage <- merge(x = TotFatalities, y = TotInjuries, by = "Event", all = TRUE)
TotHealthDamage <- melt(TotHealthDamage, id.vars = 'Event')
## Warning: The melt generic in data.table has been passed a data.frame and will
## attempt to redirect to the relevant reshape2 method; please note that reshape2
## is superseded and is no longer actively developed, and this redirection is now
## deprecated. To continue using melt methods from reshape2 while both libraries
## are attached, e.g. melt.list, you can prepend the namespace, i.e.
## reshape2::melt(TotHealthDamage). In the next version, this warning will become
## an error.
Plot the data
ggplot(TotHealthDamage, aes(Event, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Event Type") +
ylab("Damage, USD") + ggtitle("Efect on Health damage by Event type")
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_bar()`).
In conclusion, tornados are the most damaging to people in the US.
To determine the impact the economy we would like to plot the data for both property and crop damage caused by severe weather events.
To do this we will subset the data to include the top ten most damaging, then combine these to form one dataset, and finally plot this data.
What are the top ten causes of property damage?
TotPropDmg <- aggregate(StormData_Used$TOTALPROPDMG, by = list(StormData_Used$EVTYPE), "sum")
names(TotPropDmg) <- c("Event", "Prop_Cost")
TotPropDmg <- TotPropDmg[order(-TotPropDmg$Prop_Cost), ][1:10, ]
TotPropDmg
## Event Prop_Cost
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380677
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046295
What are the top 10 causes of crop damage?
TotCropDmg <- aggregate(StormData_Used$TOTALCROPDMG, by = list(StormData_Used$EVTYPE), "sum")
names(TotCropDmg) <- c("Event", "Crop_Cost")
TotCropDmg <- TotCropDmg[order(-TotCropDmg$Crop_Cost), ][1:10, ]
TotCropDmg
## Event Crop_Cost
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
Lets merge the datasets
TotEcoDamage <- merge(x = TotPropDmg, y = TotCropDmg, by = "Event", all = TRUE)
TotEcoDamage <- melt(TotEcoDamage, id.vars = 'Event')
## Warning: The melt generic in data.table has been passed a data.frame and will
## attempt to redirect to the relevant reshape2 method; please note that reshape2
## is superseded and is no longer actively developed, and this redirection is now
## deprecated. To continue using melt methods from reshape2 while both libraries
## are attached, e.g. melt.list, you can prepend the namespace, i.e.
## reshape2::melt(TotEcoDamage). In the next version, this warning will become an
## error.
Now let’s look at the data.
ggplot(TotEcoDamage, aes(Event, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Event Type") +
ylab("Damage, USD") + ggtitle("Crop/Property damage by type")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_bar()`).
In conclusion, flood damage is the most significant to both property and crops.