The aim of this project is to explore the International Best Track Archive for Climate Stewardship (IBTrACS) data set. Throughout this project, I will address some claims given about the hurricane data set and use a variety of data visualization methods to either prove or disprove those claims.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
#install.packages(c("maps", "rnaturalearth"))
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(rnaturalearth)
#install.packages("gganimate")
library(gganimate)
Before importing the CSV file, I created a vector for the column names that the assignment required us to import, from SID to LANDFALL. I then created another vector assigning the data types of each column.
col_names <- c('SID', 'SEASON', 'NUMBER', 'BASIN', 'SUBBASIN', 'NAME', 'ISO_TIME', 'NATURE', 'LAT', 'LON', 'WMO_WIND', 'WMO_PRES', 'WMO_AGENCY', 'TRACK_TYPE', 'DIST2LAND', 'LANDFALL')
col_class <- c('character', 'integer', 'integer', 'character', 'character', 'character', 'character', 'character', 'real', 'real', 'integer', 'integer', 'character', 'character', 'integer', 'integer')
Then I imported the CSV file, using the sample code as a template to follow. The IBTrACS version 4 website stated that the missing values were encoded as blank cells, thus I assigned an empty space to na.strings. I made sure to rename the columns using the col_names vector as well.
ibtracs <- read.table(
file = 'ibtracs.NA.list.v04r00.csv',
sep = ",",
colClasses = c(col_class, rep("NULL", 147)),
skip = 77876,
na.strings = ' '
)
colnames(ibtracs) <- col_names
head(ibtracs, 5)
## SID SEASON NUMBER BASIN SUBBASIN NAME ISO_TIME NATURE
## 1 1969326N10279 1969 113 NA NA MARTHA 1969-11-25 12:00:00 TS
## 2 1970138N12281 1970 43 NA CS ALMA 1970-05-17 18:00:00 TS
## 3 1970138N12281 1970 43 NA CS ALMA 1970-05-17 21:00:00 TS
## 4 1970138N12281 1970 43 NA CS ALMA 1970-05-18 00:00:00 TS
## 5 1970138N12281 1970 43 NA CS ALMA 1970-05-18 03:00:00 TS
## LAT LON WMO_WIND WMO_PRES WMO_AGENCY TRACK_TYPE DIST2LAND LANDFALL
## 1 8.500 -82.0000 25 NA hurdat_atl main 0 NA
## 2 11.500 -79.0000 25 NA hurdat_atl main 224 224
## 3 11.575 -79.0676 NA NA <NA> main 235 235
## 4 11.700 -79.2000 25 NA hurdat_atl main 246 246
## 5 11.900 -79.4349 NA NA <NA> main 268 268
I followed the given code to add the MONTH column and to display the structure of my imported data.
ibtracs$ISO_TIME = as.POSIXct(ibtracs$ISO_TIME)
ibtracs$MONTH <- lubridate::month(ibtracs$ISO_TIME)
str(ibtracs, vec.len = 1)
## 'data.frame': 46264 obs. of 17 variables:
## $ SID : chr "1969326N10279" ...
## $ SEASON : int 1969 1970 ...
## $ NUMBER : int 113 43 ...
## $ BASIN : chr "NA" ...
## $ SUBBASIN : chr "NA" ...
## $ NAME : chr "MARTHA" ...
## $ ISO_TIME : POSIXct, format: "1969-11-25 12:00:00" ...
## $ NATURE : chr "TS" ...
## $ LAT : num 8.5 11.5 ...
## $ LON : num -82 -79 ...
## $ WMO_WIND : int 25 25 ...
## $ WMO_PRES : int NA NA ...
## $ WMO_AGENCY: chr "hurdat_atl" ...
## $ TRACK_TYPE: chr "main" ...
## $ DIST2LAND : int 0 224 ...
## $ LANDFALL : int NA 224 ...
## $ MONTH : num 11 5 ...
Throughout my exploration, I utilized dplyr and ggplot2 methods to address the claims.
I first filtered the ibtracs table to only show storms in the 2020 season, then added filters to make it so that only tropical cyclones in the Atlantic were featured in the data. Thus adding that the BASINs must either be in the North Atlantic or the South Atlantic, that the SUBBASINs must be in the Caribbean Sea or the Gulf of Mexico, and finally putting in that the NATURE of the storms must be either Tropical or Subtropical in order to assure that the data reflected tropical cyclones.
Since hurricane season is June to November, I summarized the filtered data table to display the starting ISO_TIME (which refers to the Universal Time Coordinates of the storm) by grouping the lowest ISO_TIME for each storm.
Through the table, I confirmed that there were in fact 31 storms that formed in 2020 and only one was not named and in the table is called “NOT_NAMED”. By arranging the storms2020start table in ascending order, you can see that there were two storms: “ARTHUR” and “BERTHA” that were first reported in May and thus formed pre-season.
filter2020 = filter(ibtracs, SEASON == 2020)
tropical_2020 = filter(filter2020, BASIN == 'NA' | BASIN == "SA" & NATURE == 'TS' | NATURE == 'SS' | SUBBASIN == 'CS' | SUBBASIN == 'GM')
storms2020start = arrange(summarise(group_by(tropical_2020, NAME), start_time = min(ISO_TIME)), start_time)
head(storms2020start, 10)
## # A tibble: 10 × 2
## NAME start_time
## <chr> <dttm>
## 1 ARTHUR 2020-05-16 18:00:00
## 2 BERTHA 2020-05-27 06:00:00
## 3 CRISTOBAL 2020-06-01 18:00:00
## 4 DOLLY 2020-06-22 06:00:00
## 5 EDOUARD 2020-07-04 06:00:00
## 6 FAY 2020-07-05 12:00:00
## 7 GONZALO 2020-07-20 12:00:00
## 8 HANNA 2020-07-23 00:00:00
## 9 ISAIAS 2020-07-28 12:00:00
## 10 NOT_NAMED 2020-07-29 18:00:00
I first filtered through the filter2020 from the previous cell to assure that all the storms in atlantic_2020 were in the North Atlantic, South Atlantic BASINs and the Caribbean Sea and Gulf of Mexico SUBBASINs.
An important thing to take note of is that according to the data dictionary for IBTrACS, WMO_WIND is shown in knots not MPH. Thus we must remember to convert the threshold of 111 mph for a Category 3 hurricane (which is what categorizes a hurricane as a “major hurricane”) to ~96 knots. The same goes for filtering for Category 5 hurricanes, the hurricane must reach at least 156 mph and so we must convert that to ~135 knots.
atlantic_2020 = filter(filter2020, BASIN == 'NA' | BASIN == "SA" & SUBBASIN == 'CS' | SUBBASIN == 'GM')
cat3_or_above = filter(atlantic_2020, WMO_WIND %in% 96:134)
distinct(cat3_or_above, NAME)
## NAME
## 1 LAURA
## 2 TEDDY
## 3 DELTA
## 4 EPSILON
## 5 ZETA
## 6 ETA
## 7 IOTA
I created a faceted scatterplot (scatter_2020_mjr) for the 7 major storms in 2020 to show the progression of the storms over the 2020 storm season.
mjr_WMO_tracked = filter(atlantic_2020, NAME == 'DELTA' | NAME == 'EPSILON' | NAME == 'ETA' | NAME == 'IOTA' | NAME == 'LAURA' | NAME == 'TEDDY' | NAME == 'ZETA')
scatter_2020_mjr = ggplot(data = mjr_WMO_tracked, aes(x = ISO_TIME, y = WMO_WIND)) +
geom_point(size = .5) +
facet_wrap(~NAME) +
ggtitle("Wind Speed for the 7 Major Storms in 2020") +
theme_bw()
scatter_2020_mjr
## Warning: Removed 250 rows containing missing values (geom_point).
Lastly, to see whether any of the storms reached category 5 status, I simply filtered through mjr_WMO_tracked to see if any hurricanes crossed the threshold of 135 knots and found that not one hurricane did.
filter(mjr_WMO_tracked, WMO_WIND > 135)
## [1] SID SEASON NUMBER BASIN SUBBASIN NAME
## [7] ISO_TIME NATURE LAT LON WMO_WIND WMO_PRES
## [13] WMO_AGENCY TRACK_TYPE DIST2LAND LANDFALL MONTH
## <0 rows> (or 0-length row.names)
hurricanes_2010 = filter(ibtracs, SEASON == 2010 & BASIN == 'NA' | BASIN == 'SA' & SUBBASIN == 'CS' | SUBBASIN == 'GM')
head(hurricanes_2010, 5)
## SID SEASON NUMBER BASIN SUBBASIN NAME ISO_TIME NATURE
## 1 1970138N12281 1970 43 NA GM ALMA 1970-05-24 00:00:00 TS
## 2 1970138N12281 1970 43 NA GM ALMA 1970-05-24 03:00:00 TS
## 3 1970138N12281 1970 43 NA GM ALMA 1970-05-24 06:00:00 TS
## 4 1970138N12281 1970 43 NA GM ALMA 1970-05-24 09:00:00 TS
## 5 1970138N12281 1970 43 NA GM ALMA 1970-05-24 12:00:00 TS
## LAT LON WMO_WIND WMO_PRES WMO_AGENCY TRACK_TYPE DIST2LAND LANDFALL
## 1 23.0000 -84.0000 25 NA hurdat_atl main 33 33
## 2 23.4925 -84.0000 NA NA <NA> main 84 84
## 3 24.0000 -84.0000 25 NA hurdat_atl main 137 137
## 4 24.5550 -84.0074 NA NA <NA> main 199 199
## 5 25.2000 -84.0000 25 1008 hurdat_atl main 242 196
## MONTH
## 1 5
## 2 5
## 3 5
## 4 5
## 5 5
Then I filtered through hurricanes_2010 with the LAT and LON boundaries of the United States to see whether any of the storms traversed there. I looked at a map of the United States with latitude and longitudinal lines to find the boundaries to place. I found that for the latitude, the top of the United States veers on 50’N and the bottom close to 30’N. However to account for Florida and the bottom of Texas, I lowered the latitude to 20’N. For the longitudinal boundaries, I applied the same logic but to account for Maine and the West Coast.
filter(hurricanes_2010, LAT %in% 20:50 & LON %in% 60:130) #barriers for LAT and LON
## [1] SID SEASON NUMBER BASIN SUBBASIN NAME
## [7] ISO_TIME NATURE LAT LON WMO_WIND WMO_PRES
## [13] WMO_AGENCY TRACK_TYPE DIST2LAND LANDFALL MONTH
## <0 rows> (or 0-length row.names)
I made an animated map with the data from hurricanes_2010 to then see the course of the storms on the map. From this I found that while the storms got pretty close to the coast from Texas to Louisiana in the Gulf of Mexico, it did not directly hit the United States!
world_df = ne_countries(scale = "medium", returnclass = "sf")
class(world_df)
## [1] "sf" "data.frame"
worldcanvas = ggplot(data = world_df) +
geom_sf() +
coord_sf(xlim = c(-150, 0), ylim = c(0, 90), expand = TRUE) +
theme_bw()
map_storms = worldcanvas +
geom_path(data = hurricanes_2010,
aes(x = LON, y = LAT, color = 'red'),
lineend = "round", size = .2, alpha = 0.8)
animated_2010 = worldcanvas + geom_point(data = hurricanes_2010, aes(x = LON, y = LAT), size = .3) +
transition_states(NAME,
transition_length = 2,
state_length = 1) +
ggtitle("Storms in 2010")
animated_2010
anim_save("Storms in 2010.gif", animation = last_animation(), path = )
named2005 = filter(ibtracs, SEASON == 2005)
count2005 = arrange(summarise(group_by(named2005, NAME)))
count2005
## # A tibble: 28 × 1
## NAME
## <chr>
## 1 ALPHA
## 2 ARLENE
## 3 BETA
## 4 BRET
## 5 CINDY
## 6 DELTA
## 7 DENNIS
## 8 EMILY
## 9 EPSILON
## 10 FRANKLIN
## # … with 18 more rows
To get a count of the major storms, I applied the filter of a minimum of 96 knots in order for it to be considered Category 3. Then I then filtered it so that it only applied to storms in the season of 2005. From this, i got the names of 7 storms that became major hurricanes in the year.
hurricanes2005 = filter(ibtracs, WMO_WIND >= 96 & SEASON == 2005)
distinct(hurricanes2005, NAME)
## NAME
## 1 DENNIS
## 2 EMILY
## 3 KATRINA
## 4 MARIA
## 5 RITA
## 6 WILMA
## 7 BETA
Then, I filtered through ibtracs with the same threshold for a major hurricane, at least 96 knots. I found that while 2005 did in fact have a total of 7 major hurricanes, so did 2020 thus tying 2005 and 2020 for most major hurricanes.
filtered_storms = filter(ibtracs, WMO_WIND >= 96)
major_storms = count(
distinct(
group_by(
select(filtered_storms, SEASON, NAME), SEASON
)
)
)
head(arrange(major_storms, desc(n)), 5)
## # A tibble: 5 × 2
## # Groups: SEASON [5]
## SEASON n
## <int> <int>
## 1 2005 7
## 2 2020 7
## 3 1996 6
## 4 2004 6
## 5 2017 6
“Active” refers to the season with the most tropical cyclones. I filtered through ibtracs to account for the Atlantic BASINS and SUBBASINS. I then got a count of the distinct storms in the period of 1970-2020 by the years, or SEASONS in this case. I then arranged it in descending order so that we can see it from most amount of tropical cyclones to least. From this, I found that 2020 did in fact have the most active on record, with 2005 being a close second. One takeaway I had from this data was that it is evident the effects of climate change on the water temperature which ultimately has resulted in 2020, the most recent year reported in this database, to having the greatest amount of tropical cyclones!
atlantic_hurricanes = filter(ibtracs, BASIN == 'NA' | BASIN == 'SA' & NATURE == 'TS' | NATURE == "SS")
tropical_storm = count(
distinct(
group_by(
select(atlantic_hurricanes, SEASON, NAME), SEASON
)
)
)
head(arrange(tropical_storm, desc(n)), 5)
## # A tibble: 5 × 2
## # Groups: SEASON [5]
## SEASON n
## <int> <int>
## 1 2020 31
## 2 2005 28
## 3 2021 22
## 4 1995 20
## 5 2010 20
To tackle this claim, I started with filtering the previous vector of Atlantic hurricanes to include a filter for the wind speed threshold. A category 1 hurricane is a storm that is at least 64 knots. I then got a count of the distinct storms using the SID rather than the NAME due to the fact that names for storms get reused every 6 years. And since we are looking at the storm database in the time frame between 1970 and 2020, we cannot count the distinct names due to the process of reusing them. After arranging the count of hurricanes in descending order, I found that 2005 actually had the record highest of storms intensifying into hurricanes. 2020 was actually a close second with the 14 storms!
atlantic_storms = filter(atlantic_hurricanes, WMO_WIND > 64)
oneorabove_atlantic = count(
distinct(
group_by(
select(atlantic_storms, SID, SEASON), SEASON
)
)
)
head(arrange(oneorabove_atlantic, desc(n)), 5)
## # A tibble: 5 × 2
## # Groups: SEASON [5]
## SEASON n
## <int> <int>
## 1 2005 15
## 2 2020 14
## 3 2010 12
## 4 1995 11
## 5 1998 10
In order to create an animated map of the storms in 2020, I first filtered ibtracs to only contain storms from the 2020 SEASON. I then followed along with Professor Sanchez’s guideline for graphing maps.
I chose to utilize the “rnaturalearth package” due its aesthetic advantage that “we can zoom-in without having disorted polygons”. Through the process of following along, I created a world data frame called “world_df” and then created a world map called “world_canvas”. I chose to make the theme: theme_dark() rather than theme_bw() because I thought it looked better aesthetically.
storms2020 = filter(ibtracs, SEASON == 2020)
world_df = ne_countries(scale = "medium", returnclass = "sf")
class(world_df)
## [1] "sf" "data.frame"
world_canvas = ggplot(data = world_df) +
geom_sf() +
coord_sf(xlim = c(-150, 0), ylim = c(0, 90), expand = TRUE) +
theme_dark()
I then imposed the storms2020 data onto the world_canvas, choosing to have the color of the points be grouped by the NAME of the storms.
storms_imposed = world_canvas +
geom_point(data = storms2020, aes(x = LON, y = LAT, color = NAME))
Then, to animate it I followed along with the gganimate guidelines by Pederson and Robison. By changing around the aesthetic presets within gganimate, I created an animated world map of the storms in 2020!
animated_2020 = storms_imposed +
geom_point(data = storms2020, aes(x = LON, y = LAT, color = NAME)) +
transition_states(NAME,
transition_length = 2,
state_length = 1)
animated_2020 + ggtitle("Storms in 2020")
animated2020_names = animated_2020 +
enter_fade() + enter_drift(x_mod = -1) +
exit_shrink() + exit_drift(x_mod = 5)
map2020 = animate(
animated2020_names + ease_aes(x = 'bounce-out') + enter_fly(x_loc = -1) + exit_fade(),
width = 400, height = 600, res = 35
)
anim_save("Storms in 2020.gif", animation = last_animation(), path = )
map2020
Seeing 2020’s storms in animation on a map really highlighted the severity of the increase in total storms over the year. Not to mention the higher culmination of them into hurricanes. From the map, you can also see that some of the storms also hit the United States…
This is in stark comparison to 2010, which had the third highest amount of storms (after 2005 and 2020), yet none of the storms hit the United States, shows the severity of this culmination. The higher amount of storms can be a direct correlate of higher water temperature, which fuels more storms to become hurricanes, which ultimately results in more hurricanes hitting land.
animated_2010