Nasa Image from Jeff Schmaltz of Tropical Cyclone Amanda
Nasa Image from Jeff Schmaltz of Tropical Cyclone Amanda

Source: https://education.nationalgeographic.org/resource/hurricanes-cyclones-and-typhoons-explained/

Introduction

Fun Fact! 1950 was the first year meteorologists started naming Hurricanes alphabetically. The first named Hurricane to strike land in 1950 was Hurricane Baker.

Historic data indicates that the occurence and intensity of cyclonic storms (Hurricanes, Typhoons and Monsoons) increases with increased earth temperature.

We’re going to investigate that as a class by sourcing data on average global temperatures and the number and intensity of hurricanes.

If we can visualize those two sets of data side-by-side, we would expect to see more and bigger storms as the earth gets hotter and hotter.




Data Sourcing

Fun Fact! Tropical Cyclones have different names depending on where in the world they form. Hurricanes are in the Americas. Typhoons are in the oceans near Japan, China, the Phillipines and North of Indonesia. Cyclones are by Africa, Saudi Arabia, India and northern Australia south of Indonesia.

Where tropical cyclones have different names
Where tropical cyclones have different names

Source: https://www.metoffice.gov.uk/weather/learn-about/weather/types-of-weather/hurricanes/location

Loading Libraries

We’re using the tidyverse to have a cohesive and consistent ecosystem of libraries to complete Data Science-motivated projects.

The `readr`` package lets us read text files from locally on our computer or from the internet.

# Libraries used
library(tidyverse)
library(readr)


Loading Data

NASA global mean temperature data

NASA has collected global average temperatures since 1880 when enough places started collecting temperature data to calculate a global average.

Any temperature data from before 1880 is inferred from tree rings, pollen and plant fossil analysis and ice and rock sampling but can’t be compared apples-to-apples with the direct and sufficiently broadly placed around the world measurements of temperature starting from 1880.

Presumably the number of places and locations collecting measurements, or the way it’s collected with possibly satellites, means that while we can still compare temperature temporally (meaning “across time”) from 1880 to today we should be mindful that the data could be interpretable slightly differently. We’ll see that more with the hurricane data.

# Loading temperature data
temps <- read_table("https://data.giss.nasa.gov/gistemp/graphs/graph_data/Global_Mean_Estimates_based_on_Land_and_Ocean_Data/graph.txt", 
    col_names = FALSE, skip = 5)


National Hurricane Center data

The National Oceanic and Atmospheric Administration has historical data on hurricanes going back to 1850!

However we need to be careful about interpreting the data temporally (meaning “across time”) because the data was not collected the same way every year since 1850.

Before roughly the year 1900, average sustained windspeed for storms was recorded by tens, and after, in increments of five miles per hour.

Also, early detection and tracking presumably with satellites means that we have more smaller storms recorded since the 1950s that weren’t being recorded beforehand or weren’t being given names.

Here, when we download the data, we are dropping all unnamed storms.

# Loading storm data
hurricanes <- read_csv("https://www.nhc.noaa.gov/data/hurdat/hurdat2-atl-1851-2023-042624.txt", 
    col_names = FALSE, skip = 1) %>% 
  select(year=X1, windspeed=X7) %>% 
  drop_na()



Data Preparation

Fun Fact! Hurricanes are rated on the Saffir-Simpson Hurricane Wind Scale. While there are gusts of wind that are even faster, these are the storm’s sustained wind speeds, and the category can help predict property damage.

Category One: 74-95 mph
Category Two: 96-110 mph
Category Three: 111-129 mph
Category Four: 130-156 mph
Category Five: 157 mph or higher

In order to start visualizing the data we need to prepare the data that we’ve read from the internet. For the temperature data we add column names.

# Prepare temperature data
colnames(temps) <- c("year", "diff", "smoothdiff")

For the hurricane data we have to
(1) convert the date column into years,
(2) remove all of the years before 1880 to be aligned with the temperature data,
(3) remove all of the storms with windspeeds below a category 1 level of 74mph, and
(4) create a column with the category for each of the hurricanes based on their windspeed.

# Prepare hurricane data
hurricanes$year <- year(ymd(hurricanes$year))
hurricanes <- filter(hurricanes, year > 1879)
hurricanes <- filter(hurricanes, windspeed >= 74)
hurricanes$category <- cut(hurricanes$windspeed, breaks = c(74,96,111,130,157,500), labels = c("cat1", "cat2", "cat3", "cat4", "cat5"))



Data Visualization

Fun Fact! The word “hurricane” came from Taino, an indigenous Caribbean language. The Taino word was “hurakan” which means evil spirits of the wind. The word “Typhoon” may have come from the Chinese words Tai Feng or Great Wind.

Here we’re going create data visualizations to look at changes in temperature and the number of hurricanes of different categories.

We’ll also look at the variability of hurricane strength over time by looking at wind speeds in a given year.

Change in global temperature

Here we plot the average global temperature in each year. The gray line connects the actual points of data and the black line is a smoothed curve to remove the random noise in the data. Note that the data is in Celsius and 0.0, the baseline, is the average global temperature between 1951 and 1980.

Note that average temperature was decreasing from 1880 to 1910 and increasing thereafter. This is commonly linked to man-made climate change due to industrialization of nations and increased population. The fancy word for man-made is anthropogenic. Anthrop for Man and Genic for Comes-from.

Incidence of hurricanes by category

Here we’ve graphed the five categories of storm by number in each year.

If we compare the curves below to the temperature curve above we can see a correlation between average global temperature and the number and strength of the hurricanes. Average temperature and hurricane frequency and strength have been rising in the last 50 years. This corresponds to an unprecedented wave of population grown and industrialization across the globe.

We can also see there are three peaks in the temperature data, 1880, 1945 and 2023 and those correspond to peaks in the smoothed curves of the different categories of storms. Correlation is when one thing (temperature or hurricanes) goes up the thing it’s correlated to (hurricanes or temperature) goes up and vice-versa. Correlation does not prove causation but we can think through that increased temperature is causing storms and not storms causing increased temperature.

Note we have a bottom line of dots which are years in which there were zero of a particular hurricane category. We don’t want to remove them because that would distort the smoothed curves.

There are only eight category five hurricanes recorded however you can see the trend line is ticking up so they may become more common in a future with higher average global temperatures.

The colors in the graph below are:
Category 1 = Yellow
Category 2 = Green
Category 3 = Blue
Category 4 = Purple
Category 5 = Red


Increasing variance in windspeed

The variability of the strength of the hurricanes is important.

Engineers use models of the likelihood of a megastorm to build seawalls and levees that will keep a population safe. Those models are dependent on a measure of the variability of hurricane strength and if hurricane strength is increasing then a once-in-500-year megastorm measured using the variability in storms from the 1970s might be a once in 10 year storm in a future with higher temperatures and higher storm variability. The levee originally designed to protect us against storms for 500 years might be overwhelmed in the next 20 years.

Here we can see the variance in windspeed among category 1 or higher hurricanes each year as the dots and a smoothed curve showing that variance is increasing over time.




Conclusion

As fun as it is learning about hurricanes, data science and global warming, we also need to recognize that increasing global warming makes it more likely that we could have another Hurricane Katrina, or that we have climate refugees who are displaced because it’s too hot or dangerous to stay where they come from.

Devastation after Hurricane Katrina flooded New Orleans in 2005
Devastation after Hurricane Katrina flooded New Orleans in 2005

Source: https://www.history.com/news/hurricane-katrina-facts-legacy

We learned that global temperatures are rising and with it there are more hurricanes of greater intensity.

Another take away is that the variance between storms is rising and that means our models for the size of a once-in-500 year mega storm is underestimated in a new world of higher temperatures and greater storm strength and variability.

We also learned a little about R programming, how to find and read data into R, and how to make data visualizations.




Self Critique

I would have liked to have one visual that had both temperature and hurricane data but I couldn’t figure out how to do it and thought the simpler graphs were very effective.

There was a period in the hurricane data where a number of storms had a windspeed of -99. Those got cut off when I removed records with windspeeds less than 74 however I would have liked to read into the National Hurricane Center’s data documentation to find out what those meant and assess if dropping them affected my use of the data. I also should have added a blurb about the unnamed storms in the data that were dropped.

I think it’s good work, well written, clear code, easy to follow as a document but think I could work on aligning the effort to the rubric better so I could do less and get the grade (or an even better grade). I’m ok not being efficient though because I am learning so much.

I have a tendency to say things implicitly (it doesn’t seem implicit to me) but I’m trying to be better about being explicit in my graphs and text.


References

We located data on global temperatures here at https://climate.nasa.gov/vital-signs/global-temperature/?intent=121

We located data on hurricanes, the HURDAT2 dataset, here at https://www.nhc.noaa.gov/data/#hurdat

RMarkdown has a feature in the ‘Environment’ tab called ‘Import Dataset’, which provides a Graphical User Interface to put a location of a file and view it as it’s been read and a box with the code to read the data. Then you can toggle or adjust various parameters that both affect the display of data and update the block of code that you can then copy into your Rmarkdown file.

To help with the data visualization I used the R Graph Gallery at https://r-graph-gallery.com/

Here is a great reference for html colors: https://www.w3schools.com/tags/ref_colornames.asp




Code Appendix


# Libraries used
library(tidyverse)
library(readr)

# Loading temperature data
temps <- read_table("https://data.giss.nasa.gov/gistemp/graphs/graph_data/Global_Mean_Estimates_based_on_Land_and_Ocean_Data/graph.txt", 
    col_names = FALSE, skip = 5)

# Loading storm data
hurricanes <- read_csv("https://www.nhc.noaa.gov/data/hurdat/hurdat2-atl-1851-2023-042624.txt", 
    col_names = FALSE, skip = 1) %>% 
  select(year=X1, windspeed=X7) %>% 
  drop_na()

# Prepare temperature data
colnames(temps) <- c("year", "diff", "smoothdiff")

# Prepare hurricane data
hurricanes$year <- year(ymd(hurricanes$year))
hurricanes <- filter(hurricanes, year > 1879)
hurricanes <- filter(hurricanes, windspeed >= 74)
hurricanes$category <- cut(hurricanes$windspeed, breaks = c(74,96,111,130,157,500), labels = c("cat1", "cat2", "cat3", "cat4", "cat5"))
# Temperature Graph
ggplot(data=temps, aes(x=year, y=diff)) +
  geom_line(aes(y=diff), alpha=0.5) +
  geom_line(aes(y=smoothdiff)) +
  ggtitle("Increasing Trend of Average Global temperature") +
  ylab("Celsius variation to baseline") +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  theme(plot.title = element_text(size=16))
  
# Collate number of each category storm by year and calculate variance in wind speed for each year
storms <- hurricanes %>% 
  group_by(year) %>% 
  summarize(total=n(),variance=var(windspeed), cat1=sum(category=="cat1"), cat2=sum(category=="cat2"), cat3=sum(category=="cat3"), cat4=sum(category=="cat4"), cat5=sum(category=="cat5"))

# Graph each category of storm by year.
ggplot(storms, aes(x=year)) +
  geom_point(aes(y=log(cat1+1,10)), col='yellow', alpha=0.5) +
  geom_point(aes(y=log(cat2+1,10)), col='green', alpha=0.5) +
  geom_point(aes(y=log(cat3+1,10)), col='blue', alpha=0.5) +
  geom_point(aes(y=log(cat4+1,10)), col='purple', alpha=0.5) +
  geom_point(aes(y=log(cat5+1,10)), col='red', alpha=0.5) +
  geom_smooth(aes(y=log(cat1+1,10)), col='yellow') +
  geom_smooth(aes(y=log(cat2+1,10)), col='green') +
  geom_smooth(aes(y=log(cat3+1,10)), col='blue') +
  geom_smooth(aes(y=log(cat4+1,10)), col='purple') +
  geom_smooth(aes(y=log(cat5+1,10)), col='red') +
  ggtitle("Number of Hurricanes of each Category per Year") + 
  ylab("Log of the number of hurricanes") +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  theme(plot.title = element_text(size=16))

# Graphing variance of windspeed
ggplot(storms, aes(year, variance)) +
  geom_point() +
  geom_smooth() +
  ggtitle("Variance in Hurricane Strength per Year") + 
  ylab("Variance in Wind Speed") +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  theme(plot.title = element_text(size=16))