The Data

There is broad global scientific consensus that average global temperatures across the Earth are increasing, and at an increasing rate. In 2020, this has been topical with an especially severe bushfire season, for which some of the key causes are discussed here.

The Australian Bureau of Meteorology generously provides access to The Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT) dataset, which summarises over 100 years of daily minimum and maximum temperatures from 112 weather stations across Australia. These temperatures are adjusted & homogenised to control for biasing factors that can arise over time. The BOM website has further information on these methodologies

Data set-up

The data comprising the ACORN-SAT can be downloaded from this link, which provides 2 x 112 .csv files (minimum & maximum daily temperatures from 112 weather stations

Once the files have been downloaded, we can extract the data from two folders (acorn_sat_v2_daily_tmax & acorn_sat_v2_daily_tmin)

We will start with the ‘maximum’ temperature folder, and bind the 112 .csv’s into a single data-frame. This can be achieved via a nifty function created by Stack Overflow user leerssej (details of which can be found here)

Lets load in all required dependencies …

library(tidyverse)
library(purrr)
library(data.table)
library(kableExtra)
library(ggplot2)
library(plotly)
library(sf)
library(mapview)

Write leerssej’s “.csv reading” function …

# function to read in multiple csv's, plus the file-name 
read_plus <- function(flnm) {
  read_csv(flnm) %>% 
    mutate(filename = flnm)
}

Then build our dataframe from the 112 max-temperature files

tbl <-
  list.files(path = "directory of max temp folder goes here",
             pattern = "*.csv", 
             full.names = T) %>% 
  map_df(~read_plus(.)) 

One of the columns summarises the file-location & file-name. We can use regex & tidyverse tools to extract out each weather station location from the file-path name

# Extract part of the string after "//"
tbl$location <- sub('.*//', '', tbl$filename)
# Extract only the location information 
tbl$location2 <- sapply(tbl$location, function(x) unlist(strsplit(x, "\\."))[2])
# remove .csv reference
tbl$location <- NULL
tbl$filename <- NULL

# extract out location names 
location_names <- subset(tbl, is.na(date)) %>%
                  filter(location2 != "csv") 
  
location_names <- select(location_names, `site name`, location2)

# remove NA rows 
tbl <- tbl %>% drop_na(date)
# remove unnecessary columns
tbl$`site number` <- NULL
tbl$`site name` <- NULL
tbl$lat <- NULL
tbl$long <- NULL
tbl$elev <- NULL
# join back 
tbl <- tbl %>% 
  left_join(location_names, by = "location2")
# Re-order Columns
tbl <- tbl[,c(1, 4, 2, 3)]

Producing the below dataframe

location2 site name date maximum temperature (degC)
001019 KALUMBURU 1941-09-01 29.8
001019 KALUMBURU 1941-09-02 29.8
001019 KALUMBURU 1941-09-03 29.3
001019 KALUMBURU 1941-09-04 37.1
001019 KALUMBURU 1941-09-05 30.9
001019 KALUMBURU 1941-09-06 NA

Additional date fields

For visualisation purposes, let’s add some further date dimensions

# Year from date
tbl$Year <- year(tbl$date)

# Month from date 
library(anytime)
tbl$Month <- format(anydate(tbl$date), "%m")

# Week from date 
library(lubridate)
tbl$Week <- isoweek(ymd(tbl$date))

# Day of the week
# tbl$DayofWeek <- weekdays(as.Date(tbl$Date))

# Season
tbl$Season[
  tbl$Month == "12" |
    tbl$Month == "01" |
    tbl$Month == "02"] <-
  "Summer"

tbl$Season[
  tbl$Month == "03" |
    tbl$Month == "04" |
    tbl$Month == "05"] <-
  "Autumn"

tbl$Season[
  tbl$Month == "06" |
    tbl$Month == "07" |
    tbl$Month == "08"] <-
  "Winter"

tbl$Season[
  tbl$Month == "09" |
    tbl$Month == "10" |
    tbl$Month == "11"] <-
  "Spring"

# Year-Month
setDT(tbl)[, Yr_Month := format(as.Date(date), "%Y-%m") ]

# Year-Week (created)
tbl$Yr_Week <- paste(tbl$Year, tbl$Week, sep="-")

tbl$Era <- ifelse(tbl$Year>=1910 & tbl$Year<=1919,"1910s",
           ifelse(tbl$Year>=1920 & tbl$Year<=1929,"1920s",
           ifelse(tbl$Year>=1930 & tbl$Year<=1939,"1930s",
           ifelse(tbl$Year>=1940 & tbl$Year<=1949,"1940s",
           ifelse(tbl$Year>=1950 & tbl$Year<=1959,"1950s",
           ifelse(tbl$Year>=1960 & tbl$Year<=1969,"1960s",
           ifelse(tbl$Year>=1970 & tbl$Year<=1979,"1970s",
           ifelse(tbl$Year>=1980 & tbl$Year<=1989,"1980s",
           ifelse(tbl$Year>=1990 & tbl$Year<=1999,"1990s",
           ifelse(tbl$Year>=2000 & tbl$Year<=2009,"2000s",
           ifelse(tbl$Year>=2010 & tbl$Year<=2019,"2010s", 0)))))))))))

Let’s preview this revised dataframe

location2 site name date maximum temperature (degC) Year Month Week Season Yr_Month Yr_Week Era
001019 KALUMBURU 1941-09-01 29.8 1941 09 36 Spring 1941-09 1941-36 1940s
001019 KALUMBURU 1941-09-02 29.8 1941 09 36 Spring 1941-09 1941-36 1940s
001019 KALUMBURU 1941-09-03 29.3 1941 09 36 Spring 1941-09 1941-36 1940s
001019 KALUMBURU 1941-09-04 37.1 1941 09 36 Spring 1941-09 1941-36 1940s
001019 KALUMBURU 1941-09-05 30.9 1941 09 36 Spring 1941-09 1941-36 1940s
001019 KALUMBURU 1941-09-06 NA 1941 09 36 Spring 1941-09 1941-36 1940s

Weather Station coordinates

The zip file includes a .txt/.csv file with coordinates for the weather stations. Let’s read in this file & join to our dataframe

coordinates$location2 <- NULL
tbl <- tbl %>% 
  left_join(coordinates, by = c("site name" = "site.name"))

Using the “mapview” package, we can visualise the location of the 112 weather stations in Australia. A benefit of this package is unlike Google’s map package, it can be used without an API key

# create a map of all weather stations
weather_stations <- st_as_sf(coordinates, coords = c("long", "lat"), crs = 4326)

Daily minimum temperatures

Using the same approach implemented for maximum temperatures, we can read-in the minimum temperature files, & join the data to our existing dataframe

tbl_min <-
  list.files(path = "directory of max temp folder goes here",
             pattern = "*.csv", 
             full.names = T) %>% 
  map_df(~read_plus(.)) 
# this takes every part of the string after "//"
tbl_min$location <- sub('.*//', '', tbl_min$filename)
# now take only the location information 
tbl_min$location2 <- sapply(tbl_min$location, function(x) unlist(strsplit(x, "\\."))[2])
# remove .csv reference
tbl_min$location <- NULL
tbl_min$filename <- NULL

# remove these NA columns 
tbl_min <- tbl_min %>% drop_na(date)
# remove not needed columns
tbl_min$`site number` <- NULL
tbl_min$`site name` <- NULL
# join min temps to core-table 
tbl <- left_join(tbl, tbl_min, 
                 by = c("location2" = "location2", "date" = "date"))
# remove table
rm(tbl_min)

# re-order columns
tbl <- tbl[,c(1:3,15, 4:14)]

This results in a dataframe with ~ 3.5 million rows of minimum and maximum temperature data from 112 Australian weather stations since the beginning of the 20th century. With this, let’s begin some trend-analyses

location2 site name date minimum temperature (degC) maximum temperature (degC) Year Month Week Season Yr_Month Yr_Week Era lat long elev
001019 KALUMBURU 1941-09-01 20.6 29.8 1941 09 36 Spring 1941-09 1941-36 1940s -14.3 126.65 23
001019 KALUMBURU 1941-09-02 20.6 29.8 1941 09 36 Spring 1941-09 1941-36 1940s -14.3 126.65 23
001019 KALUMBURU 1941-09-03 19.6 29.3 1941 09 36 Spring 1941-09 1941-36 1940s -14.3 126.65 23
001019 KALUMBURU 1941-09-04 21.2 37.1 1941 09 36 Spring 1941-09 1941-36 1940s -14.3 126.65 23
001019 KALUMBURU 1941-09-05 19.8 30.9 1941 09 36 Spring 1941-09 1941-36 1940s -14.3 126.65 23
001019 KALUMBURU 1941-09-06 19.8 NA 1941 09 36 Spring 1941-09 1941-36 1940s -14.3 126.65 23

Overall change in minimum & maximum daily temperatures over time

We can efficiently summarise very large data-sets by combining tidyverse functions such as “group_by” with the visualisation capabilities of ggplot2. Below are two line charts depicting change to yearly-average minimum & maximum temperatures in Australia

# Minimalistic theme for visualisation 
theme_plot_text <-
  theme(
    plot.title = element_text(size = 10, hjust = 0.5),
    axis.title = element_text(size = 10),
    legend.title = element_text(size = 10),
    axis.text = element_text(size = 7))

# Min temperature visualisation 
temp.min <- tbl %>% 
  group_by(Year) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmin = mean(`minimum temperature (degC)`, na.rm = TRUE)) %>% 
  ggplot(
    aes(x = Year, 
        y = avgmin,
        colour = avgmin,
        group = 1)) +
  geom_point() +
  geom_line() +
  geom_smooth(method = "loess", 
              colour = "#44546A", 
              linetype = "dashed", 
              fill = "#FFEDB4",
              size = 0.5) +
  scale_colour_gradient2(low = "#2BAAED", mid = "#8CD04F" , high = "#FFC52B",
                         midpoint = 13) +
  labs(title = "Figure 1. Average of daily minimum temperature across Australia between 1911 & 2018") + 
  theme_bw() +
  theme(panel.border = element_blank(),
        axis.line = element_line(colour = "black", size = 0.2),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(size = 6, hjust = 0.5),
        legend.text = element_text(size = 6)) +
  theme_plot_text +
  labs(color = '°C', size = 6) 

# Max temperature visualisation 
temp.max <- tbl %>% 
  group_by(Year) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmax = mean(`maximum temperature (degC)`, na.rm = TRUE)) %>% 
  ggplot(
    aes(x = Year, 
        y = avgmax,
        colour = avgmax,
        group = 1)) +
  geom_point() +
  geom_line() +
  geom_smooth(method = "loess", 
              colour = "#80190B", 
              linetype = "dashed", 
              fill = "#FFEDB4",
              size = 0.5) +
  scale_colour_gradient2(low = "#E1B823", mid = "#DD8033" , high = "#C95343",
                         midpoint = 25) +
  labs(title = "Figure 2. Average of daily maximum temperature across Australia between 1911 & 2018") + 
  theme_bw() +
  theme(panel.border = element_blank(),
        axis.line = element_line(colour = "black", size = 0.2),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(size = 6, hjust = 0.5),
        legend.text = element_text(size = 6)) +
  theme_plot_text +
  labs(color = '°C', size = 6) 

These Figures show that across Australia, the lowest yearly-average minimum & maximum temperatures occurred between the 1940s & 1960s, with steady increases in temperatures occurring up until the present. Of particular note is that increasing temperatures begin to steepen from the 1990s onwards

Seasonal Temperatures

Clearly, temperatures across Australia are rising. What may be interesting to explore is whether these effects vary by season. Lets create summary tables of decade-average minimum & maximum temperatures for each of the four seasons:

# MINIMUM
tbl.min.season <- tbl %>% 
  group_by(Era, Season) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmin = mean(`minimum temperature (degC)`, na.rm = TRUE)) %>% 
  dcast(Season ~ Era) %>% 
  mutate(Change = `2010s` - `1910s`) %>%
  mutate(Percent_change = (Change / `2010s`)*100)
# round to 1 decimal place
tbl.min.season[2:14]=round(tbl.min.season[2:14],1) 

# MAXIMUM
tbl.max.season <- tbl %>% 
  group_by(Era, Season) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmax = mean(`maximum temperature (degC)`, na.rm = TRUE)) %>% 
  dcast(Season ~ Era) %>% 
  mutate(Change = `2010s` - `1910s`) %>%
  mutate(Percent_change = (Change / `2010s`)*100)
# round to 1 decimal place
tbl.max.season[2:14]=round(tbl.max.season[2:14],1) 

Somewhat arbitrarily I have calculated the % change to min/max temperatures from decade to 1910 to 2010, at the exclusion of all decades in between. This approach shows all four seasons demonstrated increased temperatures, with slightly greater % increases observed in Autumn & Spring

Season 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Change Percent_change
Autumn 13.1 12.9 13.1 12.7 13.3 13.3 13.5 14.1 13.9 13.9 14.3 1.3 8.9
Spring 12.2 11.9 11.9 11.9 11.9 12.2 12.4 12.8 12.9 13.3 13.4 1.2 9.0
Summer 17.4 17.1 17.1 17.2 17.2 17.6 18.0 18.1 18.3 18.5 18.9 1.5 8.0
Winter 7.5 7.1 7.3 7.1 7.4 7.3 7.4 7.7 8.1 7.9 8.1 0.6 7.8

Looking at change in decade-average maximum temperatures, there is remarkable consistency in the increase from the 1910s to the 2010s for summer, autumn & winter (4.6% increase). Interestingly, spring shows a greater % increase (5.5%) in maximum temperatures, perhaps lending credibility to the idea of “longer summers”

Season 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Change Percent_change
Autumn 24.8 24.8 24.9 24.7 25.0 24.9 25.1 25.5 25.5 25.9 26.0 1.2 4.6
Spring 25.1 25.0 24.8 24.9 24.8 25.1 24.9 25.6 25.5 26.3 26.5 1.5 5.5
Summer 29.7 29.5 29.7 29.6 29.7 29.9 30.1 30.4 30.3 30.7 31.1 1.4 4.6
Winter 19.0 18.7 18.5 19.0 18.7 18.8 19.2 19.1 19.5 19.8 19.9 0.9 4.6

Let’s visualise seasonal change over time:

# min temperature Season visualisation  
season.min <- tbl %>% 
  group_by(Year, Season) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmin = mean(`minimum temperature (degC)`, na.rm = TRUE)) %>% 
  ggplot(
    aes(x = Year, 
        y = avgmin,
        colour = Season,
        group = Season)) +
  geom_point() +
  geom_line() +
  geom_smooth(method = "loess", 
              colour = "#80190B", 
              linetype = "dashed", 
              fill = "#FFEDB4",
              size = 0.5) +
  scale_colour_manual(values=c("#DD8033", "#43854C", "#C95343", "#77C4DF")) +
  labs(title = "Figure 3. Average of daily minimum temperature across seasons between 1911 & 2018") + 
  theme_bw() +
  theme(panel.border = element_blank(),
        axis.line = element_line(colour = "black", size = 0.2),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(size = 6, hjust = 0.5),
        legend.text = element_text(size = 6)) +
  theme_plot_text +
  labs(color = 'Season', size = 6) 

# Max temperature Season visualisation  
season.max <- tbl %>% 
  group_by(Year, Season) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmax = mean(`maximum temperature (degC)`, na.rm = TRUE)) %>% 
  ggplot(
    aes(x = Year, 
        y = avgmax,
        colour = Season,
        group = Season)) +
  geom_point() +
  geom_line() +
  geom_smooth(method = "loess", 
              colour = "#80190B", 
              linetype = "dashed", 
              fill = "#FFEDB4",
              size = 0.5) +
  scale_colour_manual(values=c("#DD8033", "#43854C", "#C95343", "#77C4DF")) +
  labs(title = "Figure 4. Average of daily maximum temperature across seasons between 1911 & 2018") + 
  theme_bw() +
  theme(panel.border = element_blank(),
        axis.line = element_line(colour = "black", size = 0.2),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(size = 6, hjust = 0.5),
        legend.text = element_text(size = 6)) +
  theme_plot_text +
  labs(color = 'Season', size = 6) 

Temperature changes at different weather stations

We can also look at location-based differences in rising temperatures. Let’s create tables for decade-average minimum and maximum temepratures, by location.

max.loc <- tbl %>% 
  group_by(Era, `site name`) %>% 
  filter(Year != 2019) %>% 
  summarise(avgmax = mean(`maximum temperature (degC)`, na.rm = TRUE),
            avgmin = mean(`minimum temperature (degC)`, na.rm = TRUE)) 

max.era <- max.loc %>% 
  subset(select = -c(4)) %>% 
  dcast(`site name` ~ Era) %>% 
  mutate(Change = `2010s` - `1910s`) %>%
  mutate(Percent_change = (Change / `2010s`)*100) %>% 
  arrange(desc(Percent_change)) %>%
  top_n(20)
# round to 1 decimal place
max.era[2:14]=round(max.era[2:14],1) 

min.era <- max.loc %>% 
  subset(select = -c(3)) %>% 
  dcast(`site name` ~ Era) %>% 
  mutate(Change = `2010s` - `1910s`) %>%
  mutate(Percent_change = (Change / `2010s`)*100) %>% 
  arrange(desc(Percent_change)) %>%
  top_n(20)
# round to 1 decimal place
min.era[2:14]=round(min.era[2:14],1) 

Examining the table below, each of the top-five weather stations in terms of increased minimum temperatures occupy south-east Australia. Four of these five locations are below national averages for minimum temperatures

site name 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Change Percent_change
CANBERRA AIRPORT 4.5 4.2 5.1 4.8 5.3 5.3 5.6 6.0 6.0 6.5 6.6 2.0 30.9
INVERELL (RAGLAN ST) 5.9 5.5 5.7 5.5 6.5 6.3 6.8 7.3 7.3 7.3 8.1 2.2 27.7
WAGGA WAGGA AMO 7.5 7.0 7.6 7.3 7.5 8.0 8.4 8.5 8.7 9.2 9.6 2.1 22.0
RUTHERGLEN RESEARCH 5.8 5.4 5.4 5.3 5.4 5.7 6.5 6.5 6.5 6.8 7.2 1.4 19.3
CHARLEVILLE AERO 11.6 11.8 11.8 11.9 12.3 12.4 12.3 13.0 13.1 13.9 14.2 2.6 18.2
LAUNCESTON AIRPORT 5.2 5.2 5.2 4.9 5.0 5.3 5.8 6.0 5.9 6.2 6.3 1.1 17.7

Regarding average maximum temperatures, curiously, five of the top six weather stations are located at airports. I have no idea what might be responsible for this, particularly considering the normalisation that goes into this dataset

site name 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s Change Percent_change
CANBERRA AIRPORT 19.1 19.0 19.5 19.4 19.1 19.2 19.4 19.7 19.7 20.9 21.1 2.0 9.5
EUCLA 21.7 22.0 22.1 21.9 21.8 22.2 22.1 22.4 22.8 23.3 23.7 2.0 8.5
PERTH AIRPORT 23.4 23.7 24.0 24.1 23.9 24.1 24.7 24.7 24.8 25.0 25.5 2.1 8.3
ALBANY AIRPORT 19.3 19.3 19.6 19.7 19.7 19.8 19.8 19.9 20.0 20.1 20.9 1.5 7.4
GERALDTON AIRPORT 24.9 25.1 25.0 25.2 24.9 25.1 25.3 25.5 25.7 26.1 26.9 2.0 7.4
ALICE SPRINGS AIRPORT 27.4 27.7 27.7 28.4 28.7 28.8 28.8 29.2 29.7 29.5 29.4 2.0 6.9

One final thing we can do is visualise the location of weather stations demonstrating the largest increase in daily max/min temperature

# join our biggest temp increases to coordinates table
top_max <- inner_join(coordinates, max.era, by = c("site.name" = "site name"))
weather_stations_max <- st_as_sf(top_max, coords = c("long", "lat"), crs = 4326)

Examining maximum temperatures, it appears that southern austraila has been disproportionately impacted with 17 of the 20 highest temperature increasing stations being located below the mid-line of the country

# join our biggest temp increases to coordinates table
top_min <- inner_join(coordinates, min.era, by = c("site.name" = "site name"))
weather_stations_min <- st_as_sf(top_min, coords = c("long", "lat"), crs = 4326)

Examining minimum temperatures, a different trend is at play. Eastern austraila has been disproportionately impacted with 16/20 weather stations to record the largest % change in minimum temperatures existing on the eastern side of the country. Concerningly, but perhaps unsuprisingly, some of the biggest temperature changes by magnitude in Australia have been observed in Northern Victoria/Southern NSW, overlapping with regions impacted by the recent bushfires

Conclusions

As Australia deals with a steadily heating climate, historic temperature data can help inform about the uniformity & magnitude of change. Daily maximum temperatures have increased proportionally more in southern australia, and are most strongly observed around Spring time. Daily minimum temperatures have disproportionately increased on the eastern coast, where increases are slightly more pronounced in Autumn & Spring