Sydney and NSW are having a lot of rain this year, thought to be due to a La Nina weather event. But how does this compare to previous years? And could this be influenced by global warming? Is the variability in rain increasing? What does the data say?

To answer this question, I initially looked for river level data in NSW, but river levels over time can be affected by development of the river basin. A cleaner set of data is rainfall data, and on a BOM website I found a set of daily rainfall data for the Sydney Botanic Gardens (next to Circular Quay) that has continuous data back to 1910. The link to the data source is shown below, and the ‘station number’ is 066006.

Data source - http://www.bom.gov.au/climate/data/index.shtml

This data was downloaded from the BOM website on 15 August 2022, and had data up to 1st August 2022. I imported this data into the RStudio data science package. Altogether there were 50,250 observations of daily rainfall in these gardens. This included 201 missing values - I replaced these missing values with the rainfall data for the previous day.

setwd("G:/My Drive/Training and development/R sandpit/Sydney rainfall analysis")
# read file
rainfall <- read.csv(file = 'Daily_rainfall_data_15Oct2022.csv')  # import rainfall data
rainfall %<>% rename(month_num = Month,
                     rainfall_mm = 6)  # rename 2 columns
rainfall %<>% mutate(Month = case_when(
            month_num == 1 ~ 'January',
            month_num == 2 ~ 'February',
            month_num == 3 ~ 'March',
            month_num == 4 ~ 'April',
            month_num == 5 ~ 'May',
            month_num == 6 ~ 'June',
            month_num == 7 ~ 'July',
            month_num == 8 ~ 'August',
            month_num == 9 ~ 'September',
            month_num == 10 ~ 'October',
            month_num == 11 ~ 'November',
            month_num == 12 ~ 'December',
            TRUE ~ 'error'))

rainfall %<>% filter(Year > 1909) # limit to data after 2009 - which is the continuous data
rainfall$Month <- factor(rainfall$Month, levels = c('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November','December'), ordered = TRUE) # create ordered factor variable for month
rainfall$rainfall_mm <- nafill(rainfall$rainfall_mm, type = 'locf')    # replace missing values with the rainfall for the earlier day

#head(rainfall)

Month by month variation

Looking at the data, the first thing I noted is that rainfall is, not surprisingly, affected by the month of the year, as shown below. In this diagram, the error bars show the 95% confidence intervals.

alpha <- 0.05
monthly_mean <- rainfall %>% 
    group_by(Month) %>% 
    summarize(mean = mean(rainfall_mm),
              lower = mean(rainfall_mm) - qt(1- alpha/2, (n() - 1))*sd(rainfall_mm)/sqrt(n()),
              upper = mean(rainfall_mm) + qt(1- alpha/2, (n() - 1))*sd(rainfall_mm)/sqrt(n()))
yearly_mean <- mean(monthly_mean$mean)
monthly_mean$correction = yearly_mean/monthly_mean$mean
month_mean2 <- monthly_mean %>% select(Month, correction)
monthly_mean %>%
  ggplot(aes(x = Month, y = mean)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_errorbar(aes(ymin = lower, ymax = upper), width=.2,
                 position=position_dodge(.9)) + 
  labs(title = "Fig. 1 - Monthly daily rainfall in Sydney Botanic Gardens",
              caption = "Data source: BOM",
              x = "Month", y = "Average daily \nrainfall in mm") +
  theme(axis.title.y=element_text(angle=0, vjust = 0.5, size = 10)) +
  scale_x_discrete(breaks = function(x){x[c(TRUE, FALSE)]})

How does 2022 compare to the next wettest years?

Apart from 2022, which were the next wettest years? The table below shows that the next wettest year was 1950, then 1990. This further suggests that, apart from 2022, there does not appear to be a recent trend towards wetter years.

# https://gt.rstudio.com/articles/intro-creating-gt-tables.html
top_ten <- rainfall_year %>% arrange(desc(mean)) %>% slice(1:10) %>%
       mutate(mean = round(mean, digits = 1)) %>%
       select(-colour)

library(gt)
top_ten %>% gt() %>%
  tab_header(title = "Wettest known years for Sydney Botanical Gardens",
    subtitle = "Average rain per day in mm - Jan to July") %>%
  cols_label(
    mean = "Average daily rainfall, mm"
  )
Wettest known years for Sydney Botanical Gardens
Average rain per day in mm - Jan to July
Year Average daily rainfall, mm
2022 10.1
1950 8.5
1990 7.5
1956 7.0
1913 6.6
1963 6.4
1955 6.2
1989 6.0
1919 5.7
1949 5.7

 

We could also look at the wettest single days, which are shown in the table below:

top_ten_days <- rainfall %>% arrange(desc(rainfall_mm)) %>% slice(1:10) %>%
       mutate(Date = paste0(Day,' ',Month,' ',Year)) %>% 
       select(Date, rainfall_mm)        
       library(gt)
top_ten_days %>% gt() %>%
  tab_header(title = "Wettest known days for Sydney Botanical Gardens") %>%
  cols_label(
    rainfall_mm = "Total rainfall, mm"
  )
Wettest known days for Sydney Botanical Gardens
Date Total rainfall, mm
6 August 1986 340.0
28 March 1942 277.4
9 February 1992 264.0
9 November 1984 248.2
3 February 1990 238.0
1 May 1988 230.0
2 May 1953 226.1
11 March 1975 217.2
1 May 1955 193.0
11 February 1956 191.3

 

None of these ten wettest days have occurred in the last 10 years.

And lastly we could look to see if there has a statistically significant trendline, again with data averaged on a month-by-month basis. The graph below shows the outcome of this analysis. A separate regression analysis showed that the best fit line had a ‘p-value’ of 0.24, which means that we cannot consider this trend to be statistically significant. Note that this analysis was based on corrected mean monthly data, as discussed for the anova analysis.

monthly_sum %<>% mutate(Date = as.Date(paste0(1,'-',Month,'-',Year),format = "%d-%B-%Y"))
reg_model_a <- lm(corrected_mean ~ Date, data = monthly_sum)  # Run regression model for trial data
#summary(reg_model_a)
monthly_sum %>%
  ggplot(aes(x = Date, y = corrected_mean)) +
  geom_point(color = "grey68") +
   theme_bw() +
  geom_smooth(data = monthly_sum, aes(x = Date, y = corrected_mean, color = "deepskyblue3"), formula = y ~ x, method = "lm", fill = "light blue", size = 1.5) +
 labs(title = "Fig 4. - Average daily rainfall in Sydney Botanic Gardens, 1910 to 2022",
              subtitle = "Best fit line is shown, but the slight gradient was not statistically significant. \n
              Data has been averaged on a month by basis basis",
              caption = "Data source: BOM",
              x = "", y = "Average daily \nrainfall in mm") +
  theme(axis.title.y=element_text(angle=0, vjust = 0.5, size = 10)) + 
  theme(legend.position="none") 

 

Variability in rainfall

On a similar basis, we can also investigate whether the variability in rainfall in increasing. In the graph below, I have determined the variability, as defined by standard deviation of the average (corrected) monthly rainfall, for each five year block since 2010:

monthly_sum$fives <- fives[findInterval(monthly_sum$Year, fives)]
alpha <- 0.05
five_sumb <- monthly_sum %>% 
    group_by(fives) %>% 
    summarize(standard_deviation = sd(corrected_mean)) %>%
             mutate(colour = if_else(fives == 2020,'firebrick1','lightskyblue'))

five_sumb %>%
  ggplot(aes(x = fives, y = standard_deviation)) +
  geom_bar(stat = "identity", fill = five_sumb$colour) +
  labs(title = "Fig 6. - Variability in monthly average rainfall in Sydney Botanic Gardens",
              subtitle = "Shows the standard deviation in average rainfall \non a month-by-month basis for each five year period.",
              caption = "Data source: BOM",
              x = "", y = "Standard deviation \n in monthly \nrainfall,mm") +
  theme(axis.title.y=element_text(angle=0, vjust = 0.5, size = 10), 
        plot.title = element_text(size = 12)) + 
   annotate("text", x = 1995, y = 5, label = "Red bar is 2020 onwards")

This graph shows that while the most recent period (2020 to July 2022) has greater variability, there has no been no strong trend prior to this. The high standard deviation for the recent period will be due to the very high rainfall for 2022.

 

Conclusions

In the introduction, I asked:

This data analysis shows that the 2022 is much wetter than any previous year in Sydney - but at this stage, this appears to be a one-off event and not part of a long term trend. There does not appear to be an increase in variability. Regardless, it will be worth watching to see if 2022 is the start of a longer term trend.