Introduction

The data are downloaded from here:

https://wateroffice.ec.gc.ca/report/real_time_e.html?mode=Table&type=&startDate=2018-04-28&endDate=2018-05-05&stn=01AP003&dataType=Real-Time

though it does not appear to produce a downloadable file, so the table has to be manually scraped, and then edited (in Emacs), to add a column between the date and level fields.

Reading the data

We’ll use functionality from the oce and signal packages:

library(oce)
pl <- oce.plot.ts
library(signal)

We read the csv file with:

d <- read.csv('oakpoint.txt', header=FALSE, stringsAsFactors=FALSE)
names(d) <- c('date', 'level')
d$time <- as.POSIXct(d$date)
o <- order(d$time)
d <- d[o, ]

We can plot the data, with a range 4 days beyond the current date:

focus <- range(d$time) + c(0, 4*86400)
with(d, pl(time, level, xlim=focus, ylim=c(min(level), 6.5)))
abline(h=6)

Smoothing and fitting

Polynomial fit

There are a number of approaches for smoothing and fitting the data. Here we’ll try a polynomial fit that can be used to extrapolate and make a prediction:

First, we create a “numeric” time that can be used for the fit:

d$tn <- as.numeric(d$time) - as.numeric(d$time)[1]

Then, we define a new time vector that will be used for the prediction:

ntime <- seq(d$tn[1], as.numeric(focus[2])-as.numeric(focus[1]), 300)
mp <- lm(level ~ tn + I(tn^2) + I(tn^3) + I(tn^4), d)

with(d, pl(time, level, xlim=focus, ylim=range(level)*c(0.9, 1.1)))
lines(ntime+d$time[1], predict(mp, data.frame(tn=ntime)), col=2)
grid()
abline(h=6)

When will it hit 6m? According to the polynomial model (which is clearly not a good model for extrapolating), it will be at 2018-05-06 20:43:09.

Linear fit

A better approach would be to fit the last few days, where the behaviour looks mostly linear, and then extrapolate from there.

start <- as.POSIXct('2018-05-03 12:00:00')
dn <- subset(d, time > start)
m <- lm(level ~ tn, dn)

with(d, pl(time, level, xlim=focus, ylim=range(level)*c(0.9, 1.1)))
lines(ntime+d$time[1], predict(m, newdata=list(tn=ntime)), col=2)
grid()
abline(h=6)

From the linear fit, we get a prediction of hitting the 6m mark at 2018-05-07 07:07:23.

Tides

We can isolate the tidal fluctuations in a number of ways: loess smoothing, spectral filtering, or model fitting. Here I’ll take the residual from the polynomial fit to see the tidal variations:

d$res <- d$level - predict(mp)
with(d, pl(time, res))