Smoothing a time series involves calculating a weighted average basic moving average. This will help us see trends better because it smooths out the large increases or decreases in the data. There are five main methods that we talked about in class:
Moving Average
Weighted-Average
Normal Kernel Smoother
Lowess Smoother
Splines
library(astsa)
plot(globtemp, type="o", ylab="Global Temperature Deviations")
We can see that there are big increases and decreases. It is hard, using this data, to see a clear trend in it.
So, let’s try our smoothing methods.
We’ll first start with a moving average.
To get a moving average, we have to filter the data. The first argument is the dataset. The second argument is sides = 1 or 2. 1 is for only the past data and 2 is for the past and future. The last argument is the filter argument. It will take the point in front of data point, the data point itself and the point after the data point and average the three together.
The par command changes the size of the plots so that we can see the two plots together.
Then we plot both and look at the smoothing difference.
out <- filter(globtemp, sides=2, filter=rep(1/3,3))
par(mfrow=c(2,1))
plot.ts(globtemp, main="unsmoothed temp")
plot.ts(out, main="smoothed temp")
We can see that the data has been smoothed out. It could be better though so let’s try a weighted-average.
Now we are going to give the data point itself more weight than the two around it. The weights will be 1/3, 2/3 and 1/3.
Besides the filter argument, everything stays the same as the moving average. The filter agrument is now going to be the weights.
weights <- c(1,2,1)/3
out <- filter(globtemp, sides=2, filter=rep(1/3,3))
par(mfrow=c(2,1))
plot.ts(globtemp, main="unsmoothed temp")
plot.ts(out, main="smoothed temp")
We can see that there is little difference between the moving average and weighted-average. We can try the other methods to see if those make it any better.
The ksmooth method takes the argument time(data), data, kernel = “normal” and bandwidth of 2 (since we want one data point on either side of the data point itself).
We can plot the data and put the ksmooth line over it to see the difference.
plot(globtemp)
lines(ksmooth(time(globtemp), globtemp, "normal", bandwidth=2), lwd=2, col=2)
This isn’t too bad. It follows the original data pretty well but it could probably be a little smoother.
The lowess method takes two arguments. The first is the data and the second is how smooth you want the line. I picked 0.05 as the amount of smoothness but you can test is and change it however you want to.
plot(globtemp)
lines(lowess(globtemp, f=.05), lwd=2, col=2)
This one looks a little smoother and we could change it if we wanted to. Now let’s look at our last method
The spline smoothing method has three arguments: time(data), data and a smoothing parameter. The smoothing parameter is typically between 0 and 1. I chose 0.25 for my example but you can change it however you want.
plot(globtemp)
lines(smooth.spline(time(globtemp), globtemp, spar = .25), lwd=2, col = 2)
With a spar of 0.25, this looks pretty smooth.
These methods help smooth out the data. We’ll learn more about when to use each one when we start working on time series more.