In this learning log, we will reviewing methods for smoothing time series data sets. Primarily, we will be focusing on four smoothing methods: moving average smoother, weighted average smoother, normal kernel smoother, and a lowess smoother.
We will be using the “globtemp” data set from the “astsa” library. This data set measures the mean land-ocean temperature deviations in degrees celcius from 1880-2015.
library(astsa)
## Warning: package 'astsa' was built under R version 3.3.3
plot(globtemp, main="Temperature Deviations Over Time")
The plot above shows how the deviation in temperature can vary from year to year, resulting in a jagged plot over time. To help use produce a more accurate and consistent regression, we must start by smoothing data.
The moving average smoother smoothes the data set by averaging each data point with the values on either side of it (3 data points total). To do this, we will use the filter command to average each data. The sides=2 argument tells R that if we want to average each point using the measure directly before and the measure directly after our point. Then we can give each of these three points 1/3 weight using the filter argument. We can then see our smoothed time series using the plot.ts function. When we compare this time series to our raw data, we can see how the data is much less rigid.
par(mfrow=c(2,1))
MA <- filter(globtemp, sides=2, filter=rep(1/3,3))
plot.ts(MA, main="Moving Average Smoother", col = "maroon", lwd=2)
plot.ts(globtemp, main="Unsmoothed Data")
So that was pretty simple, so let’s explore some more ellaborate smoothing techniques.
Instead of only using the nearest two data points to smooth our data, we can use many any number of ordered data points with different weights to produce a meanful smoother.
Using our globtemp data set, we could use the 4 data points on either side of our data point to with weights 0.05, 0.05, 0.1, 0.15 as we move closer to our data point. We will define this as wgts and then use this as our weighted average filter.
(wgts <- c(0.05, 0.05, 0.1, 0.15, 0.3, 0.15, 0.1, 0.05, 0.05))
## [1] 0.05 0.05 0.10 0.15 0.30 0.15 0.10 0.05 0.05
WAtemp <- filter(globtemp, sides=2, filter=wgts)
Now we can plot our weighted average smoother in orange against the original data
plot(globtemp, main="Weighted Average Smoother")
lines(WAtemp, lwd=2, col="orange")
We can get even fancier by using the last three lines of code to input the weights distribution in the top right hand corner.
plot(globtemp, main="Weighted Average Smoother")
lines(WAtemp, lwd=2, col="orange")
par(fig = c(.6, 1, .6, 1), new = TRUE)
nwgts = c(rep(0,10), wgts, rep(0,10))
plot(nwgts, type="l", ylim = c(-.02,.1), xaxt='n', yaxt='n', ann=FALSE)
We can see the orange line reduces some of the “jaggedness” in our time series, and we can play with the weights to find the optimal smoothing structure for our data.
Another type of weighted average smoother is a Normal Kernel Smoother for which the weights of nearby data points are weighted using a kernel function, notably from the normal distribution. This type of smoother is effective because we do not need to abritrailily pick the weights for nearby data points. The ksmooth function is very easy to use with inituitive arguments of time, response, kernel type, and bandwidth. Since we are showing a normal kernel smoother, our kernel type will be “normal” and we will increase bandwidth to 2 to widen our standard normal distribution for additional smoothing.
plot(globtemp, main="Normal Kernel Smoothed Temp")
lines(ksmooth(time(globtemp), globtemp, kernel= "normal", bandwidth = 2), lwd=2, col="navy")
par(fig = c(.60, 1, .60, 1), new = TRUE)
gauss <- function(x) { 1/sqrt(2*pi) * exp(-(x^2)/2) }
x <- seq(from = -3, to = 3, by = 0.001)
plot(x, gauss(x), type ="l", ylim=c(-.02,.45), xaxt='n', yaxt='n', ann=FALSE)
We notice in the top right corner our distribution chart reflects the normal distribution now.
Our final smoothing technique will be a Lowess smoother. The Lowess smoothing technique uses the concept of nearest neighbors to smooth the data. This means that R uses subets of the data around each point to create mini regressions and then creates a smoothed value for using the subseted regression equation. We will use the lowess function with the f= argument to set the size of each subsetted regression of nearest neighbors for each data point. It is often the case that a higher f= value leads to greater smoothing. The default for f= is 2/3.
plot(globtemp, main="Lowess Smoothed Temp")
par(mfrow=c(1,1))
lines(lowess(globtemp, f=.75), lwd=2, col="green")
We can see that the high f= value has created a VERY smoothed line. Let’s try reducing f= from 0.75 to 0.15.
plot(globtemp, main="Lowess Smoothed Temp")
par(mfrow=c(1,1))
lines(lowess(globtemp, f=.15), lwd=2, col="green")
Our new smoother is improved, but still probably too smooth for our data set. Oh well.
Below is the progression of our smoothers.
plot(globtemp, main="All Smoothers")
par(mfrow=c(1,1))
lines(MA, lwd=2, col="maroon")
lines(WAtemp, lwd=2, col="orange")
lines(ksmooth(time(globtemp), globtemp, kernel= "normal", bandwidth = 2), lwd=2, col="navy")
lines(lowess(globtemp, f=.15), lwd=2, col="green")