In class on Thursday, we discussed various types of time series smoothing techniques. Smoothing plays an important part in time series analysis becuase it allows us to see the overall trend of the data without getting caught up in all of the noise and in time series data there is a lot of noise.
The general idea of all smoothing techniques is to average some number of the nearest data values surrounding a certain point to get the smoothed data point for that point. The number of nearest data values and how exactly these data values are averaged are up to interpretation for each method of smoothing.
To illustrate these smoothing methods and how they differ, I will be using the globtemp data set from the R package astsa. This data set gives the global mean land-ocean temperature deviations, measured in centigrade, for 1880 to 2015. This data set will be used to illustrate an example of each smoothing method so that we can see how they compare.
Before we start smoothing our time series, we should see what the data looks like with the noise. We can plot the data set simply be using \(\texttt{plot(dataset)}\).
library(astsa)
## Warning: package 'astsa' was built under R version 3.3.2
data("globtemp")
plot(globtemp)
We can see that there’s a noticable trend in the data set, but it’s kind of obscurred by the noise that is also visible. So, let’s move on to the smoothing techniques. The very first one is the simplest and it is called moving averages.
To smooth the data set using moving averages, all we have to do is use the \(\texttt{filter(dataset, sides, filter)}\) command where the sides argument is going to be either 1 or 2, depending on if you want the average to be of only past values in the time series or both past and future values. Then, the filter argument gives the weight for each of the data values to average as well as the number of data values you want to average. So, in the following example, we will use both past and future data values and take the average of the data point itself and the data points on either side of the main point, giving all of them equal weight. Then, we’ll plot the two time series side by side to see the difference that our smoothing technique made.
filtered <- filter(globtemp, sides = 2, filter = rep(1, 3))
par(mfrow = c(2, 1))
plot(globtemp, main = "Unfiltered")
plot(filtered, main = "Filtered")
As you can see, using the moving average method gives us a much smoother time series and, as long as we don’t smooth the time series too much, we will still be able to see the defining characteristics of the time series, only without the noise.
This is similar to our last example, but, like I hinted at, we can specify different weights for the averaging technique. Let’s say we want the weight of our main data point to be twice that of the two points that surround it. We can create the weights and then filter the values like this:
weights <- c(1, 2, 1)/3
filteredweights <- filter(globtemp, sides = 2, filter = weights)
par(mfrow = c(2, 1))
plot(globtemp, main = "Unfiltered")
plot(filteredweights, main = "Filtered with Weights")
The new time series plot that we obtain has a similar effect to our first smoothed time series plot.
For this method, we can use the \(\texttt{ksmooth(x, y, kernel, bandwidth)}\) function. The x and y indicate what our x and y values are and we can find those in our data set. the kernel can be specified as either “box” or “normal”, but we are using a normal kernel smoother so our kernel will be “normal”. That means that points closer to our main point will have higher weights. Then, the bandwidth indicates how many points on either side of each data point to take into account for the smoothing. We will use a bandwidth of 2 which indicates that we want to use 2 points on either side of our main data point. Once we create this smoothed time series, we can plot it on top of our original time series.
smooth <- ksmooth(time(globtemp), globtemp, kernel = "normal", bandwidth = 2)
plot(globtemp)
lines(smooth, lwd=2, col=3)
Lowess smoothing is different from others we have looked at in that it uses nearest neighbors to smooth the time series. It is similar to a localized regression. We can do Lowess smoothing simply by using the \(\texttt{lowess(dataset, f)}\) command, where f is our span. Then, we can plot this smoothed line over our original time series like we did in the kernel smoothing section.
low <- lowess(globtemp, f=.04)
plot(globtemp)
lines(low, lwd=2, col=5)
Finally, we arrive at our last smoothing technique, spline smoothing. In spline smoothing, our x-axis is broken up into small intervals and the data from each interval is used to smooth the time series data. We can do spline smoothing using the \(\texttt{smooth.spline(x, y, spar)}\) command. x and y are our x and y values from our data set and spar is our smoothing parameter. We’ll use a smoothing parameter of .2 for this example. We can then plot this smoothed line on top of our time series plot, like above.
spline <- smooth.spline(time(globtemp), globtemp, spar = .2)
plot(globtemp)
lines(spline, lwd=2, col = 7)
These smoothing techniques are a very relevant part of our course because they are integral to time series analyses. Smoothing of the time series data makes it easier to determine overall trends in the data and determine whether there is any seasonal or cyclical trend present. As we move forward with time series analyses, I am sure that these smoothing techniques will prove to be a useful visualization tool.