Smoothing a Time Series

On Thursday we began talking about time series smoothing. Smoothing can be helpful when you have data points with lots of variability and you want to see a trend more clearly. In general, when we smooth a time series, we are calculating the weighted average of each data point. We learned about four main types of smoothers: weighted average, normal kernel smoother, lowess smoother, and splines.

Our Example

We will use the global temperatures dataset as our example. First, we create a line graph to look at our data:

library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
plot(globtemp, type="l", ylab="Global Temperature Deviations")

From this plot, we can see that the data varies a lot and is very choppy. Although we can see an outline of our overall trend, we will use smoothing to make our data points a bit more consistent.

Weighted Average

We will begin smoothing our data by using the weighted average method. We do this by using the filter command, which will take a weighted average of each data point. We then create a plot of this averaged data vs. the unsmoothed data.

out <- filter(globtemp, sides=2, filter=rep(1/3,3)) 
par(mfrow=c(2,1))
plot.ts(globtemp, main="Unsmoothed Temp")
plot.ts(out, , main="Smoothed Temp")

As we can see, our plotted data became a lot smoother after filtering our data and using weighted averages, but this method only creates a weighted average of each point and the points immediately to the left and right of it. To create a weighted average with points further in either direction, we will use the normal kernel smoother method.

Normal Kernel Smoother

plot(globtemp)
lines(ksmooth(time(globtemp), globtemp, "normal", bandwidth=2), lwd=2, col=4)

If we look at the blue line, we can see that it is more smoothed than our original plotted data, which is the black line. When we use the ksmooth function, we can specify how many surrounding points we want in our weighted average using the bandwidth argument. If we want to include more, we increase the bandwidth, as seen below:

plot(globtemp)
lines(ksmooth(time(globtemp), globtemp, "normal", bandwidth=3), lwd=2, col=4)

Now the line is even smoother because we took into account datapoints 3 away from each data point versus 2 like we did before.

Lowess Smoother

Lastly, we will use the lowess smoothing method, which uses nearest neighbors to smooth the trend line.

plot(globtemp)
lines(lowess(globtemp), lty=2, lwd=2, col=2) 

In this case, we oversmoothed a bit. This is because we used the default span. We can change this by adding a span specifier:

plot(globtemp)
lines(lowess(globtemp, f=.05), lwd=2, col=4) 

Now that we changed our span, this looks better; we are not oversmoothing anymore.

We did not yet see an in-class example of the splines smoother.

Summary

To summarize, we use smoothing to see overall trends in our datasets. It helps with visualizing our data and is useful for time series analysis.