Smoothing

Knudson

Text: Time Series Analysis

http://www.stat.pitt.edu/stoffer/tsa4/tsa4.pdf

BASIC MOVING AVERAGES

Professor Knudson had us load in some data from Canvas. She had done some work with the data, and we were able to follow along as she talked us through what was going on. I am a little confused on some of the stuff we did, but I will do my best to explain. # Let’s “filter” a time series by looking at the moving averages

y <- rnorm(500,0,1) # 500 N(0,1) variates
v <- filter(y, sides=2, filter=rep(1/3,3)) # moving average
par(mfrow=c(2,1))
plot.ts(y, main="white noise")
plot.ts(v, ylim=c(-3,3), main="moving average")

This is loading in the data and plotting it. The white noise shows allof the data points and is pretty noise. The moving average averages out two points and so there are not as many points plotted.

Global temp

#install.packages('astsa')
library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
plot(globtemp, type="o", ylab="Global Temperature Deviations")

out <- filter(globtemp, sides=2, filter=rep(1/3,3)) # moving average
par(mfrow=c(2,1))
plot.ts(globtemp, main="unsmoothed temp")
#plot.ts(out, , main="smoothed temp")

We do the same thing for this temperature graph.

Now, what do we need to do to smooth

Johnson and Johnson quarterly earnings?

plot(jj, type="o", ylab="Quarterly Earnings per Share")

It’s time to try a smoother that is slightly more advanced

Page 67,

Monthly measurements of the southern oscillation index (Google it)

SOI measures changes in air pressure.

which warms every 3 to 7 years due El Niño

Let’s filter out the annual temperature cycle

We’ll use these weights.

12 weights bc we will average over 12 months

(wgts <- c(.5, rep(1,11), .5)/12)
##  [1] 0.04166667 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
##  [7] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
## [13] 0.04166667

We can make weights to make the more important points weighted heavier.

To understand what’s going on, calculate the 10th filtered entry.

We need to use soi[4:16] (12 numbers, centered on 10)

soi[4:16]%*%wgts
##           [,1]
## [1,] 0.1277083

Now, let’s let R calculate the rest

soif <- filter(soi, sides=2, filter=wgts)

Compare what we calculated to R’s

all.equal(as.numeric(soif[10]) , as.numeric(soi[4:16]%*%wgts))
## [1] TRUE

first is numeric, second is a matrix… so we do as.numeric

BTW, why wouldn’t this work?

all.equal((soif[10]) , (soi[4:16]%*%wgts))

Plot what we’ve created

plot(soi)
lines(soif, lwd=2, col=4)

par(fig = c(.65, 1, .65, 1), new = TRUE) # the insert
nwgts = c(rep(0,20), wgts, rep(0,20))

Black is the unsmoothed data blue represents the smoothed

The little plot at the top shows the weights

Little inset shows our type of smoother (pretty basic)

plot(nwgts, type="l", ylim = c(-.02,.1), xaxt='n', yaxt='n', ann=FALSE)

We can be fancier in how we weight the points (closer points have higher weight)

Let’s use a NORMAL KERNEL SMOOTHER with ksmooth

plot(soi)
lines(ksmooth(time(soi), soi, "normal", bandwidth = 3), lwd=2, col=4)

It might make more sense to use n.points. n.points is the number of points at which to evaluate the fit. So if we put 10, that means that we will look at the 10 nearest points and create a new regression on those points. Then you can add all these regressions together and get a new model.

the insert

#par(fig = c(.65, 1, .65, 1), new = TRUE) 
gauss <- function(x) { 1/sqrt(2*pi) * exp(-(x^2)/2) }
x <- seq(from = -3, to = 3, by = 0.001)
plot(x, gauss(x), type ="l", ylim=c(-.02,.45), xaxt='n', yaxt='n', ann=FALSE)

LOWESS SMOOTHING uses nearest neighbors

like localized regression

trend (with default span) makes it way too smooth

plot(soi)
lines(lowess(soi), lty=2, lwd=2, col=2) 

We do not want this blue line to be too squiggly nor too smooth. Pick a bandwidth that follows the data pretty well without losing too much information. This works by taking the closest 3 points (or whatever the bandwidth is) and makes a new regression. You can change this number to make the function take in 5, 10 etc points and make a new regression on them. But then the blue line gets too smooth. This is not good because then you lose information from the dataset.

If the bandwidth is too ssmall (like 1), then the blue line is super bumpy and you cannot really use the result. You need to pick a bandwidth in between too bumpy.

plot(soi)
lines(ksmooth(time(soi), soi, "normal", bandwidth = 3), lwd=2, col=4)

let’s try a different span

lines(lowess(soi, f=.05), lwd=2, col=4)

SPLINE SMOOTHING

breaks up x-axis into little intervals

and uses data in that interval

plot(soi) lines(smooth.spline(time(soi), soi, spar=.5), lwd=2, col=4) lines(smooth.spline(time(soi), soi, spar= 1), lty=2, lwd=2, col=2)

Now, practice using these smoothers on the jj and/or globtemp data