Smoothing
Knudson
Text: Time Series Analysis
http://www.stat.pitt.edu/stoffer/tsa4/tsa4.pdf
BASIC MOVING AVERAGES
Professor Knudson had us load in some data from Canvas. She had done some work with the data, and we were able to follow along as she talked us through what was going on. I am a little confused on some of the stuff we did, but I will do my best to explain. # Let’s “filter” a time series by looking at the moving averages
y <- rnorm(500,0,1) # 500 N(0,1) variates
v <- filter(y, sides=2, filter=rep(1/3,3)) # moving average
par(mfrow=c(2,1))
plot.ts(y, main="white noise")
plot.ts(v, ylim=c(-3,3), main="moving average")
This is loading in the data and plotting it. The white noise shows allof the data points and is pretty noise. The moving average averages out two points and so there are not as many points plotted.
#install.packages('astsa')
library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
plot(globtemp, type="o", ylab="Global Temperature Deviations")
out <- filter(globtemp, sides=2, filter=rep(1/3,3)) # moving average
par(mfrow=c(2,1))
plot.ts(globtemp, main="unsmoothed temp")
#plot.ts(out, , main="smoothed temp")
We do the same thing for this temperature graph.
plot(jj, type="o", ylab="Quarterly Earnings per Share")
(wgts <- c(.5, rep(1,11), .5)/12)
## [1] 0.04166667 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
## [7] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
## [13] 0.04166667
We can make weights to make the more important points weighted heavier.
soi[4:16]%*%wgts
## [,1]
## [1,] 0.1277083
soif <- filter(soi, sides=2, filter=wgts)
all.equal(as.numeric(soif[10]) , as.numeric(soi[4:16]%*%wgts))
## [1] TRUE
first is numeric, second is a matrix… so we do as.numeric
plot(soi)
lines(soif, lwd=2, col=4)
par(fig = c(.65, 1, .65, 1), new = TRUE) # the insert
nwgts = c(rep(0,20), wgts, rep(0,20))
Black is the unsmoothed data blue represents the smoothed
The little plot at the top shows the weights
plot(nwgts, type="l", ylim = c(-.02,.1), xaxt='n', yaxt='n', ann=FALSE)
plot(soi)
lines(ksmooth(time(soi), soi, "normal", bandwidth = 3), lwd=2, col=4)
It might make more sense to use n.points. n.points is the number of points at which to evaluate the fit. So if we put 10, that means that we will look at the 10 nearest points and create a new regression on those points. Then you can add all these regressions together and get a new model.
#par(fig = c(.65, 1, .65, 1), new = TRUE)
gauss <- function(x) { 1/sqrt(2*pi) * exp(-(x^2)/2) }
x <- seq(from = -3, to = 3, by = 0.001)
plot(x, gauss(x), type ="l", ylim=c(-.02,.45), xaxt='n', yaxt='n', ann=FALSE)
plot(soi)
lines(lowess(soi), lty=2, lwd=2, col=2)
We do not want this blue line to be too squiggly nor too smooth. Pick a bandwidth that follows the data pretty well without losing too much information. This works by taking the closest 3 points (or whatever the bandwidth is) and makes a new regression. You can change this number to make the function take in 5, 10 etc points and make a new regression on them. But then the blue line gets too smooth. This is not good because then you lose information from the dataset.
If the bandwidth is too ssmall (like 1), then the blue line is super bumpy and you cannot really use the result. You need to pick a bandwidth in between too bumpy.
plot(soi)
lines(ksmooth(time(soi), soi, "normal", bandwidth = 3), lwd=2, col=4)
lines(lowess(soi, f=.05), lwd=2, col=4)
plot(soi) lines(smooth.spline(time(soi), soi, spar=.5), lwd=2, col=4) lines(smooth.spline(time(soi), soi, spar= 1), lty=2, lwd=2, col=2)