Consider the function
\[f(x) = 0.5\frac{sin(x-a)}{x-a}+0.8\frac{sin(x-b)}{x-b}+0.3\frac{sin(x-c)}{x-c}\]
where \(a = 10, b = 50, c = 80\). Generate a dataset of \(N = 1052\) examples where the \(x\) are equally-spaced in \([0.1, 100]\) and the targets for regression are obtained as \(t = f(x) + N(0, 0.05^2)\). Fit standard polynomial regression with some degrees of your choice; then fit kernel ridge regression with the RBF kernel and some \(\sigma\) sigma and lambda of your choice, until you are satisfied with the fit.
Polynomials of degrees 2, 7, 12, 17, 22 were used for Standard polynomial regression.
# defining all the constants
a <- 10
b <- 50
c <- 80
N <- 1052
x <- seq(from=0.1, to=100, by=(100-0.1)/(N-1))
t <- 0.5*(sin(x-a)/(x-a))+0.8*(sin(x-b)/(x-b))+0.3*(sin(x-c)/(x-c)) + rnorm(x,0,0.05)
###########################
## standard (non-ridge) regression
d <- data.frame(x,t)
plot(x,t,type="l", main = "Standard polynomial regression
with different degrees")
# degrees used
poly <- seq(2,22, by=5)
# colors used
color <- c("yellow", "pink", "green", "blue", "red")
# plot several lines of different degree polynomials
for (i in 1:length(poly)) {
points(x, predict(lm (t ~ poly(x,poly[i]), d)), col=color[i], type="l", lwd=2)
}
legend('topright' , paste0("Polynomial ",poly), col=color,lty=c(1,1), lwd=5 ,bty='n', cex=0.75)
As can be seen from the graph above, generally, with the increase in polynomial degree the line tends to fit the data better, but not good enough.
For the Kernel ridge regression method, as the \(\sigma\) and \(\lambda\) parameters have to be tuned, 4 values of \(\sigma\) (0.01, 1, 10, 100) and 3 values of \(\lambda\) (0.01, 0.1, 1) were used.
###########################
## kernel ridge regression
# defining the constants
color2 <- c("green", "blue", "red")
#varying sigma and lambda
sig <- c(0.01, 1, 10, 100)
lam <- c(0.01, 0.1, 1)
kk <- tcrossprod(x)
dd <- diag(kk)
# drawing 4 plots for 4 sigmas
for (j in 1:length(sig)){
plot(x,t,type="l", lwd=3, main = paste("Kernel ridge regression
with the RBF kernel
with sigma = ", sig[j]))
# with 3 lines for different values of lambda in each 4 plots
for (i in 1:length(lam)){
myRBF.kernel <- exp(sig[j]*(-matrix(dd,N,N)-t(matrix(dd,N,N))+2*kk))
ident.N <- diag(rep(1,N))
alphas <- solve(myRBF.kernel + lam[i]*ident.N)
alphas <- alphas %*% t
lines(x,myRBF.kernel %*% alphas,col=color2[i])
}
legend('topright' , paste0("Lambda ",lam), col=color2,lty=c(1,1), lwd=5 ,bty='n', cex=0.75)
}
The Kernel ridge regression model fits the data are a lot better then Standard regression. In the context of this problem, with this given data, I emphirically chose the best parameters to be \(\sigma = 30\) and \(\lambda = 0.1\).
plot(x,t,type="l", lwd=3, main = paste("Kernel ridge regression
with the RBF kernel
with sigma = ", 30, "and lambda = ", 0.1))
myRBF.kernel <- exp(30*(-matrix(dd,N,N)-t(matrix(dd,N,N))+2*kk))
ident.N <- diag(rep(1,N))
alphas <- solve(myRBF.kernel + 0.1*ident.N)
alphas <- alphas %*% t
lines(x,myRBF.kernel %*% alphas,col="red")
Generally, from the observed behaviour, we can say that for this problem and set of data
\(\sigma < 1\) gives very bad results;
\(\sigma = 30\) gives the best results;
with \(\sigma > 50\) as it grows, starts overfittng.
Also, lowering \(\lambda\) improves the results (as a trend, \(\lambda = 1\) works worse, then \(\lambda = 0.1\) or \(\lambda = 0.01\))