Consider the function: \[f(x) = 0.5\frac{sin(x − a)}{(x-a)} + 0.8\frac{sin(x − b)}{(x-b)} + 0.3\frac{sin(x − c)}{(x-c)}\]
where a = 10, b = 50, c = 80. Generate a dataset of N = 1052 examples where the x are equally-spaced in [0.1, 100] and the targets for regression are obtained as t = f (x) + N(0,0.052). Fit standard polynomial regression with some degrees of your choice; then fit kernel ridge regression with the RBF kernel and some σ and λ of your choice, until you are satisfied with the fit. Write a small report (max. 4 pages) with your results.
For the standard polynomails, the a sequence was generated just to be able to demonstrate how the polynomial fitting changes when the polynomial increases. The polinomials used were: (1, 5, 9, 13, 17, 21)
# Constants
a = 50
b = 50
c = 80
N = 1052
#Limits
x_upper <- 100
x_lower <-.01
spacing = (x_upper-x_lower)/(N-1)
x <- seq(x_lower,x_upper,by= spacing)
# Targets #Normal missing
t = 0.5*(sin(x-a)/(x-a)) + 0.8*(sin(x-b)/(x-b)) + .3*(sin(x-c)/(x-c)) + rnorm(x,0,0.05)
# Function plot
data <- data.frame(x,t)
plot(x,t,type="l")
#Get colors
colors = colorRampPalette(c("red", "green"))(6)
polynomials = seq(1,21,4)
#Plot with polynomials ## Maybe put some legends
for(i in 1:length(polynomials)){
points(x, predict( lm(t ~ poly(x, degree = polynomials[i]), data) ), col=colors[i], lwd=.5)
}
legend('topright' , paste0("Polynomial ",polynomials, " "), col=colors,lty=c(1,1), lwd=5 ,bty='n', cex=1)
As shown in the plot, the Polinomial 1 is basically a linear regression, which is a straight line in the x-axis and following the mean = 0.
Kernel ridge regression is one of the different existing so called Kernel Methods. The name comes from the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without having to compute the coordinates of the data in that space, but computing the inner products between the data in the feature space. This approach is called the “kernel trick”, and its popularity comes because the computations are cheaper than explicitly computing the coodinates.
For the purpose of this problem, the RBF kernel is manually computed, the objective is to see the behaviour of the parameters \(\sigma\) and \(\lambda\).
For sigma the following values were chosen: [.01, 1, 10, 100], and for lambda: [1, .1, 0.1, .001]
#Parameters for Kernel
sigma <- c(.01, 1,10,100)
lambda <- c(1, .1, .01 ,.001)
#Colors for plotting
colors_blues = colorRampPalette(c("lightblue", "darkblue"))(4)
# Prepare the Gaussian RBF kernell & the ridge regression
N <- length(x)
kk <- tcrossprod(x)
dd <- diag(kk)
ident.N <- diag(rep(1,N))
par(mfrow=c(2,2))
for (i in 1:length(sigma)) {
plot(x,t,
type="l",
main = paste0("Gaussian Kernel with sigma = ",sigma[i]),
xlab = "x",
ylab = "Target (t)")
## Gaussian RBF kernel manually
myRBF.kernel <- exp(sigma[i]*(-matrix(dd,N,N)-t(matrix(dd,N,N))+2*kk))
for(j in 1:length(lambda)){
alphas <- solve(myRBF.kernel + lambda[j]*ident.N)
alphas <- alphas %*% t
lines(x,myRBF.kernel %*% alphas,col=colors_blues[j], lwd = 2)
}
legend('topright' , paste0("lambda = ",lambda, " "), col=colors_blues,lty=c(1,1), lwd=5 ,bty='n', cex=.5)
}
If we consider sigma and lambda independently, we can conclude that in general, a low lambda helps to better fit the model. In this example, lambda = 0.001 was prooving to have a closer approach to the bell, which can also be considered as overfit.
In the case of sigma, this proved to be a better parammeter to fit the data. It may be considered overfit but utilizing a sigma = 1, the Kernel proved to fit the data for all different lambdas.
par(mfrow = c(1,1))
#Parameters
sigma = 1
lambda = 0.001
plot(x,t,
type="l",
main = paste0("Gaussian Kernel with sigma = ",sigma),
xlab = "x",
ylab = "Target (t)")
## Gaussian RBF kernel manually
myRBF.kernel <- exp(sigma*(-matrix(dd,N,N)-t(matrix(dd,N,N))+2*kk))
alphas <- solve(myRBF.kernel + lambda*ident.N)
alphas <- alphas %*% t
lines(x,myRBF.kernel %*% alphas,col="darkblue", lwd = 2)
legend('topright' , paste0("lambda = ",lambda, " "),
col="darkblue",lty=c(1,1), lwd=5 ,bty='n', cex=.75)
Reference: Kernel method