Ítalo Gomes Gonçalves
January 22, 2017
The Gaussian Process (GP) is a flexible machine learning technique, with some similarities to the Support Vector Machine. An advantage of the GP is that the prediction includes the error variance. By combining different covariance functions it is possible to model complex data structures.
Implementation for small datasets is very simple:
# Covariance function
cfun <- function(x1, x2, r) {
d <- outer(x1, x2, function(x1, x2) (x1-x2)^2)
exp(-3*d/(r^2))
}
# Prediction
GP_pred <- function(x, y, xpred, p, r){
# p -> noise/total variance ratio
# r -> covariance range
m <- mean(y)
v <- var(y)
K <- (1-p)*v*cfun(x, x, r) + diag(p*v, length(x), length(x))
Kpred <- (1-p)*v*cfun(xpred, x, r)
pred_mu <- Kpred %*% solve(K, y-m) + m
pred_var <- v - colSums(t(Kpred) * solve(K, t(Kpred)))
pred_up <- pred_mu + 2*sqrt(pred_var)
pred_down <- pred_mu - 2*sqrt(pred_var)
return(list(mu = pred_mu, up = pred_up, down = pred_down))
}
As an example, the Gaussian Process model is applied to the cars2010 dataset:
library(ggplot2)
library(AppliedPredictiveModeling)
data(FuelEconomy)
x=cars2010$EngDispl
y=cars2010$FE
xpred <- seq(1, 10, 0.01)
pred <- GP_pred(x, y, xpred, 0.25, 2)
ggplot() +
geom_ribbon(aes(x=xpred, ymax=pred$up, ymin=pred$down),
fill = "gray60") +
geom_point(aes(x=x, y=y)) +
geom_line(aes(x=xpred, y=pred$mu), color = "blue") +
labs(x="Engine Displacement", y="Fuel Efficiency") +
scale_x_continuous(breaks = seq(0,10,2))
As an example, the Gaussian Process model is applied to the cars2010 dataset: