Introduction to Gaussian Processes

Ítalo Gomes Gonçalves
January 22, 2017

Gaussian Process

The Gaussian Process (GP) is a flexible machine learning technique, with some similarities to the Support Vector Machine. An advantage of the GP is that the prediction includes the error variance. By combining different covariance functions it is possible to model complex data structures.

Gaussian Process

Implementation for small datasets is very simple:

# Covariance function
cfun <- function(x1, x2, r) {
        d <- outer(x1, x2, function(x1, x2) (x1-x2)^2)
        exp(-3*d/(r^2))
}

# Prediction
GP_pred <- function(x, y, xpred, p, r){

        # p -> noise/total variance ratio
        # r -> covariance range

        m <- mean(y)
        v <- var(y)

        K <- (1-p)*v*cfun(x, x, r) + diag(p*v, length(x), length(x))
        Kpred <- (1-p)*v*cfun(xpred, x, r)

        pred_mu <- Kpred %*% solve(K, y-m) + m
        pred_var <- v - colSums(t(Kpred) * solve(K, t(Kpred)))
        pred_up <- pred_mu + 2*sqrt(pred_var)
        pred_down <- pred_mu - 2*sqrt(pred_var)

        return(list(mu = pred_mu, up = pred_up, down = pred_down))
}

Example

As an example, the Gaussian Process model is applied to the cars2010 dataset:

library(ggplot2)
library(AppliedPredictiveModeling)
data(FuelEconomy)

x=cars2010$EngDispl
y=cars2010$FE
xpred <- seq(1, 10, 0.01)

pred <- GP_pred(x, y, xpred, 0.25, 2)

ggplot() + 
        geom_ribbon(aes(x=xpred, ymax=pred$up, ymin=pred$down),
                    fill = "gray60") +
        geom_point(aes(x=x, y=y)) + 
        geom_line(aes(x=xpred, y=pred$mu), color = "blue") + 
        labs(x="Engine Displacement", y="Fuel Efficiency") + 
        scale_x_continuous(breaks = seq(0,10,2))

Example

As an example, the Gaussian Process model is applied to the cars2010 dataset:

plot of chunk plot2