Bayesian Interpretation of Linear Splines

References

BST249 Materials

Aim

Here we try to explain the Bayesian interpretation of the linear splines.

Load packages

library(tidyverse)

Setting

\(f \in \Omega\) where \(\Omega\) is the space of all possible trajectories from \((0,0)\) to \((1, \cdot)\).

Prior: Any Brownian motion trajectories

\(f \sim \pi(f)\) where \(\pi\) is the Brownian motion prior.

A single realization of Brownian motion looks like the following.

data1 <- data.frame(x = seq(from = 0, to = 1, length.out = 101),
                    y = cumsum(c(0, rnorm(n = 100, sd = sqrt(1/100)))))

ggplot(data = data1, mapping = aes(x = x, y = y)) +
    geom_line() +
    theme_bw() + theme(legend.key = element_blank())

The empirical density using n_iter iterations look like this.

## Helper function to make the initial increment to be 0
first_to_zero <- function(v) {
    v[1] <- 0
    v
}

n_sep <- 101
n_iter <- 1000

data2 <- data.frame(id  = rep(seq_len(n_iter), each = n_sep),
                    x   = rep(seq(from = 0, to = 1, length.out = n_sep), n_iter),
                    ## Increments are independent normals
                    z = rnorm(n = n_sep * n_iter, sd = sqrt(1 / (n_sep - 1)))) %>%
    group_by(id) %>%
    mutate(z = first_to_zero(z),
           y   = cumsum(z))

ggplot(data = data2, mapping = aes(x = x, y = y, group = id)) +
    geom_line(alpha = 1 / (n_iter / 100)) +
    theme_bw() + theme(legend.key = element_blank())

Likelihood: Indicator for observed points

The likelihood for a specific trajectory function \(f\) is the following.

\(\prod_{i=1}^{n} I(f(x_{i}) = y_{i})\) where \((x_{i}, y_{i}), i = 1, ..., n\) are the observed coordinates.

This means trajectories in \(\Omega\) that do not respect these observed coordinates are out dropped by multiplying the prior with this indicator likelihood.

Posterior: Brownian motion trajectories respecting data

As a result posterior is a distribution of trajectories that do respect the observed data points, but are otherwise random. The maximum a posteriori (MAP) estimate of the trajectory corresponds to the linear spline. As each random walk is normally distributed, we can calculate the mean to get the mode (maximum).

Here were cannot completely respect the data points because we only have finite number of trajectories in the simulated prior. So data points are respected up to a tolerance value. The trajectory is estimated by taking the mean value at each unit time points among the trajectories in the posterior sample.

## Function to examine if y coordinate is close enough
close <- function(tol, y1, y2) {
    as.numeric(abs(y1 - y2) < tol)
}

## Example observations
obs_data <- data.frame(x = c(0.25, 0.50, 0.75, 1.00),
                       y_obs = c(0.05, 0.15, 0.50, 0.30))

## Create a larger dataset
n_iter <- 10^6
data3 <- data.frame(id  = rep(seq_len(n_iter), each = n_sep),
                    x   = rep(seq(from = 0, to = 1, length.out = n_sep), n_iter),
                    ## Increments are independent normals
                    z = rnorm(n = n_sep * n_iter, sd = sqrt(1 / (n_sep - 1)))) %>%
    group_by(id) %>%
    mutate(z = first_to_zero(z),
           y   = cumsum(z))

## Augment dataset with observations
data3 <- left_join(data3, obs_data)

## Examine closeness at observed x's
data3$close <- close(tol = 0.10, data3$y, data3$y_obs)
## Other parts can be trajectory, so use 1
data3$close[is.na(data3$close)] <- 1

## Construct likelihood (1 if all observed points are close enough)
data3 <- data3 %>%
    group_by(id) %>%
    mutate(L = prod(close))
## How many trajectories were retained?
(n_traj <- sum(data3$L == 1) / n_sep)

## [1] 412

## Keep plotting data only
data_plot <- subset(data3, L == 1)

## Mean trajectory
data_mean <- data_plot %>%
    group_by(x) %>%
    summarize(mean_y = mean(y))

## Plot
ggplot(data = subset(data3, L == 1),
       mapping = aes(x = x, y = y, group = id)) +
    geom_line(alpha = min(1, 1 / (n_traj / 100))) +
    geom_line(data = data_mean, aes(group = NULL, y = mean_y),
              color = "pink", size = 2) +
    geom_point(mapping = aes(y = y_obs), size = 3, color = "white") +
    theme_bw() + theme(legend.key = element_blank())

## Warning: Removed 39964 rows containing missing values (geom_point).

Redo using a more strict closeness criterion.

## Examine closeness at observed x's
data3$close <- close(tol = 0.05, data3$y, data3$y_obs)
## Other parts can be trajectory, so use 1
data3$close[is.na(data3$close)] <- 1

## Construct likelihood (1 if all observed points are close enough)
data3 <- data3 %>%
    group_by(id) %>%
    mutate(L = prod(close))
## How many trajectories were retained?
(n_traj <- sum(data3$L == 1) / n_sep)

## [1] 22

## Keep plotting data only
data_plot <- subset(data3, L == 1)

## Mean trajectory
data_mean <- data_plot %>%
    group_by(x) %>%
    summarize(mean_y = mean(y))

## Plot
ggplot(data = subset(data3, L == 1),
       mapping = aes(x = x, y = y, group = id)) +
    geom_line(alpha = min(1, 1 / (n_traj / 100))) +
    geom_line(data = data_mean, aes(group = NULL, y = mean_y),
              color = "pink", size = 2) +
    geom_point(mapping = aes(y = y_obs), size = 3, color = "white") +
    theme_bw() + theme(legend.key = element_blank())

## Warning: Removed 2134 rows containing missing values (geom_point).

In either case, we can see the MAP trajectory is approximately a linear spline.

Top Page: http://rpubs.com/kaz_yos/
github: https://github.com/kaz-yos