Splines

We will need the following libraries.

# see https://github.com/rstudio/rstudio/issues/3389
library(rsconnect)
library(splines2)
library(data.table)

A set of vectors in a vector space is called a basis if you can reach any point in the vector space as a linear combination of the elements of the basis. A simple example is a combination of the standard basis in 2d euclidean space

\[ \begin{aligned} 4 \mathbf{\hat{\text{i}}} + 2 \mathbf{\hat{\text{j}}} \end{aligned} \]

Analogously, every continuous function in a function space can be represented as a linear combination of basis functions. Think of each basis function as building block that we stack up to form some arbitrary functional form. An example is the radial basis function.

\[ \begin{aligned} \phi_j(x) = \exp \left( -\frac{(x-\mu_j)^2}{l^2} \right) \end{aligned} \]

The \(\mu\) controls where the function is centred and \(l\) controls the width.

In the figure below, \(\mu\) is set to -1, 0 and 1 (left to right) and \(l\) is set to 0.5.

x <- seq(-3, 3, len = 200)
y1 <- exp(- ((x+1)^2)/(0.5^2)) # 
y2 <- exp(- ((x)^2)/(0.5^2)) # 
y3 <- exp(- ((x-1)^2)/(0.5^2)) # 
plot(x, y1, type = "l", col = 1)
lines(x, y2, col = 2)
lines(x, y3, col = 3)

If we form a linear combination from the basis functions then we can generate arbitrary functional forms, for example the combination below gives the curve in the next figure

\[ \begin{aligned} y(x) = -0.475 \exp(-2(x+1)^2) -0.189 \exp(-2(x)^2) - 1.818 \exp(-2(x-1)^2) \end{aligned} \]

y <- -0.475 * y1 - 0.189 * y2 - 1.818 * y3
plot(x, y, type = "l", col = 1)

Thus, basis functions are combined to create splines. M-splines are a set of basis functions that are positive over a certain range and zero elsewhere. Across the range where they are positive, an M-spline basis will integrate to 1. M-splines can be defined by the recursion

\[ \begin{aligned} M_i(x|k=1, t) &= \begin{cases} \frac{1}{t_{i+1} - t_i} & t_i \le x < t_{i+1} \\ 0 & \text{otherwise} \end{cases}\\ \\ M_i(x|k, t) &= \begin{cases} \frac{k[(x-t_i)M_i(x|k-1, t) + (t_{i+k}-x)M_{i + 1}(x|k-1, t)]}{(k-1)(t_{i+k}-t_i)} & k > 1 \end{cases} \end{aligned} \]

where \(k\) represents the order of polynomial basis functions, where \(k-1\) is the polynomial’s degree (highest order term) and \(t = \{ t_1, \dots, t_{n+k}\}\) is the knot sequence such that \(n+k\) knots have to be allocated. The splines2 package has a function to create M-splines as follows

x <- seq(0, 100, length = 50)
knots <- quantile(x, probs = c(1/3, 2/3))
msMat <- mSpline(x, 
                 knots = knots, # internal knots
                 Boundary.knots = range(x), # knots at boundaries
                 degree = 3, 
                 intercept = TRUE)
matplot(x, msMat, type = "l", ylab = "y")
abline(v = knots, lty = 2, col = "gray")

the library provides functions to differentiate, create periodic basis and integrate the M-splines.

isMat <- iSpline(x, 
                 knots = knots, 
                 degree = 3, # degree of the associated M-spline
                 Boundary.knots = range(x), 
                 intercept = TRUE)
matplot(x, isMat, type = "l", ylab = "y")
abline(v = knots, lty = 2, col = "gray")

Integrated M-splines are referred to as I-splines and when combined with non-negative weights will produce monotone splines. An M-spline of degree \(k-1\) has an associated I-spline of degree \(k\). The matrix returned from iSpline is shown below.

head(isMat)
##              1          2            3            4 5 6
## [1,] 0.0000000 0.00000000 0.0000000000 0.000000e+00 0 0
## [2,] 0.2233113 0.01056912 0.0001487042 7.805994e-07 0 0
## [3,] 0.4069525 0.03967006 0.0011552871 1.248959e-05 0 0
## [4,] 0.5559257 0.08361449 0.0037831748 6.322855e-05 0 0
## [5,] 0.6748958 0.13900913 0.0086927545 1.998334e-04 0 0
## [6,] 0.7681904 0.20275581 0.0164413741 4.878746e-04 0 0
tail(isMat)
##       1 2         3         4         5         6
## [45,] 1 1 0.9995121 0.9835586 0.7972442 0.2318096
## [46,] 1 1 0.9998002 0.9913072 0.8609909 0.3251042
## [47,] 1 1 0.9999368 0.9962168 0.9163855 0.4440743
## [48,] 1 1 0.9999875 0.9988447 0.9603299 0.5930475
## [49,] 1 1 0.9999992 0.9998513 0.9894309 0.7766887
## [50,] 1 1 1.0000000 1.0000000 1.0000000 1.0000000

The predict function can be used to evaluate the basis functions at a new set of x values while retaining the original settings for knots, degree and intercept. The example below intentionally predicts from beyond the original boundary to show the warning that is reported when you do this.

predict(msMat, c(10, 20, 30, 110))
## Warning in mSpline(x = c(10, 20, 30, 110), degree = 3L, knots =
## c(33.3333333333333, : Some 'x' values beyond boundary knots may cause ill-
## conditioned bases.
##            1        2        3       4         5       6
## [1,] 0.04116 0.032535  0.00441 0.00018  0.000000 0.00000
## [2,] 0.00768 0.033480  0.01368 0.00144  0.000000 0.00000
## [3,] 0.00012 0.019845  0.02187 0.00486  0.000000 0.00000
## [4,] 0.00000 0.000000 -0.00018 0.00639 -0.081135 0.26364

Fit a spline to data

The following fits a monotonic spline to simulated data and then plots the predicted values at each x. The model used is a multivariable linear regression.

mu <- c(2, 2, 10, 12)
cp <- c(10, 30, 50, 70)
nsim <- 200
d <- data.table(xx = runif(nsim, 0, 100))
d[, idx := findInterval(xx, cp)+1]
d[, y := rnorm(nsim, ifelse(is.na(mu[idx]), 25, mu[idx]), 3)]
plot(d$xx, d$y, xlab = "x", ylab = "y")
# monotonic spline
X <- iSpline(d$xx, 
             knots = knots, # internal knots
             Boundary.knots = range(x), # knots at boundaries
             degree = 3, 
             intercept = TRUE)
# intercept included in the spline basis
lm1 <- lm(y ~ -1 + X, data = d)
yhat <- sort(unique(fitted(lm1)))
lines(sort(d$xx), yhat, col = 2, lwd = 2)

Conclusion

We have introduced basis functions and splines, giving examples of the radial basis and the M-spline family. Splines constitute a large topic and it would be unfruitful to attempt to cover too much in one post. For example, I have omitted continuity and boundary constraints from this introduction, which ensure that the components of a spline join smoothly. The constraints amount to ensuring that slope and curvature are equal at the konts. More detail can be found in the following introductory review article https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-019-0666-3. After introducing splines, we used a monotonic spline to model some simulated data. Finally, it is worth noting that one of the many applications of monotonic splines is to create a flexible form for the cumulative hazard in survival analysis. We will have a look at that topic in a later post.