We will need the following libraries.
# see https://github.com/rstudio/rstudio/issues/3389
library(rsconnect)
library(splines2)
library(data.table)
A set of vectors in a vector space is called a basis if you can reach any point in the vector space as a linear combination of the elements of the basis. A simple example is a combination of the standard basis in 2d euclidean space
\[ \begin{aligned} 4 \mathbf{\hat{\text{i}}} + 2 \mathbf{\hat{\text{j}}} \end{aligned} \]
Analogously, every continuous function in a function space can be represented as a linear combination of basis functions. Think of each basis function as building block that we stack up to form some arbitrary functional form. An example is the radial basis function.
\[ \begin{aligned} \phi_j(x) = \exp \left( -\frac{(x-\mu_j)^2}{l^2} \right) \end{aligned} \]
The \(\mu\) controls where the function is centred and \(l\) controls the width.
In the figure below, \(\mu\) is set to -1, 0 and 1 (left to right) and \(l\) is set to 0.5.
x <- seq(-3, 3, len = 200)
y1 <- exp(- ((x+1)^2)/(0.5^2)) #
y2 <- exp(- ((x)^2)/(0.5^2)) #
y3 <- exp(- ((x-1)^2)/(0.5^2)) #
plot(x, y1, type = "l", col = 1)
lines(x, y2, col = 2)
lines(x, y3, col = 3)
If we form a linear combination from the basis functions then we can generate arbitrary functional forms, for example the combination below gives the curve in the next figure
\[ \begin{aligned} y(x) = -0.475 \exp(-2(x+1)^2) -0.189 \exp(-2(x)^2) - 1.818 \exp(-2(x-1)^2) \end{aligned} \]
y <- -0.475 * y1 - 0.189 * y2 - 1.818 * y3
plot(x, y, type = "l", col = 1)
Thus, basis functions are combined to create splines. M-splines are a set of basis functions that are positive over a certain range and zero elsewhere. Across the range where they are positive, an M-spline basis will integrate to 1. M-splines can be defined by the recursion
\[ \begin{aligned} M_i(x|k=1, t) &= \begin{cases} \frac{1}{t_{i+1} - t_i} & t_i \le x < t_{i+1} \\ 0 & \text{otherwise} \end{cases}\\ \\ M_i(x|k, t) &= \begin{cases} \frac{k[(x-t_i)M_i(x|k-1, t) + (t_{i+k}-x)M_{i + 1}(x|k-1, t)]}{(k-1)(t_{i+k}-t_i)} & k > 1 \end{cases} \end{aligned} \]
where \(k\) represents the order of polynomial basis functions, where \(k-1\) is the polynomial’s degree (highest order term) and \(t = \{ t_1, \dots, t_{n+k}\}\) is the knot sequence such that \(n+k\) knots have to be allocated. The splines2
package has a function to create M-splines as follows
x <- seq(0, 100, length = 50)
knots <- quantile(x, probs = c(1/3, 2/3))
msMat <- mSpline(x,
knots = knots, # internal knots
Boundary.knots = range(x), # knots at boundaries
degree = 3,
intercept = TRUE)
matplot(x, msMat, type = "l", ylab = "y")
abline(v = knots, lty = 2, col = "gray")
the library provides functions to differentiate, create periodic basis and integrate the M-splines.
isMat <- iSpline(x,
knots = knots,
degree = 3, # degree of the associated M-spline
Boundary.knots = range(x),
intercept = TRUE)
matplot(x, isMat, type = "l", ylab = "y")
abline(v = knots, lty = 2, col = "gray")
Integrated M-splines are referred to as I-splines and when combined with non-negative weights will produce monotone splines. An M-spline of degree \(k-1\) has an associated I-spline of degree \(k\). The matrix returned from iSpline
is shown below.
head(isMat)
## 1 2 3 4 5 6
## [1,] 0.0000000 0.00000000 0.0000000000 0.000000e+00 0 0
## [2,] 0.2233113 0.01056912 0.0001487042 7.805994e-07 0 0
## [3,] 0.4069525 0.03967006 0.0011552871 1.248959e-05 0 0
## [4,] 0.5559257 0.08361449 0.0037831748 6.322855e-05 0 0
## [5,] 0.6748958 0.13900913 0.0086927545 1.998334e-04 0 0
## [6,] 0.7681904 0.20275581 0.0164413741 4.878746e-04 0 0
tail(isMat)
## 1 2 3 4 5 6
## [45,] 1 1 0.9995121 0.9835586 0.7972442 0.2318096
## [46,] 1 1 0.9998002 0.9913072 0.8609909 0.3251042
## [47,] 1 1 0.9999368 0.9962168 0.9163855 0.4440743
## [48,] 1 1 0.9999875 0.9988447 0.9603299 0.5930475
## [49,] 1 1 0.9999992 0.9998513 0.9894309 0.7766887
## [50,] 1 1 1.0000000 1.0000000 1.0000000 1.0000000
The predict
function can be used to evaluate the basis functions at a new set of x
values while retaining the original settings for knots
, degree
and intercept
. The example below intentionally predicts from beyond the original boundary to show the warning that is reported when you do this.
predict(msMat, c(10, 20, 30, 110))
## Warning in mSpline(x = c(10, 20, 30, 110), degree = 3L, knots =
## c(33.3333333333333, : Some 'x' values beyond boundary knots may cause ill-
## conditioned bases.
## 1 2 3 4 5 6
## [1,] 0.04116 0.032535 0.00441 0.00018 0.000000 0.00000
## [2,] 0.00768 0.033480 0.01368 0.00144 0.000000 0.00000
## [3,] 0.00012 0.019845 0.02187 0.00486 0.000000 0.00000
## [4,] 0.00000 0.000000 -0.00018 0.00639 -0.081135 0.26364
The following fits a monotonic spline to simulated data and then plots the predicted values at each x. The model used is a multivariable linear regression.
mu <- c(2, 2, 10, 12)
cp <- c(10, 30, 50, 70)
nsim <- 200
d <- data.table(xx = runif(nsim, 0, 100))
d[, idx := findInterval(xx, cp)+1]
d[, y := rnorm(nsim, ifelse(is.na(mu[idx]), 25, mu[idx]), 3)]
plot(d$xx, d$y, xlab = "x", ylab = "y")
# monotonic spline
X <- iSpline(d$xx,
knots = knots, # internal knots
Boundary.knots = range(x), # knots at boundaries
degree = 3,
intercept = TRUE)
# intercept included in the spline basis
lm1 <- lm(y ~ -1 + X, data = d)
yhat <- sort(unique(fitted(lm1)))
lines(sort(d$xx), yhat, col = 2, lwd = 2)
We have introduced basis functions and splines, giving examples of the radial basis and the M-spline family. Splines constitute a large topic and it would be unfruitful to attempt to cover too much in one post. For example, I have omitted continuity and boundary constraints from this introduction, which ensure that the components of a spline join smoothly. The constraints amount to ensuring that slope and curvature are equal at the konts. More detail can be found in the following introductory review article https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-019-0666-3. After introducing splines, we used a monotonic spline to model some simulated data. Finally, it is worth noting that one of the many applications of monotonic splines is to create a flexible form for the cumulative hazard in survival analysis. We will have a look at that topic in a later post.