We are familiar with linear modeling. But often the relationship between the predictor and the outcome will be nonlinear. Fortunately, the lm()
function in R is powerful enough to handle even this situation. Let us see an example.
n = 1000
set.seed(n)
x = rnorm(n)
y = 8 + 0.6 * x + 0.4 * x^2 + 0.2 * x^3 + rnorm(n, 0, 0.5)
We generated a vector x of 1000 normal random numbers. The dependent variable y has x, x2 and x3 components, plus some noise.
If we plot x and y, we can clearly see that the curve is anything but a straight line:
plot(x, y)
Just for curiosity, let us fit an ordinary straight line between x and y:
lm1 = lm(y ~ x)
lm1
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 8.36 1.13
plot(x, y)
abline(lm1, col = "red")
Now we go for a nonlinear fit. the function bs()
in the splines package fits a cubic B-spline curve by default. But you can change the degrees through its parameters. See ?bs
for further details .
library(splines)
b = bs(x)
str(b)
## bs [1:1000, 1:3] 0.387 0.443 0.322 0.226 0.421 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:3] "1" "2" "3"
## - attr(*, "degree")= int 3
## - attr(*, "knots")= num(0)
## - attr(*, "Boundary.knots")= num [1:2] -3.36 2.67
## - attr(*, "intercept")= logi FALSE
## - attr(*, "class")= chr [1:3] "bs" "basis" "matrix"
We now fit a linear model between y and the transformed variable b.
lm2 = lm(y ~ b)
lm2
##
## Call:
## lm(formula = y ~ b)
##
## Coefficients:
## (Intercept) b1 b2 b3
## 2.429 10.303 -0.565 14.049
The last action is to apply the linear model to the original variable to get the predictions:
pr = predict(lm2, newdata = data.frame(x))
Now you can see how the fit has dramatically improved:
plot(x, y)
points(x, pr, col = "red", pch = 19, cex = 0.5)