This is a short analysis on the mcycle data set found in the MASS package. This data set gives a series of measurements of head acceleration in a simulated motorcycle accidents, in order to test crash helmets. The data consists of two variables: time and acceleration. Time reflects the time in milliseconds after impact and acceleration monitors the current acceleration rate of the head at a given time.

library(ggplot2)
library(MASS)
library(MLmetrics)
library(splines2)
library(tidyverse)

Below we have a scatterplot with time on the x axis and acceleration on the y axis to get an initial idea of the relationship between these two variables.

Polynomial Regression

In order to fit a polynomial regression model we must first find the degree we would like to use. The mcycle data set was split into training and testing sets (80% and 20% respectively). A polynomial regression was fit with degrees ranging from 1 to 10. These models were then used to predict the acceleration from the testing set, and the RMSE was compared to find the optimal degree.

set.seed(1)
index <- sample(1:nrow(mcycle),nrow(mcycle)*.8)
mcycle.train <- mcycle[index,]
mcycle.test <- mcycle[-index,]

rmse1 <- vector(length = 10)

for (i in 1:10){
  polyfit <- lm(accel ~ poly(times,i),data=mcycle.train)
  pred <- predict(polyfit,mcycle.test)
  rmse1[i] <- RMSE(pred,mcycle.test$accel)
}
optimal.d <- which.min(rmse1)

The optimal degree was found to be 10. From this point we can fit our final polynomial regression curve and its standard error bands for plotting.

timelims <- range(mcycle$times)
times.grid <- seq(from =  timelims[1], to = timelims[2])

polyfit <- lm(accel ~ poly(times,optimal.d),data=mcycle)

final.preds <- predict(polyfit, newdata = list(times = times.grid),se = TRUE)
se.bands <- cbind(final.preds$fit + (2 * final.preds$se),final.preds$fit - (2 * final.preds$se))
dat.fitted <- data.frame(time= times.grid, fitted = final.preds$fit,
                         se.bands.upper = se.bands[,1],se.bands.lower = se.bands[,2])

The plot below shows the fitted curve from polynomial regression as the red line and its confidence bands as the blue.

Cubic Spline

In order to find the cubic spline we will start off by finding each of the 20% quantiles in order to be used as the interior knots.

interior.knots <- quantile(mcycle$time, c(.2,.4,.6,.8))
##   20%   40%   60%   80% 
## 14.68 18.44 26.52 36.20

A B-spline with a degree of three (cubic spline) will lead to a set of cubic curves based on a linear combination of basis functions influenced by the location and number of interior knots we selected.

basis <- bSpline(mcycle$time, knots=interior.knots, degree=3, intercept=TRUE)   
basis2 <- data.frame(mcycle$times,basis)
colnames(basis2) <- c("times", paste0("basis", 1:8))
basis2 <- basis2 %>% gather(key="basis",value="value", -times)

bfit <- lm(mcycle$accel ~ 0 + bSpline(mcycle$times,knots=interior.knots,degree=3,intercept=TRUE))

mcycle.cubic <- cbind(mcycle,t(bfit$coefficients %*% t(basis)))
colnames(mcycle.cubic) <- c("times","accel","fitted")

Below is the plot of the cubic spline plotted over the scatterplot from part a.

Smoothing Spline

This code shows how to find a smooth spline and it is overlayed on the scatterplot from part a below.

smooth.bfit <- smooth.spline(mcycle$times,mcycle$accel)
mcycle3 <- as.data.frame(cbind(smooth.bfit$x,smooth.bfit$y))
colnames(mcycle3) <- c("times","fitted")

The degrees of freedom used in the smoothed spline is shown below.

smooth.bfit$df
## [1] 12.20876