Interactive Regression Line

The following is some code that plots Galton’s data on father-son heights, and creates an interactive slider that allows you to change the value of a proposed regression line, showing the updated mean square error each time. Just copy all of this code into R and execute. You will need to install the UsingR package if you don’t already have it. It contains the data set.

Note that the data is first centered at 0,0, so that drawing the regression line is easy.

The code comes pretty much unchanged from a lecture in the coursera/Johns Hopkins course, “Regression Models”.

library(UsingR)
library(manipulate)
y <- galton$child - mean(galton$child)
x <- galton$parent - mean(galton$parent)
df <- as.data.frame(table(x, y))
names(df) <- c("child", "parent", "freq")
head(df)

myPlot <- function(beta) {
  plot(as.numeric(as.vector(df$parent)),
       as.numeric(as.vector(df$child)),
       cex=0.15*df$freq,
       pch=21,
       col="black",
       bg="lightblue",
       xlab="Parent",
       ylab="Child"
  )
  abline(0, beta, lwd=3)
  points(0,0,cex=2,pch=19)
  mse=mean((y-(beta*x))^2)
  title(paste0("beta=", beta, ", mse=", round(mse,3)))
}
manipulate(myPlot(beta), beta=slider(0.5, 0.9, step=0.02))

# The following will fit the data and show the best possible slope.
# The "-1" below means "don't fit an intercept"...need to look this up
fit <- lm(I(child - mean(child)) ~ I(parent - mean(parent)) - 1, data=galton)
summary(fit)