Created by Robin Cunningham, UNC Chapel Hill
output: html_document
GPA Data Set
We will use this exercise to compare the regression coefficients that we find by solving the Normal Equations (Slide 9 of Lecture 5) to the Output from using lm to find the regression coefficients.
eval=FALSE so that your file will knit but you will need to change this to put out your answers. Read in the GPA file and assign it to the handy name provided.Then go ahead and print it so you have it for reference.
#gpa <-
gpaCreate each of the Matrices, X, Y, and Beta_hat that you will need to solve the normal equations. \[ X\hat{B} =Y \] Begin by defining variables \[ Y = First year GPA \] \[ X1 = Math SAT \] \[ X2 = Verbal SAT \] \[ X3 = HS Math GPA \] \[ X4 = HS English GPA. \] Then initialize a Beta_hat vector of all zeros and the appropriate length.
n <- nrow(gpa)
#Y <-
#X1 <-
#X2 <-
#X3 <-
#X4 <-
# Beta_hat <-We have our variables now, but they are all stored as vectors and we must convert them to matrices. Use cbind to create the matrix X. You will need to create a vector of 1’s of the appropriate length first. Also, use the matrix command to turn Y and Beta_hat into matrices with the correct dimensions. After you have made them, go ahead and print (with labels) all three Matrices for the normal equations with labels.
#Change Beta_hat to a matrix
#Change Y to a matrix
#create X
# First create column of ones and convert Xi to columns
# make matrix X
c. Calculate the least squares values of Beta_hat. In the calculation of the least squares values, I recommend calculating \(X^tX\) first, then the inverse, then Beta_hat.
I went ahead and wrote the code to print out the Beta_i values after they are calculated.
#Calculate Beta_hat
#Print individual values of Beta_i
labels = matrix(c("Beta_0", "Beta_1", "Beta_2", "Beta_3", "Beta_4"), nrow = 5)
beta_summary <- cbind(labels, Beta_hat)
beta_summary
Now that you know how lm calculates these coefficients, it is ok to use lm directly to calculate the least-squares statistics. Do that here and assign the model to the name gpa.mlr. Then get the summary output of the model.
How do your estimates of the parameters compare to R’s?(Comment box below)
answer here
Interpret each of the regression paramaters in words. That is, explain what each value means in terms of the scenario.
answer hereFind a 95% confidence interval for \(\beta_4\).(Use either a code-box, a comment box, or both to hold your answer.)
For a new individual with
MSAT = 640
VSAT = 540
HSM = 3.8
HSE = 3.2
What is your best estimate for their 1st-year gpa?
Please use R to give your answer.
For the person described in Part (h), find a 95% confidence interval for their 1st-year gpa. (Note that s^2 is given in the R output of the linear model.)
We should have done this to start the regression analysis, but make plots of Y versus each of the predictors (4 plots) and discuss what you see. (Try to get everything into this RMarkdown document. No paper this time.)
Makes histograms of each of the variables to see if there are any obvious outliers or other odd behavior. Print the histograms and any comments.
Plot residuals versus fitted values (Y-hat_i) and the QQ plot of residuals. It is ok to use lm for this. Include any comments on what these diagnostic plots indicate.
Find a 95% prediction interval for the individual described in Part (h) above. It is ok to use R’s predict function for this. Interpret the result in terms of the scenario.