1 Coefficient of Determination

Recall the correlation \(R = \frac{SS_{xy}}{\sqrt{SS_{xx}SS_{yy}}}\).

Show that \(R^2 = \frac{\sum_i^n (y_i-\bar{y})^2 - \sum_i^n (y_i-\hat{y_i})^2 }{\sum_i^n (y_i-\bar{y})^2 } = \frac{SS_{yy}-SSE}{SS_{yy}}\).

\(R^2\) is called the the coefficient of determination or multiple R-squared.

Practical interpretation of R^2: About \(100(R^2)%\) of the total sum of squares of deviation of sample y values about mean can be explained by (or attributed to) using x to predict y in the linear model. More simply, \(100(R^2)%\) of the variance in the data is explained by the model.

Find the coefficient of determination for our example data from HW1: (0,8), (1,5), (2,7), and (3,4).

x = c(0,1,2,3)
y = c(8,5,7,4)
lm = lm(y~x)
lm$coefficients

## (Intercept)           x 
##         7.5        -1.0

What proportion of the variance in the data is explained by the model in #2?

2 Adjusted Coefficient of Determination

The adjusted coefficient of determination is given by:

\[R^2_a = 1-\frac{n-1}{n-(k+1)} \Big(1-R^2\Big) = 1-\frac{n-1}{n-(k+1)} \Big(\frac{SSE}{SS_{yy}}\Big)\]

Where \(k\) is the number of model parameters (k=2 in the case of two variable regression). \(R^2_a\) takes into account or adjust for both the sample size and the number of model parameters, penalizing models with more parameters. Since you can often just add more and more parameter estimates to a model to make the \(R^2\) closer and closer to one, some analysts prefer to use \(R^2_a\).

Will \(R^2_a\) be bigger or smaller then \(R^2\)? Why?
Find the \(R^2_a\) for the example data.

MATH 4883 - Written Homework 6

1 Coefficient of Determination

2 Adjusted Coefficient of Determination