Karim Naguib (Boston University)
9/14/2013
\( \beta_{ClassSize} \) would be the slope of the straight line describing the linear relationship between \( TestScore \) and \( ClassSize \)
\[ TestScore = \beta_0 + \beta_{ClassSize} \times ClassSize \]
If we knew the parameters \( \beta_0 \) and \( \beta_{ClassSize} \), not only would we be able to predict the change in student performance, we would be able to predict the average test score for any class size
The equation of the model that includes all these other factors and predicts exact test scores we write
\[ TestScore = \underbrace{\beta_0 + \beta_{ClassSize} \times ClassSize}_{\text{Average }TestScore} + \text{ other factors} \]
More generally, if we have \( n \) observations for \( X_i \) and \( Y_i \) pairs (e.g. \( Y_i \) is the average test score and \( X_i \) is the average class size, for district \( i \))
\[ Y_i = \beta_0 + \beta_1 X_i + u_i \]
Typically, for our model
\[ Y_i = \beta_0 + \beta_1 X_i + u_i \]
we don't know the parameters \( \beta_0 \) and \( \beta_1 \)
From the data we have available we can then do our inference on these parameters
To estimate the model parameters of the class size/student performance model we have data from 420 California school districts in 1999
The sample correlation is found to be -0.23, indicating a weak negative relationship. However, we need a better measure of causality: we want to be able to draw a straight line through these dots characterizing the linear regression line, and from that we get the slope.
qplot(x=str, y=testscr, data=test.score.data, geom="point", xlab="Student/Teacher Ratio", ylab="Test Scores")
The Ordinary Least Squares Estimator (2)
The OLS estimator of \( \beta_1 \) is \[ \hat{\beta}_1 = \frac{\sum_i (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_i (X_i - \bar{X})^2} = \frac{s_{XY}}{s_X^2} \]
The OLS estimator of \( \beta_0 \) is \[ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} \]
The predicted value of \( Y_i \) is \[ \hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i \]
The error in predicting \( Y_i \) is called the residual \[ \hat{u}_i = Y_i - \hat{Y}_i \]
Using data from the 420 school districts an OLS regression is run to estimate the relationship between test score and teacher-student ratio (STR).
\[ \widehat{TestScore} = 698.9 - 2.28 \times STR \]
where \( \widehat{TestScore} \) is the predicted value. (This is referred to as test scores regressed on STR)
qplot(x=str, y=testscr, data=test.score.data, geom="point", xlab="Student/Teacher Ratio", ylab="Test Scores") + geom_abline(intercept=698.9, slope=-2.28, color='blue')
The \( R^2 \) is the fraction of the sample variance of \( Y_i \) (dependent variable) explained by \( X_i \) (regressor)
Let us define the total sum of squares (\( TSS \)), the explained sum of squares (\( ESS \)), and the sum of squared residuals (\( SSR \))
\[ TSS = ESS + SSR \]
\( R^2 \) can be defined as \[ R^2 = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS} \]
The standard error of the regression (\( SER \)) is an estimator of the standard deviation of the population regression error \( u_i \).
We use \( \hat{u}_1,\dots, \hat{u}_n \) to calculate our estimate
\[ SER = s_{\hat{u}} \] where \[ s_{\hat{u}}^2 = \frac{1}{n-2}\sum_i \hat{u}_i^2 = \frac{SSR}{n-2} \]
The first thing we need to do is load the data
load("usr/data/ec414/Test.Score.RData")
ls()
regress.results <- lm(testscr ~ str, data = test.score.data)
summary(regress.results)
Call:
lm(formula = testscr ~ str, data = test.score.data)
Residuals:
Min 1Q Median 3Q Max
-47.73 -14.25 0.48 12.82 48.54
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 698.93 9.47 73.82 < 2e-16 ***
str -2.28 0.48 -4.75 2.8e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 18.6 on 418 degrees of freedom
Multiple R-squared: 0.0512, Adjusted R-squared: 0.049
F-statistic: 22.6 on 1 and 418 DF, p-value: 2.78e-06
\[ E[u_i|X_i] = 0 \]
For all \( i \), \( (X_i, Y_i) \) are i.i.d.
\[ 0 < E[Y_i^4] < \infty \]
When we have a large sample, we can approximate the distribution of the random variable \( \bar{Y} \) by a normal distribution with mean \( \mu_Y \).
Another implication of the variance of \( \hat{\beta}_1 \) \[ \sigma_{\hat{\beta}_1}^2 = \frac{1}{n}\frac{Var[(X_i - \mu_X)u_i]}{[Var(X_i)]^2} \]
is the larger \( Var(X_i) \) the smaller is \( \sigma_{\hat{\beta}_1}^2 \) and hence tighter is our prediction of \( \beta_1 \).
Yet another implication of the variance of \( \hat{\beta}_1 \)
\[ \sigma_{\hat{\beta}_1}^2 = \frac{1}{n}\frac{Var[(X_i - \mu_X)u_i]}{[Var(X_i)]^2} \]
is that the smaller the variance \( u_i \) the smaller is \( \sigma_{\hat{\beta}_1}^2 \) and hence tighter is our prediction of \( \beta_1 \).