Practice midterm

Derive the least squares estimator of \({\beta_0}\) for the regression model \({Y_i=\beta_0+\epsilon_i}\) where \({\epsilon_i}\) is a random error term with mean \({E[\epsilon_i]=0}\) and variance \({\sigma^2(\epsilon_i)=\sigma^2}\); \({\epsilon_i}\) and \({\epsilon_j}\) are uncorrelated so that their covariance is zero. Prove the the least squares estimator is unbiased.

my Answer:
\({E[b_0]=E[\frac{1}{n}(\sum(k_iY_i-b_1)\sum(k_iX_i))]}\)
\({E[b_0]=\frac{1}{n}E[(\sum(k_iY_i-b_1*1)]}\)
\({E[b_0]=\frac{1}{n}*[E[b_1]-E[b_1]]}\)
\({E[b_0]=\frac{1}{n}*0=0}\)
#according to A.13c
\({E[Y_i]=\beta_0=E[\epsilon_i]=0=E[b_0]}\)
\({\beta_0=E[b_0]}\).
The given conditions of regression model are such that according to the Gauss-Markov Theorem, the least squares estimators \({b_0}\) and \({b_1}\) are unbiased.

His Answer: \({Y_i=\beta_0+\epsilon_i}\)
\({E(\epsilon_i)=0}\) \({var(\epsilon_i)=\sigma^2}\)
\({E(Y_i)=E(\beta_0+\epsilon_i)=E(\beta_0)+E(\epsilon_i)=\beta_0}\) \({Q=\sum{(Y-i-E(Y_i))^2}=\sum{Y_i^2}+nb_0^2-2b_0n\bar{y}}\) \({2nb_0-2n\bar{y}=0}\) Therefore, \({b_0=\bar{y}}\) Set \({\frac{dQ}{db_0}=0}\) Check 2nd Derivative: \({\frac{d^2Q}{db_0^2}>0}\), then \({b_0=\bar{y}}\) is a minimum \({E(b_0)=E(\bar{y})=E(\frac{\sum(Y_i)}{n})=\sum{\frac{E(Y_i)}{n}}=\frac{n\beta_0}{n}=\beta_0}\)

Find a 95% confidence interval for the slope in the following setting: \({n=25, \hat{y}=5.3+1.10x, s[b_1]=0.58}\). Test the null hypothesis that the slope is zero versus the two-sided alternative (\({\alpha=0.05}\)). Calculate the power of the test (\({\alpha=0.05}\)) when \({\beta_1=1.10}\).

Answer:
*** \({H_0: \beta_1=0}\) ***
*** \({H_A: \beta_1\ne0}\) ***
*** \({t^*=\frac{b_1}{s[b_1]}}\) ***
*** \({\alpha=0.05}\) ***
*** \({b_1=1.10}\) ***
*** \({s_{b_1}=0.58}\) ***
*** \({\hat{y}=5.3+1.10x}\) ***
*** \({1.10\pm t(0.95;25-2)}\) ***
*** \({CI=1.10\pm1.714}\) ***
*** \({-.1\le CI\le2.3}\) ***
*** \({H_0:|t^*|\le t(0.95;23)}\) ***
*** \({H_A:|t^*|> t(0.95;23)}\) ***
*** \({t^*=\frac{1.10}{0.58}=1.896}\) ***
*** \({t=1.714}\) #according to Table B.2 \({t^*>t}\), therefore reject the null hypothesis that \({\beta_1=0}\). ***
#according to Eq. 2.27
*** \({\delta=\frac{|\beta_1-\beta_{10}|}{s[b_1]}=\frac{1.10}{0.58}=1.9}\) ***
#according to Table B.5, with df of 23.
*** \({\delta=2, power=0.48}\) ***
*** \({\delta=1, power=0.16}\) ***
*** \({\delta=1.9, power\approx 0.16+\frac{1.9-1}{2-1}*(0.48-0.16)\approx0.448}\) ***

In fitting the simple linear regression model it was found that observations \({Y_i}\) fell directly on the fitted regression line (i.e., \({Y_i=\hat{Y_i}})\)). If this case were deleted, would the least squares regression line fitted to the remaining n-1 cases be changed?

my Answer:
*** \({Y_n=\hat{Y_n}}\) ***
*** \({(Y_n-\beta_0-\beta_1X_n)^2}\) ***
*** \({(\hat{Y_n}-\beta_0-\beta_1\hat{X_n})^2}\) ***
*** Therefore, conclude that the deletion of this case would not change the least squares regression line because according to Eq. 1.8, the method of least squares seeks to minimize Q. If \({Y_i}\) is on the regression line, then its removal will not affect the error term since it is zero (Q remains minimized), \({\epsilon_i=0}\), hence, \({E[b_0]}\) does not change. Therefore, the fitted regression line would not change. ***

his answer: \({Q_{n-1}=\sum{(Y_i-\hat{Y_i})^2}=Q_n}\), because the extra term added has a \({Y_i-\hat{Y_i}=0}\).

In a small-scale regression study, five observations on Y were obtained corresponding to \({X=1, 4, 10, 11, 14}\). Assume \({\sigma=0.6,\beta_0=5,}\) and \({\beta_1=3}\).
1. What are the expected values of MSR and MSE?
2. For determining whether or not a regression relation exists, would it have been better or worse to have made the five observations at \({X=6,7,8,9,10}\)? Why?
3. Would the same answer apply if the principal purpose were to estimate the mean response for \({X=8}\)? Explain.

Answer:
*** a. The linear regression model would be as follows: \({\hat{Y_i}=5+3X}\). The standard deviation of 0.6 means that 99% of the data fall between \({\bar{Y}\pm1.8}\). However, all five observations fit with the linear regression line; therefore, the expected value for MSE is \({\sigma^2=(0.6)^2=.36}\) and for MSR is \({\sigma^2+(\beta_1)^2\sum{(X_i-\bar{X})^2}=.36+(3)^2(114)=1026.36}\). ***
*** b. It would have been worse to have made the observations are \({X=6,7,8,9,10}\) because your observations are closely bunched together. This means that the variability in the slope is increased because more \({\beta_1}\) values would be valid for those observations. Having observations across a wider range of data points improves the accuracy of the linear regression model and regression relation to the actual population distribution. (see page 8.) ***
*** c. The same answer would apply. A regression line must go through the mean response of the variable, \({(\bar{X},\bar{Y})}\). In both instances, the mean response is the same (\({X=8}\)). Greater variability in the X variables is preferred. ***

For the following regression model, indicate whether it is a general linear regression model. If it is not, state whether it can be expressed as a general linear regression model by a suitable transformation: \({Y_i=\frac{1}{1+e^{-(t_i-\phi_2)}}+\epsilon_i}\)

Answer:
This is a non-linear regression model, therefor it is not a general linear regression model. Additionally, this cannot be transformed into a general linear regression model. Why? \({\epsilon_i}\) is ourside of the inverse and exponential parts of the function, therefore, no transformation will be able to incorporate the whole function.

Practice midterm

Peter Crank

October 10, 2017