1 Binary variables

  1. are generally used to control for outliers in your sample.

(outlier problem cannot be solved by binary variables)

  1. can take on more than two values.

(binary means only two values)

  1. exclude certain individuals from your sample.

D) can take on only two values.

2) In the simple linear regression model, the regression slope

  1. indicates by how many percent Y increases, given a one percent increase in X.

(using “percent” or not depends on data)

  1. when multiplied with the explanatory variable will give you the predicted Y.

(\(\hat{y}=\hat{\beta_0}+\hat{\beta_1}x\), also need intercept to predict)

C) indicates by how many units Y increases, given a one unit increase in X.

  1. represents the elasticity of Y on X.

\[\frac{dY}{Y}=\beta_1\frac{dX}{X}\]

3) Interpreting the intercept in a sample regression function is

  1. not reasonable because you never observe values of the explanatory variables around the origin.

(e.g.:apple consumption)

  1. reasonable because under certain conditions the estimator is BLUE.

(reason why we contain the intercept \(\hat{y}=\hat{\beta_0}+\hat{\beta_1}x\))

C) reasonable if your sample contains values of Xi around the origin.

  1. not reasonable because economists are interested in the effect of a change in X on the change in Y.

(back to C)

4) The regression \(R^2\) is a measure of

  1. whether or not X causes Y.

(significant test)

B) the goodness of fit of your regression line.

  1. whether or not ESS > TSS.

(TSS=ESS+RSS)

  1. the square of the determinant of R.

5) The sample regression line estimated by OLS

  1. will always have a slope smaller than the intercept.

  2. is exactly the same as the population regression line.

  3. cannot have a slope of zero.

D) will always run through the point (\(\bar{X}\),\(\bar{Y}\)).

\[y_i=\beta_0+\beta_1x_i,i=1,...,N\] \[\frac{1}{N}\sum_{i=1}^{N}y_i=\frac{1}{N}\sum_{i=1}^{N}\beta_0+\frac{1}{N}\sum_{i=1}^{N}\beta_1x_i\]

\[\bar{y}=\beta_0+\beta_1\bar{x}\]

6) The OLS residuals/error term

  1. can be calculated using the errors from the regression function.

(\(y=\beta_0+\beta_1x+e\) is unknown)

B) can be calculated by subtracting the fitted values from the actual values.

(\(y=\hat{y}+resid\))

  1. are unknown since we do not know the population regression function. (\(y=\beta_0+\beta_1x+e\) is unknown, but we need \(resid\))

  2. should not be used in practice since they indicate that your regression does not run through all your observations.

## Warning: package 'ggplot2' was built under R version 3.3.3

7) \(E(u_i|X_i) = 0\) says that

  1. dividing the error by the explanatory variable results in a zero (on average).

(\(E(A|B)\) means A is conditional on B)

  1. the sample regression function residuals are unrelated to the explanatory variable.

(\(cov(u_i,X_i)=0\))

  1. the sample mean of the Xs is much larger than the sample mean of the errors.

D) the conditional distribution of the error given the explanatory variable has a zero mean.

8) Heteroskedasticity means that

  1. homogeneity cannot be assumed automatically for the model.

B) the variance of the error term is not constant.

  1. the observed units have different preferences.

  2. agents are not all rational.

9) The t-statistic is calculated by dividing

  1. the OLS estimator by its standard error.

  2. the slope by the standard deviation of the explanatory variable.

C) the estimator minus its hypothesized value by the standard error of the estimator.

  1. the slope by 1.96.

\[t_{\beta}=\frac{\hat{\beta}-\beta_{H0}}{std.error}\]

where \(sed.error=\hat{\sigma}/\sqrt{N}\) and \(\beta_{H0}\) is null hypothesis of \(beta\).

10)

At a recent county fair, you observed that at one stand people’s weight was forecasted, and were surprised by the accuracy (within a range). Thinking about how the person could have predicted your weight fairly accurately (despite the fact that she did not know about your “heavy bones”), you think about how this could have been accomplished. You remember that medical charts for children contain 5%, 25%, 50%, 75% and 95% lines for a weight/height relationship and decide to conduct an experiment with 110 of your peers. You collect the data and calculate the following sums:

\[\sum_{i=1}^{n}Y_i=17375,\sum_{i=1}^{n}X_i=7665.5\]

\[\sum_{i=1}^{n}y_i^2=94228.8,\sum_{i1=}^{n}x_i^2=1248.9,\sum_{i=1}^{n}x_iy_i=7625.9\]

where the height is measured in inches and weight in pounds.

(Small letters refer to deviations from means as in \(z_i = Z_i-\bar{Z}\))

10.1) What is the slope of the population regression line.

  1. 4.54

  2. 6.11

  3. 7.94

  4. 10.73

10.2) What is the intercept of the population regression line?

  1. -267.86

  2. -142.52

  3. 26.60

  4. 118.91

\[Y=\beta_0+\beta_1X+U\]

\[\hat\beta_1=\frac{\sum_{i=1}^{n}(X_i-\bar{X})(Y_i-\bar{Y})}{\sum_{i=1}^{n}(X_i-\bar{X})^2}\] \[\hat\beta_0=\bar{Y}-\hat\beta_1\bar X\]

In our case, since \(x_i=X_i-\bar{X}\), \(y_i=Y_i-\bar{Y}\), we have \(\hat\beta_1\):

\[\hat\beta_1=\frac{\sum_{i=1}^{n}(X_i-\bar{X})(Y_i-\bar{Y})}{\sum_{i=1}^{n}(X_i-\bar{X})^2}\]

\[=\frac{\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}x_i^2}=\frac{7625.9}{1248.9}=6.106093\]

To \(\hat\beta_0\), we need \(\bar X\) and \(\bar Y\):

\[\bar Y=\sum_{i=1}^{n}Y_i/n=17375/110=157.9545\]

\[\bar X=\sum_{i=1}^{n}X_i/n=7665.5/110=69.68636\]

\[\hat\beta_0=\bar{Y}-\hat\beta_1\bar X=157.95-6.11\times69.69= -267.8559\]

10.3) The regression R2 is 0.495. This means:

  1. Approximately 49.5% of the regression is good and 50.5% of the regression is bad.

  2. Approximately 49.5% of the observations are within one standard deviation of the population regression line.

  3. Approximately 49.5% of the observations lie on the population regression line.

  4. Approximately 49.5% of the variation in student weight is explained by height.

(\(R^2=\frac{regression~variation}{total~variation}\))