This is the final version of the exam. I’ve added a few multiple choice questions to the end, but haven’t made any other edits.

Submit your answers through Blackboard (under Content).

Instructions

To facilitate an examination that is fair, effective, and convenient for all of us I am going to lay out some ground rules that can be boiled down to a single sentence: This is an individual, independent exam. That means:

Additionally:

Some of the regression results below are in Stata format. The most important information is usually the table at the bottom and the details in the top right. The dependent variable is in the top left cell of the bottom table (for question 1 it’s gas). The table has a column labeled Coef. which is what R’s output would label Estimate. In every other way the results are exactly the same.

Questions:

  1. The results below are based on data on gas price and consumption for a single country over 128 months. We are interested in how the price of gas, per capita disposable income, and miles per gallon affect per capita expenditure on gas. The regression equation is \[gas_t = \beta_0 + \beta_1 price_t + \beta_2 income_t + \beta_0 miles_t + u_t\] where \(gas_t\) is the per capita real expenditure on gasoline at time t; \(price_t\) is the real price of gasoline at time t; \(income_t\) is the per capita real disposable income (in thousands of dollars) at time t; and \(miles_t\) is average miles per gallon of cars at time t.
reg gas price income miles

      Source |       SS       df       MS              Number of obs =     128
-------------+------------------------------           F(  3,   124) = 1476.85
       Model |  1.78454603     3  .594848676           Prob > F      =  0.0000
    Residual |  .049944845   124  .000402781           R-squared     =  0.9728
-------------+------------------------------           Adj R-squared =  0.9721
       Total |  1.83449087   127   .01444481           Root MSE      =  .02007

------------------------------------------------------------------------------
         gas |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |  -.1385602   .0109847   -12.61   0.000    -.1603019   -.1168185
      income |   .9985463   .0154034    64.83   0.000     .9680586    1.029034
       miles |  -.5181275   .0173898   -29.79   0.000    -.5525468   -.4837082
       _cons |  -1.514543   .1171849   -12.92   0.000    -1.746485   -1.282601
  1. Interpret the coefficient on price, including statistical significance.
  2. Do you think the errors will be correlated? If so, what is the consequence? If not, why not?
  3. Notice that there is no variable for church attendance. Discuss the likely consequence of omitting this variable for the coefficient on price.
  4. Suppose we doubled the sample size (by getting a sample size of twice as many months). What are the consequences for (i) coefficient estimates and (ii) precision of coefficient estimates?

  1. Suppose that we estimated the following equation: \[participation = \beta_0 + \beta_1 education + \beta_2 male + \beta_3 male * education\] where participation is an index of political participation. The plot below represents the fitted values.
  1. Is \(\beta_1\) greater than, less than, or equal to 0? [3 points]
  2. Is \(\beta_2\) greater than, less than, or equal to 0? [3 points]
  3. Is \(\beta_3\) greater than, less than, or equal to 0? [3 points]

  1. The following results are based on life expectancy data for 33 countries. Life expectancy is measured in years. Expenditures are overall health expenditures. We also have data on GDP per capita (measured in thousands of US 2010 dollars) and variables indicating region (EastEur = 1 for countries from Eastern Europe, NorthAm =1 for countries from North America,etc).
  1. The following results are from a dummy variable model where the dependent variable is life expectancy. (i) What is the life expectancy for people in Eastern Europe? (ii) What is the life expectancy for people in the rest of the world? (iii) Is the difference statistically significant (explain briefly)? (iv… this one is deceptively easy…) Explain why this model is testing a difference in means.
reg lifeexpectancy EastEur

      Source |       SS       df       MS              Number of obs =      33
-------------+------------------------------           F(  1,    31) =   13.49
       Model |  61.4479564     1  61.4479564           Prob > F      =  0.0009
    Residual |  141.197516    31  4.55475857           R-squared     =  0.3032
-------------+------------------------------           Adj R-squared =  0.2808
       Total |  202.645472    32    6.332671           Root MSE      =  2.1342

------------------------------------------------------------------------------
lifeexpect~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     EastEur |  -3.337913   .9087699    -3.67   0.001    -5.191361   -1.484464
       _cons |   80.48077   .4185487   192.29   0.000     79.62713     81.3344
  1. We get the following results when we add a control variable for GDP per capita. Sketch life expectancy as a function of GDP per capita for Eastern European and non-Eastern European countries. Be sure to include every coefficient in your sketch. The sketch does not have to be to scale, but you need to have correct signs of slopes/intercepts and you need to indicate what each coefficient indicates on the two lines.
------------------------------------------------------------------------------
lifeexpect~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     EastEur |  -2.283396   .8911585    -2.56   0.016    -4.103384   -.4634071
       GDPPC |   .0504931   .0172679     2.92   0.007     .0152275    .0857588
       _cons |   78.40963   .8015952    97.82   0.000     76.77256    80.04671
  1. We next added an interaction between Eastern Europe and GDP per capita (a variable named EastEurGPDPC). What is the estimated effect of GDP per capita on life expectancy in Eastern Europe?
------------------------------------------------------------------------------
lifeexpect~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     EastEur |  -4.772011   1.740877    -2.74   0.010    -8.332504   -1.211518
       GDPPC |   .0434839    .017324     2.51   0.018     .0080524    .0789154
EastEurGDPPC |   .1163334   .0705772     1.65   0.110    -.0280132    .2606799
       _cons |   78.69714   .7988712    98.51   0.000     77.06326    80.33101
  1. Is the effect of GDP per capita on life expectancy statistically significantly different in Eastern Europe compared to non Eastern European countries? Use a significance level of 0.05. Be specific in explaining your answer.

  2. Suppose the errors in the above model are heteroskedastic. What is the consequence? Be specific about what elements of the output are affected and how.

  3. Suppose that someone notes that obesity rates are not included in the model. What two conditions need to be true for this to be a problem? Discuss whether they will likely be true in this model.

  4. We next report a model with life expectancy as a function of an Eastern Europe dummy, GDP per capita and (health) expenditures. One might expect that GDP per capita and health expenditures are correlated. If so which, if any, results are called into question. If not, why not?

reg lifeexpectancy EastEur GDPPC expenditures

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  3,    28) =    9.46
       Model |  86.4403967     3  28.8134656           Prob > F      =  0.0002
    Residual |  85.3184104    28  3.04708609           R-squared     =  0.5033
-------------+------------------------------           Adj R-squared =  0.4500
       Total |  171.758807    31  5.54060668           Root MSE      =  1.7456

------------------------------------------------------------------------------
lifeexpect~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     EastEur |  -2.666673   .8661047    -3.08   0.005    -4.440808   -.8925377
       GDPPC |   .0362486   .0166744     2.17   0.038     .0020927    .0704046
expenditures |   .0596671   .1583984     0.38   0.709    -.2647973    .3841314
       _cons |   78.60578   1.643044    47.84   0.000     75.24016     81.9714
  1. Here we report results from GDP per capita as a function of Europe dummy and (health) expenditures. What is the effect of multicollinearity on the variance of the coefficient on GDP per capita in the previous output? Provide a specific value and explain it.
reg GDPPC EastEur expenditures if lifeexpectancy !=.

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  2,    29) =    3.97
       Model |  3003.31835     2  1501.65918           Prob > F      =  0.0298
    Residual |  10959.3335    29  377.908051           R-squared     =  0.2000
-------------+------------------------------           Adj R-squared =  0.1900
       Total |  13962.6518    31  450.408123           Root MSE      =   19.44

------------------------------------------------------------------------------
       GDPPC |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     EastEur |  -18.96621   8.979425    -2.11   0.043    -37.33119   -.6012208
expenditures |   1.616321    1.73829     0.93   0.360     -1.93888    5.171523
       _cons |   26.26179   17.63602     1.49   0.147    -9.807908     62.3315
------------------------------------------------------------------------------
  1. Here is a model of life expectancy with controls for expenditures and GDP per capita. Dummy variables for all regions except North America are included. Interpret the coefficient on Western Europe.
reg lifeexpectancy expenditures  GDPPC  EastEur LatinAmerica Asia WestEur MidEast

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  7,    24) =   11.21
       Model |  131.521281     7  18.7887545           Prob > F      =  0.0000
    Residual |  40.2375257    24  1.67656357           R-squared     =  0.7657
-------------+------------------------------           Adj R-squared =  0.6974
       Total |  171.758807    31  5.54060668           Root MSE      =  1.2948

------------------------------------------------------------------------------
lifeexpect~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
expenditures |   .2892163   .1295467     2.23   0.035     .0218452    .5565875
       GDPPC |   .0229366   .0136133     1.68   0.105    -.0051598     .051033
     EastEur |   2.079226   1.159876     1.79   0.086    -.3146406    4.473093
LatinAmerica |   4.265266   1.695908     2.52   0.019     .7650835    7.765448
        Asia |   5.820845   1.195759     4.87   0.000      3.35292     8.28877
     WestEur |   4.749333   1.039352     4.57   0.000     2.604216     6.89445
     MidEast |   6.514199   1.679054     3.88   0.001     3.048802    9.979597
       _cons |   72.30463   1.799617    40.18   0.000      68.5904    76.01885
  1. Suppose we had instead excluded Middle East (and included North America). What would be the (approximate) value of the coefficient on the Eastern Europe dummy variable? Explain how you got the result.

  1. If we are estimating the model \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u\), and run White’s test for heteroskedasticity, we are testing several hypotheses including “the squared residuals are not a linear function of \(x_1\).” What are the other hypotheses being tested?

  1. To decide whether or not the slope coefficient is large or small
  1. you should analyze the economic importance of a given increase in the X variable.
  2. the slope coefficient must be larger than one.
  3. the probability of finding the slope should be small relative to a null hypothesis of zero slope.
  4. you should change the scale of the X variable if the cofficient appears to small.

  1. Which of the following models would be interpreted as “a 1% increase in x is associated with a 1% increase in y”
  1. \(log(y) = \beta_0 + \beta_1 log(x) + u_i\)
  2. \(log(y) = \beta_0 + \beta_1 x + u_i\)
  3. \(y = \beta_0 + \beta_1 log(x) + u_i\)
  4. \(y = \beta_0 + \beta_1 x + u_i\)
  5. none of the above

  1. Which of the following is the correct interpretation of the interaction term in this model: \(y_i = \beta_0 + \beta_1 log(x_i) + \beta_2 dummy_i + \beta_3 [log(x_i)*dummy_i] + u_i\)
  1. an increase of \(x_i\) by 1% is associated with a \(\beta_3\) change in \(y_i\) for observations where \(dummy_i=1\)
  2. an increase of \(x_i\) by 1 unit is associated with a \(\beta_3\) change in \(y_i\) for observations where \(dummy_i=1\)
  3. the effect of \(x_i\) is \(\beta_3\) greater when \(dummy_i=1\).
  4. When \(dummy_i=1\) we expect \(y_i\) to be greater by \(\beta_3\).

  1. Which of the following is not a problem with \(R^2\)?
  1. Adding irrelavent variables can increase \(R^2\).
  2. \(R^2\) doesn’t work for models with dummy variables.
  3. Choosing a model to maximize \(R^2\) can lead to over-fitting.
  4. You can’t compare the \(R^2\) of a model that includes an intercept term with one that does include an intercept term.