1 Part I: Exercises

1.1 W 7.3

Using the data in GPA2, the following equation was estimated: sat c = 1, 028.10 + 19.30hsize − 2.19hsize2 − 45.09female − 169.81black +62.31female · black n = 4, 137, R2 = .0858. The variable sat is the combined SAT score; hsize is size of the student’s high school graduating class, in hundreds; female is a gender dummy variable; and black is a race dummy variable equal to one for blacks, and zero otherwise.

(i) Is there strong evidence that hsize2 should be included in the model? From this equation,what is the optimal high school size?

From this equation we don’t have the information about SE to calculate the significance of the coeficientes so we can not be sure. However R square is very small so seems that the equation is not very good.

To calculate the optimal size we need to take the first derivative 19.30hsize − 2.19hsize2 the result is 4.4

(i) Holding hsize fixed, what is the estimated difference in SAT score between nonblack females and non-black males?

For that we just need to use white female (female =1, black = 0) − 45.09female +62.31female = 17.22. This is just a simple comparison between white male and female

(iii) What is the estimated difference in SAT score between non-black males and black males?

− 169.81black

(iv) What is the estimated difference in SAT score between black females and non-black females?

− 169.81black +62.31female · black = -107.5

1.2 W 7.5 In Example 7.2, let noP C be a dummy variable equal to one if the student does not own a PC, and zero otherwise.

(page 232) colGPA 5 b0 1 d0PC 1 b1hsGPA 1 b2ACT 1 u

colGPA 5 1.26 1 .157 PC 1 .447 hsGPA 1 .0087 ACT (.33) (.057) (.094) (.0105) n 5 141, R2 5 .219.

(i) If noP C is used in place of P C in equation (7.6), what happens to the intercept in the estiimated equation? What will be the coefficient on noP C?

###(Hint: Write P C = 1 − noP C and plug this into the equation colGP A  = βˆ 0 + ˆδ0P C + βˆ 1hsGP A + βˆ 2ACT.)

1.26+0.157 = 1.417

1.417 - 0.157noPC

(ii) What will happen to the R-squared if noP C is used in place of P C?

Shoiuld remain the same

(iii) Should PC and noPC both be included as independent variables in the model? Explain.

No because we will have multicollinearity problem

1.3 W 7.9 Let d be a dummy (binary) variable and let z be a quantitative variable. Consider the model y = β0 + δ0d + β1z + δ1d · z + u; this is a general version of a model with an interaction between a dummy variable and a quantitative variable. [An example is in equation (7.17).]

(i) Since it changes nothing important, set the error to zero, u = 0. Then, when d = 0 we can write the relationship between y and z as the function f0(z) = β0 + β1z. Write the same relationship when d = 1, where you should use f1(z) on the left-hand side to denote the linear function of z.

f1(z) = β0 + δ0d + β1z + δ1d.z

(ii) Assuming that δ1 ̸= 0 (which means the two lines are not parallel), show that the value of z∗such that f0(z∗) = fz(z∗) is z∗ = −δ0/δ1. This is the point at which the two lines intersect [as in Figure 7.2 (b)]. Argue that z∗ is positive if and only if δ0 and δ1 have opposite signs.

This is the equivalent of the first derivative, as we know that will maximize the function, in this case this will be the intersection between the lines. It has to have opposite sign so one can increase faster than the other.

(iii) Using the data in TWOYEAR, the following equation can be estimated: log(wage) = 2.289 − .357female + .50totcoll + .030female · totcoll n = 6, 763, R2 = .202, where all coefficients have been rounded to three decimal places. Using this equation, find the value of totcoll such that the predicted values of log(wage) are the same for men and women.

For that we need to take derivative in respect of Totcoll we will have -0.357 + 0.03Totcoll = 0

0.357/0.03 = 11.9

We can now test by pluggin this number back in the equation log  (wage) = 2.289 − .357female + .50totcoll + .030female · totcoll 2.289 − 0.357 + (0.5011.9) + (0.03011.9) = 8.239 for female 2.289 + (0.50*11.9) = 8.239 for male

(iv) Based on the equation in part (iii), can women realistically get enough years of college so that their earnings catch up to those of men? Explain.

No, because the amount of years is above the piratical, also it is very likely that age influence wage and woman would be to old at the time they get enough college years.

2 Part II: Empirical Problems

2.1 W 7.C8 Use the data in LOANAPP for this exercise. The binary variable to be explained is approve, which is equal to one if a mortgage loan to an individual was approved. The key explanatory variable is white, a dummy variable equal to one if the applicant was white. The other applicants in the data set are black and Hispanic. To test for discrimination in the mortgage loan market, a linear probability model can be used:

approve = β0 + β1white + otherfactors.

loan <- data.frame(loanapp)

(i) If there is discrimination against minorities, and the appropriate factors have been controlled for, what is the sign of β1?

The signial would be positive

(ii) Regress approve on white and report the results in the usual form. Interpret the coefficient on white.

fit.1 <- lm(approve ~ white, data = loan )
summary(fit.1)
## 
## Call:
## lm(formula = approve ~ white, data = loan)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90839  0.09161  0.09161  0.09161  0.29221 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.70779    0.01824   38.81   <2e-16 ***
## white        0.20060    0.01984   10.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3201 on 1987 degrees of freedom
## Multiple R-squared:  0.04893,    Adjusted R-squared:  0.04845 
## F-statistic: 102.2 on 1 and 1987 DF,  p-value: < 2.2e-16

According to this result if a person is white it has 0.2 more changes of being approved for a loan

(iii) As controls, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch, cosign, chist, pubrec, mortlat1, mortlat2, and vr. What happens to the coefficient on white? Is there still evidence of discrimination against nonwhites?

fit.2 <- lm(approve ~ white+ hrat+ obrat+ loanprc+ unem+ male+ married+ dep+ sch+ cosign+ chist+ pubrec+ mortlat1+ mortlat2+vr, data = loan )
summary(fit.2)
## 
## Call:
## lm(formula = approve ~ white + hrat + obrat + loanprc + unem + 
##     male + married + dep + sch + cosign + chist + pubrec + mortlat1 + 
##     mortlat2 + vr, data = loan)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06482  0.00781  0.06387  0.13673  0.71105 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.936731   0.052735  17.763  < 2e-16 ***
## white        0.128820   0.019732   6.529 8.44e-11 ***
## hrat         0.001833   0.001263   1.451   0.1469    
## obrat       -0.005432   0.001102  -4.930 8.92e-07 ***
## loanprc     -0.147300   0.037516  -3.926 8.92e-05 ***
## unem        -0.007299   0.003198  -2.282   0.0226 *  
## male        -0.004144   0.018864  -0.220   0.8261    
## married      0.045824   0.016308   2.810   0.0050 ** 
## dep         -0.006827   0.006701  -1.019   0.3084    
## sch          0.001753   0.016650   0.105   0.9162    
## cosign       0.009772   0.041139   0.238   0.8123    
## chist        0.133027   0.019263   6.906 6.72e-12 ***
## pubrec      -0.241927   0.028227  -8.571  < 2e-16 ***
## mortlat1    -0.057251   0.050012  -1.145   0.2525    
## mortlat2    -0.113723   0.066984  -1.698   0.0897 .  
## vr          -0.031441   0.014031  -2.241   0.0252 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3021 on 1955 degrees of freedom
##   (18 observations deleted due to missingness)
## Multiple R-squared:  0.1656, Adjusted R-squared:  0.1592 
## F-statistic: 25.86 on 15 and 1955 DF,  p-value: < 2.2e-16

The coefficient become smaller, but still positive and significant.

2.2 W 7.C10

Use the data in NBASAL for this exercise.

nba <- data.frame(nbasal)
?nbasal
## starting httpd help server ... done

(i) Estimate a linear regression model relating points per game to experience in the league and position (guard, forward, or center). Include experience in quadratic form and use centers as the base group. Report the results in the usual form.

fit.3 <- lm(points~ exper+expersq+ guard+ forward, data=nba)
summary(fit.3)
## 
## Call:
## lm(formula = points ~ exper + expersq + guard + forward, data = nba)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.220  -4.268  -1.003   3.444  22.265 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.76076    1.17862   4.039 7.03e-05 ***
## exper        1.28067    0.32853   3.898 0.000123 ***
## expersq     -0.07184    0.02407  -2.985 0.003106 ** 
## guard        2.31469    1.00036   2.314 0.021444 *  
## forward      1.54457    1.00226   1.541 0.124492    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.668 on 264 degrees of freedom
## Multiple R-squared:  0.09098,    Adjusted R-squared:  0.07721 
## F-statistic: 6.606 on 4 and 264 DF,  p-value: 4.426e-05

(ii) Why do you not include all three position dummy variables in part (i)?

No because we will have multicollinearity problem

(iii) Holding experience fixed, does a guard score more than a center? How much more?

guard = 2.31469 more

(iv) Now, add marital status to the equation. Holding position and experience fixed, are married players more productive (based on points per game)?

fit.4 <- lm(points~ exper+expersq+ guard+ forward+marr, data=nba)
summary(fit.4)
## 
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr, 
##     data = nba)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.874  -4.227  -1.251   3.631  22.412 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.70294    1.18174   3.980 8.93e-05 ***
## exper        1.23326    0.33421   3.690 0.000273 ***
## expersq     -0.07037    0.02416  -2.913 0.003892 ** 
## guard        2.28632    1.00172   2.282 0.023265 *  
## forward      1.54091    1.00298   1.536 0.125660    
## marr         0.58427    0.74040   0.789 0.430751    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.672 on 263 degrees of freedom
## Multiple R-squared:  0.09313,    Adjusted R-squared:  0.07588 
## F-statistic: 5.401 on 5 and 263 DF,  p-value: 9.526e-05

Not really because the coefficient is not statistically significant.

(v) Add interactions of marital status with both experience variables. In this expanded model, is there strong evidence that marital status affects points per game?

fit.4 <- lm(points~ exper+expersq+ guard+ forward+marr+I(marr*exper)+ I(marr*expersq), data=nba)
summary(fit.4)
## 
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr + 
##     I(marr * exper) + I(marr * expersq), data = nba)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.239  -4.328  -1.067   3.742  22.197 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        5.81615    1.34878   4.312 2.29e-05 ***
## exper              0.70255    0.43405   1.619   0.1067    
## expersq           -0.02950    0.03267  -0.903   0.3674    
## guard              2.25079    1.00002   2.251   0.0252 *  
## forward            1.62915    1.00199   1.626   0.1052    
## marr              -2.53750    2.03822  -1.245   0.2143    
## I(marr * exper)    1.27965    0.68229   1.876   0.0618 .  
## I(marr * expersq) -0.09359    0.04887  -1.915   0.0566 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.654 on 261 degrees of freedom
## Multiple R-squared:  0.1058, Adjusted R-squared:  0.08184 
## F-statistic: 4.413 on 7 and 261 DF,  p-value: 0.0001188

These are not significant coefficient, so no evidence for that

(vi) Estimate the model from part (iv) but use assists per game as the dependent variable. Are there any notable differences from part (iv)? Discuss.

fit.5 <- lm(assists~ exper+expersq+ guard+ forward+marr+I(marr*exper)+ I(marr*expersq), data=nba)
summary(fit.5)
## 
## Call:
## lm(formula = assists ~ exper + expersq + guard + forward + marr + 
##     I(marr * exper) + I(marr * expersq), data = nba)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2472 -1.1361 -0.2986  0.7388  8.3042 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.129347   0.406562  -0.318  0.75063    
## exper              0.368883   0.130834   2.819  0.00518 ** 
## expersq           -0.018658   0.009848  -1.895  0.05925 .  
## guard              2.499510   0.301436   8.292 5.95e-15 ***
## forward            0.448880   0.302028   1.486  0.13843    
## marr               0.081164   0.614377   0.132  0.89500    
## I(marr * exper)    0.163866   0.205662   0.797  0.42631    
## I(marr * expersq) -0.016172   0.014731  -1.098  0.27328    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.704 on 261 degrees of freedom
## Multiple R-squared:  0.3543, Adjusted R-squared:  0.3369 
## F-statistic: 20.45 on 7 and 261 DF,  p-value: < 2.2e-16

No the results remain not significant