Assignment_DummyVariable

C2

Use the data in \(WAGE2\) for this exercise.

(i)

Estimate the model \[ log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure + \beta_4 married + \beta_5 black + \beta_6 south + \beta_7 urban + u \] and report the results in the usual form. Holding other factors fixed, what is the approximate difference in monthly salary between blacks and nonblacks? Is this difference statistically significant?

# black = 1
wage.i <- lm(log(wage) ~ educ + exper + tenure + married + black+ south + urban, 
   data = wage2)
summary(wage.i)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98069 -0.21996  0.00707  0.24288  1.22822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
## educ         0.065431   0.006250  10.468  < 2e-16 ***
## exper        0.014043   0.003185   4.409 1.16e-05 ***
## tenure       0.011747   0.002453   4.789 1.95e-06 ***
## married      0.199417   0.039050   5.107 3.98e-07 ***
## black       -0.188350   0.037667  -5.000 6.84e-07 ***
## south       -0.090904   0.026249  -3.463 0.000558 ***
## urban        0.183912   0.026958   6.822 1.62e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared:  0.2526, Adjusted R-squared:  0.2469 
## F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16

\(\beta_5black\) = -0.188350, the wage for black people is in average 18.8% lower than the white
0.18835/0.37667 = 0.5, reject H0 , the difference is statistically significant

(ii)

Add the variables \(exper^2\) and \(tenure^2\) to the equation and show that they are jointly insignificant at even the 20% level.

wage.ii <- lm(log(wage) ~ educ + exper + tenure + married + black+ south + 
                urban + I(exper^2) + I(tenure^2), data = wage2)
MyH0 <- c("I(exper^2)", "I(tenure^2)")
linearHypothesis(wage.ii, MyH0)

## Linear hypothesis test
## 
## Hypothesis:
## I(exper^2) = 0
## I(tenure^2) = 0
## 
## Model 1: restricted model
## Model 2: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban + I(exper^2) + I(tenure^2)
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    927 123.82                           
## 2    925 123.42  2   0.39756 1.4898  0.226

(iii)

Using this Extend the original model to allow the return to education to depend on race.

wage.iii <- lm(log(wage) ~ educ + exper + tenure + married + black+ south + 
                 urban + black*educ, data = wage2)
summary(wage.iii)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban + black * educ, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97782 -0.21832  0.00475  0.24136  1.23226 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.374817   0.114703  46.859  < 2e-16 ***
## educ         0.067115   0.006428  10.442  < 2e-16 ***
## exper        0.013826   0.003191   4.333 1.63e-05 ***
## tenure       0.011787   0.002453   4.805 1.80e-06 ***
## married      0.198908   0.039047   5.094 4.25e-07 ***
## black        0.094809   0.255399   0.371 0.710561    
## south       -0.089450   0.026277  -3.404 0.000692 ***
## urban        0.183852   0.026955   6.821 1.63e-11 ***
## educ:black  -0.022624   0.020183  -1.121 0.262603    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3654 on 926 degrees of freedom
## Multiple R-squared:  0.2536, Adjusted R-squared:  0.2471 
## F-statistic: 39.32 on 8 and 926 DF,  p-value: < 2.2e-16

the return of another one year for the black is 2.3% lower than nonblack people
|-0.022624/0.020183| = 1.12 not enough to to reject H0, so education depends on race

(iv)

Again, start with the original model, but now allow wages to differ across four groups of people: married and black, married and nonblack, single and black, and single and nonblack. What is the estimated wage differential between married blacks and married nonblacks?

# base group is single nonblack
marblk = ifelse(wage2$married ==1 & wage2$black ==1, 1, 0)
singblk = ifelse(wage2$married ==0 & wage2$black ==1, 1, 0)
marnonblk = ifelse(wage2$married ==1 & wage2$black ==0, 1, 0)

wage2_new <- cbind(wage2, marblk, singblk, marnonblk)

wage.iv <- lm(log(wage) ~ educ + exper + tenure + south + 
                urban + marblk + singblk + marnonblk, data = wage2_new)
summary(wage.iv)

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + south + urban + 
##     marblk + singblk + marnonblk, data = wage2_new)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98013 -0.21780  0.01057  0.24219  1.22889 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.403793   0.114122  47.351  < 2e-16 ***
## educ         0.065475   0.006253  10.471  < 2e-16 ***
## exper        0.014146   0.003191   4.433 1.04e-05 ***
## tenure       0.011663   0.002458   4.745 2.41e-06 ***
## south       -0.091989   0.026321  -3.495 0.000497 ***
## urban        0.184350   0.026978   6.833 1.50e-11 ***
## marblk       0.009448   0.056013   0.169 0.866083    
## singblk     -0.240820   0.096023  -2.508 0.012314 *  
## marnonblk    0.188915   0.042878   4.406 1.18e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3656 on 926 degrees of freedom
## Multiple R-squared:  0.2528, Adjusted R-squared:  0.2464 
## F-statistic: 39.17 on 8 and 926 DF,  p-value: < 2.2e-16

married nonblack = 5.4 + 0.19 = 5.59, married black = 5.4 + 0.01 = 5.41 the difference between single and married black people is 5.59-5.41 = 0.18, that is 18%.

C3

A model that allows major league baseball player salary to differ by position is \[ log(salary) = \beta_0 + \beta_1years + \beta_2gamesyr + \beta_3bavg + \beta_4hrunsyr +\\ \beta_5rbisyr + \beta_6runsyr + \beta_7fldperc + \beta_8allstar + \beta_9frstbase + \beta_{10}scndbase + \\ \beta_{11}thrdbase + \beta_{12}shrtstop + \beta_{13}catcher + u \]

where outfield is the base group.

(i)

State the null hypothesis that, controlling for other factors, catchers and outfielders earn, on average, the same amount. Test this hypothesis using the data in MLB1 and comment on the size of the estimated salary differential.

mlb.i <- lm(log(salary) ~ years + gamesyr + bavg + hrunsyr + 
                  rbisyr + runsyr + fldperc + allstar + 
                  frstbase + scndbase + thrdbase + shrtstop +
                  catcher, data = mlb1)
summary(mlb.i)

## 
## Call:
## lm(formula = log(salary) ~ years + gamesyr + bavg + hrunsyr + 
##     rbisyr + runsyr + fldperc + allstar + frstbase + scndbase + 
##     thrdbase + shrtstop + catcher, data = mlb1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.42088 -0.42665 -0.03092  0.47925  2.74975 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.1295537  2.3044544   4.830 2.07e-06 ***
## years        0.0584178  0.0122732   4.760 2.87e-06 ***
## gamesyr      0.0097670  0.0033776   2.892  0.00408 ** 
## bavg         0.0004814  0.0011411   0.422  0.67340    
## hrunsyr      0.0191459  0.0159638   1.199  0.23124    
## rbisyr       0.0017875  0.0074755   0.239  0.81116    
## runsyr       0.0118707  0.0045264   2.623  0.00912 ** 
## fldperc      0.0002833  0.0023078   0.123  0.90239    
## allstar      0.0063351  0.0028828   2.198  0.02866 *  
## frstbase    -0.1328009  0.1309243  -1.014  0.31115    
## scndbase    -0.1611010  0.1414296  -1.139  0.25547    
## thrdbase     0.0145271  0.1430352   0.102  0.91916    
## shrtstop    -0.0605672  0.1302031  -0.465  0.64210    
## catcher      0.2535592  0.1313128   1.931  0.05432 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7092 on 339 degrees of freedom
## Multiple R-squared:  0.6535, Adjusted R-squared:  0.6403 
## F-statistic: 49.19 on 13 and 339 DF,  p-value: < 2.2e-16

the salary of catcher is 25.4% higher than outfielder, which is quite huge 0.2535592/0.1313128 = 1.93 > 1.645 (2-tailed α = 0.1), reject H0, but when α = 0.05, fail to reject H0

(ii)

State and test the null hypothesis that there is no difference in average salary across positions, once other factors have been controlled for.

MyH0 <- c("frstbase", "scndbase", "thrdbase", "shrtstop", "catcher")
linearHypothesis(mlb.i, MyH0)

## Linear hypothesis test
## 
## Hypothesis:
## frstbase = 0
## scndbase = 0
## thrdbase = 0
## shrtstop = 0
## catcher = 0
## 
## Model 1: restricted model
## Model 2: log(salary) ~ years + gamesyr + bavg + hrunsyr + rbisyr + runsyr + 
##     fldperc + allstar + frstbase + scndbase + thrdbase + shrtstop + 
##     catcher
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    344 174.99                           
## 2    339 170.52  5    4.4703 1.7774 0.1168

q=5, (n-k) = 353-14 = 339 F statistic = 1.78 < F(5,339) = 1.85, fail to reject H0 at 10% significance level Therefore, there is difference in average salary across positions.

(iii)

Are the results from parts (i) and (ii) consistent? If not, explain what is happening.

Nearly consistent.
The evidence against H0 in (ii) is weak because of ‘catcher’, which is statistically significant.

C10

Use the data in \(NBASAL\) for this exercise.

(i)

Estimate a linear regression model relating points per game to experience in the league and position (guard, forward, or center). Include experience in quadratic form and use centers as the base group. Report the results in the usual form.

nbasal.i <- lm(points ~ exper + I(exper^2) + guard + forward , data = nbasal) 
summary(nbasal.i)

## 
## Call:
## lm(formula = points ~ exper + I(exper^2) + guard + forward, data = nbasal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.220  -4.268  -1.003   3.444  22.265 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.76076    1.17862   4.039 7.03e-05 ***
## exper        1.28067    0.32853   3.898 0.000123 ***
## I(exper^2)  -0.07184    0.02407  -2.985 0.003106 ** 
## guard        2.31469    1.00036   2.314 0.021444 *  
## forward      1.54457    1.00226   1.541 0.124492    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.668 on 264 degrees of freedom
## Multiple R-squared:  0.09098,    Adjusted R-squared:  0.07721 
## F-statistic: 6.606 on 4 and 264 DF,  p-value: 4.426e-05

\[ points = \beta_0 + \beta_1exper + \beta_2exper^2 + \beta_3guard + \beta_4forward + u \]

(ii)

Why do you not include all three position dummy variables in part (i)?

# nbasal.ii <- lm(points ~ exper + I(exper^2) + guard + forward + center, data = nbasal)
# summary(nbasal.ii)

Dummy variable trap. There is perfect collinearity. Each player is one of the three position, so the intercept is center

(iii)

Holding experience fixed, does a guard score more than a center? How much more? Is the difference statistically significant?

A guard score on average 2.3 points more than a center does.
t = 2.31469/1.00036 = 2.31 < 1.645 (2-tailed α = 0.1), the difference is statistically significant level

(iv)

Now, add marital status to the equation. Holding position and experience fixed, are married players more productive (based on points per game)?

nbasal.iv <- lm(points ~ exper + I(exper^2) + guard + forward + marr, data = nbasal) 
summary(nbasal.iv)

## 
## Call:
## lm(formula = points ~ exper + I(exper^2) + guard + forward + 
##     marr, data = nbasal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.874  -4.227  -1.251   3.631  22.412 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.70294    1.18174   3.980 8.93e-05 ***
## exper        1.23326    0.33421   3.690 0.000273 ***
## I(exper^2)  -0.07037    0.02416  -2.913 0.003892 ** 
## guard        2.28632    1.00172   2.282 0.023265 *  
## forward      1.54091    1.00298   1.536 0.125660    
## marr         0.58427    0.74040   0.789 0.430751    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.672 on 263 degrees of freedom
## Multiple R-squared:  0.09313,    Adjusted R-squared:  0.07588 
## F-statistic: 5.401 on 5 and 263 DF,  p-value: 9.526e-05

married player is more productive by 0.58 point
t = 0.58427/0.74040 = 0.79 < 1.645 (2-tailed α = 0.1), fail to reject H0, the difference between married and single player is not different from 0

(v)

Add interactions of marital status with both experience variables. In this expanded model, is there strong evidence that marital status affects points per game?

nbasal.v <- lm(points ~ exper + I(exper^2) + guard + forward + marr + 
                 marr*exper + marr*I(exper^2), data = nbasal) 
summary(nbasal.v)

## 
## Call:
## lm(formula = points ~ exper + I(exper^2) + guard + forward + 
##     marr + marr * exper + marr * I(exper^2), data = nbasal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.239  -4.328  -1.067   3.742  22.197 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      5.81615    1.34878   4.312 2.29e-05 ***
## exper            0.70255    0.43405   1.619   0.1067    
## I(exper^2)      -0.02950    0.03267  -0.903   0.3674    
## guard            2.25079    1.00002   2.251   0.0252 *  
## forward          1.62915    1.00199   1.626   0.1052    
## marr            -2.53750    2.03822  -1.245   0.2143    
## exper:marr       1.27965    0.68229   1.876   0.0618 .  
## I(exper^2):marr -0.09359    0.04887  -1.915   0.0566 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.654 on 261 degrees of freedom
## Multiple R-squared:  0.1058, Adjusted R-squared:  0.08184 
## F-statistic: 4.413 on 7 and 261 DF,  p-value: 0.0001188

# restricted: marr, marr*exper, marr*I(exper^2)
# joint test
MyH0 <- c("marr", "exper:marr", "I(exper^2):marr")
linearHypothesis(nbasal.v, MyH0)

## Linear hypothesis test
## 
## Hypothesis:
## marr = 0
## exper:marr = 0
## I(exper^2):marr = 0
## 
## Model 1: restricted model
## Model 2: points ~ exper + I(exper^2) + guard + forward + marr + marr * 
##     exper + marr * I(exper^2)
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    264 8482.3                           
## 2    261 8343.8  3     138.5 1.4442 0.2303

q=3, (n-k) = (269-8) = 261
F statistic = 1.442 < F(3, 261) = 2.08
fail to rej H0, the marital status is not significantly related to the points

(vi)

Estimate the model from part (iv) but use assists per game as the dependent variable. Are there any notable differences from part (iv)? Discuss.

nbasal.vi <- lm(assists ~ exper + I(exper^2) + guard + forward + marr, data = nbasal) 
summary(nbasal.vi)

## 
## Call:
## lm(formula = assists ~ exper + I(exper^2) + guard + forward + 
##     marr, data = nbasal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3127 -1.0780 -0.3157  0.6788  8.2488 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.225809   0.354904  -0.636  0.52516    
## exper        0.443603   0.100372   4.420 1.45e-05 ***
## I(exper^2)  -0.026726   0.007256  -3.683  0.00028 ***
## guard        2.491672   0.300842   8.282 6.19e-15 ***
## forward      0.447471   0.301220   1.486  0.13860    
## marr         0.321899   0.222359   1.448  0.14891    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.704 on 263 degrees of freedom
## Multiple R-squared:  0.3499, Adjusted R-squared:  0.3375 
## F-statistic: 28.31 on 5 and 263 DF,  p-value: < 2.2e-16

stargazer(nbasal.iv, nbasal.vi, type = "text")

## 
## ===========================================================
##                                    Dependent variable:     
##                                ----------------------------
##                                    points        assists   
##                                     (1)            (2)     
## -----------------------------------------------------------
## exper                             1.233***      0.444***   
##                                   (0.334)        (0.100)   
##                                                            
## I(exper2)                        -0.070***      -0.027***  
##                                   (0.024)        (0.007)   
##                                                            
## guard                             2.286**       2.492***   
##                                   (1.002)        (0.301)   
##                                                            
## forward                            1.541          0.447    
##                                   (1.003)        (0.301)   
##                                                            
## marr                               0.584          0.322    
##                                   (0.740)        (0.222)   
##                                                            
## Constant                          4.703***       -0.226    
##                                   (1.182)        (0.355)   
##                                                            
## -----------------------------------------------------------
## Observations                        269            269     
## R2                                 0.093          0.350    
## Adjusted R2                        0.076          0.338    
## Residual Std. Error (df = 263)     5.672          1.704    
## F Statistic (df = 5; 263)         5.401***      28.308***  
## ===========================================================
## Note:                           *p<0.1; **p<0.05; ***p<0.01

married player is more productive by 0.322 assists per game
t = 0.321899/0.222359 = 1.447654 fail to reject H0, but the difference is stronger

C12

Use the data set in \(BEAUTY\), which contains a subset of the variables (but more usable observations than in the regressions) reported by Hamermesh and Biddle (1994).

(i)

Find the separate fractions of men and women that are classified as having above average looks. Are more people rated as having above average or below average looks?

a <- beauty %>% 
  group_by(female) %>% 
  summarize(total = n()) %>% 
  ungroup()

## `summarise()` ungrouping output (override with `.groups` argument)

b <- beauty %>% 
  group_by(female) %>% 
  filter(abvavg == 1) %>% 
  count(abvavg) %>% 
  ungroup()

c <- cbind(b, a[,2])

c %>% mutate(per = n/total) %>% rename(gender = female)

##   gender abvavg   n total       per
## 1      0      1 239   824 0.2900485
## 2      1      1 144   436 0.3302752

the fraction of female with above average looks is 33% and the fraction of male is 29%

(ii)

Test the null hypothesis that the population fractions of above-average-looking women and men are the same. Report the one-sided p-value that the fraction is higher for women. (Hint: Estimating a simple linear probability model is easiest.)

beauty.ii <- lm(abvavg ~ female, data = beauty)
summary(beauty.ii)

## 
## Call:
## lm(formula = abvavg ~ female, data = beauty)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3303 -0.2900 -0.2900  0.6697  0.7099 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.29005    0.01602  18.102   <2e-16 ***
## female       0.04023    0.02724   1.477     0.14    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4599 on 1258 degrees of freedom
## Multiple R-squared:  0.001731,   Adjusted R-squared:  0.0009373 
## F-statistic: 2.181 on 1 and 1258 DF,  p-value: 0.14

t = 1.48, when 1-tailed alpha = 0.1, it is significant, but when alpha = 0.05, it is not significant
So, the difference between female and men is not statistically significant

(iii)

Now estimate the model for the

\[ log(wage) = \beta_0 + \beta_1belavg + \beta_2abvavg + u \] separately for men and women, and report the results in the usual form. In both cases, interpret the coefficient on belavg. Explain in words what the hypothesis H0: \(\beta_1=0\) against H1: \(\beta_1<0\) means, and find the p-values for men and women

beauty_female <- beauty[beauty$female==1,]
beauty_male <- beauty[beauty$female==0,]

# female
beauty.iii.f <- lm(log(wage) ~ belavg + abvavg, data = beauty_female)
# summary(beauty.iii.f)
# male
beauty.iii.m <- lm(log(wage) ~ belavg + abvavg, data = beauty_male)
# summary(beauty.iii.m)

stargazer(beauty.iii.f, beauty.iii.m, type = "text")

## 
## ==============================================================
##                                Dependent variable:            
##                     ------------------------------------------
##                                     log(wage)                 
##                             (1)                  (2)          
## --------------------------------------------------------------
## belavg                    -0.138*             -0.199***       
##                           (0.076)              (0.060)        
##                                                               
## abvavg                     0.034                -0.044        
##                           (0.055)              (0.042)        
##                                                               
## Constant                 1.309***              1.884***       
##                           (0.034)              (0.024)        
##                                                               
## --------------------------------------------------------------
## Observations                436                  824          
## R2                         0.010                0.013         
## Adjusted R2                0.006                0.011         
## Residual Std. Error  0.523 (df = 433)      0.537 (df = 821)   
## F Statistic         2.297 (df = 2; 433) 5.529*** (df = 2; 821)
## ==============================================================
## Note:                              *p<0.1; **p<0.05; ***p<0.01

female \[ log(wage) = 1.31−0.14belavg+0.034abvavg \] male \[ log(wage) = 1.89−0.2belavg −0.04abvavg \] Coefficient on belavg: Male below average look earns 20% fewer than an average look male, and a woman below average look earns 14% fewer than an average look woman.

H0: \(\beta_0=0\) means people below average look earn on average the same as people with average looks. H1: \(\beta_1<0\) means people below average look earn on average less than people with average looks.

1-tailed p-values for women and men are 0.0358 and 0.00048 respectively. We reject H0 more strongly for men because it is more significant and the sampling variation is lower (se of male=0.06 < se of female=0.076).

(iv)

Is there convincing evidence that women with above average looks earn more than women with average looks? Explain.

the above average look female have 3.4% higher wage than the average female, but it is not statistically significant because the p-value of is 0.607

(v)

For both men and women, add the explanatory variables \(educ\), \(exper\), \(exper^2\), \(union\), \(goodhlth\), \(black\), \(married\), \(south\), \(bigcity\), \(smllcity\), and \(service\). Do the effects of the “looks” variables change in important ways?

# female
beauty.v.f <- lm(log(wage) ~ belavg + abvavg + educ + exper + I(exper^2) + 
                   union + goodhlth + black + married + south + bigcity + 
                   smllcity + service, data = beauty_female)
# male
beauty.v.m <- lm(log(wage) ~ belavg + abvavg + educ + exper + I(exper^2) + 
                   union + goodhlth + black + married + south + bigcity + 
                   smllcity + service, data = beauty_male)

stargazer(beauty.iii.f, beauty.v.f, type = "text", title = "Female Comparision")

## 
## Female Comparision
## ================================================================
##                                 Dependent variable:             
##                     --------------------------------------------
##                                      log(wage)                  
##                             (1)                   (2)           
## ----------------------------------------------------------------
## belavg                    -0.138*               -0.115*         
##                           (0.076)               (0.066)         
##                                                                 
## abvavg                     0.034                 0.058          
##                           (0.055)               (0.049)         
##                                                                 
## educ                                            0.077***        
##                                                 (0.010)         
##                                                                 
## exper                                           0.030***        
##                                                 (0.007)         
##                                                                 
## I(exper2)                                      -0.001***        
##                                                 (0.0002)        
##                                                                 
## union                                           0.284***        
##                                                 (0.053)         
##                                                                 
## goodhlth                                         0.128          
##                                                 (0.081)         
##                                                                 
## black                                            0.106          
##                                                 (0.070)         
##                                                                 
## married                                          -0.055         
##                                                 (0.044)         
##                                                                 
## south                                            -0.004         
##                                                 (0.060)         
##                                                                 
## bigcity                                         0.172***        
##                                                 (0.064)         
##                                                                 
## smllcity                                         0.013          
##                                                 (0.050)         
##                                                                 
## service                                         -0.091*         
##                                                 (0.047)         
##                                                                 
## Constant                 1.309***                -0.103         
##                           (0.034)               (0.147)         
##                                                                 
## ----------------------------------------------------------------
## Observations                436                   436           
## R2                         0.010                 0.300          
## Adjusted R2                0.006                 0.279          
## Residual Std. Error  0.523 (df = 433)       0.445 (df = 422)    
## F Statistic         2.297 (df = 2; 433) 13.932*** (df = 13; 422)
## ================================================================
## Note:                                *p<0.1; **p<0.05; ***p<0.01

stargazer(beauty.iii.m, beauty.v.m, type = "text", title = "Male Comparision")

## 
## Male Comparision
## ===================================================================
##                                   Dependent variable:              
##                     -----------------------------------------------
##                                        log(wage)                   
##                              (1)                     (2)           
## -------------------------------------------------------------------
## belavg                    -0.199***               -0.143***        
##                            (0.060)                 (0.051)         
##                                                                    
## abvavg                      -0.044                  -0.001         
##                            (0.042)                 (0.037)         
##                                                                    
## educ                                               0.060***        
##                                                    (0.007)         
##                                                                    
## exper                                              0.049***        
##                                                    (0.006)         
##                                                                    
## I(exper2)                                         -0.001***        
##                                                    (0.0001)        
##                                                                    
## union                                              0.109***        
##                                                    (0.035)         
##                                                                    
## goodhlth                                            0.001          
##                                                    (0.068)         
##                                                                    
## black                                             -0.277***        
##                                                    (0.073)         
##                                                                    
## married                                             0.082*         
##                                                    (0.043)         
##                                                                    
## south                                              0.104**         
##                                                    (0.042)         
##                                                                    
## bigcity                                            0.273***        
##                                                    (0.045)         
##                                                                    
## smllcity                                           0.135***        
##                                                    (0.037)         
##                                                                    
## service                                           -0.209***        
##                                                    (0.043)         
##                                                                    
## Constant                   1.884***                0.358***        
##                            (0.024)                 (0.119)         
##                                                                    
## -------------------------------------------------------------------
## Observations                 824                     824           
## R2                          0.013                   0.308          
## Adjusted R2                 0.011                   0.297          
## Residual Std. Error    0.537 (df = 821)        0.453 (df = 810)    
## F Statistic         5.529*** (df = 2; 821) 27.791*** (df = 13; 810)
## ===================================================================
## Note:                                   *p<0.1; **p<0.05; ***p<0.01

The significance for belavg and abvavg does not change a lot. The belavg and abvavg for both female and male move closer as we add variables.

(vi)

Use the SSR form of the Chow F statistic to test whether the slopes of the regression functions in part (v) differ across men and women. Be sure to allow for an intercept shift under the null.

# population: add female variable
beauty.vi <- lm(log(wage) ~ belavg + abvavg + educ + exper + I(exper^2) + 
                   union + goodhlth + black + married + south + bigcity + 
                   smllcity + service + female, data = beauty)

# calculate Sum of Squared Residuals
SSRp <- sum(residuals(beauty.vi)^2)
SSRf <- sum(residuals(beauty.v.f)^2)
SSRm <- sum(residuals(beauty.v.m)^2)

k <- 15 # number of parameter
n <- nrow(beauty) # number of obs.

# Chow test
( (SSRp - (SSRf + SSRm))/(SSRf + SSRm) ) * ( (n-2*k)/k )

## [1] 3.742416

H0: no structural change
F statistic(Chow) is 4.016 and the F statistic for population is nearly 0.
So, we can reject the null hypothesis and state that there is structural change between female and male.

Assignment_DummyVariable

Group_8

2020-12-03

C2

(i)

(ii)

(iii)

(iv)

C3

(i)

(ii)

(iii)

C10

(i)

(ii)

(iii)

(iv)

(v)

(vi)

C12

(i)

(ii)

(iii)

(iv)

(v)

(vi)