Econometrics HW#2

Chapter 13, c15

The data set HAPPINESS.RAW contains independently pooled cross sections for the even years from 1994 through 2006, obtained from the General Social Survey. The dependent variable for this problem is a measure of “happiness,” vhappy, which is a binary variable equal to one if the person reports being “very happy” (as opposed to just “pretty happy” or “not too happy”).

Loading the Data set

library(wooldridge)
library(lmtest)
Warning: package 'lmtest' was built under R version 4.2.3
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(sandwich)
Warning: package 'sandwich' was built under R version 4.2.3
library(car)
Warning: package 'car' was built under R version 4.2.3
Loading required package: carData
library(dplyr)
Warning: package 'dplyr' was built under R version 4.2.3

Attaching package: 'dplyr'
The following object is masked from 'package:car':

    recode
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
data('happiness')

(i)

Which year has the largest number of observations? Which has the smallest? What is the percentage of people in the sample reporting they are “very happy”?

table(happiness$year)

1994 1996 1998 2000 2002 2004 2006 
2977 2885 2806 2777 1369 1337 2986 
sum(happiness$happy== "very happy") / nrow(happiness)
[1] 0.3069382

Answer (i)

2006 has the most number of observations with 2986.

30.7% of people in the sample report that they are “very happy’’.

(ii)

Regress vhappy on all of the year dummies, leaving out y94 so that 1994 is the base year. Compute a heteroskedasticity-robust statistic of the null hypothesis that the pro- portion of very happy people has not changed over time. What is the p-value of the test?

fit_13_c15_2 <- lm(vhappy ~ y96 + y98 + y00 + y02 + y04 + y06, data=happiness)

coeftest(fit_13_c15_2, vcov=vcovHC(fit_13_c15_2, type="HC0"))

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.2878737  0.0082983 34.6906  < 2e-16 ***
y96         0.0161124  0.0119247  1.3512  0.17666    
y98         0.0296602  0.0120868  2.4539  0.01414 *  
y00         0.0293751  0.0121186  2.4240  0.01536 *  
y02         0.0152673  0.0149389  1.0220  0.30680    
y04         0.0255145  0.0151592  1.6831  0.09237 .  
y06         0.0202308  0.0118429  1.7083  0.08761 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer (ii)

The intercept is significant at 0

1998 and 2000 is significant at 0.01

2004 and 2006 is significant at 0.05

1996 and 2002 has no significance

P-Values:

Intercept: <2e-16

1996: 0.17666

1998: 0.01414

2000: 0.0156

2002: 0.30680

2004: 0.09237

2006: 0.08761

(iii)

To the regression in part (ii), add the dummy variables occattend and regattend. Inter- pret their coefficients. (Remember, the coefficients are interpreted relative to a base group.) How would you summarize the effects of church attendance on happiness?

table(happiness$attend)

           never   lt once a year      once a year sevrl times a yr 
            3213             1382             2209             2118 
    once a month     2-3x a month  nrly every week       every week 
            1193             1495              939             3041 
more thn once wk            dk,na 
            1274                0 
tmp <- happiness %>% filter(occattend==1)
table(tmp$attend)

           never   lt once a year      once a year sevrl times a yr 
               0                0                0             2118 
    once a month     2-3x a month  nrly every week       every week 
            1193             1495                0                0 
more thn once wk            dk,na 
               0                0 
tmp <- happiness %>% filter(regattend==1)
table(tmp$attend) 

           never   lt once a year      once a year sevrl times a yr 
               0                0                0                0 
    once a month     2-3x a month  nrly every week       every week 
               0                0              939                0 
more thn once wk            dk,na 
            1274                0 
fit_13_c15_3 <- lm(vhappy ~ y96 + y98 + y00 + y02 + y04 + y06 + occattend + regattend, data=happiness)

coeftest(fit_13_c15_3, vcov=vcovHC(fit_13_c15_3, type="HC0"))

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.2713457  0.0088883 30.5285  < 2e-16 ***
y96         0.0167487  0.0120288  1.3924  0.16382    
y98         0.0278593  0.0121444  2.2940  0.02180 *  
y00         0.0312657  0.0122225  2.5580  0.01053 *  
y02         0.0157476  0.0149817  1.0511  0.29322    
y04         0.0251635  0.0151597  1.6599  0.09695 .  
y06         0.0221839  0.0118809  1.8672  0.06189 .  
occattend   0.0042648  0.0080219  0.5316  0.59498    
regattend   0.1121737  0.0113827  9.8548  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer(iii)

The P-value of regular attendance is extremely low at <2e-16. However, the p-value of occasional attendance is high not significant at 0.59

(iv)

Define a variable, say highinc, equal to one if family income is above $25,000. (Unfortunately, the same threshold is used in each year, and so inflation is not accounted for. Also, $25,000 is hardly what one would consider “high income.”) Include highinc, unem10, educ, and teens in the regression in part (iii). Is the coefficient on regattend affected much? What about its statistical significance?

happiness$highinc <- ifelse(happiness$income == "$25000 or more", 1, 0)
fit_13_c15_4 <- lm(vhappy ~ y96 + y98 + y00 + y02 + y04 + y06 + occattend + regattend + highinc + unem10 + educ + teens, data = happiness)

coeftest(fit_13_c15_4, vcov=vcovHC(fit_13_c15_4, type="HC0"))

t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  0.1962739  0.0238695  8.2228 2.242e-16 ***
y96          0.0123443  0.0155310  0.7948   0.42674    
y98          0.0188090  0.0155202  1.2119   0.22558    
y00          0.0306081  0.0160125  1.9115   0.05597 .  
y02         -0.0169435  0.0188968 -0.8966   0.36994    
y04          0.0068189  0.0196643  0.3468   0.72877    
y06         -0.0056013  0.0153028 -0.3660   0.71435    
occattend   -0.0065072  0.0103057 -0.6314   0.52778    
regattend    0.0959975  0.0147733  6.4981 8.534e-11 ***
highinc      0.1011430  0.0100185 10.0956 < 2.2e-16 ***
unem10      -0.0880616  0.0095108 -9.2591 < 2.2e-16 ***
educ         0.0039147  0.0016746  2.3377   0.01942 *  
teens       -0.0167817  0.0092641 -1.8115   0.07010 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer(iv)

Changes in the coefficients of regular attendance in model in question 3 and model in question 4. Overall the estimated effect regular attendance has does shift down however, in both models the attribute continues to be statistically significant.

Model in Question 3 Model in Question 4
Estimate 0.1121737 0.0959975
Standard Error 0.0113827 0.0147733
t value 9.8548 6.4981
P-value < 2e-16 8.534e-11

(v)

Discuss the signs, magnitudes, and statistical significance of the four new variables in part (iv). Do the estimates make sense?

Answer (v)

  • educ: highest year of school completed

  • teens: household members 13 thru 17 yrs old

  • unem10: =1 if unemployed in last 10 years

highinc Unem10 educ teens
+/- Positive Negative Positive Negative
Magnitudes / Estimate 0.1011430 -0.0880616 0.0039147 -0.0167817
Significance / P-value < 2.2e-16 *** < 2.2e-16 *** 0.01942 * 0.07010 .
Unem10 and teens

Both unem10 and teens have negative coefficient estimates, which aligns with logical expectations. Notably, unem10 has a lower p-value and a larger coefficient estimate than teens, indicating that job instability has a stronger and more statistically significant impact on the dependent variable.

Unem10, representing job instability, suggests that higher unemployment rates lead to greater financial uncertainty, reducing overall stability and well-being. Similarly, teens, which captures the presence of teenage family members, reflects the challenges associated with adolescence—a period often marked by emotional and developmental changes that can strain family dynamics.

These findings suggest that both economic uncertainty and family stress during adolescence negatively affect the dependent variable. However, the results also indicate that job stability is a more significant predictor of happiness than having a teenager in the household. This is reasonable, as financial insecurity often has immediate and far-reaching consequences, whereas family-related stress, while impactful, may be more manageable over time.

highinc and educ

Both highinc and educ have positive coefficient estimates, which aligns with expectations. Additionally, highinc has a lower p-value and a higher coefficient estimate than educ, suggesting that financial security has a more substantial and statistically significant impact on the dependent variable.

Highinc, which indicates a family income above $25,000, represents a level of financial security that, while not high, helps alleviate economic stress and improve overall well-being. Even a modest financial cushion can reduce anxiety and provide greater stability in daily life.

Similarly, educ, representing the highest year of school completed, is also positively associated with the dependent variable. Higher education often leads to more fulfilling careers, offering greater autonomy, creativity, and a sense of purpose, all of which contribute to life satisfaction. Furthermore, education enhances social skills, expands professional and personal networks, and exposes individuals to diverse perspectives, fostering better social interactions and overall well-being.

The fact that highinc has a larger estimate and is more significant than educ is expected. Financial security addresses more immediate and fundamental needs, such as housing, healthcare, and stability, whereas the benefits of education—such as self-actualization, career fulfillment, and social connections—are more long-term. This finding underscores the idea that basic financial stability is a prerequisite for higher-level well-being.

(vi)

Controlling for the factors in part (iv), do there appear to be differences in happiness by gender or race? Justify your answer.

fit_13_c15_6 <- lm(vhappy ~ y96 + y98 + y00 + y02 + y04 + y06+ occattend + regattend + highinc + unem10 + educ + teens + female + black + blackfemale, data = happiness)

coeftest(fit_13_c15_6, vcov=vcovHC(fit_13_c15_6, type="HC0"))

t test of coefficients:

              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.2048132  0.0248012  8.2582  < 2e-16 ***
y96          0.0133543  0.0155055  0.8613  0.38912    
y98          0.0205980  0.0154927  1.3295  0.18370    
y00          0.0318974  0.0159869  1.9952  0.04605 *  
y02         -0.0152788  0.0188766 -0.8094  0.41830    
y04          0.0075105  0.0196535  0.3821  0.70236    
y06         -0.0040198  0.0152771 -0.2631  0.79246    
occattend   -0.0034363  0.0103538 -0.3319  0.73998    
regattend    0.1000294  0.0148462  6.7377  1.7e-11 ***
highinc      0.0964078  0.0102026  9.4494  < 2e-16 ***
unem10      -0.0867837  0.0095160 -9.1197  < 2e-16 ***
educ         0.0035342  0.0016778  2.1064  0.03520 *  
teens       -0.0148308  0.0092952 -1.5955  0.11062    
female       0.0052206  0.0100200  0.5210  0.60237    
black       -0.0346317  0.0214911 -1.6114  0.10712    
blackfemale -0.0252663  0.0266715 -0.9473  0.34350    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer(vi)

Variable Estimate Std. Error t Value Pr(>|t|)
female 0.0052206 0.0102 0.521 0.60237
black -0.0346317 0.0214911 -1.6114 0.10712
blackfemale -0.0252663 0.0266715 -0.9473 0.34350
Female:

The coefficient for female is 0.0052206, which is small and not statistically significant (p = 0.60237). This suggests that gender, specifically identifying as female, does not have a statistically significant effect on the dependent variable (happiness).

Black:

The coefficient for black is -0.0346317, which is also small and not statistically significant (p = 0.10712). This indicates that identifying as Black does not have a statistically significant effect on happiness based on this model.

BlackFemale:

The coefficient for blackfemale (interaction between race and gender) is -0.0252663, and it is not statistically significant (p = 0.34350). This suggests that the interaction of being both Black and female does not significantly affect happiness compared to the baseline groups.

From this analysis, there do not appear to be significant differences in happiness by gender or race in this model, as none of the coefficients related to these variables are statistically significant.

Chapter 14, c10

Use the data in AIRFARE.RAW for this exercise. We are interested in estimating the model

log(fare(it)) = nt + B1 concenit + B2 log(disti) + B3 [log(disti)]2 + a¡ + uit, t=1, … , 4,

where n, means that we allow for different year intercepts.

library(plm)

Attaching package: 'plm'
The following objects are masked from 'package:dplyr':

    between, lag, lead
library(wooldridge)
library(lmtest)
library(sandwich)
library(car)
library(dplyr)
library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
data("airfare")
airfare.p <- pdata.frame(airfare, index = c("id", "year"))

(i)

Estimate the above equation by pooled OLS, being sure to include year dummies. If 𝚫concen = .10, what is the estimated percentage increase in fare?

reg3_pols <- plm(data = airfare.p, lfare ~ y98 + y99 + y00 + concen + ldist + ldistsq, model = "pooling")

stargazer(reg3_pols, type = "text")

========================================
                 Dependent variable:    
             ---------------------------
                        lfare           
----------------------------------------
y98                     0.021           
                       (0.014)          
                                        
y99                   0.038***          
                       (0.014)          
                                        
y00                   0.100***          
                       (0.014)          
                                        
concen                0.360***          
                       (0.030)          
                                        
ldist                 -0.902***         
                       (0.128)          
                                        
ldistsq               0.103***          
                       (0.010)          
                                        
Constant              6.209***          
                       (0.421)          
                                        
----------------------------------------
Observations            4,596           
R2                      0.406           
Adjusted R2             0.405           
F Statistic   523.175*** (df = 6; 4589) 
========================================
Note:        *p<0.1; **p<0.05; ***p<0.01

Answer(i)

concen: = bmktshr

bmktshr: fraction market, biggest carrier

change in concen represents change in change in market concentration. The coefficient for concen (0.360) implies that for a 1-unit increase in market concentration (concen), the dependent variable (ln(fare) increases by 0.360.

Percentage change in fare≈Coefficient×Δconcen×100

Percentage change in fare=0.360⋅0.10⋅100=3.6%

(ii)

What is the usual OLS 95% confidence interval for ß1? Why is it probably not reliable? If you have access to a statistical package that computes fully robust standard errors, find the fully robust 95% CI for ß1. Compare it to the usual CI and comment.

confint(reg3_pols, level = 0.95)
                   2.5 %      97.5 %
(Intercept)  5.384848317  7.03366681
y98         -0.006397311  0.04864606
y99          0.010329215  0.06536995
y00          0.072345730  0.12739422
concen       0.301186043  0.41905462
ldist       -1.153010902 -0.65018987
ldistsq      0.083957941  0.12208129
bptest(reg3_pols)

    studentized Breusch-Pagan test

data:  reg3_pols
BP = 873.04, df = 6, p-value < 2.2e-16
coeftest(reg3_pols, vcov = vcovHC)

t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  6.2092576  0.9107631  6.8176 1.045e-11 ***
y98          0.0211244  0.0041429  5.0990 3.553e-07 ***
y99          0.0378496  0.0051739  7.3155 3.011e-13 ***
y00          0.0998700  0.0056407 17.7052 < 2.2e-16 ***
concen       0.3601203  0.0584923  6.1567 8.061e-10 ***
ldist       -0.9016004  0.2716505 -3.3190 0.0009105 ***
ldistsq      0.1030196  0.0201382  5.1156 3.255e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
0.3601203-1.96*0.0584923
[1] 0.2454754
0.3601203+1.96*0.0584923
[1] 0.4747652

Answer(ii)

usual confidence interval

B1 = 0.301186043

Standard error = 0.41905462

confidence interval of 95% = B1 ± 1.96 * SE(B1)

The Confidence interval at 95% = [0.301186043, 0.41905462]

This confidence interval is most likely not reliable because of heteroskedasticity. Running the Breush-Pagan test we get a p-value: < 2.2e-16. Since this is much smaller than 0.05 we reject the null hypothesis of homoscedasticity and conclude that there is most likely heteroskedasticity.

Looking at fully robust standard errors the coefficients change and confidence interval.

B1 = 0.3601203

standard error = 0.0584923

confidence interval of 95% = B1 ± 1.96 * SE(B1)

The Confidence interval at 95% = [0.2454754, 0.4747652]

Usual Robust
CI [0.301186043, 0.41905462] [0.2454754, 0.4747652]
est 0.301186043 0.3601203
std error 0.41905462 0.0584923

(iii)

Describe what is happening with the quadratic in log(dist). In particular, for what value of dist does the relationship between log(fare) and dist become positive? [Hint: First figure out the turning point value for log(dist), and then exponential.] Is the turning point outside the range of the data?

Answer(iii)

B1 = -0.9016 log (dist)

B2 = 0.1030 log(dist^2)

This suggests a U shaped curve for prices where at shorter distances an increase in distance lowers prices but at longer distances an increase in distance increases prices.

The turning point is where

B1 + 2*B2 log(dist) = 0

beta_ldist <- coef(reg3_pols)["ldist"]

beta_ldistsq <- coef(reg3_pols)["ldistsq"]

turning_point_log <- -beta_ldist / (2 * beta_ldistsq)

turning_point_dist <- exp(turning_point_log)

print(turning_point_dist)
   ldist 
79.50879 

The turning point is 79.50879.

Looking at the data the largest value in column dist is 2724. The turning point is within the range of the data.

(iv)

Now estimate the equation using random effects. How does the estimate of ß1 change?

reg3_re <- plm(data = airfare.p, lfare ~ y98 + y99 + y00 + concen + ldist + ldistsq, model = "random")

coeftest(reg3_re)

t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  6.2220050  0.8099666  7.6818 1.905e-14 ***
y98          0.0224743  0.0044544  5.0454 4.700e-07 ***
y99          0.0366898  0.0044528  8.2398 2.228e-16 ***
y00          0.0982120  0.0044576 22.0324 < 2.2e-16 ***
concen       0.2089935  0.0265297  7.8777 4.132e-15 ***
ldist       -0.8520921  0.2464836 -3.4570 0.0005512 ***
ldistsq      0.0974604  0.0186358  5.2297 1.773e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer (iv)

usual robust random
0.301186043 0.3601203 0.2089935

estimating the equation using random effects B1 is 0.2089935

(v)

Now estimate the equation using fixed effects. What is the FE estimate of ß1? Why is it fairly similar to the RE estimate? (Hint: What is Ô for RE estimation?)

reg3_fe <- plm(data = airfare.p, lfare ~ y98 + y99 + y00 + concen + ldist + ldistsq, model = "within")

coeftest(reg3_fe)

t test of coefficients:

        Estimate Std. Error t value  Pr(>|t|)    
y98    0.0228328  0.0044515  5.1292 3.071e-07 ***
y99    0.0363819  0.0044495  8.1766 4.061e-16 ***
y00    0.0977717  0.0044555 21.9441 < 2.2e-16 ***
concen 0.1688590  0.0294101  5.7415 1.020e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
reg3_re <- plm(data = airfare.p, lfare ~ y98 + y99 + y00 + concen + ldist + ldistsq, model = "random")

summary(reg3_re)
Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = lfare ~ y98 + y99 + y00 + concen + ldist + ldistsq, 
    data = airfare.p, model = "random")

Balanced Panel: n = 1149, T = 4, N = 4596

Effects:
                  var std.dev share
idiosyncratic 0.01134 0.10651   0.1
individual    0.10198 0.31934   0.9
theta: 0.8355

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.8316339 -0.0639906 -0.0037195  0.0626552  0.8655402 

Coefficients:
              Estimate Std. Error z-value  Pr(>|z|)    
(Intercept)  6.2220050  0.8099666  7.6818 1.569e-14 ***
y98          0.0224743  0.0044544  5.0454 4.526e-07 ***
y99          0.0366898  0.0044528  8.2398 < 2.2e-16 ***
y00          0.0982120  0.0044576 22.0324 < 2.2e-16 ***
concen       0.2089935  0.0265297  7.8777 3.334e-15 ***
ldist       -0.8520921  0.2464836 -3.4570 0.0005462 ***
ldistsq      0.0974604  0.0186358  5.2297 1.698e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    67.626
Residual Sum of Squares: 52.162
R-Squared:      0.22866
Adj. R-Squared: 0.22766
Chisq: 1360.42 on 6 DF, p-value: < 2.22e-16

Answer (V)

Fixed Effect Random
0.1688590 0.2089935

the theta hat for random effect is 0.84 meaning the variance of the unobserved fixed effects is large compared to idiosyncratic error.

(vi)

Name two characteristics of a route (other than distance between stops) that are captured by a ¡. Might these be correlated with concenit ?

Answer (vi)

One characteristics could be the population of the cities the airplane travels between. Routes with high demand tend to have higher fares due greater pricing power by airlines.

Another characteristic could be the types of planes that are being used. Some routes have bigger planes or faster planes 1st class etc.

(vii)

Are you convinced that higher concentration on a route increases airfares? What is your best estimate?

stargazer(reg3_pols, reg3_re, reg3_fe, type = "text")

=============================================================================
                                   Dependent variable:                       
             ----------------------------------------------------------------
                                          lfare                              
                        (1)                (2)                 (3)           
-----------------------------------------------------------------------------
y98                    0.021             0.022***           0.023***         
                      (0.014)            (0.004)             (0.004)         
                                                                             
y99                  0.038***            0.037***           0.036***         
                      (0.014)            (0.004)             (0.004)         
                                                                             
y00                  0.100***            0.098***           0.098***         
                      (0.014)            (0.004)             (0.004)         
                                                                             
concen               0.360***            0.209***           0.169***         
                      (0.030)            (0.027)             (0.029)         
                                                                             
ldist                -0.902***          -0.852***                            
                      (0.128)            (0.246)                             
                                                                             
ldistsq              0.103***            0.097***                            
                      (0.010)            (0.019)                             
                                                                             
Constant             6.209***            6.222***                            
                      (0.421)            (0.810)                             
                                                                             
-----------------------------------------------------------------------------
Observations           4,596              4,596               4,596          
R2                     0.406              0.229               0.135          
Adjusted R2            0.405              0.228              -0.154          
F Statistic  523.175*** (df = 6; 4589) 1,360.422*** 134.611*** (df = 4; 3443)
=============================================================================
Note:                                             *p<0.1; **p<0.05; ***p<0.01

Answer (vii)

The coefficients for cencen is positive at statistically significant at 1%. So yes I am convinced that higher concentration on a route increases air fares.