The data set HAPPINESS.RAW contains independently pooled cross sections for the even years from 1994 through 2006, obtained from the General Social Survey. The dependent variable for this problem is a measure of “happiness,” vhappy, which is a binary variable equal to one if the person reports being “very happy” (as opposed to just “pretty happy” or “not too happy”).
Loading the Data set
library(wooldridge)library(lmtest)
Warning: package 'lmtest' was built under R version 4.2.3
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(sandwich)
Warning: package 'sandwich' was built under R version 4.2.3
library(car)
Warning: package 'car' was built under R version 4.2.3
Loading required package: carData
library(dplyr)
Warning: package 'dplyr' was built under R version 4.2.3
Attaching package: 'dplyr'
The following object is masked from 'package:car':
recode
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
data('happiness')
(i)
Which year has the largest number of observations? Which has the smallest? What is the percentage of people in the sample reporting they are “very happy”?
2006 has the most number of observations with 2986.
30.7% of people in the sample report that they are “very happy’’.
(ii)
Regress vhappy on all of the year dummies, leaving out y94 so that 1994 is the base year. Compute a heteroskedasticity-robust statistic of the null hypothesis that the pro- portion of very happy people has not changed over time. What is the p-value of the test?
To the regression in part (ii), add the dummy variables occattend and regattend. Inter- pret their coefficients. (Remember, the coefficients are interpreted relative to a base group.) How would you summarize the effects of church attendance on happiness?
table(happiness$attend)
never lt once a year once a year sevrl times a yr
3213 1382 2209 2118
once a month 2-3x a month nrly every week every week
1193 1495 939 3041
more thn once wk dk,na
1274 0
never lt once a year once a year sevrl times a yr
0 0 0 2118
once a month 2-3x a month nrly every week every week
1193 1495 0 0
more thn once wk dk,na
0 0
The P-value of regular attendance is extremely low at <2e-16. However, the p-value of occasional attendance is high not significant at 0.59
(iv)
Define a variable, say highinc, equal to one if family income is above $25,000. (Unfortunately, the same threshold is used in each year, and so inflation is not accounted for. Also, $25,000 is hardly what one would consider “high income.”) Include highinc, unem10, educ, and teens in the regression in part (iii). Is the coefficient on regattend affected much? What about its statistical significance?
happiness$highinc <-ifelse(happiness$income =="$25000 or more", 1, 0)
Changes in the coefficients of regular attendance in model in question 3 and model in question 4. Overall the estimated effect regular attendance has does shift down however, in both models the attribute continues to be statistically significant.
Model in Question 3
Model in Question 4
Estimate
0.1121737
0.0959975
Standard Error
0.0113827
0.0147733
t value
9.8548
6.4981
P-value
< 2e-16
8.534e-11
(v)
Discuss the signs, magnitudes, and statistical significance of the four new variables in part (iv). Do the estimates make sense?
Answer (v)
educ: highest year of school completed
teens: household members 13 thru 17 yrs old
unem10: =1 if unemployed in last 10 years
highinc
Unem10
educ
teens
+/-
Positive
Negative
Positive
Negative
Magnitudes / Estimate
0.1011430
-0.0880616
0.0039147
-0.0167817
Significance / P-value
< 2.2e-16 ***
< 2.2e-16 ***
0.01942 *
0.07010 .
Unem10 and teens
Both unem10 and teens have negative coefficient estimates, which aligns with logical expectations. Notably, unem10 has a lower p-value and a larger coefficient estimate than teens, indicating that job instability has a stronger and more statistically significant impact on the dependent variable.
Unem10, representing job instability, suggests that higher unemployment rates lead to greater financial uncertainty, reducing overall stability and well-being. Similarly, teens, which captures the presence of teenage family members, reflects the challenges associated with adolescence—a period often marked by emotional and developmental changes that can strain family dynamics.
These findings suggest that both economic uncertainty and family stress during adolescence negatively affect the dependent variable. However, the results also indicate that job stability is a more significant predictor of happiness than having a teenager in the household. This is reasonable, as financial insecurity often has immediate and far-reaching consequences, whereas family-related stress, while impactful, may be more manageable over time.
highinc and educ
Both highinc and educ have positive coefficient estimates, which aligns with expectations. Additionally, highinc has a lower p-value and a higher coefficient estimate than educ, suggesting that financial security has a more substantial and statistically significant impact on the dependent variable.
Highinc, which indicates a family income above $25,000, represents a level of financial security that, while not high, helps alleviate economic stress and improve overall well-being. Even a modest financial cushion can reduce anxiety and provide greater stability in daily life.
Similarly, educ, representing the highest year of school completed, is also positively associated with the dependent variable. Higher education often leads to more fulfilling careers, offering greater autonomy, creativity, and a sense of purpose, all of which contribute to life satisfaction. Furthermore, education enhances social skills, expands professional and personal networks, and exposes individuals to diverse perspectives, fostering better social interactions and overall well-being.
The fact that highinc has a larger estimate and is more significant than educ is expected. Financial security addresses more immediate and fundamental needs, such as housing, healthcare, and stability, whereas the benefits of education—such as self-actualization, career fulfillment, and social connections—are more long-term. This finding underscores the idea that basic financial stability is a prerequisite for higher-level well-being.
(vi)
Controlling for the factors in part (iv), do there appear to be differences in happiness by gender or race? Justify your answer.
The coefficient for female is 0.0052206, which is small and not statistically significant (p = 0.60237). This suggests that gender, specifically identifying as female, does not have a statistically significant effect on the dependent variable (happiness).
Black:
The coefficient for black is -0.0346317, which is also small and not statistically significant (p = 0.10712). This indicates that identifying as Black does not have a statistically significant effect on happiness based on this model.
BlackFemale:
The coefficient for blackfemale (interaction between race and gender) is -0.0252663, and it is not statistically significant (p = 0.34350). This suggests that the interaction of being both Black and female does not significantly affect happiness compared to the baseline groups.
From this analysis, there do not appear to be significant differences in happiness by gender or race in this model, as none of the coefficients related to these variables are statistically significant.
Chapter 14, c10
Use the data in AIRFARE.RAW for this exercise. We are interested in estimating the model
change in concen represents change in change in market concentration. The coefficient for concen (0.360) implies that for a 1-unit increase in market concentration (concen), the dependent variable (ln(fare) increases by 0.360.
Percentage change in fare≈Coefficient×Δconcen×100
Percentage change in fare=0.360⋅0.10⋅100=3.6%
(ii)
What is the usual OLS 95% confidence interval for ß1? Why is it probably not reliable? If you have access to a statistical package that computes fully robust standard errors, find the fully robust 95% CI for ß1. Compare it to the usual CI and comment.
The Confidence interval at 95% = [0.301186043, 0.41905462]
This confidence interval is most likely not reliable because of heteroskedasticity. Running the Breush-Pagan test we get a p-value: < 2.2e-16. Since this is much smaller than 0.05 we reject the null hypothesis of homoscedasticity and conclude that there is most likely heteroskedasticity.
Looking at fully robust standard errors the coefficients change and confidence interval.
B1 = 0.3601203
standard error = 0.0584923
confidence interval of 95% = B1 ± 1.96 * SE(B1)
The Confidence interval at 95% = [0.2454754, 0.4747652]
Usual
Robust
CI
[0.301186043, 0.41905462]
[0.2454754, 0.4747652]
est
0.301186043
0.3601203
std error
0.41905462
0.0584923
(iii)
Describe what is happening with the quadratic in log(dist). In particular, for what value of dist does the relationship between log(fare) and dist become positive? [Hint: First figure out the turning point value for log(dist), and then exponential.] Is the turning point outside the range of the data?
Answer(iii)
B1 = -0.9016 log (dist)
B2 = 0.1030 log(dist^2)
This suggests a U shaped curve for prices where at shorter distances an increase in distance lowers prices but at longer distances an increase in distance increases prices.
estimating the equation using random effects B1 is 0.2089935
(v)
Now estimate the equation using fixed effects. What is the FE estimate of ß1? Why is it fairly similar to the RE estimate? (Hint: What is Ô for RE estimation?)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = lfare ~ y98 + y99 + y00 + concen + ldist + ldistsq,
data = airfare.p, model = "random")
Balanced Panel: n = 1149, T = 4, N = 4596
Effects:
var std.dev share
idiosyncratic 0.01134 0.10651 0.1
individual 0.10198 0.31934 0.9
theta: 0.8355
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.8316339 -0.0639906 -0.0037195 0.0626552 0.8655402
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) 6.2220050 0.8099666 7.6818 1.569e-14 ***
y98 0.0224743 0.0044544 5.0454 4.526e-07 ***
y99 0.0366898 0.0044528 8.2398 < 2.2e-16 ***
y00 0.0982120 0.0044576 22.0324 < 2.2e-16 ***
concen 0.2089935 0.0265297 7.8777 3.334e-15 ***
ldist -0.8520921 0.2464836 -3.4570 0.0005462 ***
ldistsq 0.0974604 0.0186358 5.2297 1.698e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 67.626
Residual Sum of Squares: 52.162
R-Squared: 0.22866
Adj. R-Squared: 0.22766
Chisq: 1360.42 on 6 DF, p-value: < 2.22e-16
Answer (V)
Fixed Effect
Random
0.1688590
0.2089935
the theta hat for random effect is 0.84 meaning the variance of the unobserved fixed effects is large compared to idiosyncratic error.
(vi)
Name two characteristics of a route (other than distance between stops) that are captured by a ¡. Might these be correlated with concenit ?
Answer (vi)
One characteristics could be the population of the cities the airplane travels between. Routes with high demand tend to have higher fares due greater pricing power by airlines.
Another characteristic could be the types of planes that are being used. Some routes have bigger planes or faster planes 1st class etc.
(vii)
Are you convinced that higher concentration on a route increases airfares? What is your best estimate?
stargazer(reg3_pols, reg3_re, reg3_fe, type ="text")
The coefficients for cencen is positive at statistically significant at 1%. So yes I am convinced that higher concentration on a route increases air fares.