\(~\)
\(~\)
library(haven)
library(broom)
library(tidyr)
library(gapminder)
library(quantreg)
library(plotly)
library(tidyverse)
library(kableExtra)
library(rmarkdown)
library(purrr)
PS_qnt_incomes <- read_dta("/Users/bastienpatras/Desktop/Sciences Po - Master in Economics/Econometrics/PBset 10/PS_qnt_incomes.dta") %>%
na.omit()
head(PS_qnt_incomes) %>%
kbl(booktabs = T) %>%
kable_styling(latex_options = "striped", full_width = T)
year | educ | age | exper | hincome | nonwhite | expersq |
---|---|---|---|---|---|---|
1999 | 13 | 18 | -2 | 7.247761 | 0 | 4 |
2002 | 8 | 22 | 7 | 8.982443 | 0 | 49 |
2003 | 13 | 22 | 2 | 8.525437 | 0 | 4 |
1996 | 7 | 44 | 30 | 5.875404 | 1 | 900 |
1997 | 10 | 45 | 28 | 5.946600 | 1 | 784 |
2002 | 7 | 49 | 35 | 6.531967 | 1 | 1225 |
\(~\)
\(~\)
sum.stat <- PS_qnt_incomes %>%
group_by(year) %>%
summarize(across(
.cols = everything(),
.fns = list(n = ~n(),
Mean = ~mean(.x, na.rm = T),
SD = ~var(.x, na.rm = T)^0.5,
q10 = ~quantile(.x, na.rm = T, probs = 0.1),
q25 = ~quantile(.x, na.rm = T, probs = 0.25),
q50 = ~quantile(.x, na.rm = T, probs = 0.50),
q75 = ~quantile(.x, na.rm = T, probs = 0.75),
q90 = ~quantile(.x, na.rm = T, probs = 0.90)),
.names = "{.col}_{.fn}")
)
paged_table(sum.stat)
\(~\)
\(~\)
# OLS regression by year
reg.income <- PS_qnt_incomes %>%
na.omit() %>%
group_by(year) %>%
do(reg.income = tidy(lm(hincome ~ educ + exper + as.factor(nonwhite) + expersq, data = .))) %>%
unnest(reg.income)
# Output
paged_table(reg.income)
# Quantile regression by year
qreg.income <- PS_qnt_incomes %>%
na.omit() %>%
group_by(year) %>%
do(qreg.income = tidy(rq(hincome ~ exper + educ +
as.factor(nonwhite) + expersq, data = ., tau=c(0.10, 0.25, 0.50, 0.75, 0.90)), se.type="nid", cov =TRUE)) %>%
unnest(qreg.income)
# Output
paged_table(qreg.income)
# Plotting OLS regression by year
plot.1 <- reg.income %>%
rename(Estimates=term) %>%
ggplot(aes(x=year, y=estimate, color=Estimates)) +
geom_line() +
geom_point(size=0.4) +
theme(axis.text = element_text(angle = 90)) +
scale_color_manual(values = c("#D12380", "#C02176",
"#F0A8CE", "#570F35",
"#8C1855", "#7A154B",
"#691240", "#ED96C5"))
# Output
ggplotly(plot.1)
# Plotting Quantile regression by year
plot.2 <- qreg.income %>%
rename(Estimates=term) %>%
ggplot(aes(x=year, y=estimate, color=Estimates)) +
geom_line() +
geom_point(size=0.4) +
theme(axis.text = element_text(angle = 90)) +
facet_wrap(~`tau`) +
scale_color_manual(values = c("#D12380", "#C02176",
"#F0A8CE", "#570F35",
"#8C1855", "#7A154B",
"#691240", "#ED96C5"))
# Output
ggplotly(plot.2)
\(~\)
The OLS estimation is able to reproduce the general trend of our quantile regression for the selected quantiles. However one can observe significant differences in terms of volatility and amplitude, indeed the OLS estimation offers only a partial view of the relationship, and we might be interested in describing the relationship at different points in the conditional distribution of outcome variables. Over the 9 years coverage, the returns on education and experience have not ceased to increase, however the fact of being non-white on income has largely evolved going from positive to negative with a slight difference in volatility for the 10th percentile and 1rst quartile. Indeed, at low level of income the quantile regression indicates that returns on experience have been more important than education over the 9 years coverage. However at higher quantile, the relationship inverts itself and returns on education are higher relative to experience. Additionally the trend over time seems to increase (be more positive) at higher quantile. The quantile regression allows us to drop the assumption that variables operate the same at the upper tails of the distribution as at the mean and to identify the factors that are important determinants of variables.
\(~\)
\(~\)
\[ Q_r(\text{hincome}) = \beta_0(\tau) + \beta_1(\tau) \times \text{exper} + \beta_2(\tau) \times \text{exper}^2 + ... + \beta_5(\tau) x_5\]
Differentiating with respect to \(\text{exper}\)
\[ \frac{\partial Q_r(\text{hincome})}{\partial x_1} = \beta_1 + 2\beta_2 \times \text{exper} \]
With \(\text{exper}\in \left\{5, 15, 25\right\} and \tau \in \left\{0.10, 0.25, 0.50, 0.75, 0.90\right\}\) gives us the marginal effects by year at the five quantiles:
# Quantile regression by year
qreg.income.exper <- PS_qnt_incomes %>%
group_by(year) %>%
do(qreg.income.exper = tidy(rq(hincome ~ exper + educ +
as.factor(nonwhite) + expersq, data = ., tau=c(0.10, 0.25, 0.50, 0.75, 0.90)
))) %>%
unnest(qreg.income.exper) %>%
select(year, term, estimate, tau) %>%
pivot_wider(names_from = term, values_from = estimate) %>%
group_by(year, tau) %>%
summarise(`Marginal effect for new entrants` = exper+2*expersq*5,
`Marginal effect for mid career` = exper+2*expersq*15,
`Marginal effect for prime earners` = exper+2*expersq*25) %>%
pivot_longer(cols=c(`Marginal effect for new entrants`,
`Marginal effect for mid career`,
`Marginal effect for prime earners`), names_to="Effects") %>%
rename(Estimates=value)
# Output
paged_table(qreg.income.exper)
# Plotting Quantile regression by year
plot.3 <- qreg.income.exper %>%
ggplot(aes(x=year, y=Estimates, color=Effects)) +
geom_line() +
geom_point(size=0.4) +
theme(axis.text = element_text(angle = 90)) +
facet_wrap(~`tau`) +
scale_color_manual(values = c("#D12380", "#C02176",
"#F0A8CE", "#570F35",
"#8C1855", "#7A154B",
"#691240", "#ED96C5"))
# Output
ggplotly(plot.3)
\(~\)
The first thing to notice is that belonging to the new entrants group (having 5 years of experience) yields the maximum returns to experience on income relative to belonging to mid career or prime earners. New entrants returns to experience has not ceased to increased over the 9 years period studied. Looking at the graph one can deduce that the maximum returns to experience on income happens within the five first years and decreases with years of experience at any quantiled studied. Looking at the evolution of the marginal effect at each quantile specifically, suggests that returns to experience on income are less important at low level of income and grow at a higher rate at higher quantile. (the time trend by quantile being higher at higher quantile)
\(~\)
\(~\)
# Quantile regression by year
qreg.income.educ <- PS_qnt_incomes %>%
mutate(Educ.group = case_when(exper == 5 ~ 'New entrants',
exper == 15 ~ 'Mid career',
exper == 25 ~ 'Prime earners')) %>%
na.omit() %>%
group_by(year, Educ.group) %>%
do(qreg.income.educ = tidy(rq(hincome ~ educ +
as.factor(nonwhite), data = ., tau=c(0.10, 0.25, 0.50, 0.75, 0.90)
))) %>%
unnest(qreg.income.educ) %>%
select(year, Educ.group, term, estimate, tau) %>%
filter(term == "educ")
paged_table(qreg.income.educ)
# Plotting Quantile regression by year
plot.4 <- qreg.income.educ %>%
ggplot(aes(x=year, y=estimate, color=Educ.group)) +
geom_line() +
geom_point(size=0.4) +
theme(axis.text = element_text(angle = 90)) +
facet_wrap(~`tau`) +
scale_color_manual(values = c("#D12380", "#C02176",
"#F0A8CE", "#570F35",
"#8C1855", "#7A154B",
"#691240", "#ED96C5"))
# Output
ggplotly(plot.4)
\(~\)
Overall, the returns to education between each group seem to be relatively close over the time range studied within a given quantile. Additionally, the returns are higher at the higher quantiles across groups. Finally, the time trend by quantile indicates that returns to education between each group seem to increase with the quantile studied. (the higher the quantile the higher the slope of the time trend)
\(~\)
\(~\)
beta.educ.diff <- qreg.income %>%
select(year, term, statistic, tau) %>%
filter(term=="educ") %>%
group_by(year) %>%
pivot_wider(names_from = tau, names_prefix = "sd_", values_from = statistic)
paged_table(beta.educ.diff)
\(~\)
\(~\)
\(~\)
\(~\)
As seen in question 4, the analysis by quantile seems to suggest different patterns (at the quantile studied). Therefore we can assume that variables operate the same at the upper tails of the distribution as at the mean.
\(~\)
\(~\)
TRUE: As we have seen in question 4 and on the graph
\(~\)
\(~\)
FALSE (cf question 4)