\(~\)

0 - Load the data set

\(~\)

library(haven)
library(broom)
library(tidyr)
library(gapminder)
library(quantreg)
library(plotly)
library(tidyverse)
library(kableExtra)
library(rmarkdown)
library(purrr)

PS_qnt_incomes <- read_dta("/Users/bastienpatras/Desktop/Sciences Po - Master in Economics/Econometrics/PBset 10/PS_qnt_incomes.dta") %>%
  na.omit()

head(PS_qnt_incomes) %>% 
  kbl(booktabs = T) %>%
  kable_styling(latex_options = "striped", full_width = T) 
year educ age exper hincome nonwhite expersq
1999 13 18 -2 7.247761 0 4
2002 8 22 7 8.982443 0 49
2003 13 22 2 8.525437 0 4
1996 7 44 30 5.875404 1 900
1997 10 45 28 5.946600 1 784
2002 7 49 35 6.531967 1 1225

\(~\)

1 - Show descriptive statistics for each year (mean, standard deviation, and quantiles 0.10, 0.25, 0.50, 0.75 and 0.90). Clean the data if needed.

\(~\)

sum.stat <- PS_qnt_incomes %>%
  group_by(year) %>%
  summarize(across(
    .cols = everything(),
    .fns = list(n = ~n(), 
                Mean = ~mean(.x, na.rm = T), 
                SD = ~var(.x, na.rm = T)^0.5,
                q10 = ~quantile(.x, na.rm = T, probs = 0.1),
                q25 = ~quantile(.x, na.rm = T, probs = 0.25),
                q50 = ~quantile(.x, na.rm = T, probs = 0.50),
                q75 = ~quantile(.x, na.rm = T, probs = 0.75),
                q90 = ~quantile(.x, na.rm = T, probs = 0.90)),
    .names = "{.col}_{.fn}")
  ) 

paged_table(sum.stat) 

\(~\)

2 - Estimate OLS and quantile regressions (for quantiles 0.10, 0.25, 0.50, 0.75 and 0.90) for each of the sample years, 1996 to 2005. Represent graphically the evolution of years of education coefficients and their confidence intervals. Comment.

\(~\)

# OLS regression by year

reg.income <- PS_qnt_incomes %>%
   na.omit() %>%
   group_by(year) %>%
   do(reg.income = tidy(lm(hincome ~ educ + exper + as.factor(nonwhite) + expersq, data = .))) %>%
   unnest(reg.income) 

# Output

paged_table(reg.income)
# Quantile regression by year

qreg.income <- PS_qnt_incomes %>%
   na.omit() %>%
   group_by(year) %>%
   do(qreg.income = tidy(rq(hincome ~ exper + educ +
                            as.factor(nonwhite) + expersq, data = ., tau=c(0.10, 0.25, 0.50, 0.75, 0.90)), se.type="nid", cov =TRUE)) %>%
   unnest(qreg.income) 

# Output

paged_table(qreg.income)
# Plotting OLS regression by year

plot.1 <- reg.income %>%
  rename(Estimates=term) %>%
  ggplot(aes(x=year, y=estimate, color=Estimates)) +
  geom_line()  +
  geom_point(size=0.4) +
  theme(axis.text = element_text(angle = 90)) +
  scale_color_manual(values = c("#D12380", "#C02176",
                                "#F0A8CE", "#570F35",
                                "#8C1855", "#7A154B",
                                "#691240", "#ED96C5"))

# Output

ggplotly(plot.1)
# Plotting Quantile regression by year

plot.2 <- qreg.income %>%
  rename(Estimates=term) %>%
  ggplot(aes(x=year, y=estimate, color=Estimates)) +
  geom_line()  +
  geom_point(size=0.4) +
  theme(axis.text = element_text(angle = 90)) +
  facet_wrap(~`tau`) +
  scale_color_manual(values = c("#D12380", "#C02176",
                                "#F0A8CE", "#570F35",
                                "#8C1855", "#7A154B",
                                "#691240", "#ED96C5"))

# Output

ggplotly(plot.2)

\(~\)

The OLS estimation is able to reproduce the general trend of our quantile regression for the selected quantiles. However one can observe significant differences in terms of volatility and amplitude, indeed the OLS estimation offers only a partial view of the relationship, and we might be interested in describing the relationship at different points in the conditional distribution of outcome variables. Over the 9 years coverage, the returns on education and experience have not ceased to increase, however the fact of being non-white on income has largely evolved going from positive to negative with a slight difference in volatility for the 10th percentile and 1rst quartile. Indeed, at low level of income the quantile regression indicates that returns on experience have been more important than education over the 9 years coverage. However at higher quantile, the relationship inverts itself and returns on education are higher relative to experience. Additionally the trend over time seems to increase (be more positive) at higher quantile. The quantile regression allows us to drop the assumption that variables operate the same at the upper tails of the distribution as at the mean and to identify the factors that are important determinants of variables.

\(~\)

3 - Provide estimates for the the marginal effects for years of experience (careful: your regression will include experience and experience squared) each year for: (a) new entrants (people with 5 years of experience); (b) mid career (people with 15 years of experience); and prime earners (people with 25 years of experience) at the five quantiles. Plot the marginal effects and discuss briefly your findings.

\(~\)

\[ Q_r(\text{hincome}) = \beta_0(\tau) + \beta_1(\tau) \times \text{exper} + \beta_2(\tau) \times \text{exper}^2 + ... + \beta_5(\tau) x_5\]

Differentiating with respect to \(\text{exper}\)

\[ \frac{\partial Q_r(\text{hincome})}{\partial x_1} = \beta_1 + 2\beta_2 \times \text{exper} \]

With \(\text{exper}\in \left\{5, 15, 25\right\} and \tau \in \left\{0.10, 0.25, 0.50, 0.75, 0.90\right\}\) gives us the marginal effects by year at the five quantiles:

# Quantile regression by year

qreg.income.exper <- PS_qnt_incomes %>%
   group_by(year) %>%
   do(qreg.income.exper = tidy(rq(hincome ~ exper + educ +
                            as.factor(nonwhite) + expersq, data = ., tau=c(0.10, 0.25, 0.50, 0.75, 0.90)
                            ))) %>%
   unnest(qreg.income.exper) %>%
   select(year, term, estimate, tau) %>%
   pivot_wider(names_from = term, values_from = estimate) %>%
   group_by(year, tau) %>%
   summarise(`Marginal effect for new entrants` = exper+2*expersq*5,
             `Marginal effect for mid career` = exper+2*expersq*15,
             `Marginal effect for prime earners` = exper+2*expersq*25) %>%
   pivot_longer(cols=c(`Marginal effect for new entrants`,
                       `Marginal effect for mid career`,
                       `Marginal effect for prime earners`), names_to="Effects") %>%
  rename(Estimates=value)

# Output

paged_table(qreg.income.exper)
# Plotting Quantile regression by year

plot.3 <- qreg.income.exper %>%
  ggplot(aes(x=year, y=Estimates, color=Effects)) +
  geom_line()  +
  geom_point(size=0.4) +
  theme(axis.text = element_text(angle = 90)) +
  facet_wrap(~`tau`) +
  scale_color_manual(values = c("#D12380", "#C02176",
                                "#F0A8CE", "#570F35",
                                "#8C1855", "#7A154B",
                                "#691240", "#ED96C5"))

# Output

ggplotly(plot.3)

\(~\)

The first thing to notice is that belonging to the new entrants group (having 5 years of experience) yields the maximum returns to experience on income relative to belonging to mid career or prime earners. New entrants returns to experience has not ceased to increased over the 9 years period studied. Looking at the graph one can deduce that the maximum returns to experience on income happens within the five first years and decreases with years of experience at any quantiled studied. Looking at the evolution of the marginal effect at each quantile specifically, suggests that returns to experience on income are less important at low level of income and grow at a higher rate at higher quantile. (the time trend by quantile being higher at higher quantile)

\(~\)

4 - Provide estimates for the the marginal effects for years of education for the same three groups. Discuss briefly your findings.

\(~\)

# Quantile regression by year

qreg.income.educ <- PS_qnt_incomes %>%
   mutate(Educ.group = case_when(exper == 5 ~ 'New entrants',
                                 exper == 15 ~ 'Mid career',
                                 exper == 25 ~ 'Prime earners')) %>%
   na.omit() %>%
   group_by(year, Educ.group) %>%
   do(qreg.income.educ = tidy(rq(hincome ~ educ + 
                            as.factor(nonwhite), data = ., tau=c(0.10, 0.25, 0.50, 0.75, 0.90)
                            ))) %>%
   unnest(qreg.income.educ) %>%
   select(year, Educ.group, term, estimate, tau) %>%
   filter(term == "educ")

paged_table(qreg.income.educ)
# Plotting Quantile regression by year

plot.4 <- qreg.income.educ %>%
  ggplot(aes(x=year, y=estimate, color=Educ.group)) +
  geom_line()  +
  geom_point(size=0.4) +
  theme(axis.text = element_text(angle = 90)) +
  facet_wrap(~`tau`) +
  scale_color_manual(values = c("#D12380", "#C02176",
                                "#F0A8CE", "#570F35",
                                "#8C1855", "#7A154B",
                                "#691240", "#ED96C5"))

# Output

ggplotly(plot.4)

\(~\)

Overall, the returns to education between each group seem to be relatively close over the time range studied within a given quantile. Additionally, the returns are higher at the higher quantiles across groups. Finally, the time trend by quantile indicates that returns to education between each group seem to increase with the quantile studied. (the higher the quantile the higher the slope of the time trend)

\(~\)

5 - Provide a test for \(H_0 : \text{ "All the returns to education are the same across quantiles in years 1996 and 2005"}\) . You can use a predefined command but you should explain all the steps to construct the statistic.

\(~\)

beta.educ.diff <- qreg.income %>%
    select(year, term, statistic, tau) %>%
    filter(term=="educ") %>%
    group_by(year) %>%
    pivot_wider(names_from = tau, names_prefix = "sd_", values_from = statistic)

paged_table(beta.educ.diff)

\(~\)

\(~\)

6 - Based on your results, would you say the following affirmations are true?

\(~\)

  1. The mean return to education and the returns at the studied quantiles changed in a similar way

\(~\)

As seen in question 4, the analysis by quantile seems to suggest different patterns (at the quantile studied). Therefore we can assume that variables operate the same at the upper tails of the distribution as at the mean.

\(~\)

  1. The returns to education at these quantiles differ significantly. In general, the returns are higher at the higher quantiles.

\(~\)

TRUE: As we have seen in question 4 and on the graph

\(~\)

  1. During the 2000’s, there were sharp increases in the returns to education, with greater in- creases at the higher quantiles.

\(~\)

FALSE (cf question 4)