My research explores the relationship between parents’ educational attainment and their children’s educational success, delving into the dynamics of social mobility and educational equity. This study aims to provide valuable insights that could inform policy formulation and contribute to ongoing debates about inequality and social stratification within the educational sphere.

The central research question guiding this study is: “What is the relationship between parents’ educational attainment and their children’s educational attainment?”

Understanding this relationship is crucial for several reasons. Firstly, it allows for a deeper exploration of social mobility. By examining how parents’ educational backgrounds influence their children’s educational outcomes, we can better understand the mechanisms that either facilitate or hinder social mobility. If the study reveals that children of highly educated parents are more likely to succeed educationally, it may indicate limited social mobility, thereby reinforcing existing social inequalities. Additionally, the research aims to illuminate disparities in educational resources and opportunities available to children from different familial educational backgrounds. These inequities are not limited to material resources but also extend to cultural and social capital, which play a significant role in shaping educational outcomes. The findings from this study can provide a scientific foundation for educational policy interventions(Little, 2016). Governments and educational institutions can utilize this research to design targeted policies that support disadvantaged families, aiming to reduce the educational gap and promote equity in education.

Theoretically, this research builds on established theories of social capital and cultural capital, which posit that parents’ educational backgrounds significantly influence their children’s educational outcomes. By empirically examining this relationship, the study contributes to a deeper understanding of these theoretical frameworks. In terms of policy impact, the insights gained from this study can inform the development of policies aimed at addressing systemic inequalities in education. Targeted interventions could help mitigate the barriers faced by children from less educated backgrounds, fostering a more equitable educational landscape. Ultimately, the research seeks to provide a clearer picture of how educational attainment is intergenerationally transmitted(Little, 2016). This understanding can help in formulating strategies to promote social mobility and reduce educational disparities, contributing to a more just and equitable society. By investigating the link between parents’ and children’s educational attainment, this study aims to contribute meaningfully to the literature on social mobility and educational equity, offering data-driven insights that can drive effective policy changes.

The dataset used for this analysis comes from the General Social Survey (GSS), a biennial survey that collects demographic characteristics and attitudes data from residents of the United States. For this study, I utilized responses from the most recent GSS survey, containing 72,390 observations. The GSS data is suitable for this research as it provides detailed variables on the educational attainment of both parents and children. The dataset, gss2022.Rdata, was downloaded from the SOC252 course website(https://q.utoronto.ca/courses/345685/modules/items/5815903) and includes GSS data from 1972 to 2021.

This sample is appropriate given the research question, “What is the relationship between parents’ educational attainment and their children’s educational attainment?” The extensive temporal range and large sample size allow for a robust analysis of intergenerational educational attainment across different periods and demographic groups in the United States.

The main variables of interest for this analysis include:

Responder’s Education (educ):Interval variable indicating the highest year of education attained by the respondent. Mother’s Education (maeduc): Interval variable indicating the highest year of education attained by the respondent’s mother. Father’s Education (paeduc): Interval variable indicating the highest year of education attained by the respondent’s father. Sex (sex): Categorical variable indicating the respondent’s sex (male or female). Race (race): Categorical variable indicating the respondent’s race.

To prepare the data for analysis, I recoded all “idk” (I don’t know) responses in the variables race, sex, educ, maeduc, and paeduc to NA (missing values) to ensure accuracy in the dataset. After recoding, all NA values were removed from the dataset to prevent any biases or inaccuracies in the analysis, resulting in a final dataset of 45,522 valid samples.

load("gss2022.Rdata")
gss <- df
gss <- gss %>% filter(educ >= 0) %>%
filter(!is.na(educ)) %>%
filter(paeduc >= 0) %>%
filter(!is.na(paeduc)) %>%
filter(maeduc >= 0) %>%
filter(!is.na(maeduc)) %>%
filter(!is.na(sex)) %>%
filter(!is.na(race))

Table 1 presents the statistics for educ, paeduc, and maeduc, including measures of central tendency (mean, median) and dispersion (standard deviation, range).

summary_stats <- gss %>%
  summarise(
    Variable = c("Respondent's Education (years)", "Father's Education (years)", "Mother's Education (years)"),
    Mean = round(c(mean(educ), mean(paeduc), mean(maeduc)), 2),
    Median = c(median(educ), median(paeduc), median(maeduc)),
    SD = round(c(sd(educ), sd(paeduc), sd(maeduc)), 2),
    Min = c(min(educ), min(paeduc), min(maeduc)),
    Max = c(max(educ), max(paeduc), max(maeduc)),
    Range = c(max(educ) - min(educ), max(paeduc) - min(paeduc), max(maeduc) - min(maeduc))
  )
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
kable(summary_stats, col.names = c("Variable", "Mean", "Median", "SD", "Min", "Max", "Range")) %>%
  kable_styling(full_width = FALSE, position = "center") %>%
  add_header_above(c(" " = 1, "Table 1: Statistics of Respondent's Education, Father's Education and Mother's Education" = 6 ))
Table 1: Statistics of Respondent’s Education, Father’s Education and Mother’s Education
Variable Mean Median SD Min Max Range
Respondent’s Education (years) 13.52 13 3.04 0 20 20
Father’s Education (years) 10.92 12 4.31 0 20 20
Mother’s Education (years) 11.06 12 3.72 0 20 20

Table 2 presents the statistics for respondent’s race and sex, including frequency and proportion.

gss_cleaned <- gss %>%
  mutate(
    race = case_when(
      race %in% c("white", "black", "other") ~ race,
      TRUE ~ NA_character_
    ),
    sex = case_when(
      sex %in% c("male", "female") ~ sex,
      TRUE ~ NA_character_
    )
  )

gss_cleaned <- gss_cleaned %>%
  dplyr::select(race, sex, degree)


categorical_summary <- datasummary_skim(gss_cleaned, type = "categorical")



gss_cleaned <- gss_cleaned %>%
  rename(
    "Respondent Race" = race,
    "Respondent Sex" = sex
  )

categorical_summary_flextable <- datasummary_skim(
  gss_cleaned %>%
    dplyr::select(`Respondent Race`, `Respondent Sex`),
  type = "categorical",
  output = "flextable"
)
## Warning: Inline histograms in `datasummary_skim()` are only supported for tables
##   produced by the `tinytable` backend.
categorical_summary_flextable <- categorical_summary_flextable %>%
  set_header_labels(Variable = "Variable", Value = "Value", Freq = "Frequency") %>%
  theme_box() %>%
  bold(part = "header") %>%
  bg(part = "header", bg = "#4CAF50") %>%
  color(part = "header", color = "white") %>%
  border_remove() %>%
  border_inner_v(border = fp_border(color = "black", width = 1)) %>%
  set_caption("Table 2:statistics for respondent's race and sex")

categorical_summary_flextable
Table 2:statistics for respondent's race and sex

N

%

Respondent Race

black

4484

9.9

other

2460

5.4

white

38578

84.7

Respondent Sex

female

24922

54.7

male

20600

45.3

This scatter plot(Table 3) with a fitted line illustrates the relationship between a father’s years of education (x-axis) and the child’s years of education (y-axis). Each point on the graph represents an individual pair of father and child, showing their respective education levels. The red fitted line represents a linear regression that captures the overall trend in the data. The upward slope of the red regression line indicates a positive correlation between the father’s and the child’s years of education. This means that, generally, as the father’s educational attainment increases, the child’s educational attainment also tends to be higher.

ggplot(gss, aes(x = paeduc, y = educ)) +
  geom_jitter(alpha = 0.1, color = "black", size = 1, width = 0.1, height = 0.2) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Table 3:Relationship Between Father's Education and Child's Education",
       x = "Father's Education (years)",
       y = "Child's Education (years)") 
## `geom_smooth()` using formula = 'y ~ x'

This bar chart (Table 4) illustrates the distribution of education levels across different genders. The x-axis represents various education groups, including Elementary School, Middle School, High School, Undergraduate, and Graduate or Higher. The y-axis shows the proportion of each gender within each education group. Overall, the visualization reveals that across all education levels, females tend to have a slightly higher representation compared to males, suggesting a trend where females are achieving higher educational levels proportionally compared to males.This suggests that gender is an important factor to consider when examining the intergenerational transmission of educational attainment.

gss_filtered <- gss %>%
  mutate(educ_group = case_when(
    educ >= 0 & educ <= 6 ~ "Elementary school",
    educ >= 7 & educ <= 9 ~ "Middle school",
    educ >= 10 & educ <= 12 ~ "High school",
    educ >= 13 & educ <= 16 ~ "Undergraduate",
    educ >= 17 ~ "Graduate or Higher"
  )
  ) 
gss_filtered$educ_group <- factor(gss_filtered$educ_group, levels = c("Elementary school", "Middle school", "High school", "Undergraduate", "Graduate or Higher"))


ggplot(gss_filtered, aes(x = educ_group, fill = sex)) +
  geom_bar(position = "fill") +
  labs(
    title = "Table 4:Distribution of Education Level Across Sex",
    x = "Education Group",
    y = "Proportion"
  )

gss_cleaned <- gss %>% select(educ, paeduc, maeduc, sex, race)
gss_scaled <- gss_cleaned %>%
  mutate(
    educ = r2sd(educ),
    paeduc = r2sd(paeduc),
    maeduc = r2sd(maeduc)
  )

model1 <- lm(educ ~ paeduc, data = gss_cleaned)
model2 <- lm(educ ~ maeduc, data = gss_cleaned)
model3 <- lm(educ ~ maeduc + paeduc, data = gss_cleaned)
model4 <- lm(educ ~ maeduc + paeduc + sex + race, data = gss_scaled)

model_list <- list("Model 1: Father's Education" = model1, 
               "Model 2: Mather's Education" = model2,
               "Model 3: Father and Mather's Education" = model3,
               "Model 4: Full models(standardized)" = model4)
modelsummary(model_list)
tinytable_sms4qhxf5owd9fiftq71
Model 1: Father's Education Model 2: Mather's Education Model 3: Father and Mather's Education Model 4: Full models(standardized)
(Intercept) 9.770 9.225 8.778 0.012
(0.034) (0.039) (0.039) (0.003)
paeduc 0.344 0.215 0.300
(0.003) (0.004) (0.006)
maeduc 0.388 0.216 0.268
(0.003) (0.005) (0.006)
sexfemale -0.020
(0.004)
raceblack -0.048
(0.007)
raceother 0.067
(0.009)
Num.Obs. 45522 45522 45522 45522
R2 0.237 0.225 0.274 0.276
R2 Adj. 0.237 0.225 0.274 0.276
AIC 218217.3 218946.8 215977.1 51377.3
BIC 218243.5 218973.0 216012.0 51438.4
Log.Lik. -109105.663 -109470.396 -107984.527 -25681.653
RMSE 2.66 2.68 2.59 0.43

In the first model, I examine the impact of the father’s education on the child’s educational attainment. The intercept is 9.770, indicating that the expected years of education for a child is approximately 9.770 years when the father’s education is zero. This suggests that even with no formal education from the father, children are expected to achieve nearly 10 years of education. The coefficient for the father’s education (paeduc) is 0.344, meaning that for each additional year of the father’s education, the child’s educational attainment increases by 0.344 years. This relationship is highly significant (p-value < 0.001), indicating a strong positive relationship between the father’s education and the child’s educational attainment. The R-squared value for this model is 0.237, suggesting that 23.7% of the variance in the child’s educational attainment is explained by the father’s education alone.

In the second model, I focus on the mother’s education as the sole predictor of the child’s educational attainment. The intercept is 9.225, suggesting that the expected years of education for a child is approximately 9.225 years when the mother’s education is zero. The coefficient for the mother’s education (maeduc) is 0.388, indicating that each additional year of the mother’s education is associated with a 0.388 year increase in the child’s educational attainment. This relationship is also highly significant (p-value < 0.001), emphasizing the importance of the mother’s education in shaping the child’s educational outcomes. The R-squared value for this model is 0.225, meaning that 22.5% of the variance in the child’s educational attainment is explained by the mother’s education alone.

The third model incorporates both the father’s and mother’s education as predictors of the child’s educational attainment. The intercept is 8.778, indicating that when both parents have zero years of education, the expected years of the child’s education is 8.778 years. The coefficient for the father’s education (paeduc) decreases to 0.215, while the coefficient for the mother’s education (maeduc) decreases to 0.216 when both are included in the model. These results suggest that both parents’ education levels have significant and nearly equal impacts on the child’s educational attainment, with each additional year of a parent’s education contributing approximately 0.215 to 0.216 years to the child’s educational attainment. The R-squared value for this model is 0.274, indicating that 27.4% of the variance in the child’s educational attainment is explained by the combined education levels of both parents.

The final model expands the analysis by including gender and race, in addition to the parents’ education levels, as predictors of the child’s educational attainment. In this model, all variables except for gender and race are standardized. The intercept is 0.012, reflecting the baseline education level when all predictors are at their means. The standardized coefficient for the father’s education (paeduc) is 0.300, and for the mother’s education (maeduc) is 0.268, indicating that both remain strong predictors of the child’s educational attainment even when controlling for gender and race. The coefficient for being female (sexfemale) is -0.020, suggesting that females have slightly lower educational attainment compared to males. The coefficients for race show that being Black (raceblack) is associated with a decrease of 0.048 standard deviations in educational attainment, while being of another race (raceother) is associated with an increase of 0.067 standard deviations compared to being White. The R-squared value for this model is 0.276, showing that 27.6% of the variance in the child’s educational attainment is explained by the combination of parents’ education, gender, and race.

Comparing the adjusted R-squared values across the four models provides insights into the explanatory power of each model. Model 1, focusing on the father’s education, explains 23.7% of the variance in the child’s educational attainment. Model 2, focusing on the mother’s education, explains 22.5% of the variance. Model 3, which includes both parents’ education, increases the explanatory power to 27.4%. Finally, Model 4, which adds gender and race to the model, explains 27.6% of the variance in the child’s educational attainment. This comparison suggests that while both parents’ education levels are strong predictors, including additional factors such as gender and race slightly enhances the model’s explanatory power.

This regression model also relates to my hypotheses. My first hypothesis is that “Higher levels of parental education will positively influence a child’s educational attainment.” The results strongly support this hypothesis. Both father’s and mother’s education have significant positive effects on the child’s educational attainment across all models. Even after controlling for other factors, such as gender and race in Model 4, the effects remain robust.

My second hypothesis is that “There is a positive influence of parents’ educational level on children’s educational level, with the mother’s effect being stronger.” Based on the coefficients from Model 3, the mother’s coefficient (0.216) is slightly higher than the father’s coefficient (0.215). This finding supports my hypothesis and suggests that the effect of the mother’s education is greater than the father’s, although the difference is minimal.

From the exploratory data analysis and initial regression models, a clear analytical narrative is beginning to take shape. The data suggest a strong and consistent relationship between parental education levels and their children’s educational attainment. Both father’s and mother’s education have significant positive impacts on the child’s educational outcomes, with each additional year of parental education leading to a notable increase in the child’s years of education.

The inclusion of gender and race in the full model further reveals subtle yet important disparities. Gender appears to play a small role, with females showing slightly lower educational attainment compared to males. Racial disparities are more pronounced, particularly for Black children, who tend to have lower educational outcomes compared to their White peers. Conversely, children from other racial backgrounds (neither Black nor White) seem to have slight educational advantages.

The models so far have focused primarily on parental education, gender, and race. However, other potentially influential factors, such as socio economic status(Little, 2016), parental occupation, and access to educational resources, have not yet been incorporated. This omission could lead to biased estimates. Also, The educational attainment variables are measured in years, which may not fully capture the quality or type of education received. For instance, the same number of years in different educational systems or settings could lead to different outcomes.Also It’s possible that the impact of parental education on a child’s educational attainment diminishes as the level of parental education increases. For example, the difference between having a parent with no education and one with a high school education might be more impactful than the difference between a parent with a bachelor’s degree and one with a master’s degree.

To address these limitations and further explore the research question, the following steps are planned: First, additional variables such as socioeconomic status(Little, 2016), which includes family income, parental occupation, and wealth, will be incorporated to control for the broader socioeconomic context that influences educational decisions. Secondly, I will try to use more flexible models to model, such as GAM to explore nonlinear relationships

These next steps are designed to address the current limitations of the analysis and to provide a more nuanced understanding of the factors influencing educational attainment. By incorporating additional variables and testing for non-linear and interaction effects the analysis will become more comprehensive and the findings more reliable. This expanded analysis will not only refine the emerging story of how parental education shapes children’s educational outcomes but also deepen the exploration of gender and racial disparities within this context.

Refence

Little, William. 2016. Introduction to Sociology – 2nd Canadian Edition Victoria, B.C.: BCcampus. https://opentextbc.ca/introductiontosociology/