Let’s work a bit more with the data set on resumes from (Oreopoulos, 2011) that we discussed in the last recitation.
First, read in the Oreopoulos data into R.
summary_table <- data %>%
group_by(occupation_type, name_ethnicity) %>%
summarize(total_callbacks = sum(callback, na.rm = TRUE)
+ sum(second_callback, na.rm = TRUE), .groups = 'drop') %>%
arrange(desc(total_callbacks))
#install.packages("kableExtra")
library(kableExtra)
summary_table %>%
kable("html",
caption = "Total Callbacks for Each Occupation Type and Ethnicity Combination",
col.names = c("Occupation Type", "Ethnicity", "Total Callbacks")) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Occupation Type | Ethnicity | Total Callbacks |
|---|---|---|
| Finance | Canada | 112 |
| Marketing and Sales | Canada | 85 |
| Retail | Canada | 74 |
| Finance | Indian | 66 |
| Retail | Indian | 60 |
| Administrative | Canada | 58 |
| Finance | Chinese | 57 |
| Marketing and Sales | Indian | 48 |
| Retail | Chinese | 48 |
| Marketing and Sales | Chinese | 47 |
| Programmer | Canada | 45 |
| Programmer | Indian | 42 |
| Programmer | Chinese | 36 |
| Insurance | Indian | 34 |
| Administrative | Chinese | 32 |
| Insurance | Canada | 32 |
| Marketing and Sales | British | 32 |
| Administrative | Indian | 27 |
| Retail | British | 27 |
| Retail | Chn-Cdn | 27 |
| Marketing and Sales | Chn-Cdn | 25 |
| Marketing and Sales | Pakistani | 24 |
| Executive Assisstant | Canada | 23 |
| Civil Engineer | Canada | 22 |
| Clerical | Indian | 17 |
| Electrical Engineer | Canada | 17 |
| Finance | Chn-Cdn | 17 |
| Clerical | Chinese | 16 |
| Finance | British | 16 |
| Insurance | Chinese | 16 |
| Marketing and Sales | Greek | 16 |
| Administrative | British | 15 |
| Retail | Pakistani | 15 |
| Programmer | Chn-Cdn | 14 |
| Administrative | Chn-Cdn | 13 |
| Civil Engineer | Indian | 12 |
| Clerical | Canada | 12 |
| Insurance | Chn-Cdn | 12 |
| Civil Engineer | Chinese | 11 |
| Executive Assisstant | Indian | 11 |
| Technology | Canada | 10 |
| Accounting | Chinese | 9 |
| Administrative | Greek | 9 |
| Executive Assisstant | Chinese | 8 |
| Finance | Pakistani | 8 |
| Human Resources Payroll | Canada | 8 |
| Accounting | Canada | 7 |
| Administrative | Pakistani | 7 |
| Executive Assistant | Canada | 7 |
| Insurance | Pakistani | 7 |
| Programmer | British | 7 |
| Accounting | Indian | 6 |
| Education | Canada | 6 |
| Electrical Engineer | Indian | 6 |
| Executive Assistant | Chinese | 6 |
| Insurance | British | 6 |
| Ecommerce | Chinese | 5 |
| Electrical Engineer | Chn-Cdn | 5 |
| Executive Assisstant | British | 5 |
| Executive Assistant | Indian | 5 |
| Human Resources Payroll | Indian | 5 |
| Maintenance Technician | Canada | 5 |
| Production | Indian | 5 |
| Programmer | Pakistani | 5 |
| Retail | Greek | 5 |
| Social Worker | Canada | 5 |
| Technology | British | 5 |
| Technology | Indian | 5 |
| Biotech and Pharmacy | Canada | 4 |
| Biotech and Pharmacy | Indian | 4 |
| Civil Engineer | Chn-Cdn | 4 |
| Clerical | Chn-Cdn | 4 |
| Electrical Engineer | British | 4 |
| Electrical Engineer | Chinese | 4 |
| Electrical Engineer | Pakistani | 4 |
| Maintenance Technician | British | 4 |
| Technology | Chn-Cdn | 4 |
| Accounting | Greek | 3 |
| Clerical | Greek | 3 |
| Education | British | 3 |
| Education | Indian | 3 |
| Executive Assisstant | Pakistani | 3 |
| Human Resources Payroll | Chn-Cdn | 3 |
| Maintenance Technician | Indian | 3 |
| Programmer | Greek | 3 |
| Technology | Chinese | 3 |
| Ecommerce | Canada | 2 |
| Executive Assisstant | Chn-Cdn | 2 |
| Executive Assistant | Chn-Cdn | 2 |
| Food Services Managers | Canada | 2 |
| Food Services Managers | Chn-Cdn | 2 |
| Human Resources Payroll | British | 2 |
| Maintenance Technician | Chn-Cdn | 2 |
| Maintenance Technician | Pakistani | 2 |
| Media and Arts | Canada | 2 |
| Media and Arts | Chinese | 2 |
| Media and Arts | Indian | 2 |
| Production | Canada | 2 |
| Production | Chinese | 2 |
| Technology | Greek | 2 |
| Technology | Pakistani | 2 |
| Accounting | Chn-Cdn | 1 |
| Biotech and Pharmacy | Chinese | 1 |
| Civil Engineer | British | 1 |
| Civil Engineer | Pakistani | 1 |
| Ecommerce | British | 1 |
| Ecommerce | Indian | 1 |
| Electrical Engineer | Greek | 1 |
| Executive Assistant | Greek | 1 |
| Finance | Greek | 1 |
| Food Services Managers | Chinese | 1 |
| Food Services Managers | Indian | 1 |
| Maintenance Technician | Chinese | 1 |
| Social Worker | Chinese | 1 |
| Biotech and Pharmacy | British | 0 |
| Biotech and Pharmacy | Chn-Cdn | 0 |
| Biotech and Pharmacy | Pakistani | 0 |
| Civil Engineer | Greek | 0 |
| Ecommerce | Chn-Cdn | 0 |
| Ecommerce | Greek | 0 |
| Ecommerce | Pakistani | 0 |
| Education | Chinese | 0 |
| Education | Chn-Cdn | 0 |
| Education | Pakistani | 0 |
| Food Services Managers | Greek | 0 |
| Human Resources Payroll | Chinese | 0 |
| Human Resources Payroll | Greek | 0 |
| Human Resources Payroll | Pakistani | 0 |
| Insurance | Greek | 0 |
| Production | British | 0 |
| Production | Chn-Cdn | 0 |
| Production | Greek | 0 |
| Production | Pakistani | 0 |
| Social Worker | British | 0 |
| Social Worker | Chn-Cdn | 0 |
| Social Worker | Indian | 0 |
| Social Worker | Pakistani | 0 |
wide_table <- summary_table %>%
pivot_wider(names_from = name_ethnicity, values_from = total_callbacks, values_fill = list(total_callbacks = 0))
colnames(wide_table) <- c("Occupation Type", "Canadian", "Indian", "Chinese", "British", "Chinese - Canadian", "Pakistani", "Greek")
wide_table %>%
kable("html", caption = "Total Callbacks for Each Occupation Type and Ethnicity Combination") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Occupation Type | Canadian | Indian | Chinese | British | Chinese - Canadian | Pakistani | Greek |
|---|---|---|---|---|---|---|---|
| Finance | 112 | 66 | 57 | 16 | 17 | 8 | 1 |
| Marketing and Sales | 85 | 48 | 47 | 32 | 25 | 24 | 16 |
| Retail | 74 | 60 | 48 | 27 | 27 | 15 | 5 |
| Administrative | 58 | 27 | 32 | 15 | 13 | 7 | 9 |
| Programmer | 45 | 42 | 36 | 7 | 14 | 5 | 3 |
| Insurance | 32 | 34 | 16 | 6 | 12 | 7 | 0 |
| Executive Assisstant | 23 | 11 | 8 | 5 | 2 | 3 | 0 |
| Civil Engineer | 22 | 12 | 11 | 1 | 4 | 1 | 0 |
| Clerical | 12 | 17 | 16 | 0 | 4 | 0 | 3 |
| Electrical Engineer | 17 | 6 | 4 | 4 | 5 | 4 | 1 |
| Technology | 10 | 5 | 3 | 5 | 4 | 2 | 2 |
| Accounting | 7 | 6 | 9 | 0 | 1 | 0 | 3 |
| Human Resources Payroll | 8 | 5 | 0 | 2 | 3 | 0 | 0 |
| Executive Assistant | 7 | 5 | 6 | 0 | 2 | 0 | 1 |
| Education | 6 | 3 | 0 | 3 | 0 | 0 | 0 |
| Ecommerce | 2 | 1 | 5 | 1 | 0 | 0 | 0 |
| Maintenance Technician | 5 | 3 | 1 | 4 | 2 | 2 | 0 |
| Production | 2 | 5 | 2 | 0 | 0 | 0 | 0 |
| Social Worker | 5 | 0 | 1 | 0 | 0 | 0 | 0 |
| Biotech and Pharmacy | 4 | 4 | 1 | 0 | 0 | 0 | 0 |
| Food Services Managers | 2 | 1 | 1 | 0 | 2 | 0 | 0 |
| Media and Arts | 2 | 2 | 2 | 0 | 0 | 0 | 0 |
As this regression involves a dependent variable that is a binary variable, using a linear regression might model probabilities outside the probabilities of 0 (no call back) and 1 (received a call back). Instead, I opted for a logistic regression which models the probability of an event occurring and ensures that the predicted probabilities remains bound within the 0 to 1 range.
model1 <- glm(callback ~ name_ethnicity + skillspeaking + skillwriting + skillsocialper,
data = data, family = binomial)
stargazer(model1,
type = 'html',
header = FALSE,
dep.var.labels = "Got a Callback",
covariate.labels = c("Canadian", "Chinese", "Chinese-Canadian", "Greek",
"Indian", "Pakistani", "Speaking Skills", "Writing Skills", "Social Skills", "Intercept (British)"),
style = 'qje',
title = 'Regression Results: Callbacks across Ethnicities and Skills' ,
notes = "",
column.sep.width = "1pt",
out = "summary_table.html",
coef = list(exp(coef(model1))))| Got a Callback | |
| Canadian | 1.337*** |
| (0.118) | |
| Chinese | 0.800*** |
| (0.125) | |
| Chinese-Canadian | 0.640*** |
| (0.146) | |
| Greek | 0.882*** |
| (0.204) | |
| Indian | 0.755*** |
| (0.123) | |
| Pakistani | 0.582*** |
| (0.167) | |
| Speaking Skills | 1.027*** |
| (0.005) | |
| Writing Skills | 0.988*** |
| (0.003) | |
| Social Skills | 0.996*** |
| (0.004) | |
| Intercept (British) | 0.052 |
| (0.286) | |
| N | 12,897 |
| Log Likelihood | -4,110.432 |
| Akaike Inf. Crit. | 8,240.865 |
| Notes: | ***Significant at the 1 percent level. |
| **Significant at the 5 percent level. | |
| *Significant at the 10 percent level. | |
Given that we ran a logistic regression, the results are in the form of log odds. Exponentiating the log odds gives us odds ratios, a clearer picture of how much more (or less) likely something is to happen in response to a predictor variable. Odds ratios represent the odds of an event occurring in one group relative to the odds of it occurring in another group, with values greater than 1 indicating higher odds and values less than 1 indicating lower odds.
Intercept (British):: The British candidates serve as the reference group, against which we can compare all the other candidates. Given that the log odds have been exponentiated, despite the intercept value of 0.052, the odds ratio is essentially 1, since the model compares all other ethnicities against British candidates.
Canadian candidate (1.337) : Candidates with a Canadian ethnicity have 33.7% higher odds of getting a callback compared to British candidates. This is statistically significant at the 1% level.
Chinese candidate (0.800): Candidates with a Chinese ethnicity have 20% lower odds of getting a callback compared to British candidates. This is statistically significant at the 1% level.
Chinese-Canadian candidate (0.640): Candidates with a Chinese-Canadian ethnicity have 36% lower odds of receiving a callback compared to British candidates. This is statistically significant at the 1% level.
Greek candidate (0.882): Greek candidates have 11.8% lower odds of receiving a callback compared to British candidates. This is statistically significant at the 1% level.
Indian candidate (0.755): Indian candidates have 24.5% lower odds of getting a callback compared to British candidates. This is statistically significant at the 1% level.
Pakistani candidate (0.582): Pakistani candidates have 41.8% lower odds of getting a callback compared to British candidates. This is statistically significant at the 1% level.
Speaking Skills (1.027): For each one-unit increase in speaking skills, the odds of receiving a callback increase by 2.7%. This is statistically significant at the 1% level, meaning speaking skills have a positive effect on the likelihood of getting a callback.
Writing Skills (0.988): For each one-unit increase in writing skills, the odds of receiving a callback decrease by 1.2%. This is statistically significant at the 1% level, meaning writing skills have a negative, but small, effect on the likelihood of getting a callback.
Social-Personal Skills (0.996): For each one-unit increase in social skills, the odds of receiving a callback decrease by 0.4%. This is statistically significant at the 1% level, but the effect is very small and likely negligible in practical terms.
programmerdata <- subset(data, occupation_type == "Programmer")
model_Prog <- glm(callback ~ name_ethnicity + language_skills,
data = programmerdata, family = binomial)retaildata <- subset(data, occupation_type == "Retail")
model_Retail <- glm(callback ~ name_ethnicity + language_skills,
data = retaildata, family = binomial)stargazer(model2, model_Prog, model_Retail,
type = 'html',
header = FALSE,
dep.var.labels = "Got a Callback",
covariate.labels = c( "Canadian", "Chinese", "Chinese-Canadian", "Greek",
"Indian", "Pakistani", "Language Skills" , "Intercept(British)"),
style = 'qje',
title = 'Regression Results: Callbacks and Language Skills by Ethnicity and Occupation',
notes.align = "l",
column.labels = c("All Jobs", "Programmer Jobs", "Retail Jobs"),
coef = list(exp(coef(model2)), exp(coef(model_Prog)), exp(coef(model_Retail))),
out = "model_summary_table.html")| Got a Callback | |||
| All Jobs | Programmer Jobs | Retail Jobs | |
| (1) | (2) | (3) | |
| Canadian | 1.314*** | 1.851*** | 0.830*** |
| (0.118) | (0.497) | (0.275) | |
| Chinese | 0.794*** | 1.724*** | 0.515* |
| (0.124) | (0.500) | (0.290) | |
| Chinese-Canadian | 0.639*** | 1.313** | 0.686** |
| (0.146) | (0.562) | (0.329) | |
| Greek | 0.883*** | 1.455* | 0.447 |
| (0.203) | (0.767) | (0.582) | |
| Indian | 0.745*** | 1.611*** | 0.480* |
| (0.122) | (0.498) | (0.286) | |
| Pakistani | 0.585*** | 0.850 | 0.447 |
| (0.166) | (0.655) | (0.398) | |
| Language Skills | 1.202*** | 1.163*** | 1.311*** |
| (0.066) | (0.210) | (0.175) | |
| Intercept(British) | 0.120 | 0.075 | 0.270 |
| (0.106) | (0.468) | (0.241) | |
| N | 12,910 | 1,172 | 1,391 |
| Log Likelihood | -4,137.972 | -401.585 | -582.409 |
| Akaike Inf. Crit. | 8,291.944 | 819.171 | 1,180.819 |
| Notes: | ***Significant at the 1 percent level. | ||
| **Significant at the 5 percent level. | |||
| *Significant at the 10 percent level. | |||
Canadian candidate: Significant Positive effect in Programmer Jobs (+85.1%) and All Jobs (+31.4%) Significant Negative effect in Retail Jobs (-17%)
Chinese candidate: Significant Positive effect in Programmer Jobs (+72.4%). Significant Negative effect in All Jobs (-20.6%) and Retail Jobs (-48.5%)
Chinese-Canadian candidate: Significant Positive effect in Programmer Jobs (+31.3%). Significant Negative effect in All Jobs (-36.1%) and Retail Jobs (-31.4%).
Greek candidate: No significant effect in Retail Jobs. Positive effect in Programmer Jobs (+45.5%). Significant negative effect in All jobs (-11.7%).
Indian candidate: Significant Negative effect in All Jobs (-25.5%). Significant Positive effect in Programmer Jobs (+61.1%). Significant Negative effect in Retail Jobs (-52%).
Pakistani candidate: Significant Negative effect in All Jobs (-41.5%). No significant effect in Programmer Jobs (-15.0%) or Retail Jobs (-55.3%).
Language Skills: All Jobs: 20.2% increase in odds per unit increase in language skills.
Programmer Jobs: 16.3% increase in odds per unit increase in language skills.
Retail Jobs: 31.1% increase in odds per unit increase in language skills.
Language Skills positively affect the likelihood of getting a callback for all job types, programmer jobs, and retail jobs.
Programmer Jobs: Candidates from ethnic groups like Canadian, Chinese, and Indian show significantly higher odds of receiving a callback in technical roles. Pakistani candidates, however, do not face significant disadvantages or advantages in this category, as their results are not statistically significant (p-value = 0.679).
Retail Jobs: While Pakistani candidates show no significant effect in retail jobs (p-value = 0.398), Chinese, Chinese-Canadian, and Indian candidates experience significantly lower odds of receiving callbacks for retail positions, indicating an ethnic disparity in customer-facing roles. Given how all coefficients are lesser than the refernce group (british), it could be likely that employment is down for teh retail industry during the time period of this data collection.
All Jobs: Only Canadians experience significant positive effect (31.4%) in hearing callbacks across jobs and everything other ethnicity experiences lower odds of hearing a callback compared to the reference British group.
Employers’ hiring decisions are influenced by both ethnic background and the specific job type, with ethnic disparities more pronounced in general and retail roles. In technical roles, such as programming, ethnic background seems to play a lesser role, suggesting that employers and the hiring process may prioritize technical skills over cultural fit in these contexts. Assumptions such as stereotypes (e.g.: Indians being good at programming roles) may also play a role in the hiring process. However, unconscious biases and preferences likely affect hiring for customer-facing positions, where ethnicity may impact employers’ perceptions of a candidate’s suitability for the role. Language skills matter for all roles, suggesting that ability to communicate is a necessity in the employer’s outlook regardless of the specific role.
programmerinteraction <- glm(callback ~ name_ethnicity * language_skills, data = programmerdata, family = binomial)retailinteraction <- glm(callback ~ name_ethnicity * language_skills, data = retaildata, family = binomial)stargazer(model3, programmerinteraction, retailinteraction,
type = 'html',
header = FALSE,
dep.var.labels = "Got a Callback",
covariate.labels = c("Canadian", "Chinese", "Chinese-Canadian", "Greek",
"Indian", "Pakistani", "Language Skills", "Canadian*Language Skills", "Chinese*Language Skills" ,
"Chinese Canadian*Language Skills", "Greek*Language Skills",
"Indian*Language Skills", "Pakistani*Language Skills", "British (Intercept)"),
style = 'qje',
title = 'Regression Results: Interaction Between Ethnicity and Language Skills for Callbacks',
column.labels = c("All Jobs", "Programmer Jobs", "Retail Jobs"),
coef = list(exp(coef(model3)), exp(coef(programmerinteraction)), exp(coef(retailinteraction))),
out = "interaction_model_summary_table.html")| Got a Callback | |||
| All Jobs | Programmer Jobs | Retail Jobs | |
| (1) | (2) | (3) | |
| Canadian | 1.269*** | 1.579*** | 0.764** |
| (0.131) | (0.562) | (0.307) | |
| Chinese | 0.789*** | 1.705*** | 0.430 |
| (0.139) | (0.561) | (0.327) | |
| Chinese-Canadian | 0.594*** | 1.215* | 0.514 |
| (0.165) | (0.639) | (0.376) | |
| Greek | 0.881*** | 1.412 | 0.257 |
| (0.241) | (0.911) | (0.780) | |
| Indian | 0.687*** | 1.419** | 0.453 |
| (0.138) | (0.563) | (0.320) | |
| Pakistani | 0.484** | 0.571 | 0.253 |
| (0.194) | (0.787) | (0.499) | |
| Language Skills | 0.943*** | 0.750 | 0.676 |
| (0.275) | (1.155) | (0.610) | |
| Canadian*Language Skills | 1.222*** | 1.889 | 1.501** |
| (0.299) | (1.219) | (0.697) | |
| Chinese*Language Skills | 1.099*** | 1.060 | 2.250*** |
| (0.313) | (1.239) | (0.720) | |
| Chinese Canadian*Language Skills | 1.407*** | 1.411 | 3.596*** |
| (0.357) | (1.356) | (0.802) | |
| Greek*Language Skills | 1.117** | 1.259 | 6.165*** |
| (0.462) | (1.733) | (1.256) | |
| Indian*Language Skills | 1.425*** | 1.676 | 1.407** |
| (0.305) | (1.218) | (0.713) | |
| Pakistani*Language Skills | 2.181*** | 4.308*** | 8.518*** |
| (0.392) | (1.503) | (0.918) | |
| British (Intercept) | 0.126 | 0.083 | 0.311 |
| (0.116) | (0.520) | (0.263) | |
| N | 12,910 | 1,172 | 1,391 |
| Log Likelihood | -4,134.734 | -400.444 | -577.596 |
| Akaike Inf. Crit. | 8,297.469 | 828.887 | 1,183.193 |
| Notes: | ***Significant at the 1 percent level. | ||
| **Significant at the 5 percent level. | |||
| *Significant at the 10 percent level. | |||
General Jobs: For Canadian candidates, the interaction with language skills significantly increases the likelihood of receiving a callback (odds ratio = 1.222), showing a positive effect. Similarly, candidates from Chinese, Chinese-Canadian, Greek, and Indian backgrounds also experience a positive interaction with language skills, though the magnitude varies. Notably, Pakistani candidates experience the most significant positive interaction (odds ratio = 2.181), indicating that language proficiency significantly boosts their callback chances in general jobs. Compared to British candidates (the reference group), language skills increase the likelihood of a callback for most ethnic groups, but particularly for Pakistani candidates, who experience the largest increase.
Programmer Jobs: Pakistani candidates have a very strong positive effect of language skills (odds ratio = 4.308, p < 0.01), meaning they significantly benefit from strong language skills in programming roles. Other ethnicities such as Canadian, Chinese, and Indian candidates also experience positive effects, but none are significant like the effect for Pakistani candidates. Greek candidates do not experience a significant effect from language skills in programming roles. Language skills, in addition to technical competency, is likely expected of Pakistani candidates as compared to others—this shows a likely bias, which necessitates additional skills for Pakistanis to receive a callback over other ethnic groups.
Retail Jobs: In Retail Jobs, language skills significantly improve callback chances for all ethnic groups. Pakistani candidates see the most substantial increase , followed by Greek, Chinese-Canadian, and Chinese candidates (odds ratio = 2.250, p < 0.01). Canadian and Indian candidates also benefit significantly. Language skills thus strongly enhance the likelihood of receiving a callback across all ethnicities in retail jobs.
In question (e), ethnicity and language skills are treated as separate, independent predictors.Each ethnic group has its own effect on the likelihood of a callback, and language skills are included as a separate main effect. However, there is no consideration of how the impact of language skills might differ across ethnicities.
In this model, the interaction recognizes that the effect of language skills might not be the same for all ethnic groups and allows the relationship to change depending on the ethnicity. It shows that language skills have a stronger or weaker effect for certain ethnicities compared to others, across job categories.
heatmapplot <- plot_ly(data = summary_table,
x = ~name_ethnicity,
y = ~occupation_type,
z = ~total_callbacks,
type = "heatmap",
colors = "Oranges") %>%
layout(title = "Number of Callbacks by Job Type and Ethnicity",
xaxis = list(title = "Job Type"),
yaxis = list(title = "Ethnicity of Name"),
coloraxis = list(colorbar = list(title = "Number of Callbacks")))
heatmapplotRemember, DAGs go from left to right in temporal order!
Based on her findings, the researcher is not correct in drawing a causal relationship between education and income. While her findings suggest a correlation between income and education, the researcher must explore confounding variables that could have influenced this relationship. Confounding variables such as socioeconomic status, networking, and work experience can influence both the level of education achieved as well as the income the students went on to earn. For example, an individual with a high socioeconomic status would have access to better institutions and, therefore, better networks, enabling them to gain more relevant work experience, positively impacting their income. In this scenario, it is also important to consider that reverse causality could exist. The higher the income an individual can earn, the more they can spend on higher education or need higher education to improve the prospects of their professional success. However, this can only be established upon controlling for confounding variables.
#install.packages("dagitty")
library(dagitty)
dag <- dagitty("dag {
Socioeconomic_Status -> Education
Socioeconomic_Status -> Networking
Socioeconomic_Status -> Work_Experience
Socioeconomic_Status -> Income
Education -> Work_Experience
Education -> Income
Networking -> Work_Experience
Networking -> Income
Work_Experience -> Income
}")
plot(dag)It would not be correct to draw a casual relationship as the conclusion based on this researcher’s findings. There are multiple reasons for this: Similar to the previous question, confounding variables have to be considered to understand the relationship between education and income. Apart from the scholarship, factors such as student motivation, which affects both scholarship and income, and factors such as networks, which enable students to learn about the scholarship and boost their chances of finding well-paying jobs. It is also important to note that there may be systemic factors that render the scholarship recipients to be different from those who did not—there may be selection bias in the sample. The treatment (the dependent variable) has to be randomly assigned to draw a causal relationship. Here, the researcher only compares the scholarship recipients to those who did not receive it. There is na assignment of people to the treatment group and the control—the groups could be different from each other in ways that are not controlled. This means claiming a direct, causal relationship is difficult.
dag <- dagitty('
dag {
"Student Motivation" -> "Education" -> "Income"
"Student Motivation" -> "Income"
"Student Networks" -> "Education"
"Student Networks" -> "Income"
"Student Motivation" -> "Student Networks"
"Education" -> "Income"
}
')
plot(dag)The causal relationship between education and income drawn by the
researcher is incorrect. There are multiple reasons for this: There may
be confounding variables that affect both education and outcome in
cities that received the randomized intervention. Factors such as the
pre-existing economic status of cities, and urbanization are factors
that affect education and income that an individual in these cities
receives. While the randomization is limited to which cities get
parks/kindergartens, it does not mean all potential confounding
variables that influence our outcome variable are accounted for at an
individual level. The random assignment occurs at the city level, but
the dependent variable is measured at the individual level. So, while
the intervention (park/kindergarten) may be random, this does not mean
that the individual level factors, such as the city’s income
distribution and proximity to the service, are all controlled. A
kindergarten is an educational intervention, whereas a park is a
leisure/public-good intervention. The effects of these two may not be
directly comparable, as the way people engage with these services are
not the same. Kindergartens cater to individuals and their children who
are of age to attend the kindergarten, while parks are publicly
accessible. The prescription of park versus kindergarten also suggests
that education and leisure/community well-being are opposed to each
other.
The introduction of kindergartens and parks may have caused spillover
effects that impacted the city in manners that this research has not
fully grasped—which could account for the differences.
In the DAG, kindergarten represents the educational intervention given, that seems to result in higher income. Here park represents the non-educational intervention given, that seems to result in lower income.
dag <- dagitty('
dag {
"Pre-existing Economic Status" -> "Kindergarten"
"Pre-existing Economic Status" -> "Park"
"Pre-existing Economic Status" -> "Income"
"Urbanization" -> "Kindergarten"
"Urbanization" -> "Park"
"Urbanization" -> "Income"
"Kindergarten" -> "Income"
"Park" -> "Income"
}
')
plot(dag)By providing the control group with a park, the researcher allows for a comparison between two types of interventions: one related to education (kindergarten) and one related to community infrastructure (park). This helps isolate the specific effect of education (kindergarten) on income, while controlling for other potential confounding factors (e.g., general community improvements). The idea is that the kindergarten intervention focuses on education, while the park intervention might improve general community well-being but not have the same direct impact on education. Here, it could also be perceived as kindergartens being educational interventions while the park is a leisure-promoting intervention. The kindergarten intervention focuses on early childhood education, which improves cognitive development and enhances future income potential. In contrast, the park intervention aims to improve leisure, physical health, and community well-being but does not directly impact education or cognitive development If the control group received nothing, it would be difficult to distinguish whether any observed differences in income were due to the kindergarten intervention or simply due to the fact that the experimental group was receiving something (any intervention) while the control group was not. Also, considering that this is a public intervention experiment, withholding public goods for the sake of an experiment would raise ethical concerns of accessibility, and citizen equality.