Hey! As I have sometimes mentioned - my last (and, actually, this) year course paper is devoted to exploratory analysis of how parents choose schools: starting from the very first differences in bahavior of families with different socio-economic status, leading to understanding of various strategies of choice. Here the main hypotheses is that with similar rights of enrollement and opportunities to choose among the alternatives, there still exsist reproduction of educational capital and social differentiation.
In my research I have done quantitative analysis of the survey data, based on statistical tests (ch-square, t-test, ANOVA) and binary regression analysis.
Today I am going to look at this research from the Bayesian point of view.
Last year’s research I have started with the phrase that I was interested to examine school choice tendencies of families of different statuses by exploring the relation of choice of an educational institution for a child with such socio-demographic characteristics as parent’s education, socio-economic and cultural statuses.
In terms of Beyesian thinking my crucial question would be if there any differences in school choice behavior between parents of different socio-economic statuses? As a starting point I would consider mother’s presence or absence of higher education and then look at the probability if parents did choose school among the alternative options.
And then, the most pleasant think for me - I have all the materials to try to replicate a part of my study from a bayesian perspective. So, enjoy~
So, to think in bayesian persective we need to analyse a set of probabilities of our events. Thankfully to the data, we can count it quite easily. We have results on the question "if parents considered more than one option while chooseing the school" and the question about parental educational capital. In my analysis I am using a binary variale of mother’s education - if she has higher education or not (unfinished included).
Prior probability in this case is plain probability if parents did choose school or not. Answer to this question we have in a database, and the results are shown below (p(choose) = 38).
## Frequencies
## bayes_ds$q24
## Type: Factor
##
## Freq % % Cum.
## ------------------------------------- ------ -------- --------
## Considered more than one option 796 38.16 38.16
## Considered one option only 1290 61.84 100.00
## Total 2086 100.00 100.00
Then, marginal likelihood, probability that mother has higher education. This is also available information in a database for us, (p(HE) = 71).
freq(bayes_ds$momedu2,
report.nas = FALSE)
## Frequencies
## bayes_ds$momedu2
## Type: Factor
##
## Freq % % Cum.
## ------------------------- ------ -------- --------
## Higher education 1481 71.00 71.00
## NO higher education 605 29.00 100.00
## Total 2086 100.00 100.00
Having these two probability values we can compute the further idea: to find out the probability that parents would choose school, with mother having higher education, we need to multiply prior probability on the probability that parents, who have higher education, choose school, and divide this on our marginal likelihood. In short:
\[p(choose|HE) = \frac{p(HE|choose)*p(choose)}{p(HE)}\]
ctable(x = bayes_ds$momedu2,
y = bayes_ds$q24,
prop = "t")
## Cross-Tabulation, Total Proportions
## momedu2 * q24
## Data Frame: bayes_ds
##
## --------------------- ----- --------------------------------- ---------------------------- ---------------
## q24 Considered more than one option Considered one option only Total
## momedu2
## Higher education 607 (29.1%) 874 (41.9%) 1481 ( 71.0%)
## NO higher education 189 ( 9.1%) 416 (19.9%) 605 ( 29.0%)
## Total 796 (38.2%) 1290 (61.8%) 2086 (100.0%)
## --------------------- ----- --------------------------------- ---------------------------- ---------------
\[\frac{0.29*0.38}{0.71} = 0.15\]
So, basically, the probability that parents will choose school for their children among the alternatives given that mother has higher education is a function of
Our belief, having considered the data, is 0.15: there is a 15% chance that parents with higher educational capital would choose school for their children.
\[p(choose|!HE) = \frac{p(!HE|choose)*p(choose)}{p(!HE)}\]
\[\frac{0.09*0.38}{0.29} = 0.12\]
In addition I have dicided to check on the probability that parents with no higher education would use the opportunity of a school choice and the results is that there is a 12% chance that parents without higher educational capital would choose school for their children.
Now we can say that yes, the probability to consider differenct options while making school choice is lower for parents with lower educational capital at least on a couple percentage, but we cannot say how strong is this difference.
For an example, that is how I have analysed such moments in my work - I have used chi-square test analysis and then considered the residuals to nderstand how strong are the differences in the behavior of two groups. The bar-chart shows differences quite nicely and print test resuls, too.
library(sjPlot)
library(ggplot2)
set_theme(base = theme_classic())
t_momedu_24 <- with(ds, table(momedu2, q24))
chi_momedu_24 <- chisq.test(t_momedu_24)
chi_momedu_24
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: t_momedu_24
## X-squared = 16.879, df = 1, p-value = 3.984e-05
chi_momedu_24$stdres %>% round(digits = 2)
## q24
## momedu2 Considered more than one option
## Higher education 4.16
## NO higher education -4.16
## q24
## momedu2 Considered one option only
## Higher education -4.16
## NO higher education 4.16
bayes_ds %>%
sjtab(fun = "xtab", var.labels=c("Mother's education", "Considering other options"),
show.row.prc=T, show.col.prc=T, show.summary=T, show.exp=T, show.legend=T, encoding = "UTF-8")
| Mother’s education |
Considering other options |
Total | |
|---|---|---|---|
|
Considered more than one option |
Considered one option only |
||
| Higher education |
607 565 41 % 76.3 % |
874 916 59 % 67.8 % |
1481 1481 100 % 71 % |
| NO higher education |
189 231 31.2 % 23.7 % |
416 374 68.8 % 32.2 % |
605 605 100 % 29 % |
| Total |
796 796 38.2 % 100 % |
1290 1290 61.8 % 100 % |
2086 2086 100 % 100 % |
χ2=16.879 · df=1 · φ=0.091 · p=0.000 |
observed values
expected values
% within Mother’s education
% within Considering other options
sjp.xtab(bayes_ds$q24, bayes_ds$momedu2,
margin = "row",
bar.pos = "stack",
axis.titles = "Considering other options",
legend.title = "Mother's education",
show.summary = TRUE,
coord.flip = TRUE)
So, what’s the differences between the results of two ways of analysis? According to Bayesian thinking the result is that there is a 15% chance that parents with higher educational capital would use the opportunity to choose school for their children (and 12% chance, that parents without higher educational capital would use such an opportunity). Chi-square analysis says, that mothers with higher education use opportunity to choose school for their children more than parents with middle technical education. Basically, these two conclusions are fairly right, but it highlights different details: bayesian analysis give the probability of an event, while chi-square shows if the difference is important.
Field, A. (2016). An adventure in statistics: The reality enigma. Sage. (pp. 361-394, 522-525).
DataCamp Course. Fundamentals of Bayesian Data Analysis in R. What is Bayesian Data Analysis? URL: https://campus.datacamp.com/courses/fundamentals-of-bayesian-data-analysis-in-r
McElreath R. (2019). Statistical Rethinking Winter 2019. Lecture 01-02. URL: https://youtu.be/4WVelCswXo4
Abhijit Dasgupta A. Conditional Probability with R. Likelihood, Independence, and Bayes. URL: https://districtdatalabs.silvrback.com/conditional-probability-with-r
Antoine Soetewey A. (2020). Descriptive statistics in R. Advanced descriptive statistics. {summarytools} package, URL: https://www.statsandr.com/blog/descriptive-statistics-in-r/#summarytools-package
Steiger J. H. (2014). An Example R Markdown. URL: https://www.statpower.net/Content/310/R%20Stuff/SampleMarkdown.html