Quick intro to my last year’s research

Hey! As I have sometimes mentioned - my last (and, actually, this) year course paper is devoted to exploratory analysis of how parents choose schools: starting from the very first differences in bahavior of families with different socio-economic status, leading to understanding of various strategies of choice. Here the main hypotheses is that with similar rights of enrollement and opportunities to choose among the alternatives, there still exsist reproduction of educational capital and social differentiation.

In my research I have done quantitative analysis of the survey data, based on statistical tests (ch-square, t-test, ANOVA) and binary regression analysis.

Today I am going to look at this research from the Bayesian point of view.

Research question

Last year’s research I have started with the phrase that I was interested to examine school choice tendencies of families of different statuses by exploring the relation of choice of an educational institution for a child with such socio-demographic characteristics as parent’s education, socio-economic and cultural statuses.

In terms of Beyesian thinking my crucial question would be if there any differences in school choice behavior between parents of different socio-economic statuses? As a starting point I would consider mother’s presence or absence of higher education and then look at the probability if parents did choose school among the alternative options.

And then, the most pleasant think for me - I have all the materials to try to replicate a part of my study from a bayesian perspective. So, enjoy~

Rethinking the analysis

So, to think in bayesian persective we need to analyse a set of probabilities of our events. Thankfully to the data, we can count it quite easily. We have results on the question "if parents considered more than one option while chooseing the school" and the question about parental educational capital. In my analysis I am using a binary variale of mother’s education - if she has higher education or not (unfinished included).

Prior probability in this case is plain probability if parents did choose school or not. Answer to this question we have in a database, and the results are shown below (p(choose) = 38).

## Frequencies  
## bayes_ds$q24  
## Type: Factor  
## 
##                                         Freq        %   % Cum.
## ------------------------------------- ------ -------- --------
##       Considered more than one option    796    38.16    38.16
##            Considered one option only   1290    61.84   100.00
##                                 Total   2086   100.00   100.00

Then, marginal likelihood, probability that mother has higher education. This is also available information in a database for us, (p(HE) = 71).

freq(bayes_ds$momedu2,
     report.nas = FALSE) 
## Frequencies  
## bayes_ds$momedu2  
## Type: Factor  
## 
##                             Freq        %   % Cum.
## ------------------------- ------ -------- --------
##          Higher education   1481    71.00    71.00
##       NO higher education    605    29.00   100.00
##                     Total   2086   100.00   100.00

Having these two probability values we can compute the further idea: to find out the probability that parents would choose school, with mother having higher education, we need to multiply prior probability on the probability that parents, who have higher education, choose school, and divide this on our marginal likelihood. In short:

\[p(choose|HE) = \frac{p(HE|choose)*p(choose)}{p(HE)}\]

ctable(x = bayes_ds$momedu2,
       y = bayes_ds$q24,
       prop = "t")
## Cross-Tabulation, Total Proportions  
## momedu2 * q24  
## Data Frame: bayes_ds  
## 
## --------------------- ----- --------------------------------- ---------------------------- ---------------
##                         q24   Considered more than one option   Considered one option only           Total
##               momedu2                                                                                     
##      Higher education                             607 (29.1%)                  874 (41.9%)   1481 ( 71.0%)
##   NO higher education                             189 ( 9.1%)                  416 (19.9%)    605 ( 29.0%)
##                 Total                             796 (38.2%)                 1290 (61.8%)   2086 (100.0%)
## --------------------- ----- --------------------------------- ---------------------------- ---------------

\[\frac{0.29*0.38}{0.71} = 0.15\]

So, basically, the probability that parents will choose school for their children among the alternatives given that mother has higher education is a function of

  • how likely do they choose school, given that mother has higher education (p = 0.29),
  • our prior belief of how often parents choose school (p = 0.38)
  • and the probability that mother has higher education (p = 0.71).

Our belief, having considered the data, is 0.15: there is a 15% chance that parents with higher educational capital would choose school for their children.

\[p(choose|!HE) = \frac{p(!HE|choose)*p(choose)}{p(!HE)}\]

\[\frac{0.09*0.38}{0.29} = 0.12\]

In addition I have dicided to check on the probability that parents with no higher education would use the opportunity of a school choice and the results is that there is a 12% chance that parents without higher educational capital would choose school for their children.

Now we can say that yes, the probability to consider differenct options while making school choice is lower for parents with lower educational capital at least on a couple percentage, but we cannot say how strong is this difference.

For an example, that is how I have analysed such moments in my work - I have used chi-square test analysis and then considered the residuals to nderstand how strong are the differences in the behavior of two groups. The bar-chart shows differences quite nicely and print test resuls, too.

library(sjPlot)
library(ggplot2)
set_theme(base = theme_classic())

t_momedu_24 <- with(ds, table(momedu2, q24))
chi_momedu_24 <- chisq.test(t_momedu_24)
chi_momedu_24
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t_momedu_24
## X-squared = 16.879, df = 1, p-value = 3.984e-05
chi_momedu_24$stdres %>% round(digits = 2)
##                      q24
## momedu2               Considered more than one option
##   Higher education                               4.16
##   NO higher education                           -4.16
##                      q24
## momedu2               Considered one option only
##   Higher education                         -4.16
##   NO higher education                       4.16
bayes_ds %>%
  sjtab(fun = "xtab", var.labels=c("Mother's education", "Considering other options"),
        show.row.prc=T, show.col.prc=T, show.summary=T, show.exp=T, show.legend=T, encoding = "UTF-8")
Mother’s education Considering other
options
Total
Considered more than
one option
Considered one
option only
Higher education 607
565
41 %
76.3 %
874
916
59 %
67.8 %
1481
1481
100 %
71 %
NO higher education 189
231
31.2 %
23.7 %
416
374
68.8 %
32.2 %
605
605
100 %
29 %
Total 796
796
38.2 %
100 %
1290
1290
61.8 %
100 %
2086
2086
100 %
100 %
χ2=16.879 · df=1 · φ=0.091 · p=0.000

observed values
expected values
% within Mother’s education
% within Considering other options

sjp.xtab(bayes_ds$q24, bayes_ds$momedu2,  
         margin = "row", 
         bar.pos = "stack",  
         axis.titles = "Considering other options", 
         legend.title = "Mother's education", 
         show.summary = TRUE, 
         coord.flip = TRUE) 

Conclusion

So, what’s the differences between the results of two ways of analysis? According to Bayesian thinking the result is that there is a 15% chance that parents with higher educational capital would use the opportunity to choose school for their children (and 12% chance, that parents without higher educational capital would use such an opportunity). Chi-square analysis says, that mothers with higher education use opportunity to choose school for their children more than parents with middle technical education. Basically, these two conclusions are fairly right, but it highlights different details: bayesian analysis give the probability of an event, while chi-square shows if the difference is important.

References: