Rationale

Bearman & Brueckner (2002) claimed to rule out intrauterine hormonal transfer as an explanation for the observation that males from dizygotic (fraternal) opposite-sex twin pairs showcase high rates of homosexuality. They did this by testing predictions from different proposed explanations, and concluded that because males from DZOS twin pairs who also had older brothers showcased typical sample proportions of homosexuality, that hormones did not play a role.

They did not provide the proportions of the sample who had and did not have older brothers only, older sisters only, either only, or neither. We know some of the sample had older sisters, but we do not know when older sisters came with older brothers and vice-versa, and so we’re missing a lot of detail. They had a sample size of 185 males from opposite-sex dizygotic twin pairs, of which 18.7% without older brothers were homosexual and 8.8% with older brothers were homosexual, so if we are maximally charitable and say the samples with and without older brothers were balanced, we can estimate some CIs. Lets assume 92 in one group and 93 in the other; it doesn’t matter which.

Analysis

#With Older Brothers, n = 93

nOld = 93
pOld = .187
qOld = 1 - pOld

SEOld = sqrt((pOld * qOld)/nOld)

#Without Older Brothers, n = 92

nNo = 92
pNo = .088
qNo = 1 - pNo

SENo = sqrt((pNo * qNo)/nNo)

SEDiff = sqrt(SEOld^2 + SENo^2)
Diff = pOld - pNo

cat(paste0("The 95% confidence interval for the difference in rates of homosexuality between dizygotic OS twin males with and without older brothers ranges from ", round(Diff - 1.96 * SEDiff, 3), " to ", round(Diff + 1.96 * SEDiff, 3), ". \n"))
## The 95% confidence interval for the difference in rates of homosexuality between dizygotic OS twin males with and without older brothers ranges from 0.001 to 0.197.
SEDiff
## [1] 0.05007085
SEDiff * 1.96
## [1] 0.09813886
Diff
## [1] 0.099
Diff/SEDiff
## [1] 1.977198
pnorm(Diff/SEDiff, lower.tail = F) * 2
## [1] 0.04801921

This p-value of .048 is far from convincing. It is hard to imagine that the proportions were this close to exactly even. If balance is even slightly off because people have no older siblings or older sisters only, or even a handful of people are removed from the sample because they instead have unknown family information, then the result is not significant. Check it out.

ProportionDifference <- function(N1, N2, Prop1, Prop2, Z = 1.96, rnd = 3, tail = 2){ 
  Q1 = 1 - Prop1; Q2 = 1 - Prop2
  SE1 = sqrt((Prop1 * Q1)/N1); SE2 = sqrt((Prop2 * Q2)/N2)
  Diff = if(Prop1 > Prop2){Prop1 - Prop2}else{Prop2 - Prop1}
  SED = sqrt(SE1^2 + SE2^2)
  ZSpec = Diff/SED
  cat(paste0("The ", round(1 - pnorm(Z, lower.tail = F) * tail, rnd) * 100,"% confidence interval for the difference between the provided proportions, whose difference is ", round(Diff, rnd), ", ranges from ", round(Diff - Z * SED, rnd), " to ", round(Diff + Z * SED, rnd), " (Z = ", round(ZSpec, rnd), ", p = ", round(pnorm(ZSpec, lower.tail = F), rnd) * tail, ") and is thus ", if(pnorm(ZSpec, lower.tail = F) * tail <= pnorm(Z, lower.tail = F) * tail){"significant"}else{"not significant"}, ".\n"))}

ProportionDifference(93, 92, .187, .088) #Original, for verification
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0.001 to 0.197 (Z = 1.977, p = 0.048) and is thus significant.
ProportionDifference(92, 93, .088, .187) #Reversal, for verification
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0.001 to 0.197 (Z = 1.977, p = 0.048) and is thus significant.
ProportionDifference(91, 90, .187, .088)  #Losing 2 each
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0 to 0.198 (Z = 1.956, p = 0.05) and is thus not significant.
ProportionDifference(90, 92, .187, .088)  #Losing 3 in the high-prevalence no older brothers group
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0 to 0.198 (Z = 1.956, p = 0.05) and is thus not significant.
ProportionDifference(93, 87, .187, .088)  #Losing 5 in the low-prevalence older brothers group
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0 to 0.198 (Z = 1.958, p = 0.05) and is thus not significant.
ProportionDifference(125, 60, .187, .088) #Swapping 32 to the high-prevalence no older brothers group
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0 to 0.198 (Z = 1.959, p = 0.05) and is thus not significant.
ProportionDifference(88, 97, .187, .088)  #Swapping 5 to the low-prevalence older brothers group
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.099, ranges from 0 to 0.198 (Z = 1.959, p = 0.05) and is thus not significant.

Confidence in the results of Bearman & Brueckner’s study should be minimal because they failed to provide critical data, the results were marginal at best, and the charitable assumptions needed to produce significant results were unlikely to be satisfied. They failed to provide the sample sizes necessary to test whether there was a significant difference between the rates of homosexuality among dizygotic opposite-sex twin males with or without older brothers. If even 2.2% of the dizygotic opposite-sex twin males had missing older sibling information, then the result is not significant and imbalanced losses would make that 1.6% if the losses were from the group without older brothers or 2.7% from the group with older brothers. We are only given proportions of 18.7% and 8.8%, and if these resulted from rounding, even a small amount also changes the answer; consider where the proportions are minimally separated with another digit, so they are 18.65% (-0.05%) and 8.84% (+0.04%):

ProportionDifference(93, 92, .1865, .0884)
## The 95% confidence interval for the difference between the provided proportions, whose difference is 0.098, ranges from 0 to 0.196 (Z = 1.959, p = 0.05) and is thus not significant.

And also note that the optimistic balance assumption must be wrong because 18.8% and 8.8% of 93 and 92 are not integers and we cannot have partial people, and also, that anything out another digit, from 18.65 to 18.74% or 8.75 to 8.84% does not work. This should be obvious, but it can be shown as well.

0.1865 * 92
## [1] 17.158
0.1866 * 92
## [1] 17.1672
0.1867 * 92
## [1] 17.1764
0.1868 * 92
## [1] 17.1856
0.1869 * 92
## [1] 17.1948
0.1870 * 92
## [1] 17.204
0.1871 * 92
## [1] 17.2132
0.1872 * 92
## [1] 17.2224
0.1873 * 92
## [1] 17.2316
0.1874 * 92
## [1] 17.2408
0.1865 * 93
## [1] 17.3445
0.1866 * 93
## [1] 17.3538
0.1867 * 93
## [1] 17.3631
0.1868 * 93
## [1] 17.3724
0.1869 * 93
## [1] 17.3817
0.1870 * 93
## [1] 17.391
0.1871 * 93
## [1] 17.4003
0.1872 * 93
## [1] 17.4096
0.1873 * 93
## [1] 17.4189
0.1874 * 93
## [1] 17.4282
0.075 * 92
## [1] 6.9
0.076 * 92
## [1] 6.992
0.077 * 92
## [1] 7.084
0.078 * 92
## [1] 7.176
0.079 * 92
## [1] 7.268
0.080 * 92
## [1] 7.36
0.081 * 92
## [1] 7.452
0.082 * 92
## [1] 7.544
0.083 * 92
## [1] 7.636
0.084 * 92
## [1] 7.728
0.075 * 93
## [1] 6.975
0.076 * 93
## [1] 7.068
0.077 * 93
## [1] 7.161
0.078 * 93
## [1] 7.254
0.079 * 93
## [1] 7.347
0.080 * 93
## [1] 7.44
0.081 * 93
## [1] 7.533
0.082 * 93
## [1] 7.626
0.083 * 93
## [1] 7.719
0.084 * 93
## [1] 7.812

Moreover, if we have the maximally large sample (n = 185) and the best balance possible, the proportion multiplied by the possible sample size by group (have versus do not have older brothers) must yield integers. No numbers in the range from 1 to 97 multiplied by 8.8% are integers and no numbers in the range of 1 to 125 multiplied by 18.7% are integers.

Nums <- seq(1, 125, 1)

(Nums * .088) - floor(Nums * .088) == 0 #True at [125]
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE  TRUE
(Nums * .187) - floor(Nums * .187) == 0 #Not true at [60], so irrelevant
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE

Bearman & Brueckner’s results cannot be based on the maximally favorable position and, because that position is as tenuous as I have demonstrated, it is unlikely that they achieved a significant result, and thus they did not adequately statistically justify their conclusion. The only way to recover this is to find a larger sample with the same result or claim the utility of a one-tailed test, but if that were viable, the prior on the sign would probably go in the opposite direction because of evidence from other studies they were aware of that individuals with older brothers were more likely to be homosexual. Choosing a one-tailed test would be an unjustifiable analytic decision, but it is, as far as I am aware, the only way to recover their conclusion, but it would nevertheless leave the results marginal anyway.

References

Bearman, P. S., & Brueckner, H. (2002). Opposite‐Sex Twins and Adolescent Same‐Sex Attraction. American Journal of Sociology, 107(5), 1179–1205. https://doi.org/10.1086/341906