Why “bathroom bills” and regulations to protect single-sex spaces are a problem for all women

Introduction

According to The Sunday Times, UK equalities minister Liz Truss is planning to impose new regulations to “protect” single-sex women’s spaces and publish new trans-exclusive guidelines on public toilets.

Transgender people who attempt to use public toilets and other facilities are frequently subjected to verbal harassment, physical and sexual assault, forcible removal (and even arrest) by police. This danger confronts not only trans people who use the facility corresponding to their gender identity, but also, and equally, to those who find themselves forced, by a “bathroom bill”, to use a facility corresponding to their “assigned at birth” gender. In short, bathroom bills make it both humiliating and potentially dangerous for transgender people to use any public toilet at all. Fear and avoidance of using public toilets have resulted in social and physical distress for many transgender people, who simply need a safe place to tend to basic needs. Further, some “masculine looking” cis-gendered women have been denied access and harassed or beaten out of the same fear¹.

How will single-sex spaces be protected?

We must ask how such a bathroom bill or regulation will be implemented. In the future, should we expected to produce identifying documents before being allowed to use a public toilet – such as a passport, a driver’s licence, or an ID card? This does not seem plausible. Nor does is seem likely that there will be a guard or attendant posted outside every public toilet who will check everyone before they may enter. The most realistic scenario is that single-sex spaces will be self-policed by the people that use them. We may then ask: how well will they be able to spot other users whose “assigned at birth” gender corresponds with the facility, and those for whom it does not? That is to say, how well can people tell if a woman is cis or trans? In the following, it will be shown that cis women will be misidentified as trans many more times than trans women will be identified.

Hypothesis

Consider that we have a test to identify trans women. To simplify the following discussion, we use the word “test” as shorthand to denote a means for identifying trans women. This test is randomly performed on users of a facility by other users of that facility. We say that the test is performed randomly on users because it is likely that not everyone who enters will be subjected to scrutiny. The test might be based on:

shoe size,
height,
voice pitch,
gait,
or some other physical attribute.

However, it’s far more likely in this situation that the “test” comprises of nothing more than an observation that a woman “looks trans”: an observer makes an on-the-spot decision based purely on a woman’s appearance and/or mannerisms. Of course, this judgement is likely to be highly subjective. The details of how an observer arrives at their decision is not important for what follows. It will soon become clear that any means used in an attempt to identify trans women will misidentify more cis women as trans than identify trans women.

In this analysis, let’s be charitable to those who advocate for bathroom bills and assume that the test is very accurate – far more accurate than is actually plausible under realistic circumstances. Let’s say that it can identify trans women with 99% accuracy. That is, the probability that the test result is positive, given that the tested individual is indeed trans, is 99%, while probability that you got the wrong result is 1%. In mathematical notation we write this as:

\[ P(+|trans) = 0.99,~\\ P(-|trans) = 0.01 \] We say that the true positive rate is 99% and the false negative rate is 1%. We’re also interested in what the probability is that the test result is negative, given that the tested individual is in fact cis. This is not necessarily equal to the accuracy of the test for identifying trans women. We may hypothesise that the accuracy of the test may be different for the two groups of women.

Let’s say that the test is really good and assume that the probability that the test result is negative, given the tested individual is cis, is 97%; it then follows that the probability of getting the wrong result if the woman is actually cis is 3%:

\[ P(-|cis) = 0.97,~\\ P(+|cis) = 0.03 \] We say that the true negative rate is 97% and the false positive rate is 3%.

We know that the prevalence of trans people in the general population is very small. Estimates of the prevalence of trans people are highly dependent on the specific case definitions used in studies. In the US, it’s estimated that about 0.6% of the population are trans². According to Amnesty International, 1.5 million transgender people live in the European Union, making up 0.3% of the population³. A 2011 survey conducted by the Equality and Human Rights Commission in the UK found that of 10,026 respondents, 1.4% would be classified into a gender minority group⁴. A 2016 systematic review and meta-analysis of empirical literature on the prevalence of transgender people found that estimates range between 0.01% to 0.8%, depending on specific case definitions⁵.

For this analysis it doesn’t matter too much which of these figures we use, because they are all so small as to not make a significant difference to the results of this analysis, but for the moment let’s arbitrarily pick 0.6% as the prevalence of transgender people. This means that prior to any testing or making any observation, if you picked someone at random from the population, the probability that you would pick a trans person is 0.6%, and the probability that you would pick a cis person is 99.4%:

\[ P(trans) = 0.006,~\\ P(cis) = 0.994 \] In lieu of separate prevalences for trans women and trans men in the population, we’ll assume that these figures are the same for both trans women and trans men, which turns out to be not unrealistic: a 1996 study of Swedes estimated a sex ratio (male:female) of 1.4:1 for those trans men and trans women requesting sex reassignment surgery and a ratio of 1:1 for those who proceeded⁶.

Analysis

Let’s say that a random woman enters a public toilet or single-sex facility and they are judged to be trans by one or more other women there. To simplify the notation we’ll use, we’ll say that the woman that entered has “tested positive for being trans”. Just as a reminder: the “test” might in fact be no more than an observer’s estimation that the woman “looked trans”.

What’s the probability that the woman judged to be trans is indeed trans? Is the test right? Could the observer(s) be wrong? What we want to know is: what’s the probability that the woman is indeed trans, given that the test result was positive, that is: \(P(trans|+)\).

Nota bene: \(P(trans|+)\) is not the same as \(P(+|trans)\)! This can be proved mathematically, but you can understand this intuitively as follows: \(P(rain|umbrella)\), the probability that it is raining, given that you took an umbrella with you outdoors, is not the same as \(P(umbrella|rain)\), the probability that you took an umbrella with you outdoors, given that it is raining!

To calculate \(P(trans|+)\), we use Bayes Theorem. The general form of the equation is:

\[ P(A|B) = \frac{P(B|A)P(A)}{\sum_i{P(B|A_i)P(A_i)}} \]

This is textbook maths that’s been known since 1763. For our specific example, we have:

\[ P(trans|+) = \frac{P(+|trans)P(trans)}{P(+|trans)P(trans)+P(+|cis)P(cis)} = \frac{0.99\times0.006}{0.99\times0.006 + 0.03\times0.994} = 0.166 \]

So the probability that the woman is trans, given that their test result was positive, is only 16.6%. This may seem surprising, given the high accuracy of the test. The result is small because the prevalence of trans people in the population is so tiny.

Similarly, the probability that a woman is cis, given they got a positive test result, \(P(cis|+)\), is:

\[ \frac{P(+|cis) P(cis)}{P(+|cis) P(cis) + P(+|trans) P(trans)} = \frac{0.03\times0.994}{0.03\times0.994 + 0.99\times0.006} = 0.834 \] Therefore we have shown that the test will misidentify a cis woman as trans 83.4% of the time. With this test we misidentify five times more cis women as trans than identify trans women! It should also be clear that this would be the case irrespective of the details of how the test works, which demonstrates the fallacy that it’s possible to tell that a woman is trans based on their shoe size, height, voice pitch, gait, appearance, etc.: we imposed a requirement that our “test” was far more accurate than plausible, yet it has been proven unfit for purpose.

Considering all possible tests

How dependent is this result and our conclusions on our choice of numbers? We’d like to make sure that we’re not accused of cherry-picking numbers to support our claim.

The inputs to our analysis are:

\(P(+|trans)\), the probability that a trans woman is judged to be trans (the true positive rate),
\(P(-|cis)\), the probability that a cis woman is judged to be cis (the true negative rate),
\(P(trans)\), the prevalence of trans women in the population.

Just as a reminder: the first two listed above may be equal, but there is no requirement for them to be so. Also, the prevalence of trans women in the population is uncertain, but not completely unknown, and importantly we know the number is very small.

Let’s see what happens when we try different values of \(P(+|trans)\) and \(P(-|cis)\). Let’s say that the test, or an observer, judges a trans woman to be trans at least 50% of the time, and similarly that a cis woman is judged to be cis at least 50% of the time, which is at least what would be expected from random guessing. Consequently, \(P(+|trans)\) and \(P(-|cis)\) are both in the range \([0.5, 1]\).

For any test with \(P(+|trans)\) and \(P(-|cis)\) in these range of values, how many more times will the test misidentify cis women as trans than identify trans women? That is, we want to know:

\[ \frac{P(cis|+)}{P(trans|+)} \]

where \(P(cis|+)\) is the probability of misidentifying a cis woman as trans and \(P(trans|+)\) is the probability of correctly identifying a trans woman. With a bit of algebra, it can be shown that \[\begin{align} \frac{P(cis|+)}{P(trans|+)} &= \frac{\frac{P(+|cis) P(cis)}{P(+|cis) P(cis) + P(+|trans) P(trans)}}{\frac{P(+|trans) P(trans)}{P(+|trans) P(trans) + P(+|cis) P(cis)}}\\\\ &= \frac{P(+|cis) P(cis)}{P(+|trans) P(trans)}. \end{align}\] The final expression above provides a simple way to determine how many more times a test will misidentify cis women as trans than identify trans women; we only need to plug in the values for the transgender prevalence \(P(trans)\), the true positive rate \(P(+|trans)\) and true negative rate \(P(-|cis)\).

Now let’s consider all possible tests by assuming a continuum of values for \(P(+|trans)\) and \(P(-|cis)\). Each pair of values defines the performance of a test. We can plot these values as points in a graph, \(P(+|trans)\) on the x-axis and \(P(-|cis)\) on the y-axis, and colour them by how many more times cis women are misidentified than trans women are identified. We’ll also plot contour lines that we’ll annotate to show this information more clearly, and we’ll mark the region of the plot that corresponds to tests that identify more trans women than misidentify cis women as trans. This is the plot for a transgender prevalence of 0.6%:

It is immediately clear that for most tests the number of cis women that are misidentified as trans is overwhelmingly greater than the number of trans women that are identified. In this “space of all possible tests”, the region corresponding to tests that identify more trans women than misidentify cis women as trans (the dark blue region at the top of the graph) is exceedingly tiny.

We might next ask if the results are at all contingent on our assumed prevalence of transgender people. We checked this be repeating the procedure used to obtain the previous plot for different values of transgender prevalence that appear in relevant literature:

In each graph we have “zoomed in” to the region corresponding to tests that identify more trans women than misidentify cis women as trans, which in each case is very small. It can be seen that the largest of this regions corresponds to a transgender prevalence of 1.4%, for which \(P(-|cis)\), the true negative rate, must be at least 98.5% for more trans women to be identified than cis women misidentified. This is equivalent to being able to correctly identify at least 197 out of 200 women as cis. For a transgender prevalence of 0.3%, 199 out of 200 assignment of cis status must be correct. This is exceptionally difficult to achieve, and is likely impossible.

A 2016 meta-analysis of studies on the prevalence of transgender people determined 0.01% of the population had surgical or hormonal gender affirmation therapy or transgender-related diagnoses. If we use this figure instead of those obtained from studies assessing self-reported transgender identity, we find that any hypothetical test would require impossibly-high performance:

In this case, \(P(-|cis)\), the true negative rate, must be at least 99.99% for more trans women to be identified than cis women misidentified, meaning that only one mistake in 10000 is allowed. This extremely high performance is not feasible.

Conclusion

All means to detect trans people using their physical attributes and/or mannerisms are flawed because there is more than enough variation in the population to make robust identification impossible. For example: tall cis women exist. Cis women with above-average size feet exist. Some cis women have deep voices. And so on.
Using simple high-school maths, we have demonstrated that any regulation intended to “protect” single-sex spaces from trans women will have the exact opposite effect because cis women will be disproportionately afflicted by it.
Scenarios in which this is not the case are infeasible because the required performances are extremely unlikely to be achievable.
The required performances are not achievable with AI: present-day computer vision systems from Amazon, Clarifai, Microsoft, and others are consistently poor at identifying transgender people and misidentify trans men and women 38% of the time⁷.
People are not as good as they they might think they are at identifying trans women. People will make mistakes. Women will be denied access to facilities, harassed or beaten. The majority of these women will be cis. This is a mathematical certainty.
There is plenty of evidence that supports the assertion that bathroom bills and regulations to protect single-sex spaces are not needed. There is simply no reason to be concerned about sharing toilets and other facilities with transgender people.
Transgender men and women shouldn’t have to worry about violence or harassment when deciding where to relieve themselves, and they shouldn’t have to spend more than a millisecond figuring out which toilet to use.

Bathroom Bills: Frequently Asked Questions ↩︎
Flores, Andrew (June 2016). “How Many Adults Identify as Transgender in the United States” (PDF). Williams Institute UCLA School of Law.↩︎
M.H. (1 September 2017). “Why transgender people are being sterilised in some European countries”. The Economist. Archived from the original on 1 September 2017. Retrieved 2 September 2017.↩︎
Glen, Fiona; Hurrell, Karen (2012). “Technical note: Measuring Gender Identity” (PDF). Equality and Human Rights Commission. Retrieved 30 May 2019.↩︎
Collin L, Reisner SL, Tangpricha V, Goodman M. “…Results: Thirty-two studies met the inclusion criteria for systematic review. Of those, 27 studies provided necessary data for a meta-analysis. Overall mP [meta-prevalence] estimates per 100,000 population were 9.2 (95% CI = 4.9–13.6) for surgical or hormonal gender affirmation therapy and 6.8 (95% CI = 4.6–9.1) for transgender-related diagnoses. Of studies assessing self-reported transgender identity, the mP was 871 (95% CI = 519–1,224); however, this result was influenced by a single outlier study. After removal of that study, the mP changed to 355 (95% CI = 144–566).” in Prevalence of Transgender Depends on the “Case” Definition: A Systematic Review, Journal of Sexual Medicine, Volume 13, Issue 4, April 2016, Pages 613-626.↩︎
Landén, M., Wålinder, J., Lundstrom, B. (1996) “…Results: During the 20-year period of the study, 233 requests for sex reassignment were processed, and the incidence data were calculated on the basis of this group. This means that the average annual frequency was 11.6 cases. The number of inhabitants in Sweden over 15 years of age increased during the study period from 6.5 million to 7.1 million, i.e. there was a mean population of 6.8 million (12), which gives an annual incidence of request for sex reassignment of 0.17 per 100,000 inhabitants. The sex ratio (male:female) is 1.4:1. To resolve the question of whether transsexualism increases or decreases, we divided the group into two 10-year periods. As can be seen from Table 1, not only do our results agree with the Swedish incidence data published in the 1970s, but also they remain remarkably stable over time. Separating from all applications the group with primary transsexualism yielded 188 cases, i.e. 9.4 cases annually. As is shown in Table 2, this corresponds to an incidence of primary transsexualism of 0.14 per 100,000 inhabitants over 15 years of age. It should also be noted that primary transsexualism is equally common in women and men…” in Incidence and sex ratio of transsexualism in Sweden from Acta Psychiatrica Scandanavica, Volume 93, pages 261-263.↩︎
Scheuerman, Morgan Klaus and Paul, Jacob M. and Brubaker, Jed R. How Computers See Gender: An Evaluation of Gender Classification in Commercial Facial Analysis Services, Proceedings of the ACM on Human-Computer Interaction, Volume 3, Article 144, November 2019.↩︎