Session Prep

library(tidyverse)
library(knitr)
library(dplyr)
library(kableExtra)
library(psych)
library(e1071)
library(ggplot2)
library(vioplot)

transpop2<- read.csv("/Users/nicoleborunda/Downloads/ICPSR_37938 3/DS0005/TransPopDS0005.csv")
# Recode Sample Variable for Gender Status from 1 and 2 to 0 and 1.
New_TRANS_CIS <- ifelse(transpop2$TRANS_CIS == 2, 0, 1)

Introduction and Literature Review

Sexual and gender minority (SGM) research has substantially increased in recent decades. Despite this progress, there remains a significant knowledge gap in the realm of transgender (trans) research. Unless otherwise specified, in this paper the terms “transgender” and “trans” refer to anyone whose gender does not match the gender they were assigned at birth. The terms “cisgender” and “cis” refer to those whose gender matches the gender they were assigned at birth. Transgender research tends to focus on risks rather than strengths, often neglecting the utilization of population-specific measures and instead relying on measures validated in the general population, which is dominated by cisgender (cis) individuals. Furthermore, there is an emphasis on trans youth populations, leaving a gap in research concerning the adult trans population. The TransPop Study (Meyer et al., 2021) represents a significant milestone in addressing these gaps in the research.

The TransPop data set include the first national probability sample of the adult U.S. transgender population as well as a comparative cisgender sample (Meyer et al., 2021). Recruitment for transgender participants occurred in two periods, April 2016 - August 2016 and June 2017 - December 2018. Recruitment for cisgender participants occurred in two periods during February 2018 and between November - December 2018. Both samples were screened for gender status (trans or cis) using the the same process, questions from the U.S. Gallup Poll. The surveys for both samples were largely the same, with a portion of population-specific questions included in each. Combined, the data include 612 variables covering demographic, health, social characteristics, life experiences, and perceptions. The data include previously validated scales, some of which have been modified and some of which were used only in part. These include scales such as the Adverse Childhood Experiences (ACE), Drug Use Disorders Identification Test (DUDIT), and the Kessler-6 which measures psychological distress.

Of key interest in this paper is the Satisfaction with Life Scale (SWLS; Diener et al., 1985). The SWLS is a five-item scale that measures overall life satisfaction. It uses a 7-point scale where 7 is strongly agree and 1 is strongly disagree. Given how little research examines what factors uniquely contribute to positive outcomes for the transgender population, I am interested in understanding what social and demographic factors contribute to life satisfaction. I am going to explore how social and demographic factors contribute to life satisfaction. Of the limited literature available, one study found that social interaction anxiety is a significant predictor of life satisfaction among transgender people in Pakistan (Shabbir et al., 2022). Much of the research using TransPop data is narrowly focused on specific healthcare, health, and mental health issues. One study comparing the U.S. transgender and cisgender populations found that the trans population skews younger, more racially diverse, and self-reports worse health than cisgender counterparts (Feldman et al., 2021). Another study found trans-nonbinary respondents receive less social affirmation and access to preferred healthcare treatment than transmen and transwomen in the TransPop sample (Lane et al., 2022). Given the stark disparities in health and wellbeing that the trans population faces, the focus on health in trans research is crucial (Grant et al., 2011; James et al., 2016; Meyer et al., 2021). However, there are many variables that contribute one’s quality of life which is why I am interested in examining other factors that uniquely contribute to life satisfaction for the U.S. trans population.

This paper will examine how race, education, sexual orientation, poverty, religious preference, political leaning, and urbanicity contribute to life satisfaction among the TransPop transgender and cisgender samples.

Methods and Sample

With more than 600 variables in this data set, I am focusing this study on an exploratory analysis of just eight factors that may contribute to life satisfaction. I will provide descriptive statistics below and then run multiple linear regression analysis to identify a model where life satisfaction is an outcome variable and gender status, either trans or cis, will be one of the predictor variables.

Note, the TransPop data provide variables with imputation to address missingness (Krueger et al., 2020). Where provided, I have used variables with imputation which is denoted by an “_I” at the end of the variable name.

Descriptive Statistics

I began my analysis with an overview of the average life satisfaction of levels of the the trans and cis samples. The life satisfaction scores are on a 7-point scale which is why the minimums are 1 and the maximums are 7. I also created a correlation matrix. For life satisfaction, there was a moderate negative correlation with poverty (-0.27), and a moderate positive correlation for personal income (0.32).

I also checked for missingness in the variables I used with a missing data matrix. Education, sexual orientation, religion, and political affiliation all showed missingness. I chose to exclude those four variables because further research would be needed to understand how to address the missingness appropriately. I will retain the remaining four predictor variables, race, poverty, personal income, and urbanicity.

I checked for normality by checking means, standard deviation, and skewness. I also examined these variables visually using histograms and violin plots.

The plots show that urbanicity (GRUCA_I) skews highly urban and race (RACE_RECODE) is a non-numerical factor. We can also clearly see that poverty (POVERTY_I) is a binary factor. Race and poverty are positively skewed so I also dropped these variables. Since personal income (PINC_I) is the only numerical variable that actually has variability I retained this variable.

# Subset the data for TRANS_CIS = 1
group_1 <- transpop2 %>%
  filter(TRANS_CIS == 1)

# Subset the data for TRANS_CIS = 2
group_2 <- transpop2 %>%
  filter(TRANS_CIS == 2)

# Calculate summary statistics for LIFESAT_I in each group
summary_stats <- data.frame(Group = c("Trans", "Cis"),
                            Mean = c(mean(group_1$LIFESAT_I), mean(group_2$LIFESAT_I)),
                            SD = c(sd(group_1$LIFESAT_I), sd(group_2$LIFESAT_I)),
                            Min = c(min(group_1$LIFESAT_I), min(group_2$LIFESAT_I)),
                            Max = c(max(group_1$LIFESAT_I), max(group_2$LIFESAT_I)))

# Print the table
kable(summary_stats, caption = "Summary Statistics for LIFESAT_I by TRANS_CIS",
      align = "c", format = "html", digits = 2) %>%
  kable_styling(full_width = F)
Summary Statistics for LIFESAT_I by TRANS_CIS
Group Mean SD Min Max
Trans 3.93 1.74 1 7
Cis 5.03 1.56 1 7
library(mice)

ninevar <- subset(transpop2, select = c(LIFESAT_I, RACE_RECODE, GEDUC1, Q34, POVERTY_I, PINC_I, GD8B, GP2, GRUCA_I))

# correlation matrix for 9 variables
cor_matrix <-cor(ninevar)
kable(cor_matrix, format = "markdown", digits = 2)
LIFESAT_I RACE_RECODE GEDUC1 Q34 POVERTY_I PINC_I GD8B GP2 GRUCA_I
LIFESAT_I 1.00 -0.10 NA NA -0.27 0.32 NA NA -0.02
RACE_RECODE -0.10 1.00 NA NA 0.14 -0.11 NA NA 0.04
GEDUC1 NA NA 1 NA NA NA NA NA NA
Q34 NA NA NA 1 NA NA NA NA NA
POVERTY_I -0.27 0.14 NA NA 1.00 -0.47 NA NA 0.07
PINC_I 0.32 -0.11 NA NA -0.47 1.00 NA NA -0.07
GD8B NA NA NA NA NA NA 1 NA NA
GP2 NA NA NA NA NA NA NA 1 NA
GRUCA_I -0.02 0.04 NA NA 0.07 -0.07 NA NA 1.00
# check for missingness
missing_counts <- colSums(is.na(ninevar))
missing_summary <- data.frame(Variable = names(missing_counts),
                              Missing_Count = missing_counts)

kable(missing_summary, format = "markdown", align = "c", caption = "Missing Value Summary")
Missing Value Summary
Variable Missing_Count
LIFESAT_I LIFESAT_I 0
RACE_RECODE RACE_RECODE 0
GEDUC1 GEDUC1 4
Q34 Q34 38
POVERTY_I POVERTY_I 0
PINC_I PINC_I 0
GD8B GD8B 862
GP2 GP2 1015
GRUCA_I GRUCA_I 0
missingTPOP <- md.pattern(ninevar, plot = TRUE, rotate.names = TRUE)

library(e1071)

means <- sapply(ninevar, function(x) round(mean(x), 2))
print(means)
##   LIFESAT_I RACE_RECODE      GEDUC1         Q34   POVERTY_I      PINC_I 
##        4.82        1.56          NA          NA        0.13        7.28 
##        GD8B         GP2     GRUCA_I 
##          NA          NA        2.34
sd <- sapply(ninevar, function(x) round(sd(x), 2))
print(sd)
##   LIFESAT_I RACE_RECODE      GEDUC1         Q34   POVERTY_I      PINC_I 
##        1.65        1.35          NA          NA        0.34        3.85 
##        GD8B         GP2     GRUCA_I 
##          NA          NA        2.55
skew <- sapply(ninevar, function(x) round(skewness(x), 2))
print(skew)
##   LIFESAT_I RACE_RECODE      GEDUC1         Q34   POVERTY_I      PINC_I 
##       -0.64        2.59          NA          NA        2.21       -0.25 
##        GD8B         GP2     GRUCA_I 
##          NA          NA        2.00
fourvar <- subset(ninevar, select = c(RACE_RECODE, POVERTY_I, PINC_I, GRUCA_I))
ggplot(fourvar, aes(x = RACE_RECODE)) +
  geom_histogram() +
  labs(title = "Histogram of RACE_RECODE")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(fourvar, aes(x = POVERTY_I)) +
  geom_histogram() +
  labs(title = "Histogram of POVERTY_I")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(fourvar, aes(x = PINC_I)) +
  geom_histogram() +
  labs(title = "Histogram of PINC_I")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(fourvar, aes(x = GRUCA_I)) +
  geom_histogram() +
  labs(title = "Histogram of GRUCA_I")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Create violin plots for four variables
race_data <- transpop2$RACE_RECODE
poverty_data <- transpop2$POVERTY_I
pinc_data <- transpop2$PINC_I
gruca_data <- transpop2$GRUCA_I

vioplot(race_data, poverty_data, pinc_data, gruca_data, names = c("RACE_RECODE", "POVERTY_I", "PINC_I", "GRUCA_I"))

Findings

Using life satisfaction as my outcome variable and personal income as my numeric predictor variable, I will now perform multiple linear regression analysis to see if I can create a model that predicts life satisfaction using transgender and cisgender status as a predictor as well. I will then test the models for significance using ANOVA.

library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
reg1 <- lm(transpop2$LIFESAT_I ~ transpop2$PINC_I)
summary(reg1)
## 
## Call:
## lm(formula = transpop2$LIFESAT_I ~ transpop2$PINC_I)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5998 -1.0635  0.3365  1.2089  3.1713 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.82867    0.08843   43.30   <2e-16 ***
## transpop2$PINC_I  0.13624    0.01074   12.68   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.568 on 1434 degrees of freedom
## Multiple R-squared:  0.1009, Adjusted R-squared:  0.1002 
## F-statistic: 160.9 on 1 and 1434 DF,  p-value: < 2.2e-16

This is my basic model, without any effects of poverty or gender status: LIFESAT_I = 3.82867 + 0.13624(PINC_I). The model indicates that approximately 10% of life satisfaction is accounted for by personal income and this model is statistically significant.

library(car)

reg2 <- lm(transpop2$LIFESAT_I ~ transpop2$PINC_I + New_TRANS_CIS)
summary(reg2)
## 
## Call:
## lm(formula = transpop2$LIFESAT_I ~ transpop2$PINC_I + New_TRANS_CIS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6512 -0.9989  0.2997  1.1488  3.4832 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.13068    0.09438  43.766  < 2e-16 ***
## transpop2$PINC_I  0.11697    0.01078  10.847  < 2e-16 ***
## New_TRANS_CIS    -0.84783    0.10573  -8.019 2.19e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.535 on 1433 degrees of freedom
## Multiple R-squared:  0.1395, Adjusted R-squared:  0.1383 
## F-statistic: 116.1 on 2 and 1433 DF,  p-value: < 2.2e-16

The second model adds gender status: LIFESAT_I = 4.13068 + 0.11697(PINC_I) - 0.84783(Trans). This model indicates that approximately 14% of life satisfaction is accounted for my personal income and gender status. The model is also statistically significant.

library(car)

reg3 <- lm(transpop2$LIFESAT_I ~ transpop2$PINC_I + New_TRANS_CIS + transpop2$POVERTY_I)
summary(reg3)
## 
## Call:
## lm(formula = transpop2$LIFESAT_I ~ transpop2$PINC_I + New_TRANS_CIS + 
##     transpop2$POVERTY_I)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5744 -0.9853  0.3142  1.1143  3.7410 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.41920    0.10912  40.497  < 2e-16 ***
## transpop2$PINC_I     0.08886    0.01201   7.400 2.31e-13 ***
## New_TRANS_CIS       -0.81547    0.10499  -7.767 1.52e-14 ***
## transpop2$POVERTY_I -0.70018    0.13631  -5.137 3.18e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.521 on 1432 degrees of freedom
## Multiple R-squared:  0.1551, Adjusted R-squared:  0.1533 
## F-statistic:  87.6 on 3 and 1432 DF,  p-value: < 2.2e-16

The final model adds poverty: LIFESAT_I = 4.41920 + 0.08886(PINC_I) - 0.81547(Trans) - 0.70018(In poverty). This model indicates that approximately 15.5% of the model is accounted for by the predictor variables and the model as a whole is statistically significant.

anova(reg1, reg2, reg3)
## Analysis of Variance Table
## 
## Model 1: transpop2$LIFESAT_I ~ transpop2$PINC_I
## Model 2: transpop2$LIFESAT_I ~ transpop2$PINC_I + New_TRANS_CIS
## Model 3: transpop2$LIFESAT_I ~ transpop2$PINC_I + New_TRANS_CIS + transpop2$POVERTY_I
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   1434 3526.5                                  
## 2   1433 3375.0  1    151.46 65.447 1.261e-15 ***
## 3   1432 3314.0  1     61.06 26.385 3.183e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
residualPlot(reg3)

The ANOVA table compares the goodness-of-fit between different models in terms of their ability to explain the variation in life satisfaction. In this case, Model 1, which includes only the predictor personal income (PINC_I) is compared to Model 2, which includes personal income and gender status (New_TRANS_CIS), and Model 3, which includes PINC_I, New_TRANS_CIS, and POVERTY_I. The table provides information about the degrees of freedom, the residual sum of squares (RSS), the sum of squares, the F-statistic, and the p-value. The p-values indicate whether the additional predictors in Model 2 and Model 3 significantly improve the model’s ability to explain the variation in LIFESAT_I. In this case, both Model 2 and Model 3 show significant improvements compared to the preceding models, suggesting that the additional predictors contribute significantly to explaining the variation in life satisfaction. The residual plot is a visualization that helps us assess if the linear regression model is appropriate and it shows it is.

Discusson

The TransPop data set is a wealth of information. This study endeavored to examine what variables may contribute to life satisfaction for transgender and cisgender populations in the United States. A weakness of this study is the ability to analyze the vast amount of data available. Further research is required to understand how to run a large scale analysis on the more than 600 variables available in this data set.

References

Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13

Feldman, J. L., Luhur, W. E., Herman, J. L., Poteat, T., & Meyer, I. H. (2021). Health and health care access in the US transgender population health (TransPop) survey. Andrology, 9(6). https://doi.org/10.1111/andr.13052

Grant, J.M., Mottet, L.A., Tanis, J., Harrison, J., Herman, J.L., & Keisling, M. (2011). Injustice at every turn: A report of the national transgender discrimination survey. National Center for Transgender Equality and National Gay and Lesbian Task Force. https://www.onlabor.org/wp-content/uploads/2016/04/ntds_full.pdf

James, S. E., Herman, J. L., Rankin, S., Keisling, M., Mottet, L., & Anafi, M. (2016). The Report of the 2015 U.S. Transgender Survey. Washington, DC: National Center for Transgender Equality. https://transequality.org/sites/default/files/docs/usts/USTS-Full-Report-Dec17.pdf

Krueger, E.A., Divsalar, S., Luhur, W., Choi, W.K., Meyer, I.H. (2020) TransPop U.S. Transgender Population Health Survey Methodology and Technical Notes. https://static1.squarespace.com/static/55958472e4b0af241ecac34f/t/5ef1331ecc152d3b60f068d6/1592865567882/TransPop+Survey+Methods+v18+FINAL+copy.pdf

Lane, M., Waljee, J. F., & Stroumsa, D. (2022). Treatment Preferences and Gender Affirmation of Nonbinary and Transgender People in a National Probability Sample. Obstetrics & Gynecology, Publish Ahead of Print. https://doi.org/10.1097/aog.0000000000004802

Meyer, I. H., Wilson, D.M.W., O’Neill. (2021). LGBTQ People in the US: Select findings from the Generations and TransPop Studies. https://williamsinstitute.law.ucla.edu/publications/generations-transpop-toplines/

Shabbir, A., Fatima, B, Aslam, T., Haider, T. (2022) Social interaction axiety and life satisfaction among transgender people. Pakistan Journal of Psychology, 53(2), 3. http://ezproxy.cul.columbia.edu/login?url=https://www.proquest.com/scholarly-journals/social-interaction-anxiety-life-satisfaction/docview/2800938917/se-2.