1 Introduction
2 Data Description
3 Research Questions
4 Logistic Regression on DOCTRST Variable
5 ANOVA Test on DOCSAT variable
6 Chi-square Test on MEDBEST variable
7 Results
8 Discussion
9 Summary

1 Introduction

The 2022 GSS survey is a multimode survey with three separate ballots where Americans are asked a plethora of census-type and public opinion questions through online surveys, in-person or over-the-phone interviews. The GSS survey took samples until the weighted response rate for each survey method reached 50%, making the complete sample size 3,544 from 27,591 lines of sample. In this report, we will be using data from the 2022 General Social Survey while filtering out any missing values and neutral answers for analysis with Likert scale type questions. We are looking to find information on how society trusts doctors based on different social constructs using a multiple logistic regression model, an ANOVA and a Chi-square test.

2 Data Description

The data set we will be using was gathered from gssdataexplorer.norc.org. All of the variables are coded as continuous, but some are dichotomized for analysis. The variables are as follows:

X: Identification Variable
YEAR: GSS year for this respondent. We will only be looking at the year 2022, all other observations can be dropped.
BALLOT: Ballot used for this interview, we will only be using Ballot C.
AGE: Respondent’s age
RACE: Respondent’s race
INCOME: Respondent’s total family income
SEX: Respondent’s sex, coded 1 as male and 2 as female
DOCTRST: The following question is asked on a Likert based scale with 1 being strongly agree and 5 being strongly disagree: How much do you agree or disagree with the following statements about doctors in general in the United States? All things considered, doctors can be trusted?
```
      ** Made categorical:
           Score given < 3, variable is coded as 0,
          Score > 3, variable is coded as 1.
```
DOCSAT1: The following question is asked on a Likert based scale with 1 being completely satisfied and 7 being completely dissatisfied: How satisfied or dissatisfied were you with the treatment you received when you last visited a doctor?
MEDBEST: The following question is asked on a Likert based scale with 1 being certain they would get the best treatment and 5 being certain that the respondent would not get the best treatment: How likely is it that if you become seriously ill, you would get or not get the best treatment available in the United States?
```
       ** Made categorical:
          Score < 3, variable is coded as "Not Likely",
          Score > 3, variable is coded as "Likely"
```
WTSSPS: post-stratification weights

3 Research Questions

Our practical and analytical question is how do factors such as race, sex, age, and income effect United States citizen’s trust in doctors. The main question is if the social group a person is in has an impact on their opinion on doctors. Then, we are looking to find which of these factors have the most impact. My hypothesis is that race and sex will have the biggest impact, so we will be using these variables specifically with other analysis methods than logistic regression.

4 Logistic Regression on DOCTRST Variable

To begin answering these questions, we will first be creating a logistic regression model on the variable DOCTRST, where the question asked is: How much do you agree or disagree with the following statements about doctors in general in the United States? All things considered, doctors can be trusted. The question is answered on a Likert based scale and we dichotomized the response variable so that people who responded “Strongly Agree” and “Agree” are coded as 0 and people who responded “Strongly Disgree” and “Disgree” are coded as 1.

4.1 Analysis of Predictor Variables for Logistic Regression

We can start by making scatterplots and barplots of the predictor variables that we will use for logistic regression to see the correlation, spot patterns, and assess potential violations.

Since none of them look normal, we will continue our model building with caution.

4.2 Logistic Regression Model

To make the logistic regression model, we will use the predictor variables that put you in a social category with DOCTRST as the response variable. The regression coefficients are shown below.

Summary of inferential statistics of the logistic regression model
	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	-1.0578908	0.6441569	-1.6422875	0.1005304
RACE	0.2336571	0.1283577	1.8203588	0.0687044
SEX	0.1467703	0.1909326	0.7687023	0.4420701
AGE	-0.0103522	0.0054375	-1.9038604	0.0569284
INCOME	-0.0667629	0.0404246	-1.6515401	0.0986283

Summary Stats of Regression Coefficients
	Estimate	Std. Error	z value	Pr(>\|z\|)	2.5 %	97.5 %
(Intercept)	-1.0578908	0.6441569	-1.6422875	0.1005304	-2.3535927	0.1811269
RACE	0.2336571	0.1283577	1.8203588	0.0687044	-0.0239798	0.4804855
SEX	0.1467703	0.1909326	0.7687023	0.4420701	-0.2255613	0.5242639
AGE	-0.0103522	0.0054375	-1.9038604	0.0569284	-0.0210980	0.0002480
INCOME	-0.0667629	0.0404246	-1.6515401	0.0986283	-0.1434848	0.0161201

4.3 Odds Ratio

In the table below, we added an odds ratio that we converted from the regression coefficients. These make more practical sense to read.

Summary Stats with Odds Ratios
	Estimate	Std. Error	z value	Pr(>\|z\|)	odds.ratio
(Intercept)	-1.0578908	0.6441569	-1.6422875	0.1005304	0.3471873
RACE	0.2336571	0.1283577	1.8203588	0.0687044	1.2632113
SEX	0.1467703	0.1909326	0.7687023	0.4420701	1.1580879
AGE	-0.0103522	0.0054375	-1.9038604	0.0569284	0.9897012
INCOME	-0.0667629	0.0404246	-1.6515401	0.0986283	0.9354169

The odds ratio associated with the RACE variable is 1.234 meaning that when Race goes up, or diverges from white, which we have coded as 1 and goes up by one unit, the odds of having an answer higher on the Likert scale increase. In simpler terms, when asked the question: Can doctors be trusted, if someone is not white the chances of them answering with “Disagree” or “Strongly disagree” increase by about \(1.234\%\). This is the most significant factor for the DOCTRST variable statistically and practically. The odds ratio can be found in the table above and interpreted the same way for each response variable. Because all of them show significant odds ratios, this is our final logistic regression model.

Normally, we would also use goodness of fit measures to test our model. Since we are only making one model based off of the likelihood function, we can disregard other goodness of fit measures.

5 ANOVA Test on DOCSAT variable

Since the race variable is the most significant in the above model, deeper analysis using an ANOVA test was done on this predictor variable. Instead of using the same question as above, we changed the response variable to DOCSAT1. For this variable the following question is asked on a Likert based scale with 1 being completely satisfied and 7 being completely dissatisfied: How satisfied or dissatisfied were you with the treatment you received when you last visited a doctor?

ANOVA Table for DOCSAT1 and Race
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
RACE	1	4.669051	4.669051	2.192523	0.1390297
Residuals	906	1929.357381	2.129533	NA	NA

Since the p-value in the ANOVA test is 0, it is below the alpha level of .05 that we chose to use for analysis. This means, a person’s race has an effect on how satisfied they were at their last doctors visit.

6 Chi-square Test on MEDBEST variable

The sex variable was the second most significant in the logistic regression model, so again, there was deeper analysis on another, similar question. The MEDBEST variable raises the question: How likely is it that if you become seriously ill, you would get or would not get the best treatment available in the United States? It is asked on a Likert based scale with 1 being certain they would get the best treatment and 5 being certain that the respondent would not get the best treatment. To test this, we will be using a Chi-square test with the null hypothesus being that women, coded as 2, will be more likely to be certain they will not get the best treatment.

Pearson's Chi-squared test

data: GSS\(SEX.cat and GSS\)MEDBEST.cat X-squared = 0.57891, df = 1, p-value = 0.4467

Pearson's Chi-squared test with Yates' continuity correction

data: t X-squared = 0.44095, df = 1, p-value = 0.5067

This Chi square test shows the sex variable is insignificant when answering this question, meaning men and women have similar answers when asked if they would get the best treatment available in the United States if they become seriously ill.

7 Results

Below are the summary statistics of our logistic regression model and the results from the ANOVA and Chi-square tests. The model and tests both show statistical significance for our practical and analytical questions.

Summary Stats with Odds Ratios
	Estimate	Std. Error	z value	Pr(>\|z\|)	odds.ratio
(Intercept)	-1.0578908	0.6441569	-1.6422875	0.1005304	0.3471873
RACE	0.2336571	0.1283577	1.8203588	0.0687044	1.2632113
SEX	0.1467703	0.1909326	0.7687023	0.4420701	1.1580879
AGE	-0.0103522	0.0054375	-1.9038604	0.0569284	0.9897012
INCOME	-0.0667629	0.0404246	-1.6515401	0.0986283	0.9354169

ANOVA Table for DOCSAT1 and Race
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
RACE	1	4.669051	4.669051	2.192523	0.1390297
Residuals	906	1929.357381	2.129533	NA	NA

Pearson's Chi-squared test with Yates' continuity correction

data: t X-squared = 0.44095, df = 1, p-value = 0.5067

8 Discussion

These results yield the conclusion that people of color in the United States do not trust doctors as much as white people do, which is the same as my hypothesis. I believe because of this is history of abuse of black people in the medical system in the United States. This issue of systemic racism would need to be solved starting in the education system, trying to integrate people of color into scientific fields. Then when a non-white person goes to the doctors office, they can be talking to and putting their trust into someone that looks like they do. This seems to be one of the only solutions to the issue of race being the biggest oppressor when it comes to medical care in the United States

9 Summary

In this report we used logistic regression, an ANOVA, and a Chi-squre test to conclude that race is the most significant predictor variable when the population is asked questions about their trust in doctors.

Statistics Capstone Project

Emma Laughlin

2023-12-12