Data

Data description

The General Social Survey (GSS) is a nationally representative survey of adults in the United States. It is conducted as personal-interview survey.

This is generalizable to noninstitutionalized, English and Spanish speaking persons 18 years of age or older, living in the United States.(NORC. University of Chicago. The General Social Survey. available at:link).

As this is an observational, cross sectional study, only associations may be investigated.

Setup

Loading packages

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## Warning: package 'readr' was built under R version 4.2.1
## Warning: package 'forcats' was built under R version 4.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Loading data

load("~/R data/Social Survey (GSS)/_5db435f06000e694f6050a2d43fc7be3_gss (2).Rdata")

Data analysis

Research question

Confidence in medicine is expected to have different predictors, one of them might be person´s education. Therefore, the objective is to test whether there is an association between confidence in medicine (Research question: “I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? - Medicine”.) and person´s education based on the self reported degree achieved (categorized as high school and university education).

Hypotheses

Null hypothesis, H0, there is nothing going on, level of education and confidence in medicine are independent. Alternative hypothesis, Ha, there is something going on, therefore level of education and confidence in medicine are dependent. Confidence in medicine vary according to education degree achieved.

Statistical method selected

Chi square test of independence was selected, as we are dealing with two categorical variables: education level (two levels: high school and some university degree) and confidence in medicine (three levels: “great deal of confidence”, “only some confidence”, or “hardly any confidence”). Chosen significance level is α = 0.05. General Social Survey Cumulative File, 1972-2012 is used as data source. R software is used for the data description and analysis (R Core Team, (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. available at:link

Checking conditions for Chi square test of independence

  • Two categorical variables - “degree” and “confidence in”medicine”.
  • Two or more categories (groups, levels) for each variable - Degree: “high school”, “university”; Confidence in medicine - “A Great Deal”, “Only Some”, “Hardly Any”.
  • Independence of observations - random sample, observations are considered to be independent.
  • There is no relationship between the subjects in each group, not paired data - fulfilled.
  • Relatively large sample size - totally, 57 061 persons included (less than 10% of the population).
  • Expected frequencies should be at least 5 for the majority (80%) of the cells (see below).

Inference

Number of observations:

length(gss$caseid)
## [1] 57061

Missing data in variables used for analysis:

View(gss[is.na(gss$degree), ])
length(gss[is.na(gss$degree), ])
## [1] 114
View(gss[is.na(gss$conmedic), ])
length(gss[is.na(gss$conmedic), ])
## [1] 114

Checking missing data in both variables - there is no obvious pattern which could indicate a potential systematic bias.

In both variables, 114 observations is missing (which is ~0.2% of the total observations). For the analysis, missing values will be removed.

gss %>%
  select(degree, conmedic) %>%
  na.omit() %>%  
  mutate(degree_cat=if_else(degree=="Lt High School"|degree =="High School", "high school", "university")) %>%
  ggplot(aes(x=degree_cat, fill=conmedic))+geom_bar(position = "fill")+labs(title="Confidence in medicine according to level of education", x="Degree category", y="Percentage")

Chi square test of independence

chi_test<-gss %>% 
  select(degree, conmedic) %>% 
  na.omit() %>% 
  mutate(degree_cat=if_else(degree=="Lt High School"|degree =="High School", "high school", "university"))
chi_test %>% 
  select(degree_cat, conmedic) %>%
  table() %>% 
  chisq.test()
## 
##  Pearson's Chi-squared test
## 
## data:  .
## X-squared = 93.679, df = 2, p-value < 2.2e-16

Expected values condition:

chi_test %>% 
  select(degree_cat, conmedic) %>%
  table() %>% 
  chisq.test() %>% 
  .$expected
##              conmedic
## degree_cat    A Great Deal Only Some Hardly Any
##   high school    13064.336 12547.084  2333.5807
##   university      4514.664  4335.916   806.4193

Conclusion:

Since p value for chi square test is tiny (p < 2.2e-16), we can reject null hypothesis and conclude that level of education and confidence in medicine are dependent/associated. To see, whether education is really a independent significant predictor of confidence in medicine concept, more rigorous analysis should be performed (regression model). The current association found might be result of confounding (therefore could be a result of variable that affects both confidence in medicine and also level of education).