The General Social Survey (GSS) is a nationally representative survey of adults in the United States. It is conducted as personal-interview survey.
This is generalizable to noninstitutionalized, English and Spanish speaking persons 18 years of age or older, living in the United States.(NORC. University of Chicago. The General Social Survey. available at:link).
As this is an observational, cross sectional study, only associations may be investigated.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## Warning: package 'readr' was built under R version 4.2.1
## Warning: package 'forcats' was built under R version 4.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
load("~/R data/Social Survey (GSS)/_5db435f06000e694f6050a2d43fc7be3_gss (2).Rdata")
Confidence in medicine is expected to have different predictors, one of them might be person´s education. Therefore, the objective is to test whether there is an association between confidence in medicine (Research question: “I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? - Medicine”.) and person´s education based on the self reported degree achieved (categorized as high school and university education).
Null hypothesis, H0, there is nothing going on, level of education and confidence in medicine are independent. Alternative hypothesis, Ha, there is something going on, therefore level of education and confidence in medicine are dependent. Confidence in medicine vary according to education degree achieved.
Chi square test of independence was selected, as we are dealing with two categorical variables: education level (two levels: high school and some university degree) and confidence in medicine (three levels: “great deal of confidence”, “only some confidence”, or “hardly any confidence”). Chosen significance level is α = 0.05. General Social Survey Cumulative File, 1972-2012 is used as data source. R software is used for the data description and analysis (R Core Team, (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. available at:link
Number of observations:
length(gss$caseid)
## [1] 57061
Missing data in variables used for analysis:
View(gss[is.na(gss$degree), ])
length(gss[is.na(gss$degree), ])
## [1] 114
View(gss[is.na(gss$conmedic), ])
length(gss[is.na(gss$conmedic), ])
## [1] 114
Checking missing data in both variables - there is no obvious pattern which could indicate a potential systematic bias.
In both variables, 114 observations is missing (which is ~0.2% of the total observations). For the analysis, missing values will be removed.
gss %>%
select(degree, conmedic) %>%
na.omit() %>%
mutate(degree_cat=if_else(degree=="Lt High School"|degree =="High School", "high school", "university")) %>%
ggplot(aes(x=degree_cat, fill=conmedic))+geom_bar(position = "fill")+labs(title="Confidence in medicine according to level of education", x="Degree category", y="Percentage")
chi_test<-gss %>%
select(degree, conmedic) %>%
na.omit() %>%
mutate(degree_cat=if_else(degree=="Lt High School"|degree =="High School", "high school", "university"))
chi_test %>%
select(degree_cat, conmedic) %>%
table() %>%
chisq.test()
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 93.679, df = 2, p-value < 2.2e-16
Expected values condition:
chi_test %>%
select(degree_cat, conmedic) %>%
table() %>%
chisq.test() %>%
.$expected
## conmedic
## degree_cat A Great Deal Only Some Hardly Any
## high school 13064.336 12547.084 2333.5807
## university 4514.664 4335.916 806.4193
Since p value for chi square test is tiny (p < 2.2e-16), we can reject null hypothesis and conclude that level of education and confidence in medicine are dependent/associated. To see, whether education is really a independent significant predictor of confidence in medicine concept, more rigorous analysis should be performed (regression model). The current association found might be result of confounding (therefore could be a result of variable that affects both confidence in medicine and also level of education).