Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting.
GSS questions cover a diverse range of issues including national spending priorities, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and sexual behavior.
According the GSS Panel Codebook, the GSS has conducted 32 surveys from 1972 to 2018, including a total of 64,814 completed interviews. Each survey from 1972 to 2004 was an independently drawn sample of English-speaking persons 18 years of age or over. Starting in 2006, Spanish-speakers were added to the target population. Block quota sampling was used for 5 of the survey years, with full probability sampling employed in the remainder of the surveys. Additionally, from 2004 onward, GSS utilizes a three-wave, rolling panel sample design, which also uses non-responsive sub-sampling to keep the design unbiased.
Although the surveys have been subject to changes in the sampling methodology and questions, response and forms have been modified over the years, overall the GSS does provide a useful sample to be able to generalize to the US adult population.
However, as it is an observational study conducted through surveys and interviews with respondents, any statistical inference cannot be used to draw casual relationships.
At this time in history, with a pandemic affecting the entire globe and climate change becoming an ever more present risk to be addressed, trust in the scientific community may be as important as ever. We will look to see whether confidence in our scientific institutions has increased over a thirty year period (1980 to 2010) and whether level of education is correlated with confidence in these institutions.
For these questions, we will perform tests for the difference of two proportions. We will define our alpha level as 0.05. And we need to define our hypotheses for our tests.
Test 1:
H_0 = Trust in our scientific institutions is the same in 2010 as in 1980. (p_1980 = p_2010)
H_a = Trust in our scientific institutions is different in 2010 than in 1980. (p_1980 != p_2010)
Test 2:
H_0 = Trust in our scientific institutions does not differ based on education level. (p_college = p_no_college)
H_a = Trust in our scientific institutions differs based on education level. (p_college != p_no_college)
From the survey data, we estimate that trust in scientific institutions has declined by 4.7% with a 95% Confidence Interval of 0.08% to 8.5%. Additionally, we estimate that the trust by college graduates is 19.0% greater with a 95% confidence interval of 13.2% to 24.9%.
Trust in our scientific institutions has decreased even though we may be relying on them even more. Further research will be needed to see if education can a partial solution to gaining trust in these needed institutions.
First we will subset the data to include only the variables we are analyzing. Additionally, we will simplify the variables to only have 2 factors. For, confidence in science institutions, we will simplify to Trust / No_trust. And for education level, we will simplify to College or No_college.
data <- gss %>%
select(year, consci, degree) %>%
drop_na() %>%
filter(year == 1980 | year == 2010) %>%
mutate(conf_sci = ifelse(consci == "A Great Deal", "Trust", "No_trust")) %>%
mutate(coll_grad = ifelse(degree == "Bachelor" | degree == "Graduate",
"College", "No_College"))
head(data, 10) %>%
kable() %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)| year | consci | degree | conf_sci | coll_grad |
|---|---|---|---|---|
| 1980 | A Great Deal | Lt High School | Trust | No_College |
| 1980 | A Great Deal | Lt High School | Trust | No_College |
| 1980 | A Great Deal | Lt High School | Trust | No_College |
| 1980 | A Great Deal | High School | Trust | No_College |
| 1980 | Only Some | High School | No_trust | No_College |
| 1980 | A Great Deal | High School | Trust | No_College |
| 1980 | Only Some | High School | No_trust | No_College |
| 1980 | A Great Deal | Bachelor | Trust | College |
| 1980 | A Great Deal | Bachelor | Trust | College |
| 1980 | A Great Deal | Lt High School | Trust | No_College |
From this subsetted data, we can create tables and visualizations to analyze our variables.
data_prop %>%
pivot_wider(names_from = coll_grad, values_from = Trust) %>%
kable() %>%
kable_styling(full_width = FALSE)| year | College | No_College |
|---|---|---|
| 1980 | 0.6063348 | 0.4356808 |
| 2010 | 0.5507614 | 0.3606195 |
ggplot(data, aes(factor(conf_sci), fill = factor(year))) +
geom_bar(position = "dodge") +
facet_wrap(~ coll_grad) +
ggtitle("Confidence in Science Institutions by College Education") +
labs(x = "Confidence in Science",
y = ("Respondent Count")) +
scale_fill_manual("Year",
values = c("1980" = "lightblue", "2010" = "blue"))Further, we will subset our data, so that we have the needed information for each our tests.
trust_by_education <-data %>%
filter(year == 2010) %>%
group_by(coll_grad) %>%
summarize(Trust = mean(conf_sci == "Trust"),
count = n())
trust_by_education %>%
kable() %>%
kable_styling(full_width = FALSE)| coll_grad | Trust | count |
|---|---|---|
| College | 0.5507614 | 394 |
| No_College | 0.3606195 | 904 |
trust_by_year <-data %>%
group_by(year) %>%
summarize(Trust = mean(conf_sci == "Trust"),
count = n())
trust_by_year %>%
kable() %>%
kable_styling(full_width = FALSE)| year | Trust | count |
|---|---|---|
| 1980 | 0.4650078 | 1286 |
| 2010 | 0.4183359 | 1298 |
Before we begin to perform any statistical inference, we will need to test the conditions of the sampling distribution of the difference of proportions.
Independence
The respondents to the survey have been independently drawn from the population and represent less than 10% of the representative population, so we meet the condition for independence.
Normality
The success-failure for each group in our tests are in excess of 10, so we can safely apply the normal model.
Because we are comparing two difference we will need to calculate both a point estimate and pooled proportion for each test. From this pooled proportion we can calculate a standard error and a z-score. From this z-score, we can derive a p_value to compare to our alpha limit of 0.05.
Let’s perform our first test
H_0 = Trust in our scientific institutions is the same in 2010 as in 1980. (p_1980 = p_2010)
H_a = Trust in our scientific institutions is different in 2010 than in 1980. (p_1980 != p_2010)
p_1980 <- trust_by_year[1, 2]
p_2010 <- trust_by_year[2, 2]
n_1980 <- trust_by_year[1, 3]
n_2010 <- trust_by_year[2, 3]
pnt_est_1 <- p_2010 - p_1980
p_pool_1 <- (p_1980 * n_1980 + p_2010 * n_2010) / (n_1980 + n_2010)
SE_1 <- sqrt((p_pool_1 * (1 - p_pool_1)) / n_1980 + (p_pool_1 * (1 - p_pool_1)) / n_2010)
z_score_1 <- as.numeric(pnt_est_1 / SE_1)
p_value_1 <- pnorm(abs(z_score_1), lower.tail = FALSE)
df_table_1 <- data.frame(c(p_1980, p_2010, n_1980, n_2010,
pnt_est_1, p_pool_1, SE_1, z_score_1, p_value_1))
colnames(df_table_1) <- c("p_1980", "p_2010", "n_1980", "n_2010",
"pnt_est", "p_pool", "SE", "z_score", "p_value")
kable(df_table_1) %>%
kable_styling()| p_1980 | p_2010 | n_1980 | n_2010 | pnt_est | p_pool | SE | z_score | p_value |
|---|---|---|---|---|---|---|---|---|
| 0.4650078 | 0.4183359 | 1286 | 1298 | -0.0466719 | 0.4415635 | 0.0195376 | -2.388819 | 0.0084513 |
From this test, we calculate a p-value of .008, which is less than our alpha level of 0.05, so we reject the Null hypothesis, and conclude that trust in the scientific institution has changed over this 30 year period. Unfortunately, trust has declined over the period by 4.66%.
Let’s look at our second test:
H_0 = Trust in our scientific institutions does not differ based on education level. (p_college = p_no_college)
H_a = Trust in our scientific institutions differs based on education level. (p_college != p_no_college)
p_coll <- trust_by_education[1, 2]
p_no_coll <- trust_by_education[2, 2]
n_coll <- trust_by_education[1, 3]
n_no_coll <- trust_by_education[2, 3]
pnt_est_2 <- p_coll - p_no_coll
p_pool_2 <- (p_coll * n_coll + p_no_coll * n_no_coll) / (n_coll + n_no_coll)
SE_2 <- sqrt((p_pool_2 * (1 - p_pool_2)) / n_coll + (p_pool_2 * (1 - p_pool_2)) / n_no_coll)
z_score_2 <- as.numeric(pnt_est_2 / SE_2)
p_value_2 <- pnorm(abs(z_score_2), lower.tail = FALSE)
df_table_2 <- data.frame(c(p_coll, p_no_coll, n_coll, n_no_coll,
pnt_est_2, p_pool_2, SE_2, z_score_2, p_value_2))
colnames(df_table_2) <- c("p_coll", "p_no_coll", "n_coll", "n_no_coll",
"pnt_est", "p_pool", "SE", "z_score", "p_value")
kable(df_table_2) %>%
kable_styling()| p_coll | p_no_coll | n_coll | n_no_coll | pnt_est | p_pool | SE | z_score | p_value |
|---|---|---|---|---|---|---|---|---|
| 0.5507614 | 0.3606195 | 394 | 904 | 0.190142 | 0.4183359 | 0.0297786 | 6.385196 | 0 |
From this test, we calculate a p-value of approximately 0, so we reject the Null hypothesis, and conclude that trust in the scientific institution does differ by educational level.
Lastly, let’s calculate 95% confidence intervals for each test.
conf_int_year <- as.numeric(pnt_est_1) + c(-1,1) * qnorm(.975) * as.numeric(SE_1)
conf_int_education <- as.numeric(pnt_est_2) + c(-1,1) * qnorm(.975) * as.numeric(SE_2)
df_table_3 <- rbind(conf_int_year, conf_int_education)
kable(df_table_3) %>%
kable_styling(full_width = FALSE)| conf_int_year | -0.0849649 | -0.0083788 |
| conf_int_education | 0.1317770 | 0.2485069 |
Even though we are relying more on scientific institutions to deal with catastrophic events, our trust in these institutions has been eroded over time. With college graduates trust higher, further research is needed to see if additional education can provide an increse in trust in these necessary institutions.