library(ggplot2)
library(dplyr)
library(statsr)load("gss.rdata")The General Social Survey (GSS) is one of the most frequently analyzed sources of data in the social sciences. It aims to monitor societal trends, attitudes, and changes in contemporary American society. This analysis will focus on the GSS Cumulative File 1972-2012, with missing values recoded to “NA” to facilitate better data exploration in R.
Methodology and Scope of Inference
The GSS is collected in even numbered years, starting as a paper-based questionnaire and transitioning to computer-assissted personal interviewing (CAPI) in 2002. These in-person interviews have a target population of adults (18+) living in households in the U.S., randomly selected from across the country. Participation is voluntary.
Generalizability
Data is collected by random sampling across the nation. Respondents are from a mix of urban, suburban, and rural geographic areas. Thus, results of the GSS are loosely generalizable to adults living in households across the United States. This excludes incarcerated individuals and those who are homeless or living in shelters. One notable reservation on this generalizability is the potential hesitance participants may have in sharing personal views on sensitive subjectives face-to-face.
Causality
The GSS is a survey that examines trends in societal attitudes- it is observational. It is not interventional and does not feature random assignment of participants to treatments or controls. No conclusions of causality can be drawn from the data, only correlations between our variables of interest.
The Affordable Care Act (ACA) has been one of the most significant health policy enactments since Medicare and Medicaid. The huge scope of the ACA and the political energy surrounding it have made it an enormously divisive topic. Many eligible seniors have reaped significant benefits. Since 2011, Medicare beneficiaries have received free preventive screenings such as mammograms and colonoscopies through the ACA. The law also implemented an annual free wellness visit for Medicare recipients. Perhaps most importantly, the ACA has decreased the burden of Part D prescription drug costs - the so-called Medicare “donut hole”, with 9.4 million Medicare beneficiaries saving more than $15 billion on prescription drug costs from 2010 - 2015 (1). The GSS data will allow us to explore public opinion on government medical assistance in the over 65 age group during this very important time in health policy. We will focus on the 2012 GSS survey, during which all Medicare recipients had coverage influenced by the ACA.
Is there a correlation between being at or above the usual age of eligibility for Medicare (65) and views on government assistance with medical care costs?
Variables are defined in the modified General Social Survey Cumulative File, 1972 - 2012. Codebase source: https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/34802.
Variables:load("gss.rdata")
#Select only data from 2012; filter out N/A values for our variables of interest
gss_clean <- gss %>% filter(year ==2012, !is.na(age), !is.na(helpsick))
# Select only our variables of interest
gss_clean_age.sick <- gss_clean %>% select(age, helpsick)We are mostly interested in the difference in opinion between those 65 or older versus those under 65. We will organize our data accordingly:
# Combine ages into two categories
gss_clean_age.sick <- mutate(gss_clean_age.sick, sixtyfiveorover = ifelse(age <65, "under 65", "65 or over"))
table(gss_clean_age.sick$sixtyfiveorover)##
## 65 or over under 65
## 210 704
table(gss_clean_age.sick$helpsick, gss_clean_age.sick$sixtyfiveorover)##
## 65 or over under 65
## Govt Should Help 64 319
## Agree With Both 102 307
## People Help Selves 44 78
Our total sample size is 914, with 210 respondents in the “65 or over” age group and 704 in the “under 65” age group. We can represent our data graphically to better visualize potential trends. We also want to order our variables so the “under 65” category preceeds the “65 or over” category.
# Ordering data on the x-axis
level_order <- c('under 65', '65 or over')
ggplot(data = gss_clean_age.sick, aes(x= factor(sixtyfiveorover, level = level_order), fill = helpsick)) +
geom_bar() +
labs(title= "Age and View on Govt Assistance for Medical Care Costs", x= "Age", y= "Count", fill = "Opinion") +
theme(plot.title = element_text(hjust = 0.5))Because of the large difference in count between the two categories, it is difficult to observe any obvious trends in the data. We will visualize the data as a proportion:
ggplot(data = gss_clean_age.sick, aes(x= factor(sixtyfiveorover, level = level_order), fill = helpsick)) +
geom_bar(position = "fill") +
labs(title= "Age and View on Govt Assistance for Medical Care Costs", x= "Age", y= "Proportion of age group", fill = "Opinion") +
theme(plot.title = element_text(hjust = 0.5))At first glance, it seems like the typical population eligible for Medicare is proportionally more likely to believe that people should help themselves for medical care, compared to the “under 65” population. Proportionally more people under the age of 65 seem to believe that the government should assist with medical costs. This is interesting because we would expect Medicare beneficiaries to be in favor of policies that support them. We will perform inference to determine if this correlation between age group and belief in government assistance is indeed present.
A mosaic plot is another tool for representing data from two qualitative variables. It can give us a graphical representation of both total counts and the relative proportions, which is useful for the data at hand.
# Replace spaces with line breaks for better y-axis readability
levels(gss_clean_age.sick$helpsick) <- gsub(" ", "\n", levels(gss_clean_age.sick$helpsick))
# Mosaic plot
plot(table(x= factor(gss_clean_age.sick$sixtyfiveorover, level= level_order), y= gss_clean_age.sick$helpsick), main = "Mosaic Plot", xlab = "Age", ylab = "Govt Help")Age groups: “under 65”, “65 or over”
Revisiting our research question, we can frame it as “is there a correlation between age group and views on government assistance with medical care costs?”"
\(H_0:\) Age group and belief in government assistance for medical care costs are independent.
\(H_A:\) Age group and belief in government assistance for medical care costs are dependent. In other words, belief in governmental assistance for medical costs does vary by age group.
We want to perform inference on 2 categorical variables (sixtyfiveorover, helpsick), one of which (helpsick) has 3 levels. Thus, we will use a chi-square test of independence to evaluate our hypothesis.
Independence: The GSS uses random sampling without replacement, so independence is conditional on the sample size being less than 10% of the total U.S population in 2012. Our sample size of 914 meets this condition.
Sample Size: For a chi-square test of independence, we represent our data in a contingency table. The expected frequency count for each cell of our contigency table needs to be at least 5. Our expected values for each cell of a chi-square test of independence follow the formula \(\frac{row\: total \:\times\: column\: total}{table\: total}\). We can calculate these by hand or use the R function as below. Our data meets this sample size condition:
# Replace line breaks with spaces in levels for table readability
levels(gss_clean_age.sick$helpsick) <- gsub("\n", " ", levels(gss_clean_age.sick$helpsick))
# Contingency table of sample data
table(gss_clean_age.sick$helpsick, gss_clean_age.sick$sixtyfiveorover)##
## 65 or over under 65
## Govt Should Help 64 319
## Agree With Both 102 307
## People Help Selves 44 78
# Contingency table of expected counts, showing at least 5 per cell
chisq.test(gss_clean_age.sick$helpsick,gss_clean_age.sick$sixtyfiveorover)$expected##
## gss_clean_age.sick$helpsick 65 or over under 65
## Govt Should Help 87.99781 295.00219
## Agree With Both 93.97155 315.02845
## People Help Selves 28.03063 93.96937
Our data meets all conditions, and so we can use theoretical methods to evaluate our hypothesis. Using a simulation approach such as a randomization test is not necessary.
# Table of observed sample data
gss12_analysis <- table(gss_clean_age.sick$helpsick, gss_clean_age.sick$sixtyfiveorover)We will use the expected (E) and observed (O) sample values for our inference. Our degrees of freedome depend on the rows (R) and columns (C) from our contingency table.
\(\chi^2 =\sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}\)
\(df = (R - 1) \times (C - 1) = 2\)
We can now perform a \(\chi^2\) test in R:
chisq.test(gss12_analysis)##
## Pearson's Chi-squared test
##
## data: gss12_analysis
## X-squared = 21.199, df = 2, p-value = 2.493e-05
The \(\chi^2\) test statistic is 21.199, and at 2 degrees of freedom, the corresponding p-value of .00002493 is much smaller than the significance level of 5%. Therefore, these data provide convincing evidence to reject our null hypothesis and accept our alternative. We find that age group and belief in government assistance for medical care costs are associated.
Although we cannot draw any causal conclusions, it appears that being 65 or older is associated with a tendency to believe in less government assistance for medical care. This age group is more likely to believe in personal responsibility for medical costs versus the younger group. Given the significant benefits of Medicare, we might have expected the opposite. This older population may have been raised to believe in individual rather than collective frameworks of responsibility. In contrast, younger generations have been raised with a more explicit and global focus on social responsibility for people, the environment, and sustainability. Political views could be a notable confounding factor for these differences.
Considering the complexity of the ACA and how politically charged it is as a topic, people may simply not be aware of the benefits they are receiving. This suggests the need for increasing health literacy in the elderly population, as well as promoting fuller understanding of the benefits and drawbacks of the ACA. Further research would include comparing these findings to trends before the implementation of the ACA in 2010.
Chi-square tests of independence have no associated confidence intervals.
Source:
(1) Borelli MC, Bujanda M, Maier K. The Affordable Care Act Insurance Reforms: Where Are We Now, and What’s Next?. Clin Diabetes. 2016;34(1):58-64. doi:10.2337/diaclin.34.1.58