Has America’s Views on Abortion changed with the times?

Submitted by: Preetha Rajan

Email: praj016@aucklanduni.ac.nz

Setup

Load packages

library(ggplot2)
library(dplyr)
library(statsr)
library(janitor)
## Warning: package 'janitor' was built under R version 3.5.2
library(varhandle)
## Warning: package 'varhandle' was built under R version 3.5.2

Load data

load("be3_gss.Rdata")

Part 1: Data

Background

The General Social Survey (GSS) is a sociological survey of American adults that collects information on a wide range of demographic characteristics and attitudes towards a variety of social and political issues such as abortion, crime and punishment, race relations, gender roles and spending priorities. According to Wikipedia: The General Social Survey has three main purposes:

  1. To gather data to monitor and explain trends, changes, and constants in attitudes, behaviors, and attributes as well as examine the structure, development, and functioning of society in general as well as the role of various sub-groups.

  2. To compare the United States to other societies to place American society in comparative perspective and develop cross-national models of human society.

  3. To make up-to-date, important, high-quality data easily accessible to scholars, students, policy makers, and others with minimal cost and waiting

The data set considered for this project on Statistical Inference, is an extract of the General Social Survey (GSS) Cumulative File, spanning four decades (1972-2012). It consists of 57,061 observations with 114 variables. Each variable corresponds to a specific question asked to the respondent.

Data Collection Methodology

The GSS in essence, is the only full-probability, personal-interview survey designed to monitor changes in both social characteristics and attitudes currently being conducted in the United States, wherein during the sampling process, one adult is randomly selected from each sampled household and “the sample is a multi-stage area probability sample to the block or segment level. At the block level, however, quota sampling is used with quotas based on sex, age, and employment status” - Appendix A: Sampling Design (GSS) (page 2097).

“The GSS has six components, which include a replicating core, topical modules, cross-national modules, experiments, re-interviews, and follow-up studies. The replicating core emphasizes collection of data on social trends, through exact replication of question wording over time, wherein core items fall into two major categories- socio-demographic/background measures, and replicated measurements on social and political attitudes and behaviors. The topical modules are used to introduce new topics not previously investigated by the GSS and to cover existing topics in greater detail with more fully specified models. The replicating core makes up one-third of the GSS and the topical and cross-national modules the other two-thirds. Experiments are conducted in both the core and supplemental modules. Re-interview and follow-up studies are completed through additional data collections” - Overview of the GSS (National Science Foundation, 2007 - Page 11).

“Each respondent is asked the replicating core of socio-demographic background items, along with replicated measurements of socio-political attitudes and behaviors. Many of the latter are measured by way of a”ballot" design such that each item is answered by a random 2/3 of each sample. Each GSS sample (A and B) includes an International Social Survey Program module (ISSP). Each sample is also asked to respond to several topical modules that may be supported by the National Science Foundation or others" - Overview of the GSS (National Science Foundation, 2007 - Page 11).

“The GSS includes a panel study component in order to allow the direct observation of change over time in the same individuals” - Overview of the GSS (National Science Foundation, 2007 - Page 14).

The scope of inference - generalizability and causality

Due to the utilization of a variety of sampling design techniques such as full probability random sampling and block quota sampling (a form of stratified random sampling), and the fact that the GSS makes use of well-defined sampling frames (that are updated with the passage of time in order to ensure that the survey targets a well-defined population that is representative of the target population of US adults), the GSS claims that “its samples closely resemble US population distributions reported in the Census and other authoritative sources” - Appendix A: Sampling Design (GSS) (page 2109).

However, due a variety of factors such as “survey non-response and sampling variation, the GSS sample does deviate from known population figures for some variables. The GSS does not calculate any post-stratification weights to adjust for such differences”- Appendix A: Sampling Design (GSS) (page 2109).

There have also been instances of underrepresentation of certain demographic segments of the population (including males and males in full time employment) for certain sampling frames and studies across the years. Such instances do cause uncertainties with regard to the external validity of the data.

Furthermore, since the GSS is an observational study in which individuals are observed and/or certain outcomes are measured (in contrast to an experiment where there are treatment and control groups), the data collected cannot be utilized in the establishment of causal relations. Because observational studies are not randomized, they cannot control for all the other un-measurable and confounding factors that may actually be driving the results. Thus, any “link” between cause and effect in observational studies is speculative at best.

Observational studies are descriptive in nature and often lead to the determination of associations. Such studies are often important to generate the hypotheses, for subsequent interventional studies. These studies typically do not have well-defined mechanistic hypotheses, but rather have a stated goal to obtain data or determine an association.

Part 2: Research question

According to a report by PBS from July 2018, “Most Americans think that women should have legal access to abortion, and that opinion has shifted little since the U.S. Supreme Court ruled in favor of women’s reproductive rights more than four decades ago”.

This report goes on to cite results from Gallup’s annual values and beliefs poll, by comparing public opinion on abortion in 1975 (two years after the Supreme Court’s landmark ruling in the Roe vs. Wade case) to that of 2018 by stating that “In 1973, the U.S. Supreme Court ruled in Roe v. Wade that a woman had the right to abortion. Since then, public opinion hasn’t shifted much. Two years after the court’s decision, 54% of U.S. adults said they supported abortion under certain circumstances and another 21% said abortion always should be legal, according to Gallup polling from 1975, while 22% of Americans said it should be illegal. By 2018, Gallup pollsters found little change, with 50% of Americans supporting abortion under certain conditions, another 29% of respondents supporting abortion no matter what and 18% of respondents saying it should be against the law”. Further, according to the Scholars Strategy network: “In all three democracies (US, Canada and Britain), public opinion favors the right to an abortion in cases of rape or fetal abnormality or to protect a woman’s health, and is much less supportive when family size, poverty, or marital status are at issue”.

Keeping in mind the above notions on public opinion regarding a woman’s right to avail of an abortion under certain circumstances (where such circumstances include protection of a woman’s health) and the fact that public opinion favours the right to an abortion in cases of rape or fetal abnormality or to protect a woman’s health, and is much less supportive when family size, poverty, or marital status are at issue, my research question is divided into two parts, which are as follows:

  1. Has the American public’s opinion on the woman’s right to legally avail of an abortion under certain circumstances (such as protecting a woman’s health) changed in the last 40 years (1975 vs. 2012), since the Supreme Court’s landmark ruling on the Roe vs. Wade case?

  2. Has the American public’s opinion on the woman’s right to legally avail of an abortion given that the woman’s family cannot afford to have more children, changed in the last 40 years (1975 vs. 2012), since the Supreme Court’s landmark ruling on the Roe vs. Wade case?

Both parts of the research question will be answered by considering the inference technique for categorical data known as “estimating the difference between two proportions”. We have in this case, two types of categorical variables: an explanatory variable (also known as a grouping variable) and the response variable. The variable ‘year’ in the GSS data set is the grouping variable for both parts of the research question and the variable ‘abhlth’ is the response variable for part (a) of the research question and the variable ‘abpoor’ is the response variable for part (b) of the research question.

Given the scope of the GSS data under consideration (with data being available only till 2012) and the scope of the Gallup polling study, we’ll first subset the GSS data set to only include the years 1975 and 2012 and then consider an original features abhlth and abpoor.

The feature abhlth, which is a categorical variable, consisted of two levels - ‘Yes’ and ‘No’. This variable was coded in terms of these levels by asking respondents the following question: “Please tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if the woman’s own health is seriously endangered by the pregnancy?”.

The feature abpoor, which is a categorical variable, consisting of two levels - ‘Yes’ and ‘No’. This variable was coded in terms of these levels by asking respondents the following question: “Please tell me whether or not you think it should be possible for a pregnant woman to obtain legal abortion if the family has a very low income and cannot afford any more children?”.

The next step regarding the first part of the research question will be to calculate the proportion of respondents who are in favour of a woman legally availing of an abortion, on the grounds that the pregnancy seriously endangers her health and the proportion of respondents who are not in favour of a woman legally availing of an abortion, on the grounds that the pregnancy seriously endangers her health. These proportions will be calculated separately for the years 1975 and 2012. It must be noted that success in this context, is defined as being in favour of a woman legally availing of an abortion on the grounds of the pregnancy being detrimental to her health (the ‘yes’ level in the abhlth variable).

The next step regarding the second part of the research question will be to calculate the proportion of respondents who are in favour of a woman legally availing of an abortion, on the grounds that the woman’s family has a low income and cannot afford more children and the proportion of respondents who are not in favour of a woman legally availing of an abortion, on the grounds that the woman’s family has a low income and cannot afford more children. These proportions will be calculated separately for the years 1975 and 2012. It must be noted that success in this context, is defined as being in favour of a woman legally availing of an abortion on the grounds of her family being too poor to have more children(the ‘yes’ level in the abpoor variable).

Purpose and significance

It is evident that abortion is so much more than a social issue. For example, a news report from February, 2018 cited the latest findings of Advancing New Standards in Reproductive Health’s (ANSIRH) longitudinal “Turnaway Study,” which recruited participants from 30 abortion facilities across the United States for nearly 8,000 interviews between 2008 and 2015. Findings indicate that limiting women’s access to abortion increases their chances of poverty, unemployment, and dependence on public assistance programs. The research published in the American Journal of Public Health, found that those denied abortion access because they were too far along in a pregnancy were nearly four times as likely to be below the federal poverty level compared to those who received care.

It is also worth noting that while research organizations such as Gallup found that public opinion regarding abortion hasn’t changed significantly in the last 40 years or so, it has also been determined that there are big differences in people’s attitudes about abortion from state to state. States are hence, a critical battleground for abortion rights.

The purpose of this research is to not only verify whether the conclusions drawn from these statistical inference methodologies is consistent with the conclusions drawn from study by Gallup, but this research question can also lead to the examination of abortion attitudes by state. Examining such research questions are critical for policy decision making. To quote the National Women’s Law Centre: “States that are hostile to abortion (have 4 or more abortion restrictions) have a worse wage gap than states that aren’t hostile. In states that are not hostile to abortion women, on average, make 20 cents less for every dollar a man makes. But, in hostile states women make 23 cents less for every dollar a man makes. Put simply, economic security and reproductive justice go hand in hand. Women can’t have one without the other. Abortion restrictions, pay discrimination, unaffordable health care, lack of paid sick days, inaccessible childcare, unfair scheduling practices all make it harder for women to have the children they want, not have children, and parent their children they have in safe, healthy environments. Economic justice is deeply interconnected to all other forms of justice, including reproductive justice”.

Subsetting the data

The following code subsets the data to only include the years 1975 and 2012 and also selects the original features abhlth and abpoor to answer both subparts of the research question:

#Step 1: Selecting the relevant features from the GSS data set to answer the first sub-part of the research question

#The final data set will be named as Abortion.Views.Health

subset.data.1 <- gss %>% select(year, abhlth)

#Step 2: Selecting the relevant features from the GSS data set to answer the second sub-part of the research question

#The final data set will be named as Abortion.Views.Poor

subset.data.2 <- gss %>% select(year, abpoor)

#Step 3: Subset both the data sets so as to only include data only for the years 1975 and 2012
Abortion.Views.Health <-subset.data.1 %>% filter(year==1975 | year==2012) 
Abortion.Views.Poor <-subset.data.2 %>% filter(year==1975 | year==2012) 

#Step 4: Convert both categorical variables that have been imported as factors to character variables so as to be able to obtain the counts
#It is for this purpose that the varhandle library has been imported. We use the unfactor function
Abortion.Views.Health$abhlth <- unfactor(Abortion.Views.Health$abhlth)
Abortion.Views.Poor$abpoor <- unfactor(Abortion.Views.Poor$abpoor)

#The grouping variable year has been imported as a numeric variable and needs to be converted to a character variable in order to enable proper construction
#of the x-axis of the bar plot in the exploratory data analysis section of the project
Abortion.Views.Health$year <- as.character(Abortion.Views.Health$year)
Abortion.Views.Poor$year <- as.character(Abortion.Views.Poor$year)

#Step 5: Include only those rows in both data sets that do not have null values
Abortion.Views.Health <- Abortion.Views.Health[complete.cases(Abortion.Views.Health),]
Abortion.Views.Poor <- Abortion.Views.Poor[complete.cases(Abortion.Views.Poor),]

Part 3: Exploratory data analysis

EDA - Research Question Part A

The first step is to obtain a contingency table (also known as a cross-tabulation), in order to depict the frequency distribution of the two variables year and abhlth. In other words, this contingency table gives us the counts of the respondents who favour and do not favour abortion on the grounds that the pregnancy is endangering the woman’s health, as grouped by the two years 1975 and 2012.

The second step is to create a summary table depicting the proportion of respondents who favour and do not favour abortion as grouped by the two years 1975 and 2012.

#Create a frequency distribution table (a table of counts for the categorical variable abhlth by year)
#Using the library janitor and the tabyl function
tabyl(Abortion.Views.Health, year, abhlth) %>% 
  adorn_totals(c('row', 'col'))
##   year  No  Yes Total
##   1975 135 1314  1449
##   2012 146 1083  1229
##  Total 281 2397  2678
#Create a summary table depicting the proportion of respondents who favour and do not favour abortion as grouped by the two years 1975 and 2012
Abortion.Health.Views.Table <- table(Abortion.Views.Health$year, Abortion.Views.Health$abhlth)
#Get the row percentages
prop.table(Abortion.Health.Views.Table, 1)
##       
##               No       Yes
##   1975 0.0931677 0.9068323
##   2012 0.1187958 0.8812042

Each table gives a summary of the number of respondents in each combination of categories, both as counts in the first table and as proportions of the explanatory variable (grouping variable),year, in the second. The research question is primarily concerned with the relative amounts of respondents that are in favour of abortion in 2012 (given the pregnancy is endangering the woman’s health) vs. 1975 (two years after the landmark judgement in the Roe vs. Wade case).

From the above contingency table of frequency counts, we can calculate the proportion of respondents who are in favour of a woman obtaining an abortion legally on the grounds that the pregnancy is seriously endangering her health for the years 1975 and 2012:

\[\hat{p}_{1975health} = 1314/1449 = 0.907 \] \[\hat{p}_{2012health} = 1083/1229 = 0.881 \] It appears that the proportion of respondents in our sample data who are in favour of a woman availing of an abortion legally on the grounds that the pregnancy is seriously endangering her health, has declined by nearly 3%, in 2012 relative to 1975.

Before going into the mechanics of hypothesis testing and confidence interval construction, let’s first visualize the difference in proportions with a bar plot.

ggplot(Abortion.Views.Health, aes(x=year, fill=abhlth))+
  theme(text = element_text(size = 14)) +
  scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%")) +
  labs(x = 'Year', y='Proportion') +
  ggtitle('Proportion of Respondents who are in favour of abortion as pregnancy endangers health')+
  geom_bar(position='fill', color='black')

The bar chart depicts a marginal difference of 3% in the proportion of adults who favour abortion in 2012 versus the the proportion of adults who favour abortion in 1975 (on the grounds that the pregnancy is endangering the woman’s health), based on GSS survey data. But the question remains. Is this difference statistically significant? Is this data contradicting the findings of the study by Gallup? The analysis will proceed with hypothesis testing to determine the statistical significance of this possible association.

EDA - Research Question Part B

The first step is to obtain a contingency table (also known as a cross-tabulation), in order to depict the frequency distribution of the two variables year and abpoor. In other words, this contingency table gives us the counts of the respondents who favour and do not favour abortion on the grounds that the woman’s family is too poor to have more children, as grouped by the two years 1975 and 2012.

The second step is to create a summary table, depicting the proportion of respondents who favour and do not favour abortion as grouped by the two years 1975 and 2012.

#Create a frequency distribution table (a table of counts for the categorical variable abhlth by year)
#Using the library janitor and the tabyl function
tabyl(Abortion.Views.Poor, year, abpoor) %>% 
  adorn_totals(c('row', 'col'))
##   year   No  Yes Total
##   1975  663  753  1416
##   2012  691  549  1240
##  Total 1354 1302  2656
#Create a summary table depicting the proportion of respondents who favour and do not favour abortion as grouped by the two years 1975 and 2012
Abortion.Poor.Views.Table <- table(Abortion.Views.Poor$year, Abortion.Views.Poor$abpoor)
#Get the row percentages
prop.table(Abortion.Poor.Views.Table, 1)
##       
##               No       Yes
##   1975 0.4682203 0.5317797
##   2012 0.5572581 0.4427419

Each table gives a summary of the number of respondents in each combination of categories, both as counts in the first table, and as proportions of the explanatory variable (grouping variable) year, in the second. The research question is primarily concerned with the relative amounts of respondents that are in favour of abortion in 2012 (given that the woman’s family is too poor to have more children) vs. 1975 (two years after the landmark judgement in the Roe vs. Wade case).

From the above contingency table of frequency counts, we can calculate the proportion of respondents who are in favour of a woman obtaining an abortion legally on the grounds that the woman’s family is too poor to have more children for the years 1975 and 2012:

\[\hat{p}_{1975poor} = 753/1416 = 0.531 \]

\[\hat{p}_{2012poor} = 549/1240 = 0.442 \]

It appears that the proportion of respondents in our sample data who are in favour of a woman availing of an abortion legally on the grounds that the woman’s family is too poor to have more children, has declined by nearly 9%, in 2012 relative to 1975.

It is worth noting that comparing the calculated proportions of success from both sub-parts of the research question does seem to reflect the following statement:“In all three democracies (US, Canada and Britain), public opinion favors the right to an abortion in cases of rape or fetal abnormality or to protect a woman’s health, and is much less supportive when family size, poverty, or marital status are at issue”.

Before going into the mechanics of hypothesis testing and confidence interval construction, let’s first visualize the difference in proportions with a bar plot.

ggplot(Abortion.Views.Poor, aes(x=year, fill=abpoor))+
  theme(text = element_text(size = 14)) +
  scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%")) +
  labs(x = 'Year', y='Proportion') +
  ggtitle('Proportion of Respondents who are in favour of abortion as the family is too poor to have more children')+
  geom_bar(position='fill', color='black')

The bar chart depicts a large difference of 9% in the proportion of adults who favour abortion in 2012 versus the the proportion of adults who favour abortion in 1975 (on the grounds that the woman’s family is too poor to have more children), based on GSS survey data. But once again, the question remains. Is this difference statistically significant? Is this data contradicting the findings of the study by Gallup? The analysis will proceed with hypothesis testing to determine the statistical significance of this possible association.

Part 4: Inference

Inference - Research Question Part A and Part B

Difference Between Two Proportions

Checking if the data meets the required conditions to conduct the appropriate statistical inference test

The next step for both part (a) and part (b) of the research question, is to evaluate the data for the necessary conditions for a valid inferential analysis.

The proper method for conducting an inferential analysis for two categorical variables with two levels each, is a two-sample z-test for population proportions.

In the context of part (a) of the research question, the two-sample z-test for proportions determines the statistical significance of the difference in the proportions of adults in 1975 favouring abortion on the grounds that the pregnancy endangers the woman’s health vs. the proportions of adults in 2012 favouring abortion on the same grounds.

In the context of part (b) of the research question, the two-sample z-test for proportions determines the statistical significance of the difference in the proportions of adults in 1975 favouring abortion on the grounds that the woman’s family is too poor to have any more children vs. the proportions of adults in 2012 favouring abortion on the same grounds.

In the context of part (a) of the research question, the test works by assuming that the proportions present in the data reflect the actual proportion of US adults in 2012 and 1975, who were in favour of abortion and not in favour aborion, on the grounds that the pregnancy endangers the woman’s health.

The z-test proceeds by calculating whether the differences in the sample proportions in the data could have arisen by chance if the proportions with each combination of characteristics are actually equal to each other in the population i.e. the public opinion on abortion (either being in favour or not in favour) on the grounds that the pregnancy is endangering the woman’s health, has not changed over the years.

In the context of part (b) of the research question, the test works by assuming that the proportions present in the data reflect the actual proportion of US adults in 2012 and 1975, who were in favour of abortion and not in favour aborion, on the grounds that the woman’s family is too poor to have more children.

The z-test proceeds by calculating whether the differences in the sample proportions in the data could have arisen by chance if the proportions with each combination of characteristic are actually equal to each other in the population i.e. the public opinion on abortion (either being in favour or not in favour) on the grounds that the woman’s family is too poor to have more children, has not changed over the years.

The first condition needed for a valid two-sample z-test is that the data must represent random samples or more specifically, independent, identically distributed (IID) variables. Each observation in the data set represents a single, unique adult and the knowledge of the sampling procedures suggests that each sample is independent of the other. In addition, the survey population of all US adults is at least 10-20 times larger than the sample size. These assumptions ensure that each adult in the data represents IID samples.

The second condition needed for a valid two-sample z-test is that the sampling distribution of the proportions under consideration in both sample populations must be normal. For proportions, the rule that satisfies this condition is that the number of successes and the number of failures should be each at least 10 in each of the samples.

In the case of part (a) of the research question, the observed number of successes are 1,314 and 1,083 adults and the observed number of failures are 135 and 146 adults, hence, meeting this requirement.

In the case of part (b) of the research question, the observed number of successes are 753 and 549 adults and the observed number of failures are 663 and 691 adults, hence, meeting this requirement.

The exploratory data analysis for both parts of the research question, did identify a difference in public opinion on abortion, with the passage of time. The two sample z-test for proportions will be two-sided to address both parts of the research question namely determining any difference between the two proportions.

Having satisfied the above conditions, we can continue with our analysis and assume that the sampling distribution of the difference between the proportions follows a normal distribution.

Confidence Interval

We’ll use a 95% confidence level to create the confidence interval for the difference in our two proportions. Our point estimate is simply the difference in the sample proportions.

In the context of research question part (a) this is:

\[\hat{p}_{2012health} - \hat{p}_{1975health} = -0.026\]

In the context of research question part (b) this is:

\[\hat{p}_{2012poor} - \hat{p}_{1975poor} = -0.089\]

The standard error for part (a) of the research question can be calculated using the equation below:

\[SE = \sqrt {\frac {0.881(1-0.88)}{1229} + \frac {0.906(1-0.906)}{1449}} = 0.012\]

The standard error for part (b) of the research question can be calculated using the equation below:

\[SE = \sqrt {\frac {0.442(1-0.442)}{1240} + \frac {0.531(1-0.531)}{1416}} = 0.019\]

Please note that the above numbers are approximate

The CI at the 95% confidence level is then calculated using our point estimate, SE, and the critical value for the 95% confidence level (1.96).

For part (a) of the research question, the 95% confidence interval is constructed as follows:

\[point~estimate \pm z^\star \times SE\]

\[-0.026 \pm 1.96 \times 0.012 = (-0.0248,-0.049)\]

For part (b) of the research question, the 95% confidence interval is constructed as follows:

\[point~estimate \pm z^\star \times SE\]

\[-0.089 \pm 1.96 \times 0.019 = (-0.0516,-0.12624)\]

Using the inference() function of the statsr package we see that, aside from some rounding error, our calculated result is in aggreement with the inference() function output for both parts of the research question:

Codes and output from the inference function - part (a) of the research question:

inference(y = abhlth,
          x = year,
          order = c(2012, 1975),
          data=Abortion.Views.Health,
          conf_level = 0.95,
          statistic = "proportion",
          type = "ci",
          method = "theoretical",
          success = "Yes",
          show_eda_plot = FALSE)
## Response variable: categorical (2 levels, success: Yes)
## Explanatory variable: categorical (2 levels) 
## n_2012 = 1229, p_hat_2012 = 0.8812
## n_1975 = 1449, p_hat_1975 = 0.9068
## 95% CI (2012 - 1975): (-0.0491 , -0.0022)

Codes and output from the inference function - part (b) of the research question:

inference(y = abpoor,
          x = year,
          order = c(2012, 1975),
          data=Abortion.Views.Poor,
          conf_level = 0.95,
          statistic = "proportion",
          type = "ci",
          method = "theoretical",
          success = "Yes",
          show_eda_plot = FALSE)
## Response variable: categorical (2 levels, success: Yes)
## Explanatory variable: categorical (2 levels) 
## n_2012 = 1240, p_hat_2012 = 0.4427
## n_1975 = 1416, p_hat_1975 = 0.5318
## 95% CI (2012 - 1975): (-0.127 , -0.0511)

Confidence Interval- Conclusion

Specifically, in terms of part (a) of the research question, we are 95% confident that in 2012, there was a 0.2% to 4.9% decline in the number of US adults who support abortion (on the grounds that the pregnancy seriously endangers the woman’s health), relative to 1975.

Specifically, in terms of part (b) of the research question, we are 95% confident that in 2012, there was a 5.1% to 12.7% decline in the number of US adults who support abortion (on the grounds that the woman’s family is too poor to have more children), relative to 1975.

The difference in proportions seems to be even stronger with regard to public views on favouring abortion on the grounds that the woman’s family is too poor to have more children (part b of the research question), with there being less public sympathy towards supporting abortion on that ground in 2012, relative to 1975.

Hypothesis Test

As we did with the confidence interval analysis, we need to meet certain conditions in order to use the normal model for the hypothesis test.

The independence requirements are the same and we’ve already demonstrated that those conditions have been met.

There is a slight difference in how the success/failure counts are calculated for a hypothesis test when the null hypothesis states that the two proportions are equal. In this case, we use the pooled proportion to calculate success/failure counts for both samples.

As we’ll see, the pooled proportion is also used for calculating the standard error.

The pooled proportion is calculated by adding up all ‘successes’ and dividing by the total sample sizes.

In the case of part (a) of the research question:

\[{n}_{1975} = 1449\]

Observed Number of successes for 1975 (the ‘yes’ level of abhlth) = 1314

\[{n}_{2012} = 1229\]

Observed Number of successes for 2012 (the ‘yes’ level of abhlth) = 1083

\[\hat{p}_{pooled(health)} = {\frac{1083 + 1314}{1229 + 1449}}= 0.89\]

Next for part (a) of the research question, we’ll demonstrate the calculation of success and failure counts using the pooled proportion for the year 2012. We first calculate the number of successes for the year 2012:

\[\hat{p}_{pooled(health)}n_{2012} = 0.89(1229) = 1094 \] We next calculate the number of failures for the year 2012:

\[(1-\hat{p}_{pooled(health)})n_{2012} = 0.11(1229) = 135 \]

We also demonstrate the calculation of success and failure counts using the pooled proportion for the year 1975. We first calculate the number of successes for the year 1975:

\[\hat{p}_{pooled(health)}n_{1975} = 0.89(1449) = 1290 \]

We next calculate the number of failures for the year 1975:

\[(1-\hat{p}_{pooled(health)})n_{1975} = 0.11(1449) = 159 \]

In the case of part (b) of the research question:

\[{n}_{1975} = 1416\]

Observed Number of successes for 1975 (the ‘yes’ level of abpoor) = 753

\[{n}_{2012} = 1240\]

Observed Number of successes for 2012 (the ‘yes’ level of abpoor) = 549

\[\hat{p}_{pooled(poor)} = {\frac{549 + 753}{1240 + 1416}}= 0.49\] Next for part (b) of the research question, we’ll demonstrate the calculation of success and failure counts using the pooled proportion for the year 2012.

We first calculate the number of successes for the year 2012:

\[\hat{p}_{pooled(poor)}n_{2012} = 0.49(1240) = 608 \]

We next calculate the number of failures for the year 2012:

\[(1-\hat{p}_{pooled(poor)})n_{2012} = 0.51(1240) = 632 \]

We also demonstrate the calculation of success and failure counts using the pooled proportion for the year 1975.

We first calculate the number of successes for the year 1975:

\[\hat{p}_{pooled(poor)}n_{1975} = 0.49(1416) = 694\]

We next calculate the number of failures for the year 1975:

\[(1-\hat{p}_{pooled(poor)})n_{1975} = 0.51(1416) = 722\]

Note: Success = Favouring abortion (the ‘Yes’ level)

Note: Failure = Opposing abortion (the ‘No’ level)

We see that we’ve met the success/failure conditions for both the years 1975 and 2012.

Having satisfied the conditions, we can move ahead with our hypothesis test.

The first thing we need to do is to establish our null and alternative hypotheses. Our null hypothesis is that there is no difference in the proportions while the alternative hypothesis states that there is a difference in the proportions. The alternative hypothesis will not be stipulating whether the proportion increased or decreased in 2012. We’ll be conducting a two-tailed test.

In the context of part (a) of the research question:

\[H_{0}:{p}_{2012(health)} - {p}_{1975(health)} = 0\] \[H_{a}:{p}_{2012(health)} - {p}_{1975(health)} \neq 0\]

In the context of part (b) of the research question:

\[H_{0}:{p}_{2012(poor)} - {p}_{1975(poor)} = 0\]

\[H_{a}:{p}_{2012(poor)} - {p}_{1975(poor)} \neq 0\]

We use the equation below to calculate the test statistic for our hypothesis test.

\[{Z} = {\frac{point~estimate - null~value}{SE}}\]

As alluded to earlier, the calculation of the standard error for a hypothesis test is slightly different from that of a confidence interval, in the sense that the pooled proportion will be used. When you’re dealing with a hypothesis test, use expected counts and expected proportions. Calculating the expected successes and failures or the expected proportion for the hypothesis test, in the case of the inference test for the difference between two proportions is not simple.

We simply say in the null hypothesis that the two population proportions should be equal to each other or that their difference should be equal to 0. But at no point do we define what this should be equal to. So we don’t have a readily available null value. All these aspects arise, as we are conducting a hypothesis test, based on the fact that the null hypothesis is true.

The standard error then becomes (in the context of part(a) of the research question):

\[SE = \sqrt {\frac {0.89(1-0.89)}{1229} + \frac {0.89(1-0.89)}{1449}} = 0.012\]

The standard error then becomes (in the context of part(b) of the research question):

\[SE = \sqrt {\frac {0.49(1-0.49)}{1240} + \frac {0.49(1-0.49)}{1416}} = 0.019\]

Finally our test statistic is calculated below.

For research question part (a), our z statistic (named as Zhealth - to distinguish this from what will be calculated for part (b)) is as follows:

\[{Z}_{health} = {\frac{-0.026 - 0}{0.012}} = | -2.167| = 2.167\]

We are taking the absolute value of the above calculated test statistic

Similarly, for research question part (b), our z statistic (named as Zpoor - to distinguish this from what was calculated for part (a)) is as follows:

\[{Z}_{poor} = {\frac{-0.089 - 0}{0.019}} = | -4.684| = 4.684\]

We are taking the absolute value of the above calculated test statistic

We can now calculate the p-value from our Z-score.

For part (a) of the research question:

#calculation of p-value for two-tailed test
2*pnorm(2.167, lower.tail = FALSE)
## [1] 0.03023485
#We can also calculate the relative risk of phat1 to phat2
#phat1 is defined as the proportion of respondents in 2012 who are in favour of abortion on the grounds that the pregnancy is endangering the woman's health
#phat2 is defined as the proportion of respondents in 1975 who are in favour of abortion on the grounds that the pregnancy is endangering the woman's health

phat1 <- 0.881
  
phat2 <- 0.907

relative.risk.ratio <- phat1/phat2

relative.risk.ratio
## [1] 0.9713341

p-value interpretation (research question part(a)):

Based upon the level of significance employed for this two-tailed hypothesis test, there is less than a 5% probability that the difference in the proportion of US adults favouring abortion in 2012, relative to 1975 (on the grounds that the pregnancy seriously endangers the woman’s health), is simply due to chance or sampling variability, given that the null hypothesis (of abortion views being unchanged over time) is true.

For part (b) of the research question:

#calculation of p-value for two-tailed test
2*pnorm(4.684, lower.tail = FALSE)
## [1] 2.8133e-06
#We can also calculate the relative risk of phat1 to phat2
#phat1 is defined as the proportion of respondents in 2012 who are in favour of abortion on the grounds that the pregnancy is endangering the woman's health
#phat2 is defined as the proportion of respondents in 1975 who are in favour of abortion on the grounds that the pregnancy is endangering the woman's health

phat1 <- 0.442
  
phat2 <- 0.531

relative.risk.ratio <- phat1/phat2

relative.risk.ratio
## [1] 0.8323917

p-value interpretation (research question part(b)):

Based upon the level of significance employed for this two-tailed hypothesis test, there is less than a 5% probability that the difference in the proportion of US adults favouring abortion in 2012, relative to 1975 (on the grounds that the woman’s family is too poor to have any more children), is simply due to chance or sampling variability, given that the null hypothesis (of abortion views being unchanged over time) is true.

Hypothesis Test- Conclusion

With p<0.05, we can reject the null hypothesis for both parts of the research question and state that we are 95% confident that there is a difference in the proportion of US adults being in favour of abortion (on the grounds of either the pregnancy endangering the woman’s health or on the grounds that the woman’s family is too poor to have more children) in 2012, relative to 1975. The data does seem to provide convincing evidence in favour of the alternative hypothesis that public opinon on abortion has changed in 2012, relative to 1975.

Based on the data, there is convincing evidence that the proportion of US adults in support of abortion (on grounds stated previously) declined in 2012, relative to 1975.

The confidence interval result and the hypothesis test are in agreement in that the 95% confidence interval did not contain 0 and the hypothesis test rejected the null hypothesis which stated that the differnce in proportions was 0.

Finally, rather than going through the above calculations, we can run a hypothesis test in one chunk of code with the inference() function. The output also provides a visualization of the shaded regions of the normal curve corresponding to the p-value for our hypothesis test for both parts of the research question.

For part (a) of the research question:

#hypothesis test with inference function
inference(y = abhlth,
          x = year,
          order=c(2012,1975),
          data=Abortion.Views.Health,
          statistic = "proportion",
          type = "ht",
          method = "theoretical",
          success = "Yes",
          null=0,
          alternative = 'twosided')
## Response variable: categorical (2 levels, success: Yes)
## Explanatory variable: categorical (2 levels) 
## n_2012 = 1229, p_hat_2012 = 0.8812
## n_1975 = 1449, p_hat_1975 = 0.9068
## H0: p_2012 =  p_1975
## HA: p_2012 != p_1975
## z = -2.1565
## p_value = 0.031

For part (b) of the research question:

#hypothesis test with inference function
inference(y = abpoor,
          x = year,
          order=c(2012,1975),
          data=Abortion.Views.Poor,
          statistic = "proportion",
          type = "ht",
          method = "theoretical",
          success = "Yes",
          null=0,
          alternative = 'twosided')
## Response variable: categorical (2 levels, success: Yes)
## Explanatory variable: categorical (2 levels) 
## n_2012 = 1240, p_hat_2012 = 0.4427
## n_1975 = 1416, p_hat_1975 = 0.5318
## H0: p_2012 =  p_1975
## HA: p_2012 != p_1975
## z = -4.5795
## p_value = < 0.0001

Conclusions

Research Question part(a): Has the American public’s opinion on the woman’s right to legally avail of an abortion under certain circumstances (such as protecting a woman’s health) changed in the last 40 years (1975 vs. 2012), since the Supreme Court’s landmark ruling on the Roe vs. Wade case?

Research Question part(b): Has the American public’s opinion on the woman’s right to legally avail of an abortion given that the woman’s family cannot afford to have more children, changed in the last 40 years (1975 vs. 2012), since the Supreme Court’s landmark ruling on the Roe vs. Wade case?

Yes, in the context of both parts of the research question, a statistically significant difference in proportions exists between US adults in 1975 who are in favour of abortion (on the grounds that either the pregnancy is endangering the woman’s health or the woman’s family is too poor to have more children) vs. US adults who are in favour of abortion (on the same grounds) in 2012, thereby signifying that the data does seem to present convincing evidence that abortion views have changed with the passage of time.

Specially in the case of part (a) of the research question, there are 3% less US adults in 2012 who are likely to say yes to abortion on the grounds that the pregnancy is seriously endangering the woman’s health, relative to 1975, based on the GSS data.

Moreover, US adults in 2012 are only 0.97 times likely to vote in favour of abortion (on the grounds that the pregancy endangers the woman’s health), relative to 1975.

Also, in the case of part (b) of the research question, there are 9% less US adults in 2012 who are likely to say yes to abortion on the grounds that the woman’s family is too poor to have more children, relative to 1975, based on the GSS data.

Moreover, US adults in 2012 are only 0.83 times likely to vote in favour of abortion (on the grounds that the woman’s family is too poor to have more children), relative to 1975.

The main conclusion that can be drawn from this research is the fact that the findings from these statistical inference techniques seem to contradict the findings of the study by Gallup - namely that public opinion on abortion has remained unchanged over the years.

It is worth noting that this research question on abortion views confined itself to data for the years 1975 and 2012. This is a constraint as the study by Gallup, compared public opinion on abortion for the years 1975 and 2018. By 2014, all provisions of the Affordable Care Act were in place and it will be interesting to extend both parts of the research question to compare public views on abortion by considering data for the years 1975 and 2018, instead. This aspect is particularly important as the Affordable Care Act broadened the health coverage for women, when it came to availing of an abortion on the grounds that the pregnancy is seriously endangering the woman’s health.

Also, a caveat as far as the GSS data is concerned, is the lack of information concerning the US State to which the respondent belongs too. It has been reported by PBS that while abortion views have remained unchanged with the passage of time, public opinion on abortion differs largely by state.

Further randomized studies should assess the extent to which opinion on abortion is affected by the US state in which the respondent lives in. These studies should control for the possible confounding variables present in the GSS data.

References

https://www.pbs.org/newshour/health/how-has-public-opinion-about-abortion-changed-since-roe-v-wade

http://gss.norc.org/DOCUMENTS/CODEBOOK/A.pdf

https://en.wikipedia.org/wiki/General_Social_Survey

https://scholars.org/brief/why-abortion-controversies-are-so-central-us-politics

https://www.nsf.gov/pubs/2007/nsf0748/nsf0748_3.pdf

https://www.kff.org/womens-health-policy/issue-brief/coverage-for-abortion-services-and-the-aca/

https://rewire.news/article/2018/02/07/economic-impact-denying-abortion-care-may-bigger-think/