Due by 11:00pm on 9/30, submitted through Canvas

In this activity, you will analyze data from the 2022 wave of the Supreme Court Public Opinion Project, A.K.A. SCOTUSpoll, (research conducted by Jessee, Malhotra and Sen) which is a survey asking ordinary Americans for their views about the Supreme Court as well as their views on specific high profile cases that the Court hears each term. See more here.

Since this is a nationally representative survey of American adults, each observation is a survey respondent (an individual person who took the survey).

The variables we will use in the dataset are:

Note that the dataset contains weights but we will not use them in any of our calculations here.

You should begin by downloading both the dataset and the Lab 3 R Markdown (.Rmd) template to your computer, saving them in the same folder. Then double-click the .Rmd template file to start RStudio.

Question 1: Loading and Exploring the Dataset

Load the dataset. Because it is in .RData format (R’s native data format) you will just use the command load("2022SCOTUSpoll.RData") which will put the dataset into R’s workspace as an object named SCOTUSpoll. You don’t have to assign it to a name yourself (i.e. you don’t have to use <- as you would with read.csv or other commands) but R will just load it automatically under the name of the object it was saved from.

You should then attach this object so that R will know to look in this dataset whenever you reference a variable name.

Finally, have R print out all the variable names in the dataset using the names command.

load("2022SCOTUSpoll.RData")
attach(SCOTUSpoll)
names(SCOTUSpoll)
## [1] "pid7"                 "idealg"               "idealg_courtmajority"
## [4] "perception"           "roe_per"              "gender"

Question 2: Recoding Party ID Variable

Make a table of the variable pid7. Note that the value of 8 corresponds to a missing value. If we do calculations with this variable, R will inappropriately treat 8s like a real value rather than missing data. So we’d like to fix that.

Using the recode command in the car library, create a new variable called pid7.new which is the same as the original variable except that it recodes values of 8 to NA but keeps values of 1 through 7 as they were in the original variable.

(Hint: Remember that you will need to install the car package if you haven’t already done this. To do this type install.packages("car") in the console – this is a rare time when you should not do this in your code but actually should do it in the terminal in the bottom left after the > prompt. This is because you will only need to install the package once and then it will be on your computer. Then each new R session you want to use the package you can type library(car) – this command should go in your source code as usual – to load it into your workspace.) Then make a table of the original variable against the new one adding the option exclude=NULL to show NA values in the table in order to make sure the recoding worked as planned.

Finally, calculate the mean and make a histogram of the new party ID variable pid7.new and comment briefly on what you see. (Note: the hist command might choose odd bin divisions but the bars should represent the frequency of values 1, 2, …, 7. You don’t need to worry about making the histogram look pretty here.)

table(pid7)
## pid7
##   1   2   3   4   5   6   7   8 
## 469 196 170 465 183 193 368  86
library(car)
## Loading required package: carData
pid7.new <- recode(pid7, 
                     "1='Strong Democrat'; 2='Weak Democrat'; 
                     3='Independent leaning Democrat'; 4='Independent'; 
                     5='Independent leaning Republican'; 6='Weak Republican'; 
                     7='Strong Republican'; 8=NA",
                   as.factor = TRUE)
table(pid7.new, pid7)
##                                 pid7
## pid7.new                           1   2   3   4   5   6   7   8
##   Independent                      0   0   0 465   0   0   0   0
##   Independent leaning Democrat     0   0 170   0   0   0   0   0
##   Independent leaning Republican   0   0   0   0 183   0   0   0
##   Strong Democrat                469   0   0   0   0   0   0   0
##   Strong Republican                0   0   0   0   0   0 368   0
##   Weak Democrat                    0 196   0   0   0   0   0   0
##   Weak Republican                  0   0   0   0   0 193   0   0
pid7.new <- as.numeric(pid7.new)
hist(pid7.new)

mean(pid7.new, na.rm = TRUE)
## [1] 3.716732

The frequency is in the hundreds. The distribution appears to be bimodal.From this histogram we can infer that Americans are primarily Strongly Democrat or Independent.

Question 3: Examining Association Between Variables

The variable idealg gives the estimated ideological position of each survey respondent based on their views on each of the Supreme Court cases asked in the survey. idealg is constructed by using an “ideal point model” to uncover the structure of this ideological dimension underlying people’s views (obviously this type of model is beyond the scope of this course so you should simply take this variable as a measure of each person’s ideological position). Higher values represent more conservative ideologies and lower values represent more liberal ones. The mean of this variable is 0 and the standard deviation is 1.

Make a histogram of the variable idealg and then make a boxplot of idealg (on the vertical axis) against pid7.new. Briefly comment on what you see.

hist(idealg)

boxplot(idealg,pid7.new)

Looking at the boxpolot we can draw a similar conclusion to the histogram of pid7.new: The majority of Americans who responded to this survey identify as either Strongly Democrat or Independent.

Question 4: Estimating a mean

The variable perception gives an estimate of each respondent’s perception of the ideological position of the Supreme Court on the same ideological scale as respondents’ own estimated ideologies (so smaller values mean a respondent perceives the Court to be more liberal while larger values mean they perceive the Court to be more conservative).

First, estimate the average perception of the Supreme Court’s ideology (that is, calculate the sample mean).

Then, construct a 95% confidence interval for this mean in two ways: (1) calculating it yourself based on the mean and standard deviation of this variable (which you can use the mean and sd functions to calculate) and (2) using the t.test function. (Note: these two ways might give very slightly different values, but should be identical to at least a couple decimal places. Also note there are no missing values in this variable so length(perception) will give you the sample size \(N\).)

mean(perception)
## [1] 0.1341852
sd(perception)
## [1] 0.7525446
length(perception)
## [1] 2130
sum(perception)
## [1] 285.8144
mean(perception) - 1.96*sd(perception)/sqrt(length(perception))
## [1] 0.1022258
mean(perception) + 1.96*sd(perception)/sqrt(length(perception))
## [1] 0.1661446
t.test(perception)
## 
##  One Sample t-test
## 
## data:  perception
## t = 8.2293, df = 2129, p-value = 3.241e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.1022082 0.1661622
## sample estimates:
## mean of x 
## 0.1341852

Question 5: Putting the Court’s Position in Perspective

The variable idealg_courtmajority gives the estimated actual position of the Court based on its rulings during the 2020-2021 term. In other words, this is an estimate of the Court’s position rather than ordinary American’s perceptions of the Court’s position. (It’s a little odd because this variable has the exact same value for every respondent since it’s not the respondents’ perceptions of the Court’s position, but the estimated actual position of the Court on this ideological dimension).

Make a boxplot of the variable idealg (on the vertical axis) against pid7.new (on the horizontal axis) as you did above and then add a horizontal line at the estimate of the actual position of the Court using the command abline(h=idealg_courtmajority). (This will actually add a horizontal line for each respondent for their value of idealg_courtmajority, but since this variable is the same for every respondent it will look like one horizontal line.)

Next, make a boxplot of the variable perception (on the vertical axis) against pid7.new (on the horizontal axis) and then add a horizontal line at the estimate of the actual position of the Court using the command abline(h=idealg_courtmajority).

Then comment briefly about what the first boxplot tells you about how the Court’s ideological position compares to the positions of ordinary Americans (specifically how it compares to the positions of Democrats, independents and Republicans) using the first boxplot, and also what the second boxplot tells you about how the Court’s actual ideological position compares to respondents’ perceptions of the Court’s position (and specifically how it compares to the perceptions of the Court held by Democrats, independents and Republicans).

boxplot(pid7.new, idealg)
abline(h=idealg_courtmajority)

boxplot(pid7.new, perception)
abline(h=idealg_courtmajority)

mean(idealg_courtmajority)
## [1] 0.6145682
mean(idealg)
## [1] -0.01189128
mean(perception)
## [1] 0.1341852

The first boxplot shows that the respondents actual expected ideaology is more conservative than the courts actual ideaology. This is confrimed by the vertical line and the mean of the courts actual position versus the respondanets expected position. The second box plot shows that the perception of the court is also lower than the courts actual position aka the perception is more liberal than the actaul position which is more conservative. It also shows that the Democrats, Independents, and Republicans all tend to view the court as more conservative than their own expected ideaology however it is skwed more towards that number aka respondents percieve the court as more conservative but slighly skewed towards their own idealogy.

Question 6: Estimating a proportion

The variable roe_per gives respondents’ views on the following question: “Should the Supreme Court overrule Roe v. Wade, the 1973 decision that established a constitutional right to abortion and prohibited states from banning abortion before the fetus can survive outside the womb, at around 23 weeks of pregnancy?” where the value of 1 indicates a response of “Yes, Roe v. Wade should be overturned” and a value of 2 indicates a response of “No, Roe v. Wade should NOT be overturned”.

First, create a new variable called roe_per.new that is 1 for the “Yes” response and 0 for the “No” response. (Hint: you can use the recode function as above but you can do this even more easily by making the new variable 2 minus the old variable)

Next, make a table of this new variable.

Finally, calculate the proportion of respondents giving the “CAN” response and calculate a 95% confidence interval for this proportion separately in two ways: (1) calculating it yourself using the sample proportion and (2) using the prop.test function. (Hint: These two ways should give nearly identical answers but they might after the first couple decimal places. Also note there are no missing values in this variable so length(schoolspeech_per.new) will give you the sample size N.)

Briefly comment on what you learned from this.

roe_per.new <- 2 - roe_per
table(roe_per.new)
## roe_per.new
##    0    1 
## 1318  812
mean(roe_per.new) - 1.96*sd(roe_per.new)/sqrt(length(roe_per.new))
## [1] 0.3605895
mean(roe_per.new) + 1.96*sd(roe_per.new)/sqrt(length(roe_per.new))
## [1] 0.4018519
prop.test(812, 812+1318)
## 
##  1-sample proportions test with continuity correction
## 
## data:  812 out of 812 + 1318, null probability 0.5
## X-squared = 119.73, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.3605941 0.4022797
## sample estimates:
##         p 
## 0.3812207

From this data we learned that we can be 95% confident that a slight majorityh of ameircans voted against overturning roe v wade. We can also conclude that a majority of the respondants are also ideaologically leaning more democratic or independent than Republican because our value is under 0.5.

Question 7: Estimating proportions separately for men and women

Make a table of roe_per.new against gender by typing table(roe_per.new, gender).

Next calculate sample proportions and 95% confidence intervals for the proportions of men and women (separately) who supported overturning Roe v. Wade. You can just do this using the prop.test and don’t have to calculate it manually. Then comment briefly on what you learn from this about possible differences in support for overturning Roe by gender, including both estimates and confidence intervals.

table(roe_per.new, gender)
##            gender
## roe_per.new   1   2
##           0 595 723
##           1 425 387
prop.test(425, 425+595)
## 
##  1-sample proportions test with continuity correction
## 
## data:  425 out of 425 + 595, null probability 0.5
## X-squared = 28.001, df = 1, p-value = 1.213e-07
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.3862960 0.4476728
## sample estimates:
##         p 
## 0.4166667
prop.test(387, 387+723)
## 
##  1-sample proportions test with continuity correction
## 
## data:  387 out of 387 + 723, null probability 0.5
## X-squared = 101.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.3207394 0.3776186
## sample estimates:
##         p 
## 0.3486486

The sample estimate for males who support overturning roe v wade is 0.4167 and the 95% confidence interval is (0.3862960, 0.4476728). The sample estimate for females is 0.3486 and confidence interval (0.3207394, 0.3776186). From this data we can conclude that more males are in favor of overturning the ruling than females. We can also use this data to conclude that Females on average from this sample are more ideaologically liberal than males who are slightly more conservative.