In this activity, you will analyze data from the 2022 wave of the Supreme Court Public Opinion Project, A.K.A. SCOTUSpoll, (research conducted by Jessee, Malhotra and Sen) which is a survey asking ordinary Americans for their views about the Supreme Court as well as their views on specific high profile cases that the Court hears each term. See more here.
Since this is a nationally representative survey of American adults, each observation is a survey respondent (an individual person who took the survey).
The variables we will use in the dataset are:
pid7 respondents’ self-identified party affiliations
(also called party ID). 1=“Strong Democrat”, 2=“Weak Democrat”,
3=“Independent leaning Democrat”, 4=“Independent”, 5=“Independent
leaning Republican”, 6=“Weak Republican”, 7=“Strong Republican”, 8=no
responseidealg estimated ideology of respondents based on their
views on Supreme Court cases. Lower (higher) values indicate more
liberal (conservative) ideological positionsperception estimate of respondents’ perception of the
Supreme Court’s ideology on the same scale as the variable
idealgidealg_courtmajority gives the estimated actual
position of the Court based on its rulings during the 2020-2021 term
(note: this variable has the same value for all respondents in the
survey)roe_per respondents’ views on whether the Supreme Court
should overturn the decision Roe v. Wade which created a constitutional
right to an abortiongender respondents’ stated gender, 1=male and
2=femaleNote that the dataset contains weights but we will not use them in any of our calculations here.
You should begin by downloading both the dataset and the Lab 3 R
Markdown (.Rmd) template to your computer, saving them in
the same folder. Then double-click the .Rmd template file
to start RStudio.
Load the dataset. Because it is in .RData format (R’s
native data format) you will just use the command
load("2022SCOTUSpoll.RData") which will put the dataset
into R’s workspace as an object named SCOTUSpoll. You don’t
have to assign it to a name yourself (i.e. you don’t have to use
<- as you would with read.csv or other
commands) but R will just load it automatically under the name of the
object it was saved from.
You should then attach this object so that R will know to look in this dataset whenever you reference a variable name.
Finally, have R print out all the variable names in the dataset using
the names command.
load("2022SCOTUSpoll.RData")
attach(SCOTUSpoll)
names(SCOTUSpoll)
## [1] "pid7" "idealg" "idealg_courtmajority"
## [4] "perception" "roe_per" "gender"
Make a table of the variable pid7. Note that the value
of 8 corresponds to a missing value. If we do calculations with this
variable, R will inappropriately treat 8s like a real value rather than
missing data. So we’d like to fix that.
Using the recode command in the car
library, create a new variable called pid7.new which is the
same as the original variable except that it recodes values of 8 to
NA but keeps values of 1 through 7 as they were in the
original variable.
(Hint: Remember that you will need to install the car
package if you haven’t already done this. To do this type
install.packages("car") in the console – this is a rare
time when you should not do this in your code but actually should do it
in the terminal in the bottom left after the > prompt.
This is because you will only need to install the package once and then
it will be on your computer. Then each new R session you want to use the
package you can type library(car) – this command should go
in your source code as usual – to load it into your workspace.) Then
make a table of the original variable against the new one adding the
option exclude=NULL to show NA values in the
table in order to make sure the recoding worked as planned.
Finally, calculate the mean and make a histogram of the new party ID
variable pid7.new and comment briefly on what you see.
(Note: the hist command might choose odd bin divisions but
the bars should represent the frequency of values 1, 2, …, 7. You don’t
need to worry about making the histogram look pretty here.)
table(pid7)
## pid7
## 1 2 3 4 5 6 7 8
## 469 196 170 465 183 193 368 86
library(car)
## Loading required package: carData
pid7.new <- recode(pid7,
"1='Strong Democrat'; 2='Weak Democrat';
3='Independent leaning Democrat'; 4='Independent';
5='Independent leaning Republican'; 6='Weak Republican';
7='Strong Republican'; 8=NA",
as.factor = TRUE)
table(pid7.new, pid7)
## pid7
## pid7.new 1 2 3 4 5 6 7 8
## Independent 0 0 0 465 0 0 0 0
## Independent leaning Democrat 0 0 170 0 0 0 0 0
## Independent leaning Republican 0 0 0 0 183 0 0 0
## Strong Democrat 469 0 0 0 0 0 0 0
## Strong Republican 0 0 0 0 0 0 368 0
## Weak Democrat 0 196 0 0 0 0 0 0
## Weak Republican 0 0 0 0 0 193 0 0
pid7.new <- as.numeric(pid7.new)
hist(pid7.new)
mean(pid7.new, na.rm = TRUE)
## [1] 3.716732
The frequency is in the hundreds. The distribution appears to be bimodal.From this histogram we can infer that Americans are primarily Strongly Democrat or Independent.
The variable idealg gives the estimated ideological
position of each survey respondent based on their views on each of the
Supreme Court cases asked in the survey. idealg is
constructed by using an “ideal point model” to uncover the structure of
this ideological dimension underlying people’s views (obviously this
type of model is beyond the scope of this course so you should simply
take this variable as a measure of each person’s ideological position).
Higher values represent more conservative ideologies and lower values
represent more liberal ones. The mean of this variable is 0 and the
standard deviation is 1.
Make a histogram of the variable idealg and then make a
boxplot of idealg (on the vertical axis) against
pid7.new. Briefly comment on what you see.
hist(idealg)
boxplot(idealg,pid7.new)
Looking at the boxpolot we can draw a similar conclusion to the histogram of pid7.new: The majority of Americans who responded to this survey identify as either Strongly Democrat or Independent.
The variable perception gives an estimate of each
respondent’s perception of the ideological position of the Supreme Court
on the same ideological scale as respondents’ own estimated ideologies
(so smaller values mean a respondent perceives the Court to be more
liberal while larger values mean they perceive the Court to be more
conservative).
First, estimate the average perception of the Supreme Court’s ideology (that is, calculate the sample mean).
Then, construct a 95% confidence interval for this mean in two ways:
(1) calculating it yourself based on the mean and standard deviation of
this variable (which you can use the mean and
sd functions to calculate) and (2) using the
t.test function. (Note: these two ways might give very
slightly different values, but should be identical to at least a couple
decimal places. Also note there are no missing values in this variable
so length(perception) will give you the sample size \(N\).)
mean(perception)
## [1] 0.1341852
sd(perception)
## [1] 0.7525446
length(perception)
## [1] 2130
sum(perception)
## [1] 285.8144
mean(perception) - 1.96*sd(perception)/sqrt(length(perception))
## [1] 0.1022258
mean(perception) + 1.96*sd(perception)/sqrt(length(perception))
## [1] 0.1661446
t.test(perception)
##
## One Sample t-test
##
## data: perception
## t = 8.2293, df = 2129, p-value = 3.241e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.1022082 0.1661622
## sample estimates:
## mean of x
## 0.1341852
The variable idealg_courtmajority gives the estimated
actual position of the Court based on its rulings during the 2020-2021
term. In other words, this is an estimate of the Court’s position rather
than ordinary American’s perceptions of the Court’s position. (It’s a
little odd because this variable has the exact same value for every
respondent since it’s not the respondents’ perceptions of the Court’s
position, but the estimated actual position of the Court on this
ideological dimension).
Make a boxplot of the variable idealg (on the vertical
axis) against pid7.new (on the horizontal axis) as you did
above and then add a horizontal line at the estimate of the actual
position of the Court using the command
abline(h=idealg_courtmajority). (This will actually add a
horizontal line for each respondent for their value of
idealg_courtmajority, but since this variable is the same
for every respondent it will look like one horizontal line.)
Next, make a boxplot of the variable perception (on the
vertical axis) against pid7.new (on the horizontal axis)
and then add a horizontal line at the estimate of the actual position of
the Court using the command
abline(h=idealg_courtmajority).
Then comment briefly about what the first boxplot tells you about how the Court’s ideological position compares to the positions of ordinary Americans (specifically how it compares to the positions of Democrats, independents and Republicans) using the first boxplot, and also what the second boxplot tells you about how the Court’s actual ideological position compares to respondents’ perceptions of the Court’s position (and specifically how it compares to the perceptions of the Court held by Democrats, independents and Republicans).
boxplot(pid7.new, idealg)
abline(h=idealg_courtmajority)
boxplot(pid7.new, perception)
abline(h=idealg_courtmajority)
mean(idealg_courtmajority)
## [1] 0.6145682
mean(idealg)
## [1] -0.01189128
mean(perception)
## [1] 0.1341852
The first boxplot shows that the respondents actual expected ideaology is more conservative than the courts actual ideaology. This is confrimed by the vertical line and the mean of the courts actual position versus the respondanets expected position. The second box plot shows that the perception of the court is also lower than the courts actual position aka the perception is more liberal than the actaul position which is more conservative. It also shows that the Democrats, Independents, and Republicans all tend to view the court as more conservative than their own expected ideaology however it is skwed more towards that number aka respondents percieve the court as more conservative but slighly skewed towards their own idealogy.
The variable roe_per gives respondents’ views on the
following question: “Should the Supreme Court overrule Roe v. Wade, the
1973 decision that established a constitutional right to abortion and
prohibited states from banning abortion before the fetus can survive
outside the womb, at around 23 weeks of pregnancy?” where the value of 1
indicates a response of “Yes, Roe v. Wade should be overturned” and a
value of 2 indicates a response of “No, Roe v. Wade should NOT be
overturned”.
First, create a new variable called roe_per.new that is
1 for the “Yes” response and 0 for the “No” response. (Hint: you can use
the recode function as above but you can do this even more
easily by making the new variable 2 minus the old variable)
Next, make a table of this new variable.
Finally, calculate the proportion of respondents giving the “CAN”
response and calculate a 95% confidence interval for this proportion
separately in two ways: (1) calculating it yourself using the sample
proportion and (2) using the prop.test function. (Hint:
These two ways should give nearly identical answers but they might after
the first couple decimal places. Also note there are no missing values
in this variable so length(schoolspeech_per.new) will give
you the sample size N.)
Briefly comment on what you learned from this.
roe_per.new <- 2 - roe_per
table(roe_per.new)
## roe_per.new
## 0 1
## 1318 812
mean(roe_per.new) - 1.96*sd(roe_per.new)/sqrt(length(roe_per.new))
## [1] 0.3605895
mean(roe_per.new) + 1.96*sd(roe_per.new)/sqrt(length(roe_per.new))
## [1] 0.4018519
prop.test(812, 812+1318)
##
## 1-sample proportions test with continuity correction
##
## data: 812 out of 812 + 1318, null probability 0.5
## X-squared = 119.73, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.3605941 0.4022797
## sample estimates:
## p
## 0.3812207
From this data we learned that we can be 95% confident that a slight majorityh of ameircans voted against overturning roe v wade. We can also conclude that a majority of the respondants are also ideaologically leaning more democratic or independent than Republican because our value is under 0.5.
Make a table of roe_per.new against gender
by typing table(roe_per.new, gender).
Next calculate sample proportions and 95% confidence intervals for
the proportions of men and women (separately) who supported overturning
Roe v. Wade. You can just do this using the prop.test and
don’t have to calculate it manually. Then comment briefly on what you
learn from this about possible differences in support for overturning
Roe by gender, including both estimates and confidence intervals.
table(roe_per.new, gender)
## gender
## roe_per.new 1 2
## 0 595 723
## 1 425 387
prop.test(425, 425+595)
##
## 1-sample proportions test with continuity correction
##
## data: 425 out of 425 + 595, null probability 0.5
## X-squared = 28.001, df = 1, p-value = 1.213e-07
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.3862960 0.4476728
## sample estimates:
## p
## 0.4166667
prop.test(387, 387+723)
##
## 1-sample proportions test with continuity correction
##
## data: 387 out of 387 + 723, null probability 0.5
## X-squared = 101.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.3207394 0.3776186
## sample estimates:
## p
## 0.3486486
The sample estimate for males who support overturning roe v wade is 0.4167 and the 95% confidence interval is (0.3862960, 0.4476728). The sample estimate for females is 0.3486 and confidence interval (0.3207394, 0.3776186). From this data we can conclude that more males are in favor of overturning the ruling than females. We can also use this data to conclude that Females on average from this sample are more ideaologically liberal than males who are slightly more conservative.