library(gvanes)
data(anes24)Tables, Means, and Proportions with the 2024 ANES and 2024 CES surveys
Key Steps:
- Load data.
- Load packages.
- Create any new data columns as needed.
- Set the survey design.
- Create tables, figures, and estimates.
This handout just sketches a few things we did in class. In another handout we’ll see how to do this with AI tools. To effectively use AI tools like ChatGPT, get the analysis setup and ask it for help performing additional tasks.
2024 ANES
First we will work more directly with packages for analysing survey data. Then review some options with the gvanes package.
Loading the data
If you plan to use gvanes package then you need to load the data through the package. Then load a copy for accessing the data outside the package:
Then load the data outside the package:
load("~/SharedProjects/PLS 350/ANES 2024/anes24.rdata")In the Environment there is now anes24, a data frame.
loading required packages
Two packages are needed. Later we’ll load the gvanes package.
library(tidyverse)
library(survey)We’ll look at some examples with a few different questions. One is V241313, “How willing should the United States be to use military force to solve international problems?” A five point response scale ranging from “Extremely willing” to “Not at all willing”
Setting the survey design means telling R how to treat the sample as complex rather than simple:
anes2024 <- svydesign(
ids = ~V240107c, # PSU
strata = ~V240107d, # strata
weights = ~V240107b, # weight
data = anes24, # data
nest = TRUE
)With any of the svy() functions, you refer to the survey design object, not the data. Note below it is anes2024 not anes24.
svytable() tabulates a variable.
svytable(~V241313, anes2024)V241313
-8. Don't know 1. Extremely willing 2. Very willing
33.75612 191.90715 510.73966
3. Moderately willing 4. A little willing 5. Not at all willing
2329.41065 1314.87385 354.69505
Because the tabulations are weighted, the tabulation contains decimals. We can round those up to integers:
svytable(~V241313, anes2024, round=TRUE)V241313
-8. Don't know 1. Extremely willing 2. Very willing
34 192 511
3. Moderately willing 4. A little willing 5. Not at all willing
2329 1315 355
Here, the responses have labels. We could delete the labels and treat the variable as a numeric measure. Treating the variable like a numeric five point scale requires dropping the ‘Don’t know’ and the response labels. First dropping the DK responses, and creating a new variable int_force:
anes24 <- anes24 %>%
mutate(int_force = na_if(V241313, "-8. Don't know")) %>%
mutate(int_force = forcats::fct_drop(int_force)) The third line drops unused labels.
Anytime we create a new variable we have to re-specify the survey design, so that it is updated with most recent variable info:
anes2024 <- svydesign(
ids = ~V240107c, # PSU
strata = ~V240107d, # strata
weights = ~V240107b, # weight
data = anes24, # data
nest = TRUE
)svytable(~int_force, anes2024, round=TRUE)int_force
1. Extremely willing 2. Very willing 3. Moderately willing
192 511 2329
4. A little willing 5. Not at all willing
1315 355
Party identification: V241221
svytable(~V241221, anes2024, round=TRUE)V241221
-8. Don't know 0. No preference {VOL, FtF}
27 30
1. Democrat 2. Republican
1659 1450
3. Independent 5. Other party {SPECIFY}
1421 150
Change DK, ‘Other’ responses to Independent
anes24 <- anes24 %>%
mutate(
party_id = case_when(
V241221 %in% c("-8. Don't know",
"0. No preference {VOL, FtF}",
"5. Other party {SPECIFY}") ~ "3. Independent",
TRUE ~ V241221),
party_id = factor(party_id,
levels = c("1. Democrat", "2. Republican", "3. Independent")))Respondent sex:
svytable(~V241550, anes2024, round=TRUE)V241550
-8. Don't know 1. Male 2. Female
27 2394 2543
# dropping don't know
anes24 <- anes24 %>%
mutate(sex = na_if(V241550, "-8. Don't know"))# resetting survey design
anes2024 <- svydesign(
ids = ~V240107c, # PSU
strata = ~V240107d, # strata
weights = ~V240107b, # weight
data = anes24, # data
nest = TRUE
)A t-test on the arithemtic mean military use by sex:
svyttest(as.numeric(int_force) ~ sex, anes2024, na.rm=TRUE)
Design-based t-test
data: as.numeric(int_force) ~ sex
t = 1.3032, df = 1782, p-value = 0.1927
alternative hypothesis: true difference in mean is not equal to 0
95 percent confidence interval:
-0.02339177 0.11603570
sample estimates:
difference in mean
0.04632196
There is no difference in means. The 95% confidence interval on the difference includes 0: -0.02339177 0.11603570. The p-value on the test is p-value = 0.19, meaning we do not reject the null of no difference.
Party Identification
svytable(~party_id, anes2024, round=TRUE)party_id
1. Democrat 2. Republican 3. Independent
1659 1450 1627
proportions of party identifiers:
prop.table(svytable(~party_id, anes2024, round=TRUE))party_id
1. Democrat 2. Republican 3. Independent
0.3502956 0.3061655 0.3435389
confidence intervals on proportions of party ID
confint(svymean(~party_id, anes2024, na.rm=TRUE)) 2.5 % 97.5 %
party_id1. Democrat 0.3301705 0.3703910
party_id2. Republican 0.2879911 0.3242358
party_id3. Independent 0.3236011 0.3636104
Means on military use across party ID
A t-test is for testing differences across two groups. With multiple party identifications, we could construct confidence intervals on each mean level of support:
svyby(~as.numeric(int_force),~party_id, anes2024, svymean, na.rm=TRUE) party_id as.numeric(int_force) se
1. Democrat 1. Democrat 3.257307 0.03327848
2. Republican 2. Republican 3.054149 0.03437336
3. Independent 3. Independent 3.389749 0.03069103
The se is the standard error. We multiply the standard error by 1.96 and add or substract that amount from the sample means. For Democrats:
1.96*0.03327848[1] 0.06522582
So the upper and lower bounds of the mean military use
4.257307 + 1.96*0.03327848[1] 4.322533
4.257307- 1.96*0.03327848[1] 4.192081
Cross-tabulation: Party Identification by Mean military use:
svytable(~int_force + party_id, anes2024) party_id
int_force 1. Democrat 2. Republican 3. Independent
1. Extremely willing 65.68608 81.10112 45.11995
2. Very willing 171.82734 234.69888 104.21343
3. Moderately willing 797.08669 737.97103 794.35293
4. A little willing 509.49945 300.25271 505.12169
5. Not at all willing 109.60530 87.34867 157.74109
Column or row percentages:
prop.table(svytable(~int_force + party_id, anes2024), margin=2) party_id
int_force 1. Democrat 2. Republican 3. Independent
1. Extremely willing 0.03972056 0.05626660 0.02808501
2. Very willing 0.10390448 0.16283015 0.06486788
3. Moderately willing 0.48200057 0.51199192 0.49444672
4. A little willing 0.30809576 0.20831029 0.31441410
5. Not at all willing 0.06627863 0.06060104 0.09818629
`confint(svymean())` # for a confidence interval around a survey mean
`svyciprop()` # another way to construct the confidence interval for a mean
`svytable()` # for the cross-tabulation
`svychisq()` # for the chi-square test of independence
`svyglm()` # linear regression
`svyttest()` # t-testsvychisq(~int_force + party_id, anes2024, na.rm=TRUE)
Pearson's X^2: Rao & Scott adjustment
data: svychisq(~int_force + party_id, anes2024, na.rm = TRUE)
F = 8.5593, ndf = 7.764, ddf = 15528.032, p-value = 2.013e-11
It’s reported as an F test, but interpreted the same as we reviewed in class. Clearly, there is a statistically significant different dependency.
gvanes package
The xtab() function from the gvanes package:
xtab(int_force, party_id, column)| int_force × party_id | ||||||
|---|---|---|---|---|---|---|
| Weighted frequencies and column percentages (columns sum to 100) | ||||||
| int_force |
1. Democrat
|
2. Republican
|
3. Independent
|
|||
| n | % | n | % | n | % | |
| 1. Extremely willing | 65 | 4 | 80 | 6 | 45 | 3 |
| 2. Very willing | 171 | 10 | 230 | 16 | 99 | 6 |
| 3. Moderately willing | 801 | 48 | 721 | 51 | 791 | 50 |
| 4. A little willing | 526 | 31 | 289 | 21 | 499 | 31 |
| 5. Not at all willing | 109 | 7 | 82 | 6 | 157 | 10 |
See the handout for other examples.
CES 2024
To load the CES data, use the package specifically for it, like ces2024.
library(ces2024)
Attaching package: 'ces2024'
The following objects are masked from 'package:gvanes':
tab, xtab
data(ces24)Load the dataset with
load("~/SharedProjects/PLS 350/CES 2024/ces24_survey.rdata")Use it as in the handout.
xtab(pid7, educ)| pid7 × educ | ||||||
|---|---|---|---|---|---|---|
| Weighted frequencies (counts rounded up) | ||||||
| pid7 |
No HS
|
High school graduate
|
Some college
|
2-year
|
4-year
|
Post-grad
|
| n | n | n | n | n | n | |
| Strong Democrat | 591 | 3156 | 2160 | 1036 | 2596 | 1935 |
| Not very strong Democrat | 308 | 1374 | 1015 | 533 | 1021 | 727 |
| Lean Democrat | 128 | 1059 | 1068 | 542 | 1116 | 836 |
| Independent | 653 | 2099 | 1297 | 683 | 1181 | 732 |
| Lean Republican | 292 | 1339 | 1171 | 565 | 1227 | 629 |
| Not very strong Republican | 337 | 1547 | 916 | 497 | 1000 | 514 |
| Strong Republican | 681 | 3430 | 2088 | 1039 | 2251 | 1063 |
| Not sure | 139 | 410 | 208 | 67 | 151 | 45 |
Then here’s column percentages
xtab(pid7, educ, column)| pid7 × educ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Weighted frequencies and column percentages (columns sum to 100) | ||||||||||||
| pid7 |
No HS
|
High school graduate
|
Some college
|
2-year
|
4-year
|
Post-grad
|
||||||
| n | % | n | % | n | % | n | % | n | % | n | % | |
| Strong Democrat | 591 | 19 | 3156 | 22 | 2160 | 22 | 1036 | 21 | 2596 | 25 | 1935 | 30 |
| Not very strong Democrat | 308 | 10 | 1374 | 10 | 1015 | 10 | 533 | 11 | 1021 | 10 | 727 | 11 |
| Lean Democrat | 128 | 4 | 1059 | 7 | 1068 | 11 | 542 | 11 | 1116 | 11 | 836 | 13 |
| Independent | 653 | 21 | 2099 | 15 | 1297 | 13 | 683 | 14 | 1181 | 11 | 732 | 11 |
| Lean Republican | 292 | 9 | 1339 | 9 | 1171 | 12 | 565 | 11 | 1227 | 12 | 629 | 10 |
| Not very strong Republican | 337 | 11 | 1547 | 11 | 916 | 9 | 497 | 10 | 1000 | 9 | 514 | 8 |
| Strong Republican | 681 | 22 | 3430 | 24 | 2088 | 21 | 1039 | 21 | 2251 | 21 | 1063 | 16 |
| Not sure | 139 | 4 | 410 | 3 | 208 | 2 | 67 | 1 | 151 | 1 | 45 | 1 |