Tables, Means, and Proportions with the 2024 ANES and 2024 CES surveys

Key Steps:

  1. Load data.
  2. Load packages.
  3. Create any new data columns as needed.
  4. Set the survey design.
  5. Create tables, figures, and estimates.

This handout just sketches a few things we did in class. In another handout we’ll see how to do this with AI tools. To effectively use AI tools like ChatGPT, get the analysis setup and ask it for help performing additional tasks.

2024 ANES

First we will work more directly with packages for analysing survey data. Then review some options with the gvanes package.

Loading the data

If you plan to use gvanes package then you need to load the data through the package. Then load a copy for accessing the data outside the package:

library(gvanes)
data(anes24)

Then load the data outside the package:

load("~/SharedProjects/PLS 350/ANES 2024/anes24.rdata")

In the Environment there is now anes24, a data frame.

loading required packages

Two packages are needed. Later we’ll load the gvanes package.

library(tidyverse)
library(survey)

We’ll look at some examples with a few different questions. One is V241313, “How willing should the United States be to use military force to solve international problems?” A five point response scale ranging from “Extremely willing” to “Not at all willing”

Setting the survey design means telling R how to treat the sample as complex rather than simple:

anes2024 <- svydesign(
  ids = ~V240107c,     # PSU
  strata = ~V240107d,  # strata
  weights = ~V240107b, # weight
  data = anes24,    # data
  nest = TRUE
)

With any of the svy() functions, you refer to the survey design object, not the data. Note below it is anes2024 not anes24.

svytable() tabulates a variable.

svytable(~V241313, anes2024)
V241313
       -8. Don't know  1. Extremely willing       2. Very willing 
             33.75612             191.90715             510.73966 
3. Moderately willing   4. A little willing 5. Not at all willing 
           2329.41065            1314.87385             354.69505 

Because the tabulations are weighted, the tabulation contains decimals. We can round those up to integers:

svytable(~V241313, anes2024, round=TRUE)
V241313
       -8. Don't know  1. Extremely willing       2. Very willing 
                   34                   192                   511 
3. Moderately willing   4. A little willing 5. Not at all willing 
                 2329                  1315                   355 

Here, the responses have labels. We could delete the labels and treat the variable as a numeric measure. Treating the variable like a numeric five point scale requires dropping the ‘Don’t know’ and the response labels. First dropping the DK responses, and creating a new variable int_force:

anes24 <- anes24 %>%
  mutate(int_force = na_if(V241313, "-8. Don't know")) %>%
  mutate(int_force = forcats::fct_drop(int_force)) 

The third line drops unused labels.

Anytime we create a new variable we have to re-specify the survey design, so that it is updated with most recent variable info:

anes2024 <- svydesign(
  ids = ~V240107c,     # PSU
  strata = ~V240107d,  # strata
  weights = ~V240107b, # weight
  data = anes24,    # data
  nest = TRUE
)
svytable(~int_force, anes2024, round=TRUE)
int_force
 1. Extremely willing       2. Very willing 3. Moderately willing 
                  192                   511                  2329 
  4. A little willing 5. Not at all willing 
                 1315                   355 

Party identification: V241221

svytable(~V241221, anes2024, round=TRUE)
V241221
             -8. Don't know 0. No preference {VOL, FtF} 
                         27                          30 
                1. Democrat               2. Republican 
                       1659                        1450 
             3. Independent    5. Other party {SPECIFY} 
                       1421                         150 

Change DK, ‘Other’ responses to Independent

anes24 <- anes24 %>%
  mutate(
    party_id = case_when(
    V241221 %in% c("-8. Don't know",
     "0. No preference {VOL, FtF}",
     "5. Other party {SPECIFY}") ~ "3. Independent",
      TRUE ~ V241221),
    party_id = factor(party_id,
        levels = c("1. Democrat", "2. Republican", "3. Independent")))

Respondent sex:

svytable(~V241550, anes2024, round=TRUE)
V241550
-8. Don't know        1. Male      2. Female 
            27           2394           2543 
# dropping don't know
anes24 <- anes24 %>%
  mutate(sex = na_if(V241550, "-8. Don't know"))
# resetting survey design
anes2024 <- svydesign(
  ids = ~V240107c,     # PSU
  strata = ~V240107d,  # strata
  weights = ~V240107b, # weight
  data = anes24,    # data
  nest = TRUE
)

A t-test on the arithemtic mean military use by sex:

svyttest(as.numeric(int_force) ~ sex, anes2024, na.rm=TRUE)

    Design-based t-test

data:  as.numeric(int_force) ~ sex
t = 1.3032, df = 1782, p-value = 0.1927
alternative hypothesis: true difference in mean is not equal to 0
95 percent confidence interval:
 -0.02339177  0.11603570
sample estimates:
difference in mean 
        0.04632196 

There is no difference in means. The 95% confidence interval on the difference includes 0: -0.02339177 0.11603570. The p-value on the test is p-value = 0.19, meaning we do not reject the null of no difference.

Party Identification

svytable(~party_id, anes2024, round=TRUE)
party_id
   1. Democrat  2. Republican 3. Independent 
          1659           1450           1627 

proportions of party identifiers:

prop.table(svytable(~party_id, anes2024, round=TRUE))
party_id
   1. Democrat  2. Republican 3. Independent 
     0.3502956      0.3061655      0.3435389 

confidence intervals on proportions of party ID

confint(svymean(~party_id, anes2024, na.rm=TRUE))
                           2.5 %    97.5 %
party_id1. Democrat    0.3301705 0.3703910
party_id2. Republican  0.2879911 0.3242358
party_id3. Independent 0.3236011 0.3636104

Means on military use across party ID

A t-test is for testing differences across two groups. With multiple party identifications, we could construct confidence intervals on each mean level of support:

svyby(~as.numeric(int_force),~party_id, anes2024, svymean, na.rm=TRUE) 
                     party_id as.numeric(int_force)         se
1. Democrat       1. Democrat              3.257307 0.03327848
2. Republican   2. Republican              3.054149 0.03437336
3. Independent 3. Independent              3.389749 0.03069103

The se is the standard error. We multiply the standard error by 1.96 and add or substract that amount from the sample means. For Democrats:

1.96*0.03327848
[1] 0.06522582

So the upper and lower bounds of the mean military use

4.257307 + 1.96*0.03327848
[1] 4.322533
4.257307- 1.96*0.03327848
[1] 4.192081

Cross-tabulation: Party Identification by Mean military use:

svytable(~int_force + party_id, anes2024)
                       party_id
int_force               1. Democrat 2. Republican 3. Independent
  1. Extremely willing     65.68608      81.10112       45.11995
  2. Very willing         171.82734     234.69888      104.21343
  3. Moderately willing   797.08669     737.97103      794.35293
  4. A little willing     509.49945     300.25271      505.12169
  5. Not at all willing   109.60530      87.34867      157.74109

Column or row percentages:

prop.table(svytable(~int_force + party_id, anes2024), margin=2)
                       party_id
int_force               1. Democrat 2. Republican 3. Independent
  1. Extremely willing   0.03972056    0.05626660     0.02808501
  2. Very willing        0.10390448    0.16283015     0.06486788
  3. Moderately willing  0.48200057    0.51199192     0.49444672
  4. A little willing    0.30809576    0.20831029     0.31441410
  5. Not at all willing  0.06627863    0.06060104     0.09818629
`confint(svymean())` # for a confidence interval around a survey mean
`svyciprop()` # another way to construct the confidence interval for a mean
`svytable()` # for the cross-tabulation
`svychisq()` # for the chi-square test of independence
`svyglm()` # linear regression
`svyttest()` # t-test
svychisq(~int_force + party_id, anes2024, na.rm=TRUE)

    Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~int_force + party_id, anes2024, na.rm = TRUE)
F = 8.5593, ndf = 7.764, ddf = 15528.032, p-value = 2.013e-11

It’s reported as an F test, but interpreted the same as we reviewed in class. Clearly, there is a statistically significant different dependency.

gvanes package

The xtab() function from the gvanes package:

xtab(int_force, party_id, column)
int_force × party_id
Weighted frequencies and column percentages (columns sum to 100)
int_force
1. Democrat
2. Republican
3. Independent
n % n % n %
1. Extremely willing 65 4 80 6 45 3
2. Very willing 171 10 230 16 99 6
3. Moderately willing 801 48 721 51 791 50
4. A little willing 526 31 289 21 499 31
5. Not at all willing 109 7 82 6 157 10

See the handout for other examples.

CES 2024

To load the CES data, use the package specifically for it, like ces2024.

library(ces2024)

Attaching package: 'ces2024'
The following objects are masked from 'package:gvanes':

    tab, xtab
data(ces24)

Load the dataset with

load("~/SharedProjects/PLS 350/CES 2024/ces24_survey.rdata")

Use it as in the handout.

xtab(pid7, educ)
pid7 × educ
Weighted frequencies (counts rounded up)
pid7
No HS
High school graduate
Some college
2-year
4-year
Post-grad
n n n n n n
Strong Democrat 591 3156 2160 1036 2596 1935
Not very strong Democrat 308 1374 1015 533 1021 727
Lean Democrat 128 1059 1068 542 1116 836
Independent 653 2099 1297 683 1181 732
Lean Republican 292 1339 1171 565 1227 629
Not very strong Republican 337 1547 916 497 1000 514
Strong Republican 681 3430 2088 1039 2251 1063
Not sure 139 410 208 67 151 45

Then here’s column percentages

xtab(pid7, educ, column)
pid7 × educ
Weighted frequencies and column percentages (columns sum to 100)
pid7
No HS
High school graduate
Some college
2-year
4-year
Post-grad
n % n % n % n % n % n %
Strong Democrat 591 19 3156 22 2160 22 1036 21 2596 25 1935 30
Not very strong Democrat 308 10 1374 10 1015 10 533 11 1021 10 727 11
Lean Democrat 128 4 1059 7 1068 11 542 11 1116 11 836 13
Independent 653 21 2099 15 1297 13 683 14 1181 11 732 11
Lean Republican 292 9 1339 9 1171 12 565 11 1227 12 629 10
Not very strong Republican 337 11 1547 11 916 9 497 10 1000 9 514 8
Strong Republican 681 22 3430 24 2088 21 1039 21 2251 21 1063 16
Not sure 139 4 410 3 208 2 67 1 151 1 45 1