Tables, Means, and Proportions with the 2024 ANES and 2024 CES surveys

Key Steps:

Load data.
Load packages.
Create any new data columns as needed.
Set the survey design.
Create tables, figures, and estimates.

This handout just sketches a few things we did in class. In another handout we’ll see how to do this with AI tools. To effectively use AI tools like ChatGPT, get the analysis setup and ask it for help performing additional tasks.

2024 ANES

First we will work more directly with packages for analysing survey data. Then review some options with the gvanes package.

Loading the data

If you plan to use gvanes package then you need to load the data through the package. Then load a copy for accessing the data outside the package:

library(gvanes)
data(anes24)

Then load the data outside the package:

load("~/SharedProjects/PLS 350/ANES 2024/anes24.rdata")

In the Environment there is now anes24, a data frame.

loading required packages

Two packages are needed. Later we’ll load the gvanes package.

library(tidyverse)
library(survey)

We’ll look at some examples with a few different questions. One is V241313, “How willing should the United States be to use military force to solve international problems?” A five point response scale ranging from “Extremely willing” to “Not at all willing”

Setting the survey design means telling R how to treat the sample as complex rather than simple:

anes2024 <- svydesign(
  ids = ~V240107c,     # PSU
  strata = ~V240107d,  # strata
  weights = ~V240107b, # weight
  data = anes24,    # data
  nest = TRUE
)

With any of the svy() functions, you refer to the survey design object, not the data. Note below it is anes2024 not anes24.

svytable() tabulates a variable.

svytable(~V241313, anes2024)

V241313
       -8. Don't know  1. Extremely willing       2. Very willing 
             33.75612             191.90715             510.73966 
3. Moderately willing   4. A little willing 5. Not at all willing 
           2329.41065            1314.87385             354.69505

Because the tabulations are weighted, the tabulation contains decimals. We can round those up to integers:

svytable(~V241313, anes2024, round=TRUE)

V241313
       -8. Don't know  1. Extremely willing       2. Very willing 
                   34                   192                   511 
3. Moderately willing   4. A little willing 5. Not at all willing 
                 2329                  1315                   355

Here, the responses have labels. We could delete the labels and treat the variable as a numeric measure. Treating the variable like a numeric five point scale requires dropping the ‘Don’t know’ and the response labels. First dropping the DK responses, and creating a new variable int_force:

anes24 <- anes24 %>%
  mutate(int_force = na_if(V241313, "-8. Don't know")) %>%
  mutate(int_force = forcats::fct_drop(int_force))

The third line drops unused labels.

Anytime we create a new variable we have to re-specify the survey design, so that it is updated with most recent variable info:

anes2024 <- svydesign(
  ids = ~V240107c,     # PSU
  strata = ~V240107d,  # strata
  weights = ~V240107b, # weight
  data = anes24,    # data
  nest = TRUE
)

svytable(~int_force, anes2024, round=TRUE)

int_force
 1. Extremely willing       2. Very willing 3. Moderately willing 
                  192                   511                  2329 
  4. A little willing 5. Not at all willing 
                 1315                   355

Party identification: V241221

svytable(~V241221, anes2024, round=TRUE)

V241221
             -8. Don't know 0. No preference {VOL, FtF} 
                         27                          30 
                1. Democrat               2. Republican 
                       1659                        1450 
             3. Independent    5. Other party {SPECIFY} 
                       1421                         150

Change DK, ‘Other’ responses to Independent

anes24 <- anes24 %>%
  mutate(
    party_id = case_when(
    V241221 %in% c("-8. Don't know",
     "0. No preference {VOL, FtF}",
     "5. Other party {SPECIFY}") ~ "3. Independent",
      TRUE ~ V241221),
    party_id = factor(party_id,
        levels = c("1. Democrat", "2. Republican", "3. Independent")))

Respondent sex:

svytable(~V241550, anes2024, round=TRUE)

V241550
-8. Don't know        1. Male      2. Female 
            27           2394           2543

# dropping don't know
anes24 <- anes24 %>%
  mutate(sex = na_if(V241550, "-8. Don't know"))

# resetting survey design
anes2024 <- svydesign(
  ids = ~V240107c,     # PSU
  strata = ~V240107d,  # strata
  weights = ~V240107b, # weight
  data = anes24,    # data
  nest = TRUE
)

A t-test on the arithemtic mean military use by sex:

svyttest(as.numeric(int_force) ~ sex, anes2024, na.rm=TRUE)


    Design-based t-test

data:  as.numeric(int_force) ~ sex
t = 1.3032, df = 1782, p-value = 0.1927
alternative hypothesis: true difference in mean is not equal to 0
95 percent confidence interval:
 -0.02339177  0.11603570
sample estimates:
difference in mean 
        0.04632196

There is no difference in means. The 95% confidence interval on the difference includes 0: -0.02339177 0.11603570. The p-value on the test is p-value = 0.19, meaning we do not reject the null of no difference.

Party Identification

svytable(~party_id, anes2024, round=TRUE)

party_id
   1. Democrat  2. Republican 3. Independent 
          1659           1450           1627

proportions of party identifiers:

prop.table(svytable(~party_id, anes2024, round=TRUE))

party_id
   1. Democrat  2. Republican 3. Independent 
     0.3502956      0.3061655      0.3435389

confidence intervals on proportions of party ID

confint(svymean(~party_id, anes2024, na.rm=TRUE))

                           2.5 %    97.5 %
party_id1. Democrat    0.3301705 0.3703910
party_id2. Republican  0.2879911 0.3242358
party_id3. Independent 0.3236011 0.3636104

Means on military use across party ID

A t-test is for testing differences across two groups. With multiple party identifications, we could construct confidence intervals on each mean level of support:

svyby(~as.numeric(int_force),~party_id, anes2024, svymean, na.rm=TRUE)

                     party_id as.numeric(int_force)         se
1. Democrat       1. Democrat              3.257307 0.03327848
2. Republican   2. Republican              3.054149 0.03437336
3. Independent 3. Independent              3.389749 0.03069103

The se is the standard error. We multiply the standard error by 1.96 and add or substract that amount from the sample means. For Democrats:

1.96*0.03327848

[1] 0.06522582

So the upper and lower bounds of the mean military use

4.257307 + 1.96*0.03327848

[1] 4.322533

4.257307- 1.96*0.03327848

[1] 4.192081

Cross-tabulation: Party Identification by Mean military use:

svytable(~int_force + party_id, anes2024)

                       party_id
int_force               1. Democrat 2. Republican 3. Independent
  1. Extremely willing     65.68608      81.10112       45.11995
  2. Very willing         171.82734     234.69888      104.21343
  3. Moderately willing   797.08669     737.97103      794.35293
  4. A little willing     509.49945     300.25271      505.12169
  5. Not at all willing   109.60530      87.34867      157.74109

Column or row percentages:

prop.table(svytable(~int_force + party_id, anes2024), margin=2)

                       party_id
int_force               1. Democrat 2. Republican 3. Independent
  1. Extremely willing   0.03972056    0.05626660     0.02808501
  2. Very willing        0.10390448    0.16283015     0.06486788
  3. Moderately willing  0.48200057    0.51199192     0.49444672
  4. A little willing    0.30809576    0.20831029     0.31441410
  5. Not at all willing  0.06627863    0.06060104     0.09818629

`confint(svymean())` # for a confidence interval around a survey mean
`svyciprop()` # another way to construct the confidence interval for a mean
`svytable()` # for the cross-tabulation
`svychisq()` # for the chi-square test of independence
`svyglm()` # linear regression
`svyttest()` # t-test

svychisq(~int_force + party_id, anes2024, na.rm=TRUE)


    Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~int_force + party_id, anes2024, na.rm = TRUE)
F = 8.5593, ndf = 7.764, ddf = 15528.032, p-value = 2.013e-11

It’s reported as an F test, but interpreted the same as we reviewed in class. Clearly, there is a statistically significant different dependency.

gvanes package

The xtab() function from the gvanes package:

xtab(int_force, party_id, column)

int_force × party_id
Weighted frequencies and column percentages (columns sum to 100)
int_force	1. Democrat		2. Republican		3. Independent
int_force	n	%	n	%	n	%
1. Extremely willing	65	4	80	6	45	3
2. Very willing	171	10	230	16	99	6
3. Moderately willing	801	48	721	51	791	50
4. A little willing	526	31	289	21	499	31
5. Not at all willing	109	7	82	6	157	10

See the handout for other examples.

CES 2024

To load the CES data, use the package specifically for it, like ces2024.

library(ces2024)


Attaching package: 'ces2024'

The following objects are masked from 'package:gvanes':

    tab, xtab

data(ces24)

Load the dataset with

load("~/SharedProjects/PLS 350/CES 2024/ces24_survey.rdata")

Use it as in the handout.

xtab(pid7, educ)

pid7 × educ
Weighted frequencies (counts rounded up)
pid7	No HS	High school graduate	Some college	2-year	4-year	Post-grad
pid7	n	n	n	n	n	n
Strong Democrat	591	3156	2160	1036	2596	1935
Not very strong Democrat	308	1374	1015	533	1021	727
Lean Democrat	128	1059	1068	542	1116	836
Independent	653	2099	1297	683	1181	732
Lean Republican	292	1339	1171	565	1227	629
Not very strong Republican	337	1547	916	497	1000	514
Strong Republican	681	3430	2088	1039	2251	1063
Not sure	139	410	208	67	151	45

Then here’s column percentages

xtab(pid7, educ, column)

pid7 × educ
Weighted frequencies and column percentages (columns sum to 100)
pid7	No HS		High school graduate		Some college		2-year		4-year		Post-grad
pid7	n	%	n	%	n	%	n	%	n	%	n	%
Strong Democrat	591	19	3156	22	2160	22	1036	21	2596	25	1935	30
Not very strong Democrat	308	10	1374	10	1015	10	533	11	1021	10	727	11
Lean Democrat	128	4	1059	7	1068	11	542	11	1116	11	836	13
Independent	653	21	2099	15	1297	13	683	14	1181	11	732	11
Lean Republican	292	9	1339	9	1171	12	565	11	1227	12	629	10
Not very strong Republican	337	11	1547	11	916	9	497	10	1000	9	514	8
Strong Republican	681	22	3430	24	2088	21	1039	21	2251	21	1063	16
Not sure	139	4	410	3	208	2	67	1	151	1	45	1