Bias in Self-reported Turnout

Surveys are frequently used to measure political behavior such as voter turnout, but some researchers are concerned about the accuracy of self-reports. In particular, they worry about possible social desirability bias where in post-election surveys, respondents who did not vote in an election lie about not having voted because they may feel that they should have voted. Is such a bias present in the American National Election Studies (ANES)? The ANES is a nation-wide survey that has been conducted for every election since 1948. The ANES conducts face-to-face interviews with a nationally representative sample of adults. The table below displays the names and descriptions of variables in the turnout.csv data file.

Name	Description
`year`	Election year
`VEP`	Voting Eligible Population (in thousands)
`VAP`	Voting Age Population (in thousands)
`total`	Total ballots cast for highest office (in thousands)
`ANES`	Turnout estimated from the American National Election Survey (in percentages)
`felons`	Total ineligible felons (in thousands)
`noncit`	Total non-citizens (in thousands)
`overseas`	Total eligible overseas voters (in thousands)
`osvoters`	Total ballots counted by overseas voters (in thousands)

turnout <- read.csv("C:/Users/Mr Laptop/Desktop/QM/turnout.csv")
turnout

##    year    VEP    VAP  total ANES felons noncit overseas osvoters
## 1  1980 159635 164445  86515   71    802   5756     1803       NA
## 2  1982 160467 166028  67616   60    960   6641     1982       NA
## 3  1984 167702 173995  92653   74   1165   7482     2361       NA
## 4  1986 170396 177922  64991   53   1367   8362     2216       NA
## 5  1988 173579 181955  91595   70   1594   9280     2257       NA
## 6  1990 176629 186159  67859   47   1901  10239     2659       NA
## 7  1992 179656 190778 104405   75   2183  11447     2418       NA
## 8  1994 182623 195258  75106   56   2441  12497     2229       NA
## 9  1996 186347 200016  96263   73   2586  13601     2499       NA
## 10 1998 190420 205313  72537   52   2920  14988     2937       NA
## 11 2000 194331 210623 105375   73   3083  16218     2937       NA
## 12 2002 198382 215462  78382   62   3168  17237     3308       NA
## 13 2004 203483 220336 122295   77   3158  18068     3862       NA
## 14 2008 213314 230872 131304   78   3145  19392     4972      263

library(qss)
data("turnout")
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(qsslearnr)
turnout

##    year    VEP    VAP  total ANES felons noncit overseas osvoters
## 1  1980 159635 164445  86515   71    802   5756     1803       NA
## 2  1982 160467 166028  67616   60    960   6641     1982       NA
## 3  1984 167702 173995  92653   74   1165   7482     2361       NA
## 4  1986 170396 177922  64991   53   1367   8362     2216       NA
## 5  1988 173579 181955  91595   70   1594   9280     2257       NA
## 6  1990 176629 186159  67859   47   1901  10239     2659       NA
## 7  1992 179656 190778 104405   75   2183  11447     2418       NA
## 8  1994 182623 195258  75106   56   2441  12497     2229       NA
## 9  1996 186347 200016  96263   73   2586  13601     2499       NA
## 10 1998 190420 205313  72537   52   2920  14988     2937       NA
## 11 2000 194331 210623 105375   73   3083  16218     2937       NA
## 12 2002 198382 215462  78382   62   3168  17237     3308       NA
## 13 2004 203483 220336 122295   77   3158  18068     3862       NA
## 14 2008 213314 230872 131304   78   3145  19392     4972      263

Question 1

Load the data into R and check the dimensions of the data. Also, obtain a summary of the data. How many observations are there? What is the range of years covered in this data set?

dim(turnout)

## [1] 14  9

summary(turnout)

##       year           VEP              VAP             total       
##  Min.   :1980   Min.   :159635   Min.   :164445   Min.   : 64991  
##  1st Qu.:1986   1st Qu.:171192   1st Qu.:178930   1st Qu.: 73179  
##  Median :1993   Median :181140   Median :193018   Median : 89055  
##  Mean   :1993   Mean   :182640   Mean   :194226   Mean   : 89778  
##  3rd Qu.:2000   3rd Qu.:193353   3rd Qu.:209296   3rd Qu.:102370  
##  Max.   :2008   Max.   :213314   Max.   :230872   Max.   :131304  
##                                                                   
##       ANES           felons         noncit         overseas       osvoters  
##  Min.   :47.00   Min.   : 802   Min.   : 5756   Min.   :1803   Min.   :263  
##  1st Qu.:57.00   1st Qu.:1424   1st Qu.: 8592   1st Qu.:2236   1st Qu.:263  
##  Median :70.50   Median :2312   Median :11972   Median :2458   Median :263  
##  Mean   :65.79   Mean   :2177   Mean   :12229   Mean   :2746   Mean   :263  
##  3rd Qu.:73.75   3rd Qu.:3042   3rd Qu.:15910   3rd Qu.:2937   3rd Qu.:263  
##  Max.   :78.00   Max.   :3168   Max.   :19392   Max.   :4972   Max.   :263  
##                                                                NA's   :13

Answer: There are 14 observations. The range of the dataset is 1980:2008.

Question 2

Calculate the turnout rate based on the voting age population or VAP. Note that for this data set, we must add the total number of eligible overseas voters since the VAP variable does not include these individuals in the count. Next, calculate the turnout rate using the voting eligible population or VEP. What difference do you observe?

TurnoutRate1 <- turnout$total/(turnout$VAP+turnout$overseas)*100
TurnoutRate1

##  [1] 52.03972 40.24522 52.53748 36.07845 49.72260 35.93884 54.04097 38.03086
##  [9] 47.53376 34.83169 49.34211 35.82850 54.54777 55.67409

TurnoutRate2 <- turnout$total/turnout$VEP*100
TurnoutRate2

##  [1] 54.19551 42.13701 55.24860 38.14115 52.76848 38.41895 58.11384 41.12625
##  [9] 51.65793 38.09316 54.22449 39.51064 60.10084 61.55433

data.frame (TurnoutRate1,TurnoutRate2)

##    TurnoutRate1 TurnoutRate2
## 1      52.03972     54.19551
## 2      40.24522     42.13701
## 3      52.53748     55.24860
## 4      36.07845     38.14115
## 5      49.72260     52.76848
## 6      35.93884     38.41895
## 7      54.04097     58.11384
## 8      38.03086     41.12625
## 9      47.53376     51.65793
## 10     34.83169     38.09316
## 11     49.34211     54.22449
## 12     35.82850     39.51064
## 13     54.54777     60.10084
## 14     55.67409     61.55433

Answer: he turnout rates calculated using VAP are higher than those calculated using VEP

Question 3

Compute the difference between VAP and ANES estimates of turnout rate. How big is the difference on average? What is the range of the difference? Conduct the same comparison for the VEP and ANES estimates of voter turnout. Briefly comment on the results.

differenceVAP <- (turnout$ANES-TurnoutRate1)
differenceVAP

##  [1] 18.96028 19.75478 21.46252 16.92155 20.27740 11.06116 20.95903 17.96914
##  [9] 25.46624 17.16831 23.65789 26.17150 22.45223 22.32591

mean(differenceVAP)

## [1] 20.32914

range(differenceVAP)

## [1] 11.06116 26.17150

differenceVEP <- (turnout$ANES-TurnoutRate2)
differenceVEP

##  [1] 16.804491 17.862987 18.751404 14.858846 17.231520  8.581054 16.886160
##  [8] 14.873745 21.342072 13.906838 18.775507 22.489359 16.899156 16.445672

mean(differenceVEP)

## [1] 16.83634

range(differenceVEP)

## [1]  8.581054 22.489359

Answer: On average, the difference between VAP and ANES is bigger than that between VEP and ANES. Also, the VAP and ANES has a broader range in their difference.

Question 4

Compare the VEP turnout rate with the ANES turnout rate separately for presidential elections and midterm elections. Note that the data set excludes the year 2006. Does the bias of the ANES vary across election types?

turnout <- turnout %>%
  mutate(type = if_else(year %in% c(1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008), 
                        'pres', 
                        'midterm'))


turnout %>% group_by(type) %>% summarise(differenceVAP=mean(differenceVAP),differenceVEP=mean(differenceVEP))

## # A tibble: 2 × 3
##   type    differenceVAP differenceVEP
##   <chr>           <dbl>         <dbl>
## 1 midterm          20.3          16.8
## 2 pres             20.3          16.8

Answer: The bias of ANAS is greater in presidential elections.

Question 5

Divide the data into half by election years such that you subset the data into two periods. Calculate the difference between the VEP turnout rate and the ANES turnout rate separately for each year within each period. Has the bias of the ANES increased over time?

FirstPeriod <- turnout %>% slice(1:7)
SecondPeriod <- turnout %>% slice(8:14)
(FirstPeriod$total/FirstPeriod$VEP*100) - FirstPeriod$ANES

## [1] -16.804491 -17.862987 -18.751404 -14.858846 -17.231520  -8.581054 -16.886160

(SecondPeriod$total/SecondPeriod$VEP*100) - SecondPeriod$ANES

## [1] -14.87375 -21.34207 -13.90684 -18.77551 -22.48936 -16.89916 -16.44567

Answer: It seems that the bias of ANES has increased over time and ranges more widely.

Question 6

The ANES does not interview overseas voters and prisoners. Calculate an adjustment to the 2008 VAP turnout rate. Begin by subtracting the total number of ineligible felons and non-citizens from the VAP to calculate an adjusted VAP. Next, calculate an adjusted VAP turnout rate, taking care to subtract the number of overseas ballots counted from the total ballots in 2008. Compare the adjusted VAP turnout with the unadjusted VAP, VEP, and the ANES turnout rate. Briefly discuss the results.

AdjustedVAP <- turnout$VAP-turnout$felons-turnout$noncit

AdjustedTurnout <- turnout$total/AdjustedVAP*100
263 ->turnout$osvoters[14]
AdjustedTurnout

##  [1] 54.79552 42.67959 56.03515 38.64073 53.53897 38.99517 58.93660 41.65151
##  [9] 52.36551 38.70601 55.07730 40.18415 61.42082 63.02542

UnadjustedTurnout <- (turnout$total/turnout$VAP)*100
UnadjustedTurnout

##  [1] 52.61030 40.72566 53.25038 36.52780 50.33937 36.45217 54.72591 38.46501
##  [9] 48.12765 35.32996 50.03015 36.37857 55.50387 56.87307

VEPTurnout <- (turnout$total/turnout$VEP)*100
VEPTurnout

##  [1] 54.19551 42.13701 55.24860 38.14115 52.76848 38.41895 58.11384 41.12625
##  [9] 51.65793 38.09316 54.22449 39.51064 60.10084 61.55433

ANESTurnout <- (turnout$total*turnout$ANES)/100
ANESTurnout

##  [1]  61425.65  40569.60  68563.22  34445.23  64116.50  31893.73  78303.75
##  [8]  42059.36  70271.99  37719.24  76923.75  48596.84  94167.15 102417.12

data.frame (AdjustedTurnout, UnadjustedTurnout, VEPTurnout, turnout$ANES)

##    AdjustedTurnout UnadjustedTurnout VEPTurnout turnout.ANES
## 1         54.79552          52.61030   54.19551           71
## 2         42.67959          40.72566   42.13701           60
## 3         56.03515          53.25038   55.24860           74
## 4         38.64073          36.52780   38.14115           53
## 5         53.53897          50.33937   52.76848           70
## 6         38.99517          36.45217   38.41895           47
## 7         58.93660          54.72591   58.11384           75
## 8         41.65151          38.46501   41.12625           56
## 9         52.36551          48.12765   51.65793           73
## 10        38.70601          35.32996   38.09316           52
## 11        55.07730          50.03015   54.22449           73
## 12        40.18415          36.37857   39.51064           62
## 13        61.42082          55.50387   60.10084           77
## 14        63.02542          56.87307   61.55433           78

Answer: Adjusted Turnout is more similar to VEP Turnout, and less different from ANES than unadjusted turnout. I was not able to substract the overseas cast ballots for 2008.