Overview and first look at the data

The FiveThirtyEight article “Voter Registrations Are Way, Way Down During The Pandemic” https://fivethirtyeight.com/features/voter-registrations-are-way-way-down-during-the-pandemic/, published June 26, 2020, by Kaleigh Rogers and Nathaniel Rakich, compares new voter registrations in the spring of 2020 to the same period from 2016. Looking at both periods, which are typically busy during the run-up to a Presidential election, there is an apparent decline in the number of new registrations in 2020 over the comparison period.

The raw data is available on GitHub https://github.com/fivethirtyeight/data/tree/master/voter-registration and was imported to R for further analysis. The data covers January through April for new registrations in both 2016 and 2020 for 11 states and the District of Columbia, and structure of the dataset is shown below:

gitURL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/voter-registration/new-voter-registrations.csv"
voterReg <- read.csv(gitURL)
str(voterReg)
## 'data.frame':    106 obs. of  4 variables:
##  $ Jurisdiction         : chr  "Arizona" "Arizona" "Arizona" "Arizona" ...
##  $ Year                 : int  2016 2016 2016 2016 2020 2020 2020 2020 2016 2016 ...
##  $ Month                : chr  "Jan" "Feb" "Mar" "Apr" ...
##  $ New.registered.voters: int  25852 51155 48614 30668 33229 50853 31872 10249 87574 103377 ...

Munging the data for analysis

In order to sort the data by month properly, each month will need to be converted to an integer. Next, since the Year variable holds both 2016 and 2020 data, it will be helpful to separate the New registrations into two separate variables, one for 2016 new registration and one for 2020 new registrations. This will make it visually easier to see side-by-side comparisons in the data table, and will allow for a simple calculated variable showing the delta between 2016 new registrations and 2020 new registrations in each state for each month reported. The results of these data transformations are stored in a subsetted data frame.

library(dplyr)

monthName <- c(voterReg$Month)
numMonth <- sapply(monthName, switch, "Jan" = 1, "Feb" = 2, "Mar" = 3, "Apr" = 4, "May" = 5)
voterReg$numMonth <- numMonth

newReg2016 <- voterReg$New.registered.voters[which(voterReg$Year == 2016)]
newReg2016
##  [1]  25852  51155  48614  30668  87574 103377 174278 185478  17024  20707
## [11]  25627  22204   3007   3629   5124   3818   2840   2954   4706   4157
## [21]   5714  50231  87351  73627  52508  34952  40976  44150  37028  44040
## [31]  99674  52782  76098  19580  29122  40497  26655   5828  35213  84357
## [41]  58272  73341  29374 132860 143795 170607 143199  91205  20032  36911
## [51]  44171  20460  26239
newReg2020 <- voterReg$New.registered.voters[which(voterReg$Year == 2020)]
newReg2020
##  [1]  33229  50853  31872  10249 151595 238281 176810  38970  20260  33374
## [11]  18990   6034   3276   3353   2535    589   3334   3348   2225   1281
## [21]   1925  77466 109859  54872  21031  38573  55386  26284  15484  44443
## [31]  68455  47899  21332  21532  20708  23864  10061  23488 111990  54053
## [41]  54807  35484  23517 134559 130080 129424  34694  35678  25934  29507
## [51]  31492   5467   8239
newRegDelta <- newReg2020 - newReg2016
newRegDelta
##  [1]    7377    -302  -16742  -20419   64021  134904    2532 -146508    3236
## [10]   12667   -6637  -16170     269    -276   -2589   -3229     494     394
## [19]   -2481   -2876   -3789   27235   22508  -18755  -31477    3621   14410
## [28]  -17866  -21544     403  -31219   -4883  -54766    1952   -8414  -16633
## [37]  -16594   17660   76777  -30304   -3465  -37857   -5857    1699  -13715
## [46]  -41183 -108505  -55527    5902   -7404  -12679  -14993  -18000
subsetVoterReg <- data.frame(distinct(voterReg, Jurisdiction, Month, numMonth),newReg2016,newReg2020,newRegDelta)
subsetVoterReg
##            Jurisdiction Month numMonth newReg2016 newReg2020 newRegDelta
## 1               Arizona   Jan        1      25852      33229        7377
## 2               Arizona   Feb        2      51155      50853        -302
## 3               Arizona   Mar        3      48614      31872      -16742
## 4               Arizona   Apr        4      30668      10249      -20419
## 5            California   Jan        1      87574     151595       64021
## 6            California   Feb        2     103377     238281      134904
## 7            California   Mar        3     174278     176810        2532
## 8            California   Apr        4     185478      38970     -146508
## 9              Colorado   Jan        1      17024      20260        3236
## 10             Colorado   Feb        2      20707      33374       12667
## 11             Colorado   Mar        3      25627      18990       -6637
## 12             Colorado   Apr        4      22204       6034      -16170
## 13             Delaware   Jan        1       3007       3276         269
## 14             Delaware   Feb        2       3629       3353        -276
## 15             Delaware   Mar        3       5124       2535       -2589
## 16             Delaware   Apr        4       3818        589       -3229
## 17 District of Columbia   Jan        1       2840       3334         494
## 18 District of Columbia   Feb        2       2954       3348         394
## 19 District of Columbia   Mar        3       4706       2225       -2481
## 20 District of Columbia   Apr        4       4157       1281       -2876
## 21 District of Columbia   May        5       5714       1925       -3789
## 22              Florida   Jan        1      50231      77466       27235
## 23              Florida   Feb        2      87351     109859       22508
## 24              Florida   Mar        3      73627      54872      -18755
## 25              Florida   Apr        4      52508      21031      -31477
## 26              Georgia   Jan        1      34952      38573        3621
## 27              Georgia   Feb        2      40976      55386       14410
## 28              Georgia   Mar        3      44150      26284      -17866
## 29              Georgia   Apr        4      37028      15484      -21544
## 30             Illinois   Jan        1      44040      44443         403
## 31             Illinois   Feb        2      99674      68455      -31219
## 32             Illinois   Mar        3      52782      47899       -4883
## 33             Illinois   Apr        4      76098      21332      -54766
## 34             Maryland   Jan        1      19580      21532        1952
## 35             Maryland   Feb        2      29122      20708       -8414
## 36             Maryland   Mar        3      40497      23864      -16633
## 37             Maryland   Apr        4      26655      10061      -16594
## 38             Maryland   May        5       5828      23488       17660
## 39       North Carolina   Jan        1      35213     111990       76777
## 40       North Carolina   Feb        2      84357      54053      -30304
## 41       North Carolina   Mar        3      58272      54807       -3465
## 42       North Carolina   Apr        4      73341      35484      -37857
## 43       North Carolina   May        5      29374      23517       -5857
## 44                Texas   Jan        1     132860     134559        1699
## 45                Texas   Feb        2     143795     130080      -13715
## 46                Texas   Mar        3     170607     129424      -41183
## 47                Texas   Apr        4     143199      34694     -108505
## 48                Texas   May        5      91205      35678      -55527
## 49             Virginia   Jan        1      20032      25934        5902
## 50             Virginia   Feb        2      36911      29507       -7404
## 51             Virginia   Mar        3      44171      31492      -12679
## 52             Virginia   Apr        4      20460       5467      -14993
## 53             Virginia   May        5      26239       8239      -18000

Visually explore the data by faceting graphs for each jurisdiction. Display 2016 data points in red and 2020 data points in blue over a numeric representation of each month, which keeps the months in proper order.

library(ggplot2)

Cfgraph <- ggplot(subsetVoterReg) + geom_line(aes(x = numMonth , y = newReg2016), color = "red") + geom_line(aes(x = numMonth , y = newReg2020), color = "blue") + xlab('Month') + ylab('New.Registrations') + facet_wrap(~Jurisdiction) + labs(title = "Comparison of 2016 to 2020 New Voter Registrations")
print(Cfgraph)

Also graph the net change in new voter registrations for each jurisdiction over each month.

Netgraph <- ggplot(subsetVoterReg, aes(x = numMonth , y = newRegDelta)) + geom_line() + xlab('Month') + ylab('Net.Change.in.New.Registrations') + facet_wrap(~Jurisdiction) + labs(title = "Net Change of 2016 to 2020 New Voter Registrations")
print(Netgraph)

Conclusions

Based on the comparisons of 2016 to 2020 data, it appears that California, Florida, Illinois and Texas saw the largest drops in new voter registrations during the spring registrations prior to the Presidential elections. Given that the reported data reflects discrete new voter registrations by month, we could verify which states had the largest net total Deltas by adding the net Deltas across each state for the given time period. In fact, let’s do so:

library(data.table)
DT <- data.table(subsetVoterReg)
setkey(DT, Jurisdiction)
aggregate(newRegDelta ~ Jurisdiction, subsetVoterReg, sum)
##            Jurisdiction newRegDelta
## 1               Arizona      -30086
## 2            California       54949
## 3              Colorado       -6904
## 4              Delaware       -5825
## 5  District of Columbia       -8258
## 6               Florida        -489
## 7               Georgia      -21379
## 8              Illinois      -90465
## 9              Maryland      -22029
## 10       North Carolina        -706
## 11                Texas     -217231
## 12             Virginia      -47174

Here we can see clearly that aggregated new voter registrations across the four or five months of data have most adversely impacted Texas, Illinois, Virginia, and Arizona. The large net gain in California’s reflects that more people registered even earlier in 2020 than they did in 2016, which can be seen in the first faceted visualization. Overall, it is clear that the pandemic did indeed decrease new registrations overall across most of the states studied. It would be interesting to see if there is a correlation with the larger registration declines against the attempts to properly socially distance to control the virus within each state.