Primary Election Participation - Hypothesis 1
1. Creating Data Set
Summary: I created my final data set by joining
(using an inner join) the May 2012 and May 2014 Voter Snapshot files
using the voter registration number as a common identifier. This gave me
a data set with voter’s party affiliations in 2012 and 2014. I then used
the mutate() function to add a column that would indicate whether there
was a party switch within that 2 year period. At this point, I joined my
data with the North Carolina Voter History file -this would allow me to
see participation in primary elections over the past ten years using the
filter() function for primary elections only. I chose to further filter
the data for only the following two primary elections (March 2016 and
May 2018) for relevancy. I then created a new column for the March 2016
and May 2018 elections with a true/false value to indicate if the voter
participated in each election.
Note: Pre-processing each data set included grouping
the data by voter registration number and using the distinct() function
to get rid of duplicate entries caused through joining the may files and
then again with the voter history files.
mayjoin <- inner_join(may2010, may2014, by = "voter_reg_num")
mayjoin <- mayjoin %>% mutate(party_changes = ifelse(party_cd.x == party_cd.y, 1, 2))
Here I loaded in voter history files
finaldata <- inner_join(mayjoin, swhist, by = "voter_reg_num")
finaldata <- filter(finaldata, grepl("primary", election_desc, ignore.case = TRUE))
mar2016 <- "03/15/2016 PRIMARY"
finaldata <- finaldata %>% group_by(voter_reg_num) %>% mutate(mar2016 = any(election_desc == may2014))
may2018 <- "05/08/2018 PRIMARY"
finaldatadata <- finaldata %>% group_by(voter_reg_num) %>% mutate(may2018 = any(election_desc == may2018))
I created subsets for those who switched parties and those who did
not. This made it easier when I started to create visuals for the
data.
noswitchdata <- subset(finaldata, party_changes == 1) %>% group_by(voter_reg_num)
switchdata <- subset(finaldata, party_changes == 2) %>% group_by(voter_reg_num)
I used the following code to count voters who participated in each
election based on whether they had switched party affiliation or not. I
did this to identify potential trends before graphing data.
Note: I repeated this for both subsets of data and for
both elections (4 total).
Findings:
March 2016 participation among those who did NOT switch
parties between 2012 and 2014: 93.5%
## Did not participate: 34116
## Did participate: 491390
March 2016 Participation among those who switched parties
between 2012 and 2014: 92%
## Did not participate: 4460
## Did participate: 52149
May 2018 participation among those who did NOT switch
parties: 56.5%
## Did not participate: 228773
## Did participate: 296733
May 2018 participation among those who switched parties:
49.7%
## Did not participate: 28928
## Did participate: 27681
Summary: In the 2016 presidential primary, there
seems to be no statistically significant difference in participation
rates among those who switched parties in-between 2012 and 2014 and
those who did not (there is a 1% difference). In the 2018 primary,
however, 56% of voters who did NOT switch parties in between 2012 and
2014 participated in the 2018 primary compared to the 49% of those who
did switch parties. With the numbers alone, it is evident that those who
did not engage in party switching participated in both primary elections
at higher rates than those who did switch parties. In neither election
is there a significant difference in participation rate among
party-switchers and non party-switchers, but the 2018 data actually
invalidates my original hypothesis. If anything, it is suggestive that
strong-partisan voters participate in primary elections at higher
rates.
2. Visualizing Data
Note: With no statistically significant differences
in data, it was hard to decide appropriate visualization methods. Below,
I used simple bar-graphs to show percentages of voters who did and did
not participate in each election based on the party-switching
behavior.
This code creates a bar graph with the subsetted data for those who
did not switch party-affiliation to display their participation in the
2016 primary. I used geom_text() to add percentage values to each x-axis
variable, labs() for the title and axis labels, and scale_y_continuous()
to change the y-axis from scientific notation to exact numbers for clear
visualization.
ggplot(noswitchdata, aes(x = mar2016, fill = mar2016))
+ geom_bar(color = "black")
+ geom_text(stat = "count", aes(label = paste0(round((..count..)/sum(..count..)*100, 1), "%")), position = position_stack(vjust = 0.5), size = 3)
+ labs(title = "March 2016 Presidential Primary Participation Among non-Party-Switchers", subtitle = "Voters who did not switch party-affiliation in-between the previous two primary elections", x = "Participation", y = "Count", fill = "Key")
+ theme_minimal()
+ scale_y_continuous(labels = scales::number_format())
Note: Here I will provide code for each subset, but
not for each election (to avoid including almost the same code 4x)
ggplot(switchdata, aes(x = mar2016, fill = mar2016))
+ geom_bar(color = "black")
+ geom_text(stat = "count", aes(label = paste0(round((..count..)/sum(..count..)*100, 1), "%")), position = position_stack(vjust = 0.5), size = 3)
+ labs(title = "March 2016 Presidential Primary Participation Among Party-Switchers", subtitle = "Voters who switched party-affiliation in-between the previous two primary elections", x = "Participation", y = "Count", fill = "Key")
+ theme_minimal()
+ scale_y_continuous(labels = scales::number_format())
Resulting Graphs
In the graphs below, “true” represents voters who did participate,
and “false” represents those who did not.
1. March 2016, No Party-Switch

2. March 2016, Party-Switch

3. May 2018, No Party-Switch

4. May 2018, Party-Switch

3. H1 Conclusion
The data and graphs above display little to no correlation between
party-switching and primary election participation among North Carolina
voters in between 2012 and 2018. My original hypothesis that there would
be a positive correlation between the party-switching behavior and
participation in subsequent primary elections seems to have been proven
wrong thus far. In 2016, there was little difference in participation
rates among those who swapped parties and those who did not. In 2018,
there was an approximate 6.5% difference in participation rates between
those who swapped parties and those who did not, but that additional
6.5% was held by the voters who did not engage in party-switching,
suggesting higher rates of political engagement overall among stronger
partisans. It is worth noting the significant difference in
participation rates overall in the 2016 presidential primary election
compared to the 2018 primaries. This could potentially be attributed
voters higher rates of political engagement during a presidential
election year.
Continuation: Up until this point I have only tested
my original hypothesis. The remainder of my project will look into
relationships between demographics, including age, race, and gender, and
party-switching to identify if any particular demographic is more/less
likely to engage in party-switching than its counterpart. Originally, I
expected to find a positive correlation between party-switching and
primary election participation, and then use trends in
voter-demographics to gain further insight in to potential explanations
for why people switch-parties. Instead, I will shift my focus away from
election participation in general, and zero in on identifying which
demographics, if any, engage in party-switching at higher rates.
Demographic Trends - Hypotheses 2 and 3
To test my second third hypothesis by evaluating trends in
party-switching among certain demographics, I used a left_join to bring
the data set that I used for my first hypothesis and the North Carolina
voter registration files together. This added additional information to
my already established data set including age, race, and gender.
demodataset <- left_join(finaldata, votereg, by = "voter_reg_num")
Note: I used the count() function to count the
values in the “party_changes” column in both my original data set and
the data set that resulted from joining the voter registration data and
subsequent preprocessing (analyzing/eliminating duplicates and missing
values) to ensure that they matched and I was continuing to analyze the
same set of voters.
Age
To calculate which voters switch party-affiliation at the highest
rate based on age, I began with my “demodataset” -which consisted of my
original dataset and the added demographic information. I used mutate()
to create an additional column with age ranges, which I established
using the cut() function. I then used group_by(), summarise(), and
count() to group the data by the previously established age ranges and
the party_change variable (which uses a 1 to indicate no party switch,
and 2 to indicate a party switch) and then count each
combination.Finally, I added an additional column using mutate() which
calculated the switch rate for each age range using the following
formula: # of individuals who switched parties / the total number of
individuals in each age range. The exact code follows, along with the
data set which I use to graph the data later on.
agedata <- demodataset %>% mutate(age_range = cut(age_at_year_end, breaks = c(18, 30, 45, 60, 75, Inf), labels = c("18-29", "30-44", "45-59", "60-74", "75+")))
agedata <- agedata %>% group_by(age_range, party_changes) %>% summarise(count = n()) %>% pivot_wider(names_from = party_changes, values_from = count, values_fill = 0) %>% mutate(switch_rate = `2`/(`1` + `2`))
| 18-29 |
59869 |
9821 |
0.1409 |
| 30-44 |
95924 |
13936 |
0.12685 |
| 45-59 |
128572 |
12918 |
0.0912 |
| 60-74 |
146414 |
12459 |
0.0784 |
| 75+ |
91758 |
7140 |
0.0722 |
The table (and the graph below) shows that the age range 18-29 has
the highest rate of party-affiliation-switching at 14%. This means that
of the voters between the ages of 18 and 29 in our data set, 14% of them
engaged in party-switching. As age increases, the percentage of
individuals who switched parties decreases, all the way through to ages
75+, which holds the lowest rate pf switching at 7.2%. My second
hypothesis seems true: the younger population engages in higher rates of
party-switching than the older population.
Below is the code used to create the graph for this data, and the
graph itself. The code uses geom_bar() to create a bar graph with age
range values on the x-axis and switch rates on the y-axis. Geom_text()
labels each bar with the exact percentages, aes(fill = age_range)
specifies the fill color for each bar (unique to each x-axis value), and
labs() creates labels for each component on the graph for
easy-visualization.
ggplot(agedata, aes(x = age_range, y = switch_rate, fill = age_range)) + geom_bar(stat = "identity", position = "dodge") + geom_text(aes(label = scales::percent(switch_rate)), position = position_dodge(width = 0.9), vjust = -0.5) + labs(title = "Switch Rates by Age Range", subtitle = "The percentage of each age-range who switched party-affiliations in between 2012 and 2014 in North Carolina", x = "Age Range", y = "Switch Rate") +theme_minimal()

Looking at this graph, it is easy to see that the rates at
which voters switch party-affiliations decreases as age increases,
suggesting that hypothesis 2 was accurate.
Race
To calculate switch rates based on race, I used a similar strategy to
the one described under “Age”, minus the step of identifying age ranges.
Below is the code used to create a data set with switch rates for each
unique value under the “race_code” variable, as well as the resulting
data frame.
racedata <- demodata %>% group_by(race_code, party_changes) %>% summarise(count = n()) %>% pivot_wider(names_from = party_changes, values_from = count, values_fill = 0) %>% mutate(switch_rate = `2`/(`1`+`2`))
| A |
3939 |
779 |
0.1651 |
| B |
94594 |
9096 |
0.0877 |
| I |
3416 |
323 |
0.0862 |
| M |
2070 |
304 |
0.128 |
| O |
11130 |
1757 |
0.1363 |
| P |
18 |
3 |
0.1429 |
| U |
26936 |
4372 |
0.1396 |
| W |
383366 |
39975 |
0.0944 |
I used the data frame above and the code below to graph the data
similar to how the age data was graphed.
ggplot(racedata, aes(x = race_code, y = switch_rate, fill = race_code)) + geom_bar(stat = "identity", position = "dodge") + geom_text(aes(label = scales::percent(switch_rate)), position = position_dodge(width = 0.9), vjust = -0.5) + labs(title = "Switch Rates by Race", subtitle = "The percentage of each age-range who switched party-affiliations in between 2012 and 2014 in North Carolina", x = "Race", y = "Switch Rate", fill = "Race") + theme_minimal()

No particular hypothesis was tested here, but in the graph above it
is evident that the Asian (A) race change party-affiliation at the
highest rate, and Black and Indian (B/I) switch party-affiliation at the
lowest rate, with Whites (W) also switching at a low-rate.
Party switching based on original affiliation
Aside from demographic trends, I was interested in looking at
original party affiliations (in this case, party affiliation in 2012)
and the rates at which each group swapped parties in between 2012 and
2014. It should be noted that this will really only show any trends in
this particular two years, but if any amount of voters have switched
their party affiliation before the data used in this project (2012) than
it is not representative of overall trends in original party-affiliation
switches. It might be indicative of an overall trend, but would need
further analysis across a larger time-frame (this is a point where
further research could be suggested).
Below is the code used to create the data set for analyzing switch
rates across party affiliations at point A (2012). The results will
display the rates at which each original party affiliation switched in
between 2012 and 2014. Here I can gain evidence for/against my third
hypothesis: those who originally identified as “unaffiliated” will
switch party-affiliation at higher rates than those who identify with a
party. This would support the theory that voters use “unaffiliated” as a
placeholder rather than a permanent identification.
partyaffdata <- demodata %>% group_by(party_cd.x) %>% summarise(total_count = n(), switch_rate = sum(party_changes ==2)/total_count) %>% arrange(desc(switch_rate))
The graph below shows that in between the years of 2012 and 2014,
liberals in North Carolina switched party-affiliation at a rate that is
nearly double that of democrats, republicans, or those unaffiliated.
This does not provide support for hypothesis 3, but could be used to
make inferences about the political environment at the time that might
explain why liberals switched at such a high rate. Or, with further
analysis, it could be concluded that this is a typical trend across
time.
ggplot(partyaffdata, aes(x = party_cd.x, y = switch_rate, fill = party_cd.x)) + geom_bar(stat = "identity") + geom_text(aes(label = scales::percent(switch_rate)), position = position_dodge(width = 0.9), vjust = -0.5) + labs(title = "Switch Rates by `Original` Party Affiliation", subtitle = "The percentage of each original party identification (id in 2012) who switched party-affiliations in between 2012 and 2014 in North Carolina", x = "Original ID", y = "Switch Rate", fill = "Original ID") + theme_minimal()

Final Conclusion/Suggestions for Further
Research
Conclusion
Primary Election Participation (Hypothesis 1):
Contrary to the initial hypothesis, there was little to no correlation
between party-switching and subsequent primary election participation.
While there was a slight difference in participation rates in the 2018
primary, it was not statistically significant, and the 2016 data did not
support the hypothesis.
Age/Demographics (Hypothesis 2): The analysis
revealed a clear trend in party-switching rates based on age. Younger
individuals, aged 18-29, exhibited the highest rate of party-switching
at 14%, while the rate decreased with increasing age groups, reaching
7.2% for individuals aged 75 and above. This finding supported the
second hypothesis that younger populations are more likely to engage in
party-switching.
Original Party Affiliation (Hypothesis 3): The
analysis of party-switching rates based on original party affiliation
(in 2012) revealed that liberals had the highest switching rate, nearly
double that of Democrats, Republicans, and the unaffiliated. This result
did not support the hypothesis that voters with an initial
identification as “unaffiliated” would switch parties at higher
rates.
Discussion/Suggestions
While this data analysis only scratched the surface of overall trends
in party-switching by analyzing a subset of voters in North Carolina,
further research endeavors could contribute to a more comprehensive
understanding of party-switching dynamics and incentives for switching.
Extending the analysis to cover a longer time frame to identify any
temporal trends in party-switching behavior could provide a more
complete understanding of how political dynamics and voter behaviors
evolve over time. Specifically, this project could be continued on from
2014 to 2022 for a greater understanding of party-switching behavior in
the most recent decade. Additionally, it might be interesting to
investigate external factors, such as major political events that might
influence party-switching behavior. Understanding contextual elements
could offer a more nuanced interpretation of observed trends.
Ultimately, the behavior of party-switching is a pretty nuanced topic
to study, with various external sociopolitical factors that may
influence why each individual chooses to change parties at a given point
in time. Predicting electoral participation based on the behavior of
party switching, or identifying demographic trends in the behavior,
cannot provide any solid answer to why people switch parties. Still,
identifying trends in voter-behavior could be applied to predicting
electoral outcomes: understanding how and why voters change affiliations
may help in predicting potential shifts in voting patterns, which can be
valuable for political strategists and pollsters. Identifying these
trends may also provide insight into how specific subgroups engage with
the political environment, which is essential to understand for the
purpose of tailoring messages and campaigns to diverse audiences.