I’m interested in learning more information about the population of registered NYC voters who are least likely to vote in a given election: nonpartisan, independent voters. According to the 2018 Voters Analysis conducted by the Mayor’s Office in conjunction with the Campaign Finance Board, the strongest indicator that a person will not vote in a given election is lack of party affiliation. As a registered independent voter myself, I wanted to learn more about this population as we approach the highly-anticipated 2020 Presidential election.

The data set provided by NYC Open Data contained location and electoral information for over 4 million people. I first narrowed the data by selecting only registered voters who were not registered as Democrats (DEM) or Republicans (REP). I further narrowed the data by culling participation information regarding primary elections, from which independent voter participation is excluded. Then I selected a subset of years that held Presidential and federal midterm elections.

Voter_Analysis_select <- Voter_Analysis_2008_2018_3_ %>%
  select("random_ID", "age", "RegistrationYear", "PoliticalParty.08", "BoroCode.08", "NTA.08", "PoliticalParty.10", "BoroCode.10", "CommunityDistrict.10", "NTA.10", "PoliticalParty.12", "BoroCode.12", "NTA.12", "PoliticalParty.14", "BoroCode.14", "NTA.14", "PoliticalParty.16", "BoroCode.16", "NTA.16", "PoliticalParty.18", "BoroCode.18", "NTA.18", "GE08", "GE10", "GE12", "GE14", "GE16", "GE18", "eligible", "participated", "participation_score_2018", "weights") %>% 
  hablar::convert(num("random_ID", "age", "eligible", "participated", "participation_score_2018", "weights")) %>% 
  filter(eligible>0, PoliticalParty.18 != "REP") %>% 
  filter(PoliticalParty.18 != "DEM") %>%
  filter(PoliticalParty.18 != "GRE") %>% 
  filter(PoliticalParty.18 != "WEP") %>% 
  filter(PoliticalParty.18 != "WOR") %>%
  filter(PoliticalParty.18 != "OTH") %>%
  filter(PoliticalParty.18 != "CON") %>%
  filter(PoliticalParty.18 != "IND")

The next step will require pivoting the data to extract the years from the Neighborhood Tabulation Area columns. I will further group the voters by age so I can categorize them by generation. Then I will make a bar graph that compares voter participation from 2008-2018. At some point, I plan to use the NTA data to create a map and analyze participation data spatially, but we may just save that for the QGIS section.

Voter_Ages_NA <- Voter_Analysis_select %>%
  select("random_ID", "age", "GE08", "GE10", "GE12", "GE14", "GE16", "GE18", "participation_score_2018") %>%
  mutate(Generation = case_when(age >= 54 ~ 'Boomers',
    age >= 38 ~ 'GenX',
    age >= 22 ~ 'Millennials',
    age >= 18 ~ 'GenZ'))
include=FALSE
# Citation for the case_when code structure: https://stackoverflow.com/questions/15016723/create-categories-by-comparing-a-numeric-column-with-a-fixed-value
Boomer_Av_Partic <- Voter_Ages_NA %>%
  filter(Generation == "Boomers") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

GenX_Av_Partic <- Voter_Ages_NA %>%
  filter(Generation == "GenX") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

Mill_Av_Partic <- Voter_Ages_NA %>%
  filter(Generation == "Millennials") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

GenZ_Av_Partic <- Voter_Ages_NA %>%
  filter(Generation == "GenZ") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

https://dplyr.tidyverse.org/reference/summarise_all.html

Generations <- Voter_Ages_NA %>%
  select(Generation) %>%
  distinct() %>%
  na.omit()
  
INDVoter_Generations <- rbind(Boomer_Av_Partic, GenX_Av_Partic, Mill_Av_Partic, GenZ_Av_Partic)

Indep_Generation_Part <- cbind(Generations, INDVoter_Generations) %>%
  rename(Average_Participation = participation_score_2018)

https://www.datacamp.com/community/tutorials/merging-datasets-r

ggplot(Indep_Generation_Part) +
 aes(x = Generation, weight = Average_Participation) +
 geom_bar(fill = "#41ab5d") +
 labs(y = "Percent Election Participation", title = "NYC Independent Voter Participation", subtitle = "by Generation", caption = "Source: NYC Campaign Finance Bureau") +
 theme_classic() +
 ylim(0L, 40L)

ggsave("Generation.png", width = 6, height = 4)

source: https://wagner-mspp-2020.github.io/r-demos/r-demo.html#data-manipulation-with-dplyr

The above graph represents one of the top findings of the Voter Analysis Report: that voters over age 50 were most likely to have higher participation scores. Now I want to look at the data by borough.

Voter_Boroughs <- Voter_Analysis_select %>%
  select("random_ID", "BoroCode.18", "participation_score_2018") %>%
  mutate(Borough = case_when(BoroCode.18 == 1 ~ 'Bronx',
    BoroCode.18 == 2 ~ 'Brooklyn',
    BoroCode.18 == 3 ~ 'Manhattan',
    BoroCode.18 == 4 ~ 'Queens',
    BoroCode.18 == 5 ~ 'Staten Island'))
Bronx_AVP <- Voter_Boroughs %>%
  filter(Borough == "Bronx") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

Brook_AVP <- Voter_Boroughs %>%
  filter(Borough == "Brooklyn") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

Manhat_AVP <- Voter_Boroughs %>%
  filter(Borough == "Manhattan") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

Queens_AVP <- Voter_Boroughs %>%
  filter(Borough == "Queens") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)

Staten_AVP <- Voter_Boroughs %>%
  filter(Borough == "Staten Island") %>%
  summarize_at(c("participation_score_2018"), mean, na.rm = TRUE)
Boroughs <- Voter_Boroughs %>%
  select(Borough) %>%
  distinct() %>%
  na.omit()
  
INDVoter_Boroughs <- rbind(Bronx_AVP, Brook_AVP, Manhat_AVP, Queens_AVP, Staten_AVP)

Indep_Boroughs_Part <- cbind(Boroughs, INDVoter_Boroughs) %>%
  rename(Average_Participation = participation_score_2018) %>%
  mutate(Percent_Participation = Average_Participation * 100)

source: https://www.datacamp.com/community/tutorials/merging-datasets-r

ggplot(Indep_Boroughs_Part) +
 aes(x = Borough, weight = Average_Participation) +
 geom_bar(fill = "#41ab5d") +
 labs(y = "Percent Election Participation", title = "NYC Independent Voter Participation", subtitle = "by Borough", caption = "Source: NYC Campaign Finance Bureau") +
 theme_classic() +
 ylim(0L, 40L)

ggsave("Borough.png", width = 6, height = 4)

The findings reflected in the above graph are in line with the findings published by the Campaign Finance Bureau. Voters in Manhattan are most likely to have higher participation scores, and voters in the Bronx are likely to have lower participation scores. Now I’m interested in looking at the data for these groups for each of the presidential elections from 2008 to 2018. I’m wondering if the data will reflect trends that differ from the general findings if they are looked at over time.

Voter_Analysis_2008 <- Voter_Analysis_select %>%
  select("random_ID", "age", "PoliticalParty.08", "BoroCode.08", "NTA.08", "GE08", "eligible", "participated", "participation_score_2018") %>%
   mutate(Generation = case_when(age >= 54 ~ 'Boomers',
    age >= 38 ~ 'GenX',
    age >= 22 ~ 'Millennials',
    age >= 18 ~ 'GenZ'))

  Boomer_Av08 <- Voter_Analysis_2008 %>%
    filter(Generation == "Boomers") %>%
    summarize_at(c("GE08"), mean, na.rm = TRUE)
  
  GenX_Av08 <- Voter_Analysis_2008 %>%
    filter(Generation == "GenX") %>%
    summarize_at(c("GE08"), mean, na.rm = TRUE)
  
  Mill_Av08 <- Voter_Analysis_2008 %>%
    filter(Generation == "Millennials") %>%
   summarize_at(c("GE08"), mean, na.rm = TRUE)
Generations08 <- Voter_Analysis_2008 %>%
  filter(Generation != 'GenZ') %>%
  select(Generation) %>%
  distinct() %>%
  na.omit()
  
Voter_Generations08 <- rbind(Boomer_Av08, GenX_Av08, Mill_Av08)

Generation_Part08 <- cbind(Generations08, Voter_Generations08) %>%
  rename(Average_Participation = GE08) %>%
  mutate(Percent_Participation = Average_Participation * 100)
ggplot(Generation_Part08) +
 aes(x = Generation, weight = Percent_Participation) +
 geom_bar(fill = "#d8576b") +
 labs(y = "Percent Election Participation", title = "NYC Independent Voter Participation (2008)", subtitle = "by Generation", caption = "Source: NYC Campaign Finance Bureau") +
 theme_classic() +
  coord_flip() +
 ylim(0L, 100L)

ggsave("Generation2008.png", width = 6, height = 4)

It looks like in the 2008 election, the trends represented above persisted. However, the Millenial generation had lower than expected participation.

Voter_Analysis_2012 <- Voter_Analysis_select %>%
  select("random_ID", "age", "PoliticalParty.12", "BoroCode.12", "NTA.12", "GE12", "eligible", "participated", "participation_score_2018") %>%
  mutate(Generation = case_when(age >= 54 ~ 'Boomers',
    age >= 38 ~ 'GenX',
    age >= 22 ~ 'Millennials',
    age >= 18 ~ 'GenZ'))
  
Boomer_Av12 <- Voter_Analysis_2012 %>%
    filter(Generation == "Boomers") %>%
    summarize_at(c("GE12"), mean, na.rm = TRUE) 

   GenX_Av12 <- Voter_Analysis_2012 %>%
    filter(Generation == "GenX") %>%
    summarize_at(c("GE12"), mean, na.rm = TRUE)
   
   Mill_Av12 <- Voter_Analysis_2012 %>%
    filter(Generation == "Millennials") %>%
   summarize_at(c("GE12"), mean, na.rm = TRUE)
  
Generations12 <- Voter_Analysis_2012 %>%
  filter(Generation != 'GenZ') %>%
  select(Generation) %>%
  distinct() %>%
  na.omit()
  
Voter_Generations12 <- rbind(Boomer_Av12, GenX_Av12, Mill_Av12)

Generation_Part12 <- cbind(Generations12, Voter_Generations12) %>%
  rename(Average_Participation = GE12) %>%
  mutate(Percent_Participation = Average_Participation * 100)
ggplot(Generation_Part12) +
 aes(x = Generation, weight = Percent_Participation) +
 geom_bar(fill = "#d8576b") +
 labs(y = "Average Participation in Election", title = "NYC Independent Voter Participation (2012)", subtitle = "by Generation", caption = "Source: NYC Campaign Finance Bureau") +
 coord_flip() +
 theme_classic() +
 ylim(0L, 100L)

ggsave("Generation2012.png", width = 6, height = 4)

The general trend for 2012 held true - Boomers participated in elections at a higher rate than other groups. However, it looks like the millenial group jumped up a bit in participation and beat the GenX generation group by a few percentage points.

Voter_Analysis_2016 <- Voter_Analysis_select %>%
  select("random_ID", "age", "PoliticalParty.16", "BoroCode.16", "NTA.16", "GE16", "eligible", "participated", "participation_score_2018") %>%
  mutate(Generation = case_when(age >= 54 ~ 'Boomers',
    age >= 38 ~ 'GenX',
    age >= 22 ~ 'Millennials',
    age >= 18 ~ 'GenZ'))

  Boomer_Av16 <- Voter_Analysis_2016 %>%
    filter(Generation == "Boomers") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)
  
   GenX_Av16 <- Voter_Analysis_2016 %>%
    filter(Generation == "GenX") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)
   
   Mill_Av16 <- Voter_Analysis_2016 %>%
    filter(Generation == "Millennials") %>%
   summarize_at(c("GE16"), mean, na.rm = TRUE)
   
  GenZ_Av16 <- Voter_Analysis_2016 %>%
    filter(Generation == "GenZ") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)
    
Generations16 <- Voter_Analysis_2016 %>%
  select(Generation) %>%
  distinct() %>%
  na.omit()
  
Voter_Generations16 <- rbind(Boomer_Av16, GenX_Av16, Mill_Av16, GenZ_Av16)

Generation_Part16 <- cbind(Generations16, Voter_Generations16) %>%
  rename(Average_Participation = GE16) %>%
  mutate(Percent_Participation = Average_Participation * 100)
ggplot(Generation_Part16) +
 aes(x = Generation, weight = Percent_Participation) +
 geom_bar(fill = "#d8576b") +
 labs(y = "Percent Participation in Election", title = "NYC Independent Voter Participation (2016)", subtitle = "by Generation", caption = "Source: NYC Campaign Finance Bureau") +
 coord_flip() +
 theme_classic() +
 ylim(0L, 100L)

ggsave("Generation2016.png", width = 6, height = 4)

In 2016, we saw the participation of a new generation of voters for the very first time: Generation Z. This voting group had a good start in participation, with a higher rate than Generation X. Millennials held their ground during this election as well.

Voter_Boro_2008 <- Voter_Analysis_select %>%
  select("random_ID", "age", "PoliticalParty.08", "BoroCode.08", "NTA.08", "GE08", "eligible", "participated", "participation_score_2018") %>%
  select("random_ID", "BoroCode.08", "participation_score_2018", "GE08") %>%
  mutate(Borough = case_when(BoroCode.08 == 1 ~ 'Bronx',
    BoroCode.08 == 2 ~ 'Brooklyn',
    BoroCode.08 == 3 ~ 'Manhattan',
    BoroCode.08 == 4 ~ 'Queens',
    BoroCode.08 == 5 ~ 'Staten Island'))

  Bronx_AVP08 <- Voter_Boro_2008 %>%
   filter(Borough == "Bronx") %>%
   summarize_at(c("GE08"), mean, na.rm = TRUE) 
  
  Brook_AVP08 <- Voter_Boro_2008 %>%
   filter(Borough == "Brooklyn") %>%
    summarize_at(c("GE08"), mean, na.rm = TRUE)
  
  Manhat_AVP08 <- Voter_Boro_2008 %>%
   filter(Borough == "Manhattan") %>%
    summarize_at(c("GE08"), mean, na.rm = TRUE) 
  
  Queens_AVP08 <- Voter_Boro_2008 %>%
    filter(Borough == "Queens") %>%
    summarize_at(c("GE08"), mean, na.rm = TRUE) 
  
  Staten_AVP08 <- Voter_Boro_2008 %>%
    filter(Borough == "Staten Island") %>%
    summarize_at(c("GE08"), mean, na.rm = TRUE)
    
Boroughs08 <- Voter_Boro_2008 %>%
  select(Borough) %>%
  distinct() %>%
  na.omit()
  
Voter_Boroughs08 <- rbind(Bronx_AVP08, Brook_AVP08, Manhat_AVP08, Queens_AVP08, Staten_AVP08)

Boroughs_Part08 <- cbind(Boroughs08, Voter_Boroughs08) %>%
  rename(Average_Participation = GE08) %>%
  mutate(Percent_Participation = Average_Participation * 100)
ggplot(Boroughs_Part08) +
 aes(x = Borough, weight = Percent_Participation) +
 geom_bar(fill = "#26828e") +
 labs(y = "Percent Participation in Election", title = "NYC Independent Voter Participation (2008)", subtitle = "by Borough", caption = "Source: NYC Campaign Finance Bureau") +
 coord_flip() +
 theme_classic() +
 ylim(0L, 100L)

ggsave("Borough2008.png", width = 6, height = 4)

Generally, the 2008 election reflected the expectations set by the Voter Analysis Report. Voters in Manhattan had the highest participation score, and Queens and the Bronx has the lowest.

Voter_Analysis_2012 <- Voter_Analysis_select %>%
  select("random_ID", "age", "PoliticalParty.12", "BoroCode.12", "NTA.12", "GE12", "eligible", "participated", "participation_score_2018") %>%
  mutate(Borough = case_when(BoroCode.12 == 1 ~ 'Bronx',
    BoroCode.12 == 2 ~ 'Brooklyn',
    BoroCode.12 == 3 ~ 'Manhattan',
    BoroCode.12 == 4 ~ 'Queens',
    BoroCode.12 == 5 ~ 'Staten Island'))

  
  Bronx_AVP12 <- Voter_Analysis_2012 %>%
   filter(Borough == "Bronx") %>%
   summarize_at(c("GE12"), mean, na.rm = TRUE)

  Brook_AVP12 <- Voter_Analysis_2012 %>%
   filter(Borough == "Brooklyn") %>%
    summarize_at(c("GE12"), mean, na.rm = TRUE)
  
  Manhat_AVP12 <- Voter_Analysis_2012 %>%
   filter(Borough == "Manhattan") %>%
    summarize_at(c("GE12"), mean, na.rm = TRUE)
  
  Queens_AVP12 <- Voter_Analysis_2012 %>%
    filter(Borough == "Queens") %>%
    summarize_at(c("GE12"), mean, na.rm = TRUE)
  
  Staten_AVP12 <- Voter_Analysis_2012 %>%
    filter(Borough == "Staten Island") %>%
    summarize_at(c("GE12"), mean, na.rm = TRUE) 

Boroughs12 <- Voter_Analysis_2012 %>%
  select(Borough) %>%
  distinct() %>%
  na.omit()
  
Voter_Boroughs12 <- rbind(Bronx_AVP12, Brook_AVP12, Manhat_AVP12, Queens_AVP12, Staten_AVP12)

Boroughs_Part12 <- cbind(Boroughs12, Voter_Boroughs12) %>%
  rename(Average_Participation = GE12) %>%
  mutate(Percent_Participation = Average_Participation * 100)
ggplot(Boroughs_Part12) +
 aes(x = Borough, weight = Percent_Participation) +
 geom_bar(fill = "#26828e") +
 labs(y = "Average Participation in Election", title = "NYC Independent Voter Participation (2012)", subtitle = "by Borough", caption = "Source: NYC Campaign Finance Bureau") +
 coord_flip() +
 theme_classic() +
 ylim(0L, 100L)

ggsave("Borough2012.png", width = 6, height = 4)

The data shown by the graph above is surprising. In 2012, the Bronx went from the lowest participation score to the highest, and Manhattan’s score fell considerably.

Voter_Analysis_2016 <- Voter_Analysis_select %>%
  select("random_ID", "age", "PoliticalParty.16", "BoroCode.16", "NTA.16", "GE16", "eligible", "participated", "participation_score_2018") %>%
  mutate(Borough = case_when(BoroCode.16 == 1 ~ 'Bronx',
    BoroCode.16 == 2 ~ 'Brooklyn',
    BoroCode.16 == 3 ~ 'Manhattan',
    BoroCode.16 == 4 ~ 'Queens',
    BoroCode.16 == 5 ~ 'Staten Island'))
  
  Bronx_AVP16 <- Voter_Analysis_2016 %>%
   filter(Borough == "Bronx") %>%
   summarize_at(c("GE16"), mean, na.rm = TRUE)

  Brook_AVP16 <- Voter_Analysis_2016 %>%
   filter(Borough == "Brooklyn") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)
  
  Manhat_AVP16 <- Voter_Analysis_2016 %>%
   filter(Borough == "Manhattan") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)
  
  Queens_AVP16 <- Voter_Analysis_2016 %>%
    filter(Borough == "Queens") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)
  
  Staten_AVP16 <- Voter_Analysis_2016 %>%
    filter(Borough == "Staten Island") %>%
    summarize_at(c("GE16"), mean, na.rm = TRUE)

Boroughs16 <- Voter_Analysis_2016 %>%
  select(Borough) %>%
  distinct() %>%
  na.omit()
  
Voter_Boroughs16 <- rbind(Bronx_AVP16, Brook_AVP16, Manhat_AVP16, Queens_AVP16, Staten_AVP16)

Boroughs_Part16 <- cbind(Boroughs16, Voter_Boroughs16) %>%
  rename(Average_Participation = GE16) %>%
  mutate(Percent_Participation = Average_Participation * 100)
ggplot(Boroughs_Part16) +
 aes(x = Borough, weight = Percent_Participation) +
 geom_bar(fill = "#26828e") +
 labs(y = "Average Participation in Election", title = "NYC Independent Voter Participation (2016)", subtitle = "by Borough", caption = "Source: NYC Campaign Finance Bureau") +
 coord_flip() +
 theme_classic() +
 ylim(0L, 100L)

ggsave("Borough2016.png", width = 6, height = 4)

Finally, the 2016 election reflected similar results as the 2012 election. By looking at data for Generation and Borough groups over time, I’ve noted a few things. The first is that general data taken and represented over 10 years glosses over certain trends that reflect different behavior in voting participation. Further, in both the Borough and Generation groups, when events in a specific election or messaging leading up to an election encourage a certain group to participate at a higher frequency than is typical for that group, it is likely that the same group with vote with increased frequency in the following election as well. This happened for the Millennial groups in 2012 and 2016 and for the Bronx in 2012 and 2016. Finally, it is clear that although the general trends for independent voters follow those of the larger swath of the NYC voting population, there are key differences in their behavior when charted over time. Independents (although their participation scores are typically lower than partisan voters) seem quite responsive to specific events of each election. They may have voting behavior that counters expectations (such as turnout from younger voters or voters in low-score geographies).