Data Visualization Final: 2014 North Carolina Early Voting Experiment

Group Information

– North Carolina Early Voting Experiment 2014

– Emil Ghitman-Gilkes and Izzy Fischer

Data and Subgroup Description

We obtained our data from an experiment performed by Professor Christopher Mann in 2014 on voter mobilization tactics for early voting turnout in North Carolina. Previous research related to this experiment include a study performed in Michigan which sent a social pressure letter that provided voters a copy of their own public vote histories, along with their neighbors’, and a threat to deliver an updated set after the election (Issenberg, 2016). Turnout levels of voters who received this letter and those who did not receive the letter were compared. Other research are past experiments executed by Professor Mann that tested different types of treatments for voter mobilization whether it be vote by mail, mail ballot mobilization, or mobilizing specific demographics (Mann, 2013 & 2014). The dataset for Professor Mann’s experiment contained data on total turnout as well; we were only concerned with making visualizations that showed trends for early voting turnout. As far as experiment logistics goes, Professor Mann and NCLCV (North Carolina League of Conservation Voters) sent a treatment letter intending to encourage North Carolina voters to turn out to an early voting polling place sometime in the two weeks before election day. This letter was sent to 247,623 households in 27 counties across the state. Turnout levels of the experimental group were compared to those of the control group, comprising 54,338 households that did not receive the treatment letter in order to analyze if the treatment letter effectively mobilized voters before November 8th (Election Day). Since we were interested in evaluating whether there were disproportionate effects of the treatment letter across demographics, we pulled data containing the treatment effects across age group (18-30, 31-60, 61-99) and race (African-American/Black and White).

We were dealing with a dataset from stata with an unnecessary amount of variables, when we only wanted to look at a few. Thus, we had to do a great deal of data cleaning inside Stata of pulling certain variables from stata and putting them into excel, and then also changing column names and data frame layout in R once we learned how to put the stata dataset inside of R studio. Aside from Professor Mann’s experiment on voter mobilization, we were also interested in examining the effects of polling place accessibility on early voting turnout. Therefore, we compiled data containing the number of polling places per county, polling place location, polling place hours (total hours, number of weekend hours, number of evening hours, and number of early morning hours), and the number of registered voters in each county. By making visualizations, we were investigating many of the questions that Professor Mann wanted to ask (included in our project proposal) as well as items that we were interested in such as exploring the relationship between early voting turnout levels and the number of hours polling places were open in each county. Furthermore, we explored whether or not there were differential effects of the number of weekend hours on early voting turnout among African-American/Blacks and Whites. Lastly, we explored the spatial relationship between polling place location and household location and whether or not the location of polling places relative to where voters lived had an effect on early voting turnout.

Graph 1

Will the treatment letter increase turnout during the early voting period?

This graph shows the treatment effect for the 27 North Carolina counties included in Professor Mann’s experiment. The graph concludes that the majority of counties had treatment effects of at least 1 percent, however there are extremely high levels of uncertainty as factors unrelated to the experiment could have affected turnout (i.e. weather, being mobilized by other organizations, county size variation etc.). Further contributing to uncertainty is that certain counties are extremely small so sample sizes were not very large. Two counties show a negative treatment effect, which is potentially due to the fact that there is a large population of old voters, who won’t likely change voting habits due to a mobilization letter, and/or a large population of highly educated voters who have pretty consistent voting habits due to their civic engagement in political activity.

To create this graph we first mutated a column to put the counties in descending order of the treatment effect. We also mutated a column containing counties that had negative treatment effects so that we could have a different color highlight those counties. We then used ggplot to create a bar chart and flipped the axes as it is easier to discern differences in treatment effect when the bars are shown in landscape view. For Aesthetics we created a different color for the bars above 0 treatment effect in order to point out that the treatment effect did not work in every single county and show the prevalence of negative treatment effects.

Graph 2

Will the treatment letter be more effective among certain age groups?

This graph shows the average treatment effects (for the 27 counties) of three age groups. In politics, the conventional age breakdown is typically three groups: 1) elderly voters (who generally do not change their voting habits); 2) Millennials (who are usually effected more by mobilization tactics other than direct mail); and 3) Middle-age adult voters. If we didn’t average by county, and took the statewide data so that our error bars were not so large (due to having a small sample size of just 27), we most likely would have seen a similar trend that the bars show in this graph: that the treatment effect mobilized mid age voters more than it did the other two age groups. This is intuitive when thinking about the nature of these age groups: young voters do not open mail or check their mail as often as older voters and elderly voters are generally unaffected by treatment letters because they are unlikely to change their voting habits simply due to a social pressure letter.

In this graph, in ggplot2 we used the geom_bar with error bars to create this graph. However, we first had to mutate a column to create factor levels for each age group. This was one of our graphs where we didn’t really need to do much data manipulation in order to get the graph we needed besides uploading the dataset that we had. We used the R color brewer palette to create colors that made sense with the data and utilized subtitles and appropriate labels to emphasize aspects of the graphs. Additionally, we used other aesthetics like the size of the bar, the themes and the subtitles in order to make the graph look more aesthetically pleasing.

Graph 3

Will the treatment letter be more effective among certain demographic groups?

Graph 3 shows early voting turnouts for treatment and control groups for Blacks and Whites across each county. Although it appears that we can conclude that the treatment effect mobilized blacks more than whites, and that the turnout in general was higher due to the longer bars in the control groups, there are extremely high levels of uncertainty for the same reasons as in Graphs 1 and 2. Nevertheless, the difference in early voting turnout between Blacks and Whites appears to be significant. A possible way to better show this differentiation would be to use individual-level statewide data instead of using county-level data, however for the purposes of Professor Mann’s experiment we compared treatment effects among counties.

To create this chart, we first had to create individual data frames for each category of voter (Black control, Black treatment, White control, and White treatment). We then r bound these data frames so that we could compare turnouts for each category/type of voter on one graph. We set the column names for this data frame as “county,” “turnout,” and “type.” Once we created the necessary dataframe, we used ggplot2 to create a bar chart and used scalefillmanual() to set custom colors for each type of voter and create a legend showing the corresponding color for each type of voter. We removed y-axis labels from the theme as the labels would not provide meaningful information considering that there is a legend. Finally, we faceted by county and flipped the axes to put the chart in landscape format, as it is easier to discern differences in turnout between the types of voters.

Graph 4

Will the treatment letter only be effective if the poll places are in close proximity to the voters?

For this graph we were interested in seeing if the treatment effect was more prominent among households that were closer to polling places. We chose four counties, two with high treatment effects, and two with low treatment effects to look at the density of voters (red dots) relative to the location of polling places (black dots). As shown in the chart, the counties that had polling places placed in areas where the registered voters were densely populated had higher treatment effects. In Gaston and Wake there are high densities of voters in places that do not have polling places, while in Alamance and Buncombe there are polling places wherever there is a high density of voters. Keep in mind, however, as with many of our other graphs, there is a high level of uncertainty for treatment effect.

In order to create this graph, we first had to create a new data frame. We had to merge the data sets by county so that we had the polling place latitudes and longitudes and the household latitudes and longitudes all in one data frame. Additionally, we had to (with the help of Professor Lopez of course) change the data frame so that the household latitudes and longitudes matched the polling places so that they were in the same county. We did those steps by creating different types and then binding them together/joining them together. We used the custom color brewer in order to get the red v. black, and different sized bullet points to emphasize the difference between households and polling places. This was a really cool graph to make!!

Graph 5

Will better polling place accessibility affect turnout?

In this graph, we wanted to test if the number of total open hours during the two weeks of early voting had an effect on overall turnout in each county. We expected to see more hours causing higher turnout rates, however, this graph has too much uncertainty to really see a trend. Perhaps if we plotted all registered voters, or had more counties we would see a linear trend, but there is no way for us to make that assumption based on the data.

For this graph, we had to create a new dataset that had both the polling place hours and the turnout information in it by merging by county name. In ggplot2, we used the size and label functions in order to aesthetically put the points as different counties weighted by their population. In order to make the variability more accurate we weighted the line by number of registered voters so that it would account for the different sizes in each county.

Graph 6

Will polling place accessibility disproportionally affect certain demographic groups?

Graph 6 shows the relationship between the number of weekend hours polling places were open and early voting turnout levels for Blacks and Whites. Considering uncertainty, there appears to be no effect of higher numbers of weekend hours on early voting turnout among either Blacks or Whites. However, like Graph 3, this chart shows that Blacks turned out at higher levels than did Whites during the early voting period. The sizes of the Cartesian coordinates for each county (one set for Black turnout and one set for White turnout) represent the number of registered voters in the county. The reason why we chose to compare weekend hours and total hours is because we wanted to see if perhaps turnout would increase over the weekends because voters have more time on their hands during the weekends than the work/school week.

To create this graph, we used geomjitter as there was overlap for certain counties and we wanted to see each observation. We chose to use a linear trend line as we were interested in finding a linear trend in the relationship between weekend accessibility and early voting turnout (which was not found). We also weighted the trend line to account for the number of registered voters in the counties. To get the turnout data and trend lines for both Blacks and Whites we put the commands for aesthetics in the geomjitter and geomsmooth functions rather than in ggplot(). Finally, we created a scale so that the size of the Cartesian coordinates represented the number of registered voters in each county.

Conclusion

After studying field experiments in Political Science for a semester, while simultaneously data wrangling, and creating visualizations for the treatments that we got to know so well, although we were not able to take away any definite trends due to the noisy data, we were able to see how complex drawing trends from Social Science experiments really is. Overall, we are able to say that the treatment letter did indeed work. However the more focused trends that we were trying to see by comparing age groups, or demographic groups such as that the treatment effect worked better for mid age voters, or that the treatment effect mobilized African American voters more that whites may not be true due to the noise in the data. The variability made looking at these visualizations with an unbiased perspective very difficult, however; we learned a lesson from it.

This project is not over for us. Since we agreed to help Professor Mann create these visualizations to include in his published journal articles he is planning to write, we will be working on this data throughout winter break and including more visualizations that explore even more hypotheses. Both Professor Mann and us are interested in looking at a graph similar to graph five that is comparing the treatment effect, rather than treatment overall turnout to total number of weekend hours. We are also wanting to look at state wide data/individual voter data for the age graph that looks at more subjects in order to have less error, and look more into the data from graph four by including different factors like geographical limitations, population size, demographics in the populations and how that effects mobilization based on distance. Overall, this project was extremely beneficial for an understanding of how data visualizations can help or hinder findings in an experiment, and how looking at data from different perspectives can change the way visualizations are made and analyzed.