Module 4: Dumbbell Plot for Burlington Election 2024

Data Description

The bton_elec data set has the results of the 2024 election for Burlington mayor and the 8 city councilors by ward (wards 1 - 8). Residents all get to vote for the mayor, but only gets to vote for their respective city councilor.

Joan Shannon (D) ended up losing the election despite five of the city councilors being in the same party, which isn’t rare, but typically the overall winner matches the overall results of the smaller elections. When that doesn’t happen, we’re often interested in finding out why.

One possible explanation is Jerrymandering: setting up districts where the vast majority of voters vote for a single party to keep their votes from other districts.

Another explanation is that one candidate in the larger race (mayor in this example) is less popular than their equivalent in the smaller races (councilors) among the districts or wards.

Let’s compare Joan Shannon’s vote percentage to the councilors’ vote percentages by ward.

Direct Way:

We can create a dumbbell plot using two geom_points() and one geom_segment(), as seen below:

An important part of the graph above is missing. What is it?

Improved dumbbell plot: Using the long format data

If we want ggplot() to automatically make a legend, we need to map a column to the color or fill aesthetic.

So we need to place the non-ward columns into a single column (called vote_prop) and the column names into a second column (called role_party). That’s where the pivot_longer() function comes to the rescue!

pivot_longer() has 3 arguments:

cols = the columns we want to stack (pivot) on top of one another
names_to = the name of the column that stores the old column names
values_to = the name of the column that stores the values of the old columns

Let’s put the data into the long format and save it as bton_long. pivot_longer() is a “big” function that we can pipe our data sets into!

## # A tibble: 32 × 3
##     ward role_party   vote_prop
##    <int> <chr>            <dbl>
##  1     1 mayor_dem        0.347
##  2     1 mayor_prog       0.62 
##  3     1 council_dem      0.511
##  4     1 council_prog     0.478
##  5     2 mayor_dem        0.245
##  6     2 mayor_prog       0.724
##  7     2 council_dem      0    
##  8     2 council_prog     1    
##  9     3 mayor_dem        0.326
## 10     3 mayor_prog       0.639
## # ℹ 22 more rows

Unfortunately, the role (mayor/councilor) is in the same column currently as the party. We need some way of separating the role_party column into two columns: role and party. Fortunately, the separate() function seen below does just that!

## # A tibble: 32 × 4
##     ward role    party vote_prop
##    <int> <chr>   <chr>     <dbl>
##  1     1 mayor   dem       0.347
##  2     1 mayor   prog      0.62 
##  3     1 council dem       0.511
##  4     1 council prog      0.478
##  5     2 mayor   dem       0.245
##  6     2 mayor   prog      0.724
##  7     2 council dem       0    
##  8     2 council prog      1    
##  9     3 mayor   dem       0.326
## 10     3 mayor   prog      0.639
## # ℹ 22 more rows

Now let’s try again to make the dumbbell plot using the long format data, using one geom_point() and one geom_line() functions

The plot above shows that Joan (Dem mayor) was less popular than the corresponding councilor in 6 of the 8 wards, while Emma was more popular than the corresponding councilor in 6 of the 8 wards (with ward 3 being nearly identical).

Joan’s loss seems to be that she was less liked/trusted/popular than the individuals running in the wards, despite more Democratic councilors being elected than progressives.

Module 4: Dumbbell Plot for Burlington Election 2024

Jacob Martin

DS 1870

Data Description

Direct Way:

Improved dumbbell plot: Using the long format data