The bton_elec data set has the results of the 2024 election for Burlington mayor and the 8 city councilors by ward (wards 1 - 8). Residents all get to vote for the mayor, but only gets to vote for their respective city councilor.
Joan Shannon (D) ended up losing the election despite five of the city councilors being in the same party, which isn’t rare, but typically the overall winner matches the overall results of the smaller elections. When that doesn’t happen, we’re often interested in finding out why.
One possible explanation is Jerrymandering: setting up districts where the vast majority of voters vote for a single party to keep their votes from other districts.
Another explanation is that one candidate in the larger race (mayor in this example) is less popular than their equivalent in the smaller races (councilors) among the districts or wards.
Let’s compare Joan Shannon’s vote percentage to the councilors’ vote percentages by ward.
We can create a dumbbell plot using two geom_points()
and one geom_segment()
, as seen below:
An important part of the graph above is missing. What is it?
If we want ggplot()
to automatically make a legend, we
need to map a column to the color
or fill
aesthetic.
So we need to place the non-ward columns into a single column (called
vote_prop) and the column names into a second column (called
role_party). That’s where the pivot_longer()
function comes to the rescue!
pivot_longer()
has 3 arguments:
cols =
the columns we want to stack (pivot) on top of
one anothernames_to =
the name of the column that stores the old
column namesvalues_to =
the name of the column that stores the
values of the old columnsLet’s put the data into the long format and save it as
bton_long. pivot_longer()
is a “big”
function that we can pipe our data sets into!
## # A tibble: 32 × 3
## ward role_party vote_prop
## <int> <chr> <dbl>
## 1 1 mayor_dem 0.347
## 2 1 mayor_prog 0.62
## 3 1 council_dem 0.511
## 4 1 council_prog 0.478
## 5 2 mayor_dem 0.245
## 6 2 mayor_prog 0.724
## 7 2 council_dem 0
## 8 2 council_prog 1
## 9 3 mayor_dem 0.326
## 10 3 mayor_prog 0.639
## # ℹ 22 more rows
Unfortunately, the role (mayor/councilor) is in the same column
currently as the party. We need some way of separating the
role_party column into two columns: role and
party. Fortunately, the separate()
function seen
below does just that!
## # A tibble: 32 × 4
## ward role party vote_prop
## <int> <chr> <chr> <dbl>
## 1 1 mayor dem 0.347
## 2 1 mayor prog 0.62
## 3 1 council dem 0.511
## 4 1 council prog 0.478
## 5 2 mayor dem 0.245
## 6 2 mayor prog 0.724
## 7 2 council dem 0
## 8 2 council prog 1
## 9 3 mayor dem 0.326
## 10 3 mayor prog 0.639
## # ℹ 22 more rows
Now let’s try again to make the dumbbell plot using the long format
data, using one geom_point()
and one
geom_line()
functions
The plot above shows that Joan (Dem mayor) was less popular than the corresponding councilor in 6 of the 8 wards, while Emma was more popular than the corresponding councilor in 6 of the 8 wards (with ward 3 being nearly identical).
Joan’s loss seems to be that she was less liked/trusted/popular than the individuals running in the wards, despite more Democratic councilors being elected than progressives.