The Nineteenth Amendment was ratified in August of 1920. Prior to this, women were not allowed to vote (with some variations according to state laws). However, their disenfranchisement did not limit their involvement in the political process. In fact, many women ran for, and won, various appointments in public office. Everything from the County Superintendent for Education to even campaigning for the Presidency, women were active in politics across the country. Exploring this often overlooked history will grant greater insight into the political process and gender in the late nineteenth and early twentieth centuries.
I am pulling my data from the Her Hat Was In The Ring! database which is a website and database built in collaboration with scholars and the HistoryIT company [1]. They aggregated information regarding 3,327 women who ran a total of 4,572 campaigns for office prior to the passage of the Nineteenth Amendment. They pulled their biographical and elections data from newspapers, government reports, biographies, county histories, and other sites. Each woman in the database has detailed campaign information, a short biography, an image (if available), and citation to the primary source(s) used. Furthermore, the user can search the database for specific candidates or by office, state, and party.
For the sake of this project, I decided to transcribe the data for all women who ran for their State Senate or State House of Representatives/Assembly. I chose these two offices as it would provide me with a sizable dataset without becoming unwieldily in the transcription process. Additionally, it would provide a looking glass to state level government involvement of women in politics. Which states were more receptive to female candidates? What years were highest for female candidates? What parties backed female candidates? All of these questions I hope to provide at least a partial answer to through my analysis.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(broom)
library(tidyr)
library(readr)
library(stringr)
library(USAboundaries)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(sp)
library(maptools)
## Checking rgeos availability: TRUE
library(rgdal)
## rgdal: version: 1.1-8, (SVN revision 616)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.9.2, released 2012/10/08
## Path to GDAL shared files: /usr/share/gdal
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)
## Linking to sp version: 1.2-2
library(ggmap)
library(RColorBrewer)
library(scales)
##
## Attaching package: 'scales'
## The following objects are masked from 'package:readr':
##
## col_factor, col_numeric
I begin by reading in my csv file for my female politicians. I also created a data_id list that will be used to connect my politicians dataframe with my states dataframe later. Essentially it has the state names (the key for my politicians data frame) and an id variable (the key for my states dataframe). I also chose to exclude both the Alaskan and Hawaiian Territories from my map. My reasoning was twofold: 1. Of the 312 rows of data I have, no one ran for office in those states. 2. By only focusing on the continental United STates, the maps are “easier on the eyes” as the US is oreinted the way many people visualize it to be.
#Read in my csv files, read in my shapefile (for the lower 48), and create my statesd_df
politicians <- read_csv("her_hat_state_reps.csv")
data_id_list<- read_csv("data_id_list.csv")
continental_us_1920 <- c("Maine", "New Hampshire", "Vermont", "Massachusetts", "Rhode Island", "Connecticut", "New York", "New Jersey", "Pennsylvania", "Maryland", "Delaware", "Virginia", "West Virginia", "North Carolina", "South Carolina", "Georgia", "Florida", "Ohio", "Kentucky", "Tennessee", "Indiana", "Michigan", "Mississippi", "Alabama", "Iowa", "Wisconsin", "Louisiana", "Arkansas", "Missouri", "Illinois", "Minnesota", "North Dakota", "South Dakota", "Nebraska", "Kansas", "Texas", "Montana", "Colorado", "Wyoming", "New Mexico","Arizona", "Utah", "Idaho", "Washington", "Oregon", "California","Nevada", "Oklahoma")
states_1920<-us_states("1920-12-31", states = continental_us_1920)
states_df <- tidy(states_1920, region = "id")
My politicians dataset contains generally qualitative data. This is to say that it does not provide variables that are numeric and easily translatable into maps or graphs.
politicians
## Source: local data frame [312 x 12]
##
## last_name first_name_initial election_year elected city county
## (chr) (chr) (int) (lgl) (chr) (chr)
## 1 Algeo Sara MacCormack 1920 NA Providence NA
## 2 Anthony Mabel E. 1914 FALSE NA NA
## 3 Brewer Mary G. 1918 FALSE NA NA
## 4 Brewer Frances 1918 NA Staten Island NA
## 5 Brown Margaret Tobin 1901 FALSE NA NA
## 6 Brown Margaret Tobin 1914 FALSE NA NA
## 7 Cannon Martha Hughes 1896 TRUE NA NA
## 8 Clark Lucy A. 1896 FALSE NA NA
## 9 Clarke Kathryn 1915 TRUE NA NA
## 10 Cryderman Dora W. 1914 FALSE NA NA
## .. ... ... ... ... ... ...
## Variables not shown: district (chr), state (chr), title (chr), position
## (chr), political_party_1 (chr), political_party_2 (chr)
So, before I left_join my politicians dataset with my states dataframe, I need to create some variables that quantify the results for the women I transcribed.
# Creates a unique name_id for each observation
politicians$name_id <- with(politicians, paste0(last_name, first_name_initial, election_year))
# Total number of campaigns per state
politicians <-
politicians %>%
group_by(state) %>%
mutate(total_campaigns_per_state = n())
# Total number of successful campaigns per state
politicians <- politicians %>%
group_by(state) %>%
mutate(total_winning_campaigns_per_state = sum(elected, na.rm = TRUE))
# Total campaigns per year
politicians <- politicians %>%
group_by(election_year) %>%
mutate(total_campaigns_per_year = n())
# Perecent winning campaigns per state
politicians <- politicians %>%
group_by(state) %>%
mutate(percent_winning_per_state = round( (total_winning_campaigns_per_state/total_campaigns_per_state)*100, 2))
It is also important to remember that the data is incomplete. Below I show that while i have 312 campaigns in my dataset, I am missing 67 election decisions. Joanna Drucker would want us to remind our readers that the data we have is incomplete because the records are often incomplete. While I agree, I am not referring to my data as capta…
#Checking on NA values for the elected variable
count_na <- function(x) sum(is.na(x))
count_na(politicians$elected)
## [1] 67
Here I join my politicians dataset with my data id table. Then I join the states dataframe to my politicians+data_id table thus connecting my politicians data with the states dataframe.
politicians_b <- politicians %>%
left_join(data_id_list, by = c("state" = "state_name"))
states_df_data<- states_df %>%
left_join(politicians_b, by = c("id" = "state_id"))
Now we get to the fun part: mapping. I begin by mapping total number of campaigns by state. This allows me to see, generally, what states had campaigns and where the density of these campaigns took place. Then I created a map that showed the total number of winning campaigns, followed by a percentage map to illustrate the percentage of campaigns that ended with a win.
# Map of the total campaigns by state
ggplot() +
geom_map(data = states_df_data, map = states_df_data,
aes(x = long, y = lat, map_id = id, group = group,
fill = total_campaigns_per_state),
color = "gray", size = 0.25) +
coord_map(projection = "albers", lat0 = 29.5, lat1 = 45.5) +
theme_nothing(legend = TRUE) +
scale_fill_distiller(name="Percent", palette = "OrRd", breaks = pretty_breaks(n = 5))+
labs(title="Percent Campaigns per State")
# Map of the winning campaigns by state
ggplot() +
geom_map(data = states_df_data, map = states_df_data,
aes(x = long, y = lat, map_id = id, group = group,
fill = total_winning_campaigns_per_state),
color = "gray", size = 0.25) +
coord_map(projection = "albers", lat0 = 29.5, lat1 = 45.5) +
theme_nothing(legend = TRUE) +
scale_fill_distiller(name="Percent", palette = "OrRd", breaks = pretty_breaks(n = 5))+
labs(title="Total Winning Campaigns per State")
# Map of percentage winning campaigns per state
ggplot() +
geom_map(data = states_df_data, map = states_df_data,
aes(x = long, y = lat, map_id = id, group = group,
fill = percent_winning_per_state),
color = "gray", size = 0.25) +
coord_map(projection = "albers", lat0 = 29.5, lat1 = 45.5) +
theme_nothing(legend = TRUE) +
scale_fill_distiller(name="Percent", palette = "OrRd", breaks = pretty_breaks(n = 5))+
labs(title="Percent Winning Campaigns per State")
Colorado is clearly identifiable as the state with the most campaigns by female candidates prior to 1920. Furthermore, it has the most winning campaigns (which makes sense) but it doesn’t have the highest percentage winning.There are a few states with only a handful of campaigns but all of them were successful campaigns.
politicians %>%
select (state, percent_winning_per_state, total_campaigns_per_state) %>%
unique() %>%
arrange(desc(percent_winning_per_state))
## Source: local data frame [19 x 3]
## Groups: state [19]
##
## state percent_winning_per_state total_campaigns_per_state
## (chr) (dbl) (int)
## 1 Arizona 90.91 11
## 2 California 25.00 24
## 3 Colorado 26.09 92
## 4 Connecticut 100.00 2
## 5 Idaho 100.00 5
## 6 Kansas 100.00 4
## 7 Michigan 100.00 1
## 8 Montana 100.00 5
## 9 Nebraska 0.00 1
## 10 Nevada 50.00 2
## 11 New York 10.17 59
## 12 North Carolina 100.00 1
## 13 Oklahoma 100.00 2
## 14 Oregon 14.29 35
## 15 Rhode Island 0.00 1
## 16 Utah 45.16 31
## 17 Vermont 100.00 1
## 18 Washington 17.24 29
## 19 Wyoming 83.33 6
So it is interesting that some states had a 100% success rate for their handful of campaigns. It begs teh question as to who these women were and why weren’t more women running if their counter-parts were winning public office.
Finally, lets look at the party break down for these political campaigns. It would appear that the party with the most female campaigns is the Democratic, followed by Republican. Howvever, the Socialists and Prohibition also contributed a sizeable amount of female candidates.
# Total campaigns per party
party_counts <- subset(politicians, select = c("name_id", "political_party_1", "political_party_2")) %>%
gather(affiliation, party_, -name_id) %>%
count(party_) %>%
filter(party_ != "NA")
# 3D Exploded Pie Chart
library(plotrix)
##
## Attaching package: 'plotrix'
## The following object is masked from 'package:scales':
##
## rescale
slices <- party_counts$n
pie_chart <- pie3D(slices, explode=0.1, main="Pie Chart of Party Affiliation ")
lbls <- party_counts$party_
pie3D.labels(pie_chart, labels = lbls, labelcex = 1)
Which party was the most successful? What was the most dominant party per state? Which party won the most?
politicians %>% subset(select = c("name_id", "elected", "political_party_1", "political_party_2")) %>%
gather(affiliation, party_, -name_id, -elected) %>%
filter(elected == "TRUE" & party_ !="NA") %>%
count(party_) %>%
arrange()
## Source: local data frame [6 x 2]
##
## party_ n
## (chr) (int)
## 1 Citizen 2
## 2 Democratic 45
## 3 Populist 6
## 4 Progressive 1
## 5 Republican 38
## 6 Silver Republican 1
state_party <- politicians %>% subset(select = c("elected","state", "political_party_1", "political_party_2")) %>%
gather(affiliation, party_, -state, -elected) %>%
filter(party_ !="NA")
state_party_b <- state_party %>%
left_join(data_id_list, by = c("state" = "state_name"))
state_party_df<- states_df %>%
left_join(state_party_b, by = c("id" = "state_id"))
ggplot() +
geom_map(data = state_party_df, map = state_party_df,
aes(x = long, y = lat, map_id = id, group = group,
fill = party_),
color = "gray", size = 0.25) +
coord_map(projection = "albers", lat0 = 29.5, lat1 = 45.5) +
theme_nothing(legend = TRUE) +
labs(title="Most Popular Party per State")
winning_state_party_df<- state_party_df %>%
filter(elected == "TRUE")
ggplot() +
geom_map(data = state_party_df, map = state_party_df,
aes(x = long, y = lat, map_id = id, group = group),
color = "gray", size = 0.25) +
coord_map(projection = "albers", lat0 = 29.5, lat1 = 45.5)+
geom_map(data = winning_state_party_df, map = state_party_df,
aes(x = long, y = lat, map_id = id, group = group,
fill = party_),
color = "gray", size = 0.25) +
coord_map(projection = "albers", lat0 = 29.5, lat1 = 45.5) +
theme_nothing(legend = TRUE) +
labs(title="Dominant Party per State")
What ultimately these maps and tables are showing me is that the western states tended to produce more female candidates for state congressional office. I would surmise that this was true because the eastern states were more established in their political ways and that states such as Colorado, Oregon, and Utah were just coming into their own politically. Of the various political parties involved, the Democratic party was able to get the most women elected followed closely by the Republican party. One of these two parties dominated in each of the 19 states.