The dataset I chose originates from the article titled Trump’s Endorsements: A Deeper Dive, which examines the influence of Trump’s endorsements during the 2022 election cycle. While Trump frequently claims that his endorsements virtually guarantee victory, this analysis demonstrates that several factors, such as unopposed candidates and incumbents, artificially boost his endorsement success rate. The dataset shows that 32% of Trump-endorsed candidates ran unopposed and 44% were incumbents, inflating the reported success rate to 95%. A more realistic measure, accounting for these factors, lowers his success rate to 82%. In this analysis, I will focus specifically on Republican candidates in New York State during the 2022 election and explore the outcome of Trump’s endorsements in that region.
To begin, the dataset was imported from a publicly available GitHub repository.
# Import necessary libraries
library(readr)
# Load the dataset directly from GitHub
rep_candidates <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/rep_candidates.csv"))
# Display the first few rows to inspect the data
head(rep_candidates)
# Check the dimensions of the dataset
dim(rep_candidates)
## [1] 1599 27
#rep_candidates is a 1599 x 27 dataframe
The dataset consists of 1599 rows and 27 columns. Since I am primarily interested in candidates from New York, I will filter the data to include only rows where the state is New York.
# Load necessary library for data manipulation
# Filter the dataset to include only Republican candidates from New York
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
ny_rep_candidates = rep_candidates %>% filter(State == 'New York')
dim(ny_rep_candidates)
## [1] 43 27
Now, our updated data frame ny_rep_candidates contains
43 rows and 27 columns, representing the 43 Republican candidates who
ran in New York in 2020. Since not all of the columns are relevant to
our analysis, I will now remove the unnecessary ones to focus on the key
variables.
#There are 43 Republican Candidates running in new york
#Getting rid of columns that we dont need
ny_rep_candidates$Race.2=NULL
ny_rep_candidates$Race.3=NULL
ny_rep_candidates$Primary.Date=NULL
ny_rep_candidates$Club.for.Growth=NULL
ny_rep_candidates$Party.Committee=NULL
ny_rep_candidates$Renew.America=NULL
ny_rep_candidates$VIEW.PAC=NULL
ny_rep_candidates$Maggie.s.List=NULL
ny_rep_candidates$Winning.for.Women=NULL
ny_rep_candidates$E.PAC=NULL
ny_rep_candidates$Candidate=NULL
ny_rep_candidates$Trump.Date=NULL
ny_rep_candidates
dim(ny_rep_candidates)
## [1] 43 15
#Now we have 43 rows and 17 columns
Now, the data frame has been reduced to 17 columns. I will rename these columns to make them shorter and more intuitive for easier reference and analysis.
#Now lets rename all of the column into more useful names
head(ny_rep_candidates)
colnames(ny_rep_candidates) <- c("gender", "race", "inc", "inc chal", "st", "pos","dist", "n_prim","rat_prim","out_prim",
"n_run","rat_run","out_run","insurrectionist","trump_endorse")
head(ny_rep_candidates)
Now, I will modify the insurrectionist column to
represent a Boolean value (True or False). If a candidate denied the
results of the 2020 Presidential election, the value will be set to
True. To achieve this, I first reviewed all the unique values in this
column, and then used an if statement to replace each value
accordingly.
unique(ny_rep_candidates$insurrectionist)
## [1] "Fully accepted" "Fully denied"
## [3] "No comment" "Accepted with reservations"
## [5] "Raised questions" "Avoided answering"
if (any(ny_rep_candidates$insurrectionist == "Fully accepted")) {
ny_rep_candidates$insurrectionist=FALSE
} else {
ny_rep_candidates$insurrectionist = TRUE
}
ny_rep_candidates
Now, I will filter the data to include only candidates who won their races. Additionally, I will remove a few more columns that are no longer needed for our analysis to streamline the dataset.
#Filtering out only winners.
ny_rep_candidates = ny_rep_candidates %>% filter(out_prim == 'Won')
#removing other columns
ny_rep_candidates$st=NULL
ny_rep_candidates$out_prim=NULL
ny_rep_candidates$n_run=NULL
ny_rep_candidates$rat_run=NULL
ny_rep_candidates$out_run=NULL
ny_rep_candidates
I will now update the relevant column to indicate whether Trump endorsed each candidate. If a candidate was endorsed by Trump, the value will be set to True; otherwise, it will be set to False. This will help in analyzing the impact of Trump’s endorsements.
if (any(ny_rep_candidates$trump_endorse == "Yes")) {
ny_rep_candidates$trump_endorse=TRUE
} else {
ny_rep_candidates$trump_endorse = FALSE
}
Now, I will perform some exploratory data analysis, starting with a pie chart to visualize the racial distribution of Republican candidates in New York who won their races. This will give us a better understanding of the diversity within the winning candidates.
ny_rep_candidates
#race distribution of NY Republican candidates that won in 2020
pie(table(ny_rep_candidates$race))
Barplot showing candidate incumbent count
#Was Candidate incumbent?
barplot(table(ny_rep_candidates$inc))
To measure the mean ratio of votes, I will first need to remove the
% symbol from the values in the vote ratio column, as these
values are currently stored as characters rather than numeric data. Once
the % sign is removed, I can convert the values to a
numeric format to perform calculations like the mean.
without_percent=c()
for(i in ny_rep_candidates$rat_prim){
i=substr(i, start=1, stop=nchar(i)-1)
without_percent=append(without_percent,i)
}
without_percent
## [1] "100" "43" "47" "53" "100" "100" "100" "100" "100" "100" "100" "78"
## [13] "100" "67" "100" "100" "75" "100" "100" "100" "100" "57" "51" "54"
## [25] "100" "100"
Now that the % sign has been successfully removed from
the vote ratio column, we can convert the values to numeric format.
After this conversion, I’ll adjust the values to represent actual ratios
by dividing them by 100, making them ready for further analysis.
ny_rep_candidates$rat_prim=as.numeric(without_percent)
#Now lets make this an actual ratio
ny_rep_candidates$rat_prim=(ny_rep_candidates$rat_prim)/100
ny_rep_candidates
#finally, lets measure the mean ratio of votes one during the primary
mean(ny_rep_candidates$rat_prim)
## [1] 0.8557692
85%
NY Republican candidates won 85% of the primary vote in 2022.
This is a really robust data set. The 2022 election cycle was an interesting one. Right wing media and polls boasted about a red wave that was supposed to take over the house and senate that season. In reality, the results were complete opposite of what was expected.
There was an unexpected red wave in New York State, where a lot of republicans won seats in districts that weren’t expected to be republican. This is likely an effect of the change in electoral map from the previous season to the current. The current electoral map is gerrymandered to favor republicans. It would be interesting in the future to compare these results with the results of the previous election.
It would also be interesting to compare NY results vs results from other states to see if there were any differences.