Overview

The article I chose was "Trump's Endorsees Have Started Losing More. But Don't ReadInto That For 2024."

This article is an analysis of the 2022 election data and gives us a glimpse into how effective Trump's endorsements are. While Trump boasts that his endorsements are a guaranteed win, the election candidate tells a different story. Trump's endorsement has an overall success rate of 95%. However, there are reasons other than Trump endorsement that had influenced this number. 32% of Trump's endorsements went to candidates that had no opposition when running. Also, 44% of all of Trump's endorsements were incumbents. These two factors boost Trump's endorsement success rate and allow him to misinform his followers. In reality, his endorsement's success rate is a lower 82% overall. The author also analyzes the statistics in different states and saw differences in success rates based on many different factors. I will work with the data provided to modify it and see if I can come up with any conclusions for NY Republican Candidates that ran in 2022.

Cleaning the Data

First I imported the data set from the linked GITHUB.

#importing the data set


library(readr)

rep_candidates=read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/rep_candidates.csv"))

head(rep_candidates)

dim(rep_candidates)

## [1] 1599   27

#rep_candidates is a 1599 x 27 dataframe

rep_candidates is a data frame that is 1599 x 27.

I am only interested in looking at NY Republican candidates.

I will narrow it down using dplyr, and filter out only rows with New York under State

#Getting a subset of the data that we are interested in (NY)
#using dplyr because its easier to filter rows with it in my opinion

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

ny_rep_candidates = rep_candidates %>% filter(State == 'New York')

dim(ny_rep_candidates)

## [1] 43 27

Now, our new data frame ny_rep_candidates has 43 rows and 27 columns.

There are 43 Republican Candidates that ran in NY in 2020.

I’m not interested in all of the columns so now I’m going to remove some of them.

#There are 43 Republican Candidates running in new york

#Getting rid of columns that we dont need
ny_rep_candidates$Race.2=NULL
ny_rep_candidates$Race.3=NULL
ny_rep_candidates$Primary.Date=NULL
ny_rep_candidates$Club.for.Growth=NULL
ny_rep_candidates$Party.Committee=NULL
ny_rep_candidates$Renew.America=NULL
ny_rep_candidates$VIEW.PAC=NULL
ny_rep_candidates$Maggie.s.List=NULL
ny_rep_candidates$Winning.for.Women=NULL
ny_rep_candidates$E.PAC=NULL
ny_rep_candidates$Candidate=NULL
ny_rep_candidates$Trump.Date=NULL
ny_rep_candidates

dim(ny_rep_candidates)

## [1] 43 15

#Now we have 43 rows and 17 columns

Now we have 17 columns.

I will rename all the columns because they are currently too long.

#Now lets rename all of the column into more useful names
head(ny_rep_candidates)

colnames(ny_rep_candidates) <- c("gender", "race", "inc", "inc chal", "st", "pos","dist", "n_prim","rat_prim","out_prim",
                                 "n_run","rat_run","out_run","insurrectionist","trump_endorse")
head(ny_rep_candidates)

Now I will make the insurrectionist column either True or False. If the candidate denied the results of the 2020 Presidential election, it will be set to True. To do this, I looked for all the unique possibilities in this column and replaced it using an if statement.

#Now  will replace the values in the col insurrectionist
#Column now is going to be boolean

unique(ny_rep_candidates$insurrectionist)

## [1] "Fully accepted"             "Fully denied"              
## [3] "No comment"                 "Accepted with reservations"
## [5] "Raised questions"           "Avoided answering"

if (any(ny_rep_candidates$insurrectionist == "Fully accepted")) {
  ny_rep_candidates$insurrectionist=FALSE
} else {
  ny_rep_candidates$insurrectionist = TRUE
}

ny_rep_candidates

Now, I will filter out so that only candidates that won are in our data frame.

I will also remove some other columns that we don’t need anymore.

#We only care about the winners. Filtering only winners.

ny_rep_candidates = ny_rep_candidates %>% filter(out_prim == 'Won')

#now some columns are useless so removing
ny_rep_candidates$st=NULL
ny_rep_candidates$out_prim=NULL
ny_rep_candidates$n_run=NULL
ny_rep_candidates$rat_run=NULL
ny_rep_candidates$out_run=NULL
ny_rep_candidates

Replacing values under a column. True if Trump endorsed the candidate and false if he didn’t.

#Now  will replace the values in the trump endorse
#To True or false

if (any(ny_rep_candidates$trump_endorse == "Yes")) {
  ny_rep_candidates$trump_endorse=TRUE
} else {
  ny_rep_candidates$trump_endorse = FALSE
}

Data Exploration and Visualization

Now I will play around with the data a little bit.

Pie chart of the racial distribution of republican candidates in NY that won:

#now lets analyze and visualize
ny_rep_candidates

#race distribution of NY Republican candidates that won in 2020

pie(table(ny_rep_candidates$race))

Barplot showing candidate incumbent count

#Was Candidate incumbent?

barplot(table(ny_rep_candidates$inc))

I want to measure the mean ratio of votes. For that, I have to first remove the % symbol from the value because it is a character and not a numeric.

without_percent=c()

for(i in ny_rep_candidates$rat_prim){
  i=substr(i, start=1, stop=nchar(i)-1)
  without_percent=append(without_percent,i)
}
without_percent

##  [1] "100" "43"  "47"  "53"  "100" "100" "100" "100" "100" "100" "100" "78" 
## [13] "100" "67"  "100" "100" "75"  "100" "100" "100" "100" "57"  "51"  "54" 
## [25] "100" "100"

Now that the % sign is removed, we can convert it back to numeric and make it into an actual ratio.

ny_rep_candidates$rat_prim=as.numeric(without_percent)

#Now lets make this an actual ratio

ny_rep_candidates$rat_prim=(ny_rep_candidates$rat_prim)/100
ny_rep_candidates

#finally, lets measure the mean ratio of votes one during the primary
mean(ny_rep_candidates$rat_prim)

## [1] 0.8557692

NY Republican candidates won 85% of the primary vote in 2022.

Findings and Recommendations

NY Republican candidates won 85% of the primary vote in 2022.

This is a really robust data set. The 2022 election cycle was an interesting one. Right wing media and polls boasted about a red wave that was supposed to take over the house and senate that season. In reality, the results were complete opposite of what was expected.

There was an unexpected red wave in New York State, where a lot of republicans won seats in districts that weren’t expected to be republican. This is likely an effect of the change in electoral map from the previous season to the current. The current electoral map is gerrymandered to favor republicans. It would be interesting in the future to compare these results with the results of the previous election.

It would also be interesting to compare NY results vs results from other states to see if there were any differences.

Data 607 Homework 1

Jean Jimenez

2023-09-01

Overview

Cleaning the Data

Data Exploration and Visualization

Findings and Recommendations