How does Campaign spending affect election outcome? I will look at a datasets of various Senate elections in the 2022 election cycle and see which candidate won. Then I will determine if it was their campaign contributions that caused their victory, or if it was some other factors. This is just with Senate General Elections. Imagine the insights I can find from Texas District Primaries

Description of Data Obtained

The dataset used for this analysis contains financial and electoral outcome data for U.S. Senate candidates in the 2022 election. The data was obtained from the Federal Election Commission (FEC) website, which tracks campaign finance reports for all federal candidates. Each observation in the dataset represents an individual Senate candidate, including incumbents, challengers, and candidates for open seats. The dataset includes financial information such as total receipts, individual and PAC contributions, and cash on hand. The data set also includes party affiliation, incumbency status, state, and general election performance for each candidate.

Description of key variables

I am interested in these variables: WIN_LOSE (Whether they lost or won the elections represented as 0=loss and 1=win), incum_stat (whether they are challenging or the incumbant represented as I or C or O for Open seat), PTY_CD (the party of the candidate represented as 1=democrat or 2=republican), TTL_RECEIPTS (the total funds raised for that candidate), PAC_CASH (funds raised from pacs), CAND_CONTRIB (how much personal money the candidate donated), Ending_Cash (the amount of cash left at the end of the election), GENERAL (the percent of the electorate that they won.) I have already recoded everything I want. I will analyze these variables by making logistical regressions, side by side boxplots, bar charts or density plots, and scatterplots.

# Load dataset (assuming it's in CSV format after extraction)
data <- read.csv("2022Senate - Sheet1.csv")
attach(data)
# Inspect first few rows
head(data)
##     CAND_ID           CAND_NAME WIN_LOSE incum_stat PTY_CD
## 1 S2AL00145        BRITT, Katie        1          O      2
## 2 S2AL00251 BOYD, Willie Eugene        0          O      1
## 3 S4AK00099     MURKOWSKI, Lisa        1          I      2
## 4 S2AK00127     TSHIBAKA, KELLY        0          C      2
## 5 S2AK00226   CHESBRO, PATRICIA        0          C      1
## 6 S0AZ00350         KELLY, Mark        1          I      1
##   NONCAND_PTY_AFFILIATION TTL_RECEIPTS  PAC_CASH Begin_Cash Ending_cash
## 1                     REP   11452928.7  711044.0          0  1708731.48
## 2                     DEM     134567.3       0.0          0    13058.22
## 3                     REP    9319257.3  818872.0    1023945   657686.46
## 4                     REP    6011431.6  191714.4          0    51251.95
## 5                     DEM     188577.2       0.0          0     7630.44
## 6                     DEM   92771344.1 3081428.2    1402523  1457875.18
##   CAND_CONTRIB TTL_INDIV_CONTRIB CAND_OFFICE_ST GENERAL
## 1         0.00         9080849.3             AL  66.62%
## 2      5259.21          112154.0             AL  30.88%
## 3         0.00         5813100.3             AK  43.37%
## 4         0.00         5733782.0             AK  42.60%
## 5      9131.42          178485.8             AK  10.37%
## 6         0.00        86182200.1             AZ  51.39%

Basic table of data

I am making two side by side boxplots and 1 logistical regression model. The first 2 boxplots compare the total funds raised by candidates who won vs. who lost. The second set of boxplots show the total amount of PAC money raised for candidates who won vs. lost. Lastly the model predicts which variable: TTL_RECEIPTS, PAC_CASH, TTL_INDIV_CONTRIB, incum_stat, or Ending_cash have a statistically significant effect on the outcome of an election or not.

library(scales)

# Clean and prep variables
data$GENERAL_clean <- as.numeric(gsub("%", "", data$GENERAL))
data$PAC_CASH_M <- data$PAC_CASH / 1e6
data$Color <- ifelse(data$WIN_LOSE == 1, "Winner", "Loser")

# Scatterplot with readable formatting
ggplot(data, aes(x = PAC_CASH_M, y = GENERAL_clean, color = Color)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_manual(values = c("Winner" = "red", "Loser" = "blue")) +
  scale_x_continuous(labels = label_dollar(suffix = "M", prefix = "$")) +
  labs(
    title = "PAC Cash vs. General Vote %",
    x = "PAC Cash Raised",
    y = "General Election Vote %",
    color = "Election Result"
  ) +
  theme_minimal()
## Warning: Removed 623 rows containing missing values or values outside the scale range
## (`geom_point()`).

# Convert TTL_RECEIPTS to millions for readability
data$TTL_RECEIPTS_M <- data$TTL_RECEIPTS / 1e6

# Scatterplot: Total Receipts vs General %, color by win/loss
ggplot(data, aes(x = TTL_RECEIPTS_M, y = GENERAL_clean, color = Color)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_manual(values = c("Winner" = "red", "Loser" = "blue")) +
  scale_x_continuous(labels = label_dollar(suffix = "M", prefix = "$")) +
  labs(
    title = "Total Receipts vs. General Vote %",
    x = "Total Receipts Raised",
    y = "General Election Vote %",
    color = "Election Result"
  ) +
  theme_minimal()
## Warning: Removed 623 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(data, aes(x = as.factor(WIN_LOSE), y = TTL_RECEIPTS, fill = as.factor(WIN_LOSE))) +
  geom_boxplot() +
  labs(title = "Total Receipts by Election Outcome",
       x = "Election Outcome: 0 = Lost, 1 = Won",
       y = "Total Receipts") +
  theme_minimal()
## Warning: Removed 619 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

ggplot(data, aes(x = as.factor(WIN_LOSE), y = PAC_CASH, fill = as.factor(WIN_LOSE))) +
  geom_boxplot() +
  labs(title = "PAC Cash by Election Outcome",
       x = "Election Outcome: 0 = Lost and 1 = Won",
       y = "PAC Cash") +
  theme_minimal()
## Warning: Removed 619 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

logis_model <- glm(WIN_LOSE ~ TTL_RECEIPTS + PAC_CASH + TTL_INDIV_CONTRIB + incum_stat + Ending_cash, 
                   data = data, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(logis_model)
## 
## Call:
## glm(formula = WIN_LOSE ~ TTL_RECEIPTS + PAC_CASH + TTL_INDIV_CONTRIB + 
##     incum_stat + Ending_cash, family = binomial, data = data)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)       -3.187e+01  3.523e+03  -0.009    0.993
## TTL_RECEIPTS      -2.441e-07  1.769e-07  -1.380    0.167
## PAC_CASH           5.022e-07  5.116e-07   0.982    0.326
## TTL_INDIV_CONTRIB  2.537e-07  1.867e-07   1.359    0.174
## incum_statI        3.188e+01  3.523e+03   0.009    0.993
## incum_statO        3.095e+01  3.523e+03   0.009    0.993
## Ending_cash        2.738e-06  2.106e-06   1.300    0.194
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 78.157  on 56  degrees of freedom
## Residual deviance: 16.048  on 50  degrees of freedom
##   (619 observations deleted due to missingness)
## AIC: 30.048
## 
## Number of Fisher Scoring iterations: 20

One large and clear conclusion that we can draw is that a candidate receiving money from a PAC is more signfificant than other sources of income. We can see this because the median line on the boxplot of Candidates who won and PAC_CASH is higher than the median line for people who won from all money. That means that on average, more PAC_MONEY raises the chance that a candidate would win on average than not. This is confirmed in the Logistical regression model where PAC_CASH has the largest coeffiecient by far of 5.116 indicating that it has the greatest affect on an election outcome.

However, I think there are many flaws with this this data. This is too restricted of a sample size. I think a more common sense conclusion is that the incumbant was more likely to win anyway and so PACs donated to the candidate more likley to win, and they did win most of the time. Of course, more testing, data, and research will be needed to prove that.