Click Here for R Script

Data Source: data.world

Which Congress member(s) raised the most amount of money while spending the least?

The Democrats control the House and Senate, though only by a small margin.

Check for Normality

Raised and Spent variables are not normally distributed.

Correlation Test

Is there a relationship between Raised and Spent?

## 
##  Spearman's rank correlation rho
## 
## data:  party$Spent and party$Raised
## S = 2162931, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9161945

Scatter Plot

Given a correlation rho of 0.92, there is a very strong relationship between Spent and Raised.

Insight

Let’s see which member(s) had the best return on invest (ROI) and compare how much they Raised vs. Spent.

Senator Joe Manchin had a 932% ROI ((7790164 / 835794) * 100 = 932.0675%) for the 2022 election cycle. One of the limitations of this data set is that it does not reveal what Senator Manchin did to achieve such an ROI, but it does alert us to pay attention to his business strategy for fundraising.

Generalized Linear Model

A predictive model can be built using this data set to predict the approximate outcome of Raised if given the value(s) for Spent. A generalized linear model (GLM) will be used to compensate for the outlier values.

## 
## Call:
## glm(formula = Raised ~ Spent, data = party)
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.903e+05  5.880e+04   4.936 1.07e-06 ***
## Spent       1.148e+00  8.462e-03 135.712  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.539027e+12)
## 
##     Null deviance: 2.9169e+16  on 536  degrees of freedom
## Residual deviance: 8.2338e+14  on 535  degrees of freedom
## AIC: 16597
## 
## Number of Fisher Scoring iterations: 2
## We fitted a linear model (estimated using ML) to predict Raised with Spent
## (formula: Raised ~ Spent). The model's explanatory power is substantial (R2 =
## 0.97). The model's intercept, corresponding to Spent = 0, is at 2.90e+05 (95%
## CI [1.75e+05, 4.06e+05], t(535) = 4.94, p < .001). Within this model:
## 
##   - The effect of Spent is statistically significant and positive (beta = 1.15,
## 95% CI [1.13, 1.17], t(535) = 135.71, p < .001; Std. beta = 0.99, 95% CI [0.97,
## 1.00])
## 
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.

Given a p-value of < 2e-16, Spent is deemed useful for predicting Raised.

Predicting the Outcome

How much money would be Raised if $500,000, $1,000,000, and $2,000,000 were Spent? Let’s see what the model shows:

##         1         2         3 
##  864492.9 1438711.8 2587149.4

$864,492.92, $1,438,711.76, and $2,587,149.45, respectively. Hover mouse cursor or finger tap the data points on the blue trend line to see more details:

## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gvlma_1.0.0.3      plotly_4.10.1      gridExtra_2.3      DT_0.28           
##  [5] rio_0.5.29         performance_0.10.3 readxl_1.4.2       janitor_2.2.0     
##  [9] lubridate_1.9.2    forcats_1.0.0      stringr_1.5.0      dplyr_1.1.1       
## [13] purrr_1.0.1        readr_2.1.4        tidyr_1.3.0        tibble_3.2.1      
## [17] tidyverse_2.0.0    report_0.5.7       ggplot2_3.4.1     
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.0  viridisLite_0.4.1 farver_2.1.1      fastmap_1.1.1    
##  [5] lazyeval_0.2.2    bayestestR_0.13.1 digest_0.6.31     timechange_0.2.0 
##  [9] lifecycle_1.0.3   ellipsis_0.3.2    magrittr_2.0.3    compiler_4.3.1   
## [13] rlang_1.1.0       sass_0.4.5        tools_4.3.1       utf8_1.2.3       
## [17] yaml_2.3.7        data.table_1.14.8 knitr_1.42        labeling_0.4.2   
## [21] htmlwidgets_1.6.2 curl_5.0.0        withr_2.5.0       foreign_0.8-84   
## [25] grid_4.3.1        datawizard_0.7.1  fansi_1.0.4       colorspace_2.1-0 
## [29] scales_1.2.1      MASS_7.3-60       insight_0.19.2    cli_3.6.1        
## [33] rmarkdown_2.21    generics_0.1.3    rstudioapi_0.14   httr_1.4.5       
## [37] tzdb_0.3.0        parameters_0.21.1 cachem_1.0.7      splines_4.3.1    
## [41] effectsize_0.8.3  cellranger_1.1.0  vctrs_0.6.1       Matrix_1.5-4.1   
## [45] jsonlite_1.8.4    hms_1.1.3         patchwork_1.1.2   crosstalk_1.2.0  
## [49] see_0.7.5         jquerylib_0.1.4   glue_1.6.2        stringi_1.7.12   
## [53] gtable_0.3.3      munsell_0.5.0     pillar_1.9.0      htmltools_0.5.5  
## [57] R6_2.5.1          evaluate_0.20     lattice_0.21-8    haven_2.5.2      
## [61] highr_0.10        openxlsx_4.2.5.2  snakecase_0.11.0  bslib_0.4.2      
## [65] Rcpp_1.0.10       zip_2.3.0         nlme_3.1-162      mgcv_1.8-42      
## [69] xfun_0.38         pkgconfig_2.0.3