Loading in the data.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
library(tidyr)
fec<-read.csv("fec_independent_expenditures.csv", header = TRUE)
opposition<-fec%>%
  filter(support_oppose_indicator=="O", report_year>="2013", candidate_office=="P",
         candidate_id!="P80002801",
         candidate_id!="")%>%
  group_by(candidate_id, report_year)%>%
  summarise(expenditure_amount_Opp=sum(expenditure_amount, na.rm = TRUE),
            n_Opp=n())

## `summarise()` has grouped output by 'candidate_id'. You can override using the `.groups` argument.

support<-fec%>%
  filter(support_oppose_indicator=="S", report_year>="2013", candidate_office=="P",
         candidate_id!="P00547984",
         candidate_id!="P20002721",
         candidate_id!="P60007671",
         candidate_id!="P60007895",
         candidate_id!="P60009354",
         candidate_id!="P60019239",
         candidate_id!="P60021102",
         candidate_id!="P60022118",
         candidate_id!="P60023215",
         candidate_id!="P80003353",
         candidate_id!="")%>%
  group_by(candidate_id, report_year)%>%
  summarise(expenditure_amount_Supp=sum(expenditure_amount, na.rm = TRUE),
            n_Supp=n())

## `summarise()` has grouped output by 'candidate_id'. You can override using the `.groups` argument.

tog<-opposition%>%
  inner_join(support)%>%
  mutate(ratio=n_Opp/n_Supp)

## Joining, by = c("candidate_id", "report_year")

Milestone 4: Multiple Linear Regression

Using the numeric response variable, now incorporate more numeric explanatory variables (at least 3).

For this milestone, we had to tweak our data set slightly to accommodate some new variables. Because of this, we have a new simple linear regression model:

slr<-lm(tog$expenditure_amount_Opp~tog$expenditure_amount_Supp)
summary(slr)

## 
## Call:
## lm(formula = tog$expenditure_amount_Opp ~ tog$expenditure_amount_Supp)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -45597712  -1650922   1860864   3355416  34496404 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -4.216e+06  2.744e+06  -1.537    0.136    
## tog$expenditure_amount_Supp  1.050e+00  8.466e-02  12.404  6.8e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13790000 on 28 degrees of freedom
## Multiple R-squared:  0.846,  Adjusted R-squared:  0.8405 
## F-statistic: 153.9 on 1 and 28 DF,  p-value: 6.796e-13

State what these three variables are and your rationale for including them.

We chose to use report year, number of times an expenditure was made in support of a candidate, and, likewise, the number of times an expenditure was made in opposition of a candidate. For year, we thought as the 2016 presidential election was approaching, people might make more, last minute expenses to support or oppose the candidate of choice. For the two latter variables, we thought there might be some kind of connection between the number of times an expense is made in support or opposition to a candidate and the amount spent in opposition (does the amount of expenses determine how much was spent in opposition?).

Finding the beta matrix

YMat<-tog$expenditure_amount_Opp
XMat<-matrix(c(rep(1, dim(tog)[1]), tog$report_year, tog$n_Supp, tog$n_Opp), nrow=dim(tog)[1])

betaMat<-solve(t(XMat)%*%XMat)%*%t(XMat)%*%YMat

Showing our beta values:

betaMat

##               [,1]
## [1,] -6.107048e+08
## [2,]  3.031536e+05
## [3,]  1.649807e+03
## [4,]  6.834127e+03

Now to find our error terms:

eMat<-YMat-(XMat%*%betaMat)
n<-dim(tog)[1]
p<-3
ms_res<-sum(eMat^2)/(n-p-1)
covMat<-ms_res*solve(t(XMat)%*%XMat)
err<-sqrt(diag(covMat))
err

## [1] 1.051444e+09 5.217172e+05 5.538653e+01 7.356215e+01

We weren’t entirely sure how to code this part, but our fitted model based on these values would be: Y=(XMat x betaMat) + err

This would give us:

(XMat%*%betaMat) + err

## Warning in (XMat %*% betaMat) + err: longer object length is not a multiple of
## shorter object length

##               [,1]
##  [1,] 1.060663e+09
##  [2,] 7.146562e+07
##  [3,] 5.308151e+05
##  [4,] 1.864346e+05
##  [5,] 1.051809e+09
##  [6,] 1.125768e+06
##  [7,] 2.875244e+05
##  [8,] 1.147096e+06
##  [9,] 1.052369e+09
## [10,] 1.236064e+06
## [11,] 3.999407e+05
## [12,] 1.111714e+06
## [13,] 1.053870e+09
## [14,] 2.793999e+06
## [15,] 3.464477e+05
## [16,] 1.902293e+06
## [17,] 1.051759e+09
## [18,] 1.164418e+06
## [19,] 4.629382e+05
## [20,] 4.023134e+05
## [21,] 1.052489e+09
## [22,] 1.490375e+06
## [23,] 2.408600e+05
## [24,] 8.206597e+05
## [25,] 1.052367e+09
## [26,] 1.390792e+06
## [27,] 1.787377e+08
## [28,] 8.963331e+04
## [29,] 1.051707e+09
## [30,] 1.014297e+06

Creating a model to verify results:

mlr<-lm(tog$expenditure_amount_Opp~tog$report_year+tog$n_Supp+tog$n_Opp)
summary(mlr)

## 
## Call:
## lm(formula = tog$expenditure_amount_Opp ~ tog$report_year + tog$n_Supp + 
##     tog$n_Opp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2718176  -689041  -391122  -158051  5863348 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -6.107e+08  1.051e+09  -0.581    0.566    
## tog$report_year  3.032e+05  5.217e+05   0.581    0.566    
## tog$n_Supp       1.650e+03  5.539e+01  29.787   <2e-16 ***
## tog$n_Opp        6.834e+03  7.356e+01  92.903   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1839000 on 26 degrees of freedom
## Multiple R-squared:  0.9975, Adjusted R-squared:  0.9972 
## F-statistic:  3398 on 3 and 26 DF,  p-value: < 2.2e-16

Looks about right!

Project Milestone 4

Ben Jaffe and Nate Howard

4/6/2021

Loading in the data.

Milestone 4: Multiple Linear Regression

Using the numeric response variable, now incorporate more numeric explanatory variables (at least 3).

State what these three variables are and your rationale for including them.

Finding the beta matrix