library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(tidyr)
fec<-read.csv("fec_independent_expenditures.csv", header = TRUE)
opposition<-fec%>%
filter(support_oppose_indicator=="O", report_year>="2013", candidate_office=="P",
candidate_id!="P80002801",
candidate_id!="")%>%
group_by(candidate_id, report_year)%>%
summarise(expenditure_amount_Opp=sum(expenditure_amount, na.rm = TRUE),
n_Opp=n())
## `summarise()` has grouped output by 'candidate_id'. You can override using the `.groups` argument.
support<-fec%>%
filter(support_oppose_indicator=="S", report_year>="2013", candidate_office=="P",
candidate_id!="P00547984",
candidate_id!="P20002721",
candidate_id!="P60007671",
candidate_id!="P60007895",
candidate_id!="P60009354",
candidate_id!="P60019239",
candidate_id!="P60021102",
candidate_id!="P60022118",
candidate_id!="P60023215",
candidate_id!="P80003353",
candidate_id!="")%>%
group_by(candidate_id, report_year)%>%
summarise(expenditure_amount_Supp=sum(expenditure_amount, na.rm = TRUE),
n_Supp=n())
## `summarise()` has grouped output by 'candidate_id'. You can override using the `.groups` argument.
tog<-opposition%>%
inner_join(support)%>%
mutate(ratio=n_Opp/n_Supp)
## Joining, by = c("candidate_id", "report_year")
For this milestone, we had to tweak our data set slightly to accommodate some new variables. Because of this, we have a new simple linear regression model:
slr<-lm(tog$expenditure_amount_Opp~tog$expenditure_amount_Supp)
summary(slr)
##
## Call:
## lm(formula = tog$expenditure_amount_Opp ~ tog$expenditure_amount_Supp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45597712 -1650922 1860864 3355416 34496404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.216e+06 2.744e+06 -1.537 0.136
## tog$expenditure_amount_Supp 1.050e+00 8.466e-02 12.404 6.8e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13790000 on 28 degrees of freedom
## Multiple R-squared: 0.846, Adjusted R-squared: 0.8405
## F-statistic: 153.9 on 1 and 28 DF, p-value: 6.796e-13
We chose to use report year, number of times an expenditure was made in support of a candidate, and, likewise, the number of times an expenditure was made in opposition of a candidate. For year, we thought as the 2016 presidential election was approaching, people might make more, last minute expenses to support or oppose the candidate of choice. For the two latter variables, we thought there might be some kind of connection between the number of times an expense is made in support or opposition to a candidate and the amount spent in opposition (does the amount of expenses determine how much was spent in opposition?).
YMat<-tog$expenditure_amount_Opp
XMat<-matrix(c(rep(1, dim(tog)[1]), tog$report_year, tog$n_Supp, tog$n_Opp), nrow=dim(tog)[1])
betaMat<-solve(t(XMat)%*%XMat)%*%t(XMat)%*%YMat
Showing our beta values:
betaMat
## [,1]
## [1,] -6.107048e+08
## [2,] 3.031536e+05
## [3,] 1.649807e+03
## [4,] 6.834127e+03
Now to find our error terms:
eMat<-YMat-(XMat%*%betaMat)
n<-dim(tog)[1]
p<-3
ms_res<-sum(eMat^2)/(n-p-1)
covMat<-ms_res*solve(t(XMat)%*%XMat)
err<-sqrt(diag(covMat))
err
## [1] 1.051444e+09 5.217172e+05 5.538653e+01 7.356215e+01
We werenโt entirely sure how to code this part, but our fitted model based on these values would be: Y=(XMat x betaMat) + err
This would give us:
(XMat%*%betaMat) + err
## Warning in (XMat %*% betaMat) + err: longer object length is not a multiple of
## shorter object length
## [,1]
## [1,] 1.060663e+09
## [2,] 7.146562e+07
## [3,] 5.308151e+05
## [4,] 1.864346e+05
## [5,] 1.051809e+09
## [6,] 1.125768e+06
## [7,] 2.875244e+05
## [8,] 1.147096e+06
## [9,] 1.052369e+09
## [10,] 1.236064e+06
## [11,] 3.999407e+05
## [12,] 1.111714e+06
## [13,] 1.053870e+09
## [14,] 2.793999e+06
## [15,] 3.464477e+05
## [16,] 1.902293e+06
## [17,] 1.051759e+09
## [18,] 1.164418e+06
## [19,] 4.629382e+05
## [20,] 4.023134e+05
## [21,] 1.052489e+09
## [22,] 1.490375e+06
## [23,] 2.408600e+05
## [24,] 8.206597e+05
## [25,] 1.052367e+09
## [26,] 1.390792e+06
## [27,] 1.787377e+08
## [28,] 8.963331e+04
## [29,] 1.051707e+09
## [30,] 1.014297e+06
Creating a model to verify results:
mlr<-lm(tog$expenditure_amount_Opp~tog$report_year+tog$n_Supp+tog$n_Opp)
summary(mlr)
##
## Call:
## lm(formula = tog$expenditure_amount_Opp ~ tog$report_year + tog$n_Supp +
## tog$n_Opp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2718176 -689041 -391122 -158051 5863348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.107e+08 1.051e+09 -0.581 0.566
## tog$report_year 3.032e+05 5.217e+05 0.581 0.566
## tog$n_Supp 1.650e+03 5.539e+01 29.787 <2e-16 ***
## tog$n_Opp 6.834e+03 7.356e+01 92.903 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1839000 on 26 degrees of freedom
## Multiple R-squared: 0.9975, Adjusted R-squared: 0.9972
## F-statistic: 3398 on 3 and 26 DF, p-value: < 2.2e-16
Looks about right!