fec<-read.csv("fec_independent_expenditures.csv", header = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v stringr 1.4.0
## v tidyr 1.1.2 v forcats 0.5.1
## v readr 1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
opposition<-fec%>%
filter(support_oppose_indicator=="O", report_year>="2013", candidate_office=="P",
candidate_id!="P80002801",
candidate_id!="")%>%
group_by(candidate_id)%>%
summarise(expenditure_amount=sum(expenditure_amount, na.rm = TRUE))
support<-fec%>%
filter(support_oppose_indicator=="S", report_year>="2013", candidate_office=="P",
candidate_id!="P00547984",
candidate_id!="P20002721",
candidate_id!="P60007671",
candidate_id!="P60007895",
candidate_id!="P60009354",
candidate_id!="P60019239",
candidate_id!="P60021102",
candidate_id!="P60022118",
candidate_id!="P60023215",
candidate_id!="P80003353",
candidate_id!="")%>%
group_by(candidate_id)%>%
summarise(expenditure_amount=sum(expenditure_amount, na.rm = TRUE))
fec<-cbind(support, opposition)
fec$opp_amt<-fec[,4]
fec$sup_amt<-fec[,2]
fec<-fec[-c(2:4)]
x<-fec$sup_amt
y<-fec$opp_amt
mod<-lm(y~x)
AN<-anova(mod)
ssres<-AN$`Sum Sq`[2]
ssreg<-AN$`Sum Sq`[1]
n<-dim(fec)[1]
AN
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 2.5267e+16 2.5267e+16 45.936 3.227e-06 ***
## Residuals 17 9.3510e+15 5.5006e+14
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Regression: - Degrees of Freedom: 1 - Sum of Squares: 2.5267e+16 - Mean Squares: 2.5267e+16 - F-Value: 45.936 - P-Value: 3.227e-06
Residual: - Degrees of Freedom: 17 - Sum of Squares: 9.3510e+15 - Mean Squares: 5.5006e+14
Total Degrees of Freedom: 18 Total Sum of Squares: 3.4618e+16
Our F-Statistic value is:
f_stat<-(ssreg/1)/(ssres/(n-2))
f_stat
## [1] 45.93577
Which has been stated in the ANOVA table.
Our R-Squared value is:
R2<-(AN$`Sum Sq`[1]/3.4618e+16)
R2
## [1] 0.7298875
Which tells us that roughly 73% of our “x” variable is represented by the regression model.
plot(mod)
The three plots we’re interested in are the Residuals vs. Fitted, Normal QQ Plot, and the Leverage vs. Residual Plot. Based on these findings, our data is pretty normal with almost no outliers that would affect our regression model. We do have three values that stand out as potential influencers to the model, as shown by the QQ Plot and the Residuals vs. Fitted Plot. However, the Residuals vs. Leverage plot shows us that one of those potentially influential data points won’t affect our model as much as we thought. We do have two outliers that lie outside the maximum Cook’s distance, which tells us that we might need to exclude these values in order to have a more “accurate” regression model.
Overall, our regression model represents our “x” variable fairly well. Our R-Squared value tells us that almost 3/4’s of our “x” variable is represented by our regression model; this might be raised if we remove our influential outliers as defined by the various plots we created. Especially if we remove the two variables outside the maximum Cook’s distance.