library(tidyverse)
library(infer)
library(tigris)
library(sf)
library(kableExtra)
library(cowplot)
set.seed(888)
In this paper, I investigate the relationship between the competitiveness of electoral districts and the partisan leanings of candidates elected in those districts. The predominating narrative on partisan gerrymandering posits that redistricting efforts led by one-sided state legislatures leads to one-sided, uncompetitive congressional races, in which one party is heavily favored in most districts. In turn this lack of competition contributes to polarization, as the “real” races take place during primary elections, in which candidates are incentivized to take more extreme positions to appeal to their parties’ political bases. This narrative is compelling, but I have found very limited empirical evidence to support it. Therein lies the goal of the analysis I put forth below: is there empirical support linking a lack of competitiveness with more polarizing candidates?
Because these phenomena are not directly observable — referred to as “latent” variables — I consider six different proxy variables that attempt to quantify competitiveness and partisanship. To attempt to measure competitiveness, I consider three variables: the results of a 2022 study measuring voter policy preferences (referred to below as the “Ideology” measure; see [1]), the Cook Partisan Voting Index (referred to as “Cook PVI”; see [2]), and the average, district-level “efficiency gap” for all congressional elections since 2010 (see [3]). As a proxy of how far left or right candidates lean, I consider another three variables: the NOMINATE measure of ideological leanings in Congressional voting, popularized by Poole and Rosenthal (referred to as “Nominate”; see [4]), GovTrack’s measure of partisan cooperation in Congress (referred to as “Govtrack”; see [5]) and a “homemade” measure of partisan rhetoric based on predictive modeling performed on a corpus candidate tweets.
Testing is performed on all nine combinations of the above six variables. Because of the latent nature of these variables, it is difficult to assign a high degree of confidence to any one test. I therefore found it useful to conduct multiple test simultaneously and compare results, attempting to assess the strength of any relationship holistically. I first conduct null hypothesis test to compare the mean partisanship score of candidates from districts with a high degree of competitiveness versus those from districts with a low degree of competitiveness. Results appear mixed, but tend to indicate only a weak relationship, if any, with three of the nine pairs of variables showing a statistically significant relationship. To follow up on this finding, I fit a series of linear regression models. Of the nine models, only three had statistically significant predictors, corresponding to the same three significant pairs from the first round of tests. Moreover, those three models offered only limited explanation of the variance in the dependent variable, as evidenced by low \(R^2\) (all under 15%). Effect sizes also appear minimal, with beta coefficients near zero in all but once case.
In conclusion, this analysis provides limited .
LIMITATIONS
CITATIONS
Primary data
candidates <- read_csv('data/candidates_partisan_compete.csv')
## Rows: 4297 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): fec_id, name, state, district, party, office, incumbent_challenge,...
## dbl (7): govtrack_id, partisan_score_twitter, partisan_score_nominate, part...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(candidates)
## Rows: 4,297
## Columns: 18
## $ fec_id <chr> "H2CO07170", "H2UT03280", "H2CA30291", "H2MN07…
## $ name <chr> "AADLAND, ERIK", "AALDERS, TIM", "AAZAMI, SHER…
## $ state <chr> "CO", "UT", "CA", "MN", "NY", "MN", "OH", "CA"…
## $ district <chr> "07", "03", "32", "07", "12", "05", "09", "50"…
## $ party <chr> "REPUBLICAN PARTY", "REPUBLICAN PARTY", "DEMOC…
## $ office <chr> "House", "House", "House", "House", "House", "…
## $ incumbent_challenge <chr> "Open seat", "Challenger", "Challenger", "Chal…
## $ candidate_status <chr> "P", "C", "N", "P", "P", "P", "N", "C", "N", "…
## $ bioguide_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ ballotpedia_id <chr> "Erik_Aadland", "Tim_Aalders", "Shervin_Aazami…
## $ govtrack_id <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ twitter_name <chr> "aadlandforco", NA, "aazamishervin", NA, NA, N…
## $ partisan_score_twitter <dbl> 0.4659687, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ partisan_score_nominate <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ partisan_score_govtrack <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ compete_ideology <dbl> 0.03679439, 0.37553384, 0.59131146, 0.54685173…
## $ compete_cookpvi <dbl> 0.01369863, 0.45205479, 0.45205479, 0.61643836…
## $ compete_eff_gap <dbl> 0.43740634, 0.03439195, 0.50331208, 0.46466341…
candidates <- candidates %>%
mutate(moc = if_else(is.na(bioguide_id), 'No', 'Yes')) %>%
select(name, state, district, party, incumbent_challenge, moc,
contains('partisan'), contains('compete'))
candidates <- candidates[!duplicated(candidates),]
candidates[duplicated(candidates$name) | duplicated(candidates$name, fromLast = TRUE),]
## # A tibble: 127 × 12
## name state district party incumbent_challenge moc partisan_score_twitter
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 AUSTIN… MT 00 REPU… Challenger No NA
## 2 AUSTIN… MT 00 REPU… Challenger No NA
## 3 AUSTIN… MT 00 REPU… Challenger No NA
## 4 AUSTIN… MT 00 REPU… Challenger No NA
## 5 BALLAR… MT 00 DEMO… Challenger No NA
## 6 BALLAR… MT 00 DEMO… Challenger No NA
## 7 BALLAR… MT 00 DEMO… Challenger No NA
## 8 BALLAR… MT 00 DEMO… Challenger No NA
## 9 BERMAN… MN 01 REPU… Open seat No NA
## 10 BERMAN… MN 06 REPU… Challenger No NA
## # ℹ 117 more rows
## # ℹ 5 more variables: partisan_score_nominate <dbl>,
## # partisan_score_govtrack <dbl>, compete_ideology <dbl>,
## # compete_cookpvi <dbl>, compete_eff_gap <dbl>
Mapping Data
data('fips_codes')
districts_sf <- congressional_districts() %>%
st_simplify(dTolerance = 10000) %>%
data.frame() %>%
left_join(
fips_codes %>%
select(state, state_code) %>%
unique(),
by = c('STATEFP' = 'state_code')
)
## Retrieving data for the year 2021
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================= | 34%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================= | 94%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
states_sf <- states(cb = TRUE) %>%
st_simplify(dTolerance = 1000) %>%
data.frame() %>%
filter(STUSPS != 'AS',
STUSPS != 'GU',
STUSPS != 'MP',
STUSPS != 'VI',
STUSPS != 'PR',)
## Retrieving data for the year 2021
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================= | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
candidates_sf <- candidates %>%
left_join(
select(districts_sf, state, CD116FP, geometry),
by = c('state', 'district' = 'CD116FP'),
) %>%
left_join(
select(states_sf, STUSPS, geometry), suffix = c('_distr','_state'),
by = c('state' = 'STUSPS')
)
Spread across states
state_bar_plot <- function (data, filter_column) {
data %>%
mutate(dupe = duplicated(name)) %>%
filter(!is.na(!! sym(filter_column)),
dupe == FALSE) %>%
group_by(state) %>%
summarise(count = n()) %>%
ggplot(aes(x = reorder(state, count), y = count)) +
geom_bar(stat = 'identity') +
scale_x_discrete(guide = guide_axis(n.dodge = 3)) +
xlab('') +
ylab('') +
labs(title = str_remove_all(filter_column, 'partisan_score_')) +
coord_flip()
}
state_plots <- lapply(c('name','partisan_score_twitter',
'partisan_score_nominate','partisan_score_govtrack'),
state_bar_plot, data = candidates)
plot_grid(plotlist = state_plots, nrow = 2, ncol = 2)
state_plots <- lapply(c('partisan_score_twitter',
'partisan_score_nominate',
'partisan_score_govtrack'),
state_bar_plot, data = candidates)
plot_grid(plotlist = state_plots, nrow = 1, ncol = 3)
Spread across parties
party_bar_plot <- function (data, filter_column) {
data %>%
mutate(dupe = duplicated(name)) %>%
filter(!is.na(!! sym(filter_column)), dupe == FALSE,
party == 'DEMOCRATIC PARTY' | party == 'REPUBLICAN PARTY') %>%
group_by(party) %>%
summarise(count = n()) %>%
ggplot(aes(x = reorder(party, count), y = count, fill = party)) +
geom_bar(stat = 'identity', show.legend = FALSE) +
scale_fill_manual(values = c('DEMOCRATIC PARTY' = 'deepskyblue2',
'REPUBLICAN PARTY' = 'firebrick3')) +
xlab('') +
ylab('') +
labs(title = str_remove_all(filter_column, 'partisan_score_')) +
coord_flip()
}
party_plots <- lapply(c('name','partisan_score_twitter',
'partisan_score_nominate','partisan_score_govtrack'),
party_bar_plot, data = candidates)
plot_grid(plotlist = party_plots, nrow = 2, ncol = 2)
party_plots <- lapply(c('partisan_score_twitter',
'partisan_score_nominate',
'partisan_score_govtrack'),
party_bar_plot, data = candidates)
plot_grid(plotlist = party_plots, nrow = 3, ncol = 1)
Geographic Visualization
map_plot <- function(fill,
color = 'blue',
data = candidates_sf) {
data %>%
filter(state != 'HI',
state != 'AK',
state != 'PR') %>%
ggplot() +
geom_sf(aes(geometry = geometry_distr, fill = !!sym(fill))) +
geom_sf(aes(geometry = geometry_state), color = 'purple', size = 0.4, fill = NA) +
scale_fill_gradient(low = 'gray100',
high = color,
limits = c(0,1),
name = str_remove_all(fill, 'compete_')) +
theme(legend.position='bottom')
}
map_plot(fill = 'compete_ideology',
data = candidates_sf)
map_plot(fill = 'compete_cookpvi',
data = candidates_sf)
map_plot(fill = 'compete_eff_gap',
data = candidates_sf)
Efficiency and Partisanship
scatterplot <- function(x,y) {
candidates %>%
filter(!is.na(!! sym(x)),
!is.na(!! sym(y))) %>%
ggplot(aes(x = !! sym(x), y = !! sym(y))) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ x') +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size = 10)) +
labs(title = str_c(str_remove_all(x, 'compete|_'), '-',
str_remove_all(y, 'partisan_score|_')))
}
var_grid_continous <- expand_grid(
x = colnames(select(candidates, contains('compete'))),
y = colnames(select(candidates, contains('partisan')))
)
plot_list <- list()
for (row in 1:nrow(var_grid_continous)) {
plot <- scatterplot(as.character(var_grid_continous[row,'x']),
as.character(var_grid_continous[row,'y']))
plot_list[[row]] <- plot
}
plot_grid(plotlist = plot_list[c(4,7,1,9,3,6,8,2,5)], nrow = 3, ncol = 3)
Summary Stats
cat('District Ideology\n')
## District Ideology
summary(candidates$compete_ideology)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00104 0.13089 0.28782 0.31187 0.48419 0.93941 108
cat('\nDistrict Cook PVI\n')
##
## District Cook PVI
summary(candidates$compete_cookpvi)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0137 0.1507 0.3425 0.3657 0.5343 1.0000 131
cat('\nDistrict Effiency Gap\n')
##
## District Effiency Gap
summary(candidates$compete_eff_gap)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.2233 0.3785 0.3932 0.5339 1.0000 192
summary(candidates$compete_cookpvi)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0137 0.1507 0.3425 0.3657 0.5343 1.0000 131
summary(candidates$compete_eff_gap)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.2233 0.3785 0.3932 0.5339 1.0000 192
summary(candidates$compete_ideology)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00104 0.13089 0.28782 0.31187 0.48419 0.93941 108
Binary Independent Variables
candidates <- candidates %>%
mutate(
compete_ideology_binary = factor(if_else(compete_ideology > median(
candidates$compete_ideology, na.rm = TRUE), 'high', 'low'),
levels = c('low','high'),
ordered = TRUE),
compete_cookpvi_binary = factor(if_else(compete_cookpvi > median(
candidates$compete_cookpvi, na.rm = TRUE), 'high', 'low'),
levels = c('low','high'),
ordered = TRUE),
compete_eff_gap_binary = factor(if_else(compete_eff_gap > median(
candidates$compete_eff_gap, na.rm = TRUE), 'high', 'low'),
levels = c('low','high'),
ordered = TRUE))
boxplot <- function(x,y) {
candidates %>%
filter(!is.na(!! sym(x)),
!is.na(!! sym(y))) %>%
ggplot(aes(x = !! sym(x), y = !! sym(y))) +
geom_boxplot() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size = 10)) +
labs(title = str_c(str_remove_all(x, 'compete|_|binary'), '_',
str_remove_all(y, 'partisan_score|_')))
}
var_grid <- expand_grid(
x = colnames(select(candidates, (contains('compete') & contains('binary')))),
y = colnames(select(candidates, contains('partisan')))
) %>%
arrange(x,y)
plot_list <- list()
for (row in 1:nrow(var_grid)) {
plot <- boxplot(as.character(var_grid[row,'x']),
as.character(var_grid[row,'y']))
plot_list[[row]] <- plot
}
plot_grid(plotlist = plot_list[c(3,6,9,2,5,8,1,4,7)], nrow = 3, ncol = 3)
Define functions
hyp_diff_means <- function(x, y,
data = candidates,
conf_level = 0.95,
bootstraps = 1000) {
mean_diff <- data %>%
filter(!is.na(!! sym(y)),
!is.na(!! sym(x))) %>%
specify(response = (!!sym(y)), explanatory = (!!sym(x))) %>%
calculate(stat = 'diff in means', order = c('high', 'low'))
null_dist <- data %>%
filter(!is.na(!! sym(y)),
!is.na(!! sym(x))) %>%
specify(response = (!!sym(y)), explanatory = (!!sym(x))) %>%
hypothesize(null = 'independence') %>%
generate(reps = bootstraps, type = 'permute') %>%
calculate(stat = 'diff in means', order = c('high', 'low'))
pval <- null_dist %>%
get_p_value(obs_stat = mean_diff, direction = 'two-sided')
ci <- data %>%
filter(!is.na(!! sym(y)),
!is.na(!! sym(x))) %>%
specify(response = (!!sym(y)), explanatory = (!!sym(x))) %>%
generate(reps = bootstraps, type = 'bootstrap') %>%
calculate(stat = 'diff in means', order = c('high', 'low')) %>%
get_ci(level = conf_level)
results <- list(p_value = pval$p_value,
lower_ci = ci$lower_ci,
upper_ci = ci$upper_ci,
mean_diff = mean_diff$stat)
return(results)
}
grid_hyp_test <- function(data, grid, ...) {
results_df <- data.frame(grid, p_value = NA, lower_ci = NA,
upper_ci = NA, mean_diff = NA)
for (row in 1:nrow(grid)) {
test_name <- str_c(str_remove_all(as.character(grid[row,'x']),
'compete|_|binary'), '_',
str_remove_all(as.character(grid[row,'y']),
'partisan_score|_'))
result <- hyp_diff_means(as.character(grid[row,'x']),
as.character(grid[row,'y']),
data = data)
results_df <- results_df %>%
mutate(
p_value = replace(p_value,
x == as.character(grid[row,'x']) &
y == as.character(grid[row,'y']),
result$p_value),
lower_ci = replace(lower_ci,
x == as.character(grid[row,'x']) &
y == as.character(grid[row,'y']),
result$lower_ci),
upper_ci = replace(upper_ci,
x == as.character(grid[row,'x']) &
y == as.character(grid[row,'y']),
result$upper_ci),
mean_diff = replace(mean_diff,
x == as.character(grid[row,'x']) &
y == as.character(grid[row,'y']),
result$mean_diff))
}
return(results_df)
}
hyp_test_heatmap <- function(results_df, upper_limit = 0.1) {
results_df %>%
mutate(x = str_remove_all(x, 'compete|binary|_'),
y = str_remove_all(y, 'partisan_score|_')) %>%
ggplot(aes(x, y)) +
geom_tile(aes(fill = p_value)) +
geom_text(aes(label = str_c('Obs. Diff.: ', round(mean_diff,3),
'\np-value: ',
if_else(p_value == 0, '<0.000',
format(round(p_value,3), nsmall=3)),
'\n95% CI: ', round(lower_ci,3),
' to ', round(upper_ci,3)))) +
xlab('District Competition Measures') +
ylab('Candidate Partisanship Measures') +
theme(axis.text.y = element_text(angle = 90, hjust = 0.5)) +
scale_fill_gradient(low = 'darkslategray1',
high = 'gray100',
limits = c(0,upper_limit),
oob = scales::squish)
# na.value = 'grey50')
}
Run test on full data
results <- grid_hyp_test(data = candidates, grid = var_grid)
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
hyp_test_heatmap(results)
Split by party
results_dem <- grid_hyp_test(data = filter(candidates,
party == 'DEMOCRATIC PARTY'),
grid = var_grid,
conf_level = 0.975)
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
results_rep <- grid_hyp_test(data = filter(candidates,
party == 'REPUBLICAN PARTY'),
grid = var_grid,
conf_level = 0.975)
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
hyp_test_heatmap(results_dem, upper_limit = 0.05) +
labs(title = 'Democrats')
hyp_test_heatmap(results_rep, upper_limit = 0.05) +
labs(title = 'Republicans')
results_df <- data.frame(var_grid_continous,
beta_coef = NA,
predictor_pval = NA,
r_squared = NA)
model_list <- list()
for (row in 1:nrow(var_grid_continous)) {
x = as.character(var_grid_continous[row,'x'])
y = as.character(var_grid_continous[row,'y'])
model <- lm(data = candidates, formula = as.formula(paste(y,'~',x)))
model_list[[row]] <- model
results_df <- results_df %>%
mutate(
beta_coef = replace(beta_coef,
x == as.character(var_grid_continous[row,'x']) &
y == as.character(var_grid_continous[row,'y']),
summary(model)$coefficients[2]),
predictor_pval = replace(predictor_pval,
x == as.character(var_grid_continous[row,'x']) &
y == as.character(var_grid_continous[row,'y']),
summary(model)$coefficients[8]),
r_squared = replace(r_squared,
x == as.character(var_grid_continous[row,'x']) &
y == as.character(var_grid_continous[row,'y']),
summary(model)$r.squared))
}
results_df %>%
filter(predictor_pval < 0.1) %>%
arrange(desc(r_squared))
## x y beta_coef predictor_pval
## 1 compete_cookpvi partisan_score_nominate 0.3040729 6.683363e-26
## 2 compete_ideology partisan_score_nominate 0.3184230 1.037703e-20
## 3 compete_cookpvi partisan_score_govtrack 0.1099427 5.712276e-03
## 4 compete_eff_gap partisan_score_govtrack -0.0785671 7.348125e-02
## r_squared
## 1 0.222758558
## 2 0.179384003
## 3 0.014323040
## 4 0.006066025
results_df %>%
mutate(x = str_remove_all(x, 'compete|_'),
y = str_remove_all(y, 'partisan_score|_')) %>%
ggplot(aes(x,y)) +
geom_tile(aes(fill = ifelse(predictor_pval < 0.1, r_squared, NA))) +
geom_text(aes(label = str_c('R^2: ',round(r_squared,3),
'\nPredictor pval: ',format(round(predictor_pval,3), nsmall=3),
'\nBeta: ',round(beta_coef,3)))) +
xlab('District Competition Measures') +
ylab('Candidate Partisanship Measures') +
scale_fill_gradient(low = 'gray100',
high = 'darkslategray1',
limits = c(0,0.5),
na.value = 'grey50') +
theme(axis.text.y = element_text(angle = 90, hjust = 0.5)) +
labs(fill = 'R^2')
Residual Plots
for (i in c(5, 6, 2, 9)) {
p1 <- ggplot(data = model_list[[i]], aes(x = .resid)) +
geom_histogram(bins = 25) +
xlab("Residuals")
p2 <- ggplot(data = model_list[[i]], aes(sample = .resid)) +
stat_qq() +
geom_abline(intercept = mean(model_list[[i]]$residuals),
slope = sd(model_list[[i]]$residuals),
color = 'red', linetype = 'dashed')
p3 <- ggplot(data = model_list[[i]], aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted values") +
ylab("Residuals")
formula <- str_c(as.character(model_list[[i]]$terms[1])[2],' ',
as.character(model_list[[i]]$terms[1])[1],' ',
as.character(model_list[[i]]$terms[1])[3])
title <- ggdraw() +
draw_label(formula, fontface = 'bold', x = 0, hjust = -0.1)
print(plot_grid(title, plot_grid(p1, p2), p3,
nrow = 3, ncol = 1, rel_heights = c(0.15,1,1)))
}
for (i in c(5, 6, 2, 9)) {
print(summary(model_list[[i]]))
}
##
## Call:
## lm(formula = as.formula(paste(y, "~", x)), data = candidates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.31291 -0.09256 -0.02076 0.07328 0.51309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.33912 0.01232 27.54 <2e-16 ***
## compete_cookpvi 0.30407 0.02708 11.23 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1365 on 440 degrees of freedom
## (3285 observations deleted due to missingness)
## Multiple R-squared: 0.2228, Adjusted R-squared: 0.221
## F-statistic: 126.1 on 1 and 440 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = as.formula(paste(y, "~", x)), data = candidates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45740 -0.16191 -0.00229 0.16626 0.56915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.38416 0.01767 21.742 < 2e-16 ***
## compete_cookpvi 0.10994 0.03962 2.775 0.00571 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2219 on 530 degrees of freedom
## (3195 observations deleted due to missingness)
## Multiple R-squared: 0.01432, Adjusted R-squared: 0.01246
## F-statistic: 7.702 on 1 and 530 DF, p-value: 0.005712
##
##
## Call:
## lm(formula = as.formula(paste(y, "~", x)), data = candidates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.37994 -0.09915 -0.01600 0.08063 0.53963
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.35218 0.01256 28.044 <2e-16 ***
## compete_ideology 0.31842 0.03243 9.818 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1401 on 441 degrees of freedom
## (3284 observations deleted due to missingness)
## Multiple R-squared: 0.1794, Adjusted R-squared: 0.1775
## F-statistic: 96.4 on 1 and 441 DF, p-value: < 2.2e-16
##
##
## Call:
## lm(formula = as.formula(paste(y, "~", x)), data = candidates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44154 -0.16717 -0.00503 0.16262 0.57069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.45839 0.01981 23.141 <2e-16 ***
## compete_eff_gap -0.07857 0.04381 -1.793 0.0735 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2249 on 527 degrees of freedom
## (3198 observations deleted due to missingness)
## Multiple R-squared: 0.006066, Adjusted R-squared: 0.00418
## F-statistic: 3.216 on 1 and 527 DF, p-value: 0.07348
Correlation
candidates_sf <- candidates_sf %>%
mutate(
cookpvi_nominate = abs(compete_cookpvi - partisan_score_nominate),
cookpvi_govtrack = abs(compete_cookpvi - partisan_score_govtrack),
ideology_nominate = abs(compete_ideology - partisan_score_nominate))
candidates_sf_summarized <- candidates_sf %>%
group_by(state, district, geometry_state, geometry_distr) %>%
summarize(cookpvi_nominate = 1 - mean(cookpvi_nominate, na.rm = T),
cookpvi_govtrack = 1 - mean(cookpvi_govtrack, na.rm = T),
ideology_nominate = 1 - mean(ideology_nominate, na.rm = T),
.groups = 'keep')
map_plot(fill = 'cookpvi_govtrack',
data = candidates_sf_summarized)