Summary

This document contains two new tables which build off the configuration 3 table from previous updates. These new configurations (4-7) include:

  1. Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.

  2. Configuration 5 - this uses the own_party_vote_share_ratio of the individuals own/opponent party average vote share in a state over all years excluding the current.

  3. Configuration 6 - this uses an interaction of both own_party_vote_share_jacknife and own_party_vote_share_ratio

  4. Configuration 7 - this uses non-Jacknife features. Here I compute own_party_vote_share_total as the average vote share of a party over the entire sample in the training_df and then take those values for each state-party grouping and merge this with the validation_df. This now includes the vote-share of a party over all years for a particular state, but is computed on data outside the validation df. I repeat the same thing to get a own_party_vote_share_ratio_total. This is the still going to be overfitting, but hopefully not as blatantly as just computing these directly on the validation df.

NOTE I also plot the densities of these features conditional on did_win. This gives a potential explanation for their lack of explanatory power (all their adj-r-sqrt are quite low), as none of the distributions vary massively between different win outcomes.

NOTE Throughout this entire analysis I am splitting tables to predict both (1) Win and (2) Vote share. This should be clear from the column-group sub-headings. For predicting vote_share the baseline model is linear.

Definitions

Before jumping into regression output, I will put all definitions of important terms here:

  1. Vote-Share : In all tables below, when vote-share is a LHS variable this means the vote-share of a specific candidate election. I.e. how much of the total vote did a specific candidate get in their election.

  2. Own-Party-Vote-Share-Jacknife : This is a feature and computed as the “average vote share of my party in this state over all years excluding the current”

  3. Own-Party-Vote-Share-Ratio : This is a feature and computed as the “average vote share of my party in this state over all years excluding the current / opponents parties vote share in this state over all years excluding the current”

  4. Own-Party-Vote-Share : In configuration (7) I compute a non-Jacknife version of this feature. This is computed as the the “average vote share of my party in this state over all years including the current”. However, this is computed on the train_df and NOT directly on the val_df. I then transfer these values by their state, party key to the val_df. This is meant to reduce the amount of overfitting.

  5. Own-Party-Vote-Share-Ratio : In configuration (7) this is computed with the same method as item (4) above, now just taking the ratio of “average vote share of my party in this state over all years / opponents parties vote share in this state over all years”

NOTE I am keeping the definition of variables above each table configuration as before. This section will be in all new markdowns and is meant to make it easier to find definitions.



Table 1 - Configuration 4

Replacing own_party_win_rate with own_party_vote_share_jacknife in the election lm. This now includes:

  1. Party (limited to Democrat vs. Republican)
  2. Vote-share

Own Party Vote-share-Jacknife is computed as: “average vote share of my party in this state over all years excluding the current”

Table 01 - Version 04 - Election Regressions
Fit measured in adjusted R squared and AUC
Model Configuration Election Outcome Vote Share
Adjusted R Squared ROC AUC Adjusted R squared
Single Variable Model
Election LM 0.0258 0.6015 0.0163
Lower 95% C.I. 0.0128 0.5665 0.0053
Upper 95% C.I. 0.0444 0.6364 0.0334
Sex 0.0153 0.5605 0.0181
0.0052 0.5315 0.0067
0.0312 0.5895 0.0352
Skine-Tone −0.0049 0.5454 −0.0055
−0.0013 0.5107 −0.0015
0.0259 0.5801 0.0278
MTurk Features 0.0017 0.5434 0.0026
−0.0016 0.5078 −0.0014
0.0156 0.5790 0.0172
P_hat_cnn 0.0972 0.6815 0.1016
0.0677 0.6487 0.0743
0.1284 0.7142 0.1313
Combined Variable Model
Election LM + P_hat_cnn 0.1172 0.7005 0.1185
0.0872 0.6685 0.0918
0.1541 0.7325 0.1535
Election LM + Sex 0.0388 0.6215 0.0367
0.0218 0.5870 0.0201
0.0643 0.6559 0.0595
Election LM + Sex + P_hat_cnn 0.1197 0.7042 0.1244
0.0918 0.6722 0.0949
0.1554 0.7361 0.1578
Election LM + Sex + Skin-Tone 0.0326 0.6330 0.0304
0.0307 0.5989 0.0266
0.0728 0.6672 0.0775
Election LM + Sex + Skin-Tone + P_hat_cnn 0.1150 0.7123 0.1205
0.1015 0.6807 0.1076
0.1666 0.7439 0.1736
Election LM + Sex + Skin-Tone + MTurk 0.0332 0.6359 0.0326
0.0315 0.6019 0.0326
0.0808 0.6700 0.0825
Election LM + Sex + Skin-Tone + MTurk + P_hat_cnn 0.1158 0.7161 0.1221
0.1030 0.6847 0.1104
0.1726 0.7475 0.1819

Table 1 - Configuration 5

Replacing own_party_win_rate with own_party_vote_share_ratio in the election lm. This now includes:

  1. Party (limited to Democrat vs. Republican)
  2. Vote-share-ratio

Own Party Vote share ratio is computed as: “average vote share of my party in this state over all years excluding the current / opponents parties vote share in this state over all years excluding the current”

Table 01 - Version 05 - Election Regressions
Fit measured in adjusted R squared and AUC
Model Configuration Election Outcome Vote Share
Adjusted R Squared ROC AUC Adjusted R Squared
Single Variable Model
Election LM 0.0243 0.5959 0.0079
Lower 95% C.I. 0.0114 0.5608 0.0006
Upper 95% C.I. 0.0421 0.6310 0.0216
Sex 0.0153 0.5605 0.0181
0.0051 0.5315 0.0074
0.0312 0.5895 0.0369
Skine-Tone −0.0049 0.5454 −0.0055
−0.0010 0.5107 −0.0015
0.0260 0.5801 0.0272
MTurk Features 0.0017 0.5434 0.0026
−0.0017 0.5078 −0.0010
0.0163 0.5790 0.0179
P_hat_cnn 0.0972 0.6815 0.1016
0.0696 0.6487 0.0747
0.1301 0.7142 0.1332
Combined Variable Model
Election LM + P_hat_cnn 0.1177 0.6993 0.1115
0.0925 0.6672 0.0824
0.1516 0.7314 0.1464
Election LM + Sex 0.0375 0.6202 0.0278
0.0205 0.5856 0.0128
0.0610 0.6547 0.0486
Election LM + Sex + P_hat_cnn 0.1202 0.7027 0.1171
0.0940 0.6708 0.0882
0.1554 0.7347 0.1522
Election LM + Sex + Skin-Tone 0.0311 0.6350 0.0214
0.0267 0.6009 0.0202
0.0745 0.6692 0.0657
Election LM + Sex + Skin-Tone + P_hat_cnn 0.1152 0.7113 0.1131
0.1020 0.6796 0.0997
0.1668 0.7429 0.1658
Election LM + Sex + Skin-Tone + MTurk 0.0318 0.6381 0.0235
0.0294 0.6041 0.0235
0.0789 0.6721 0.0726
Election LM + Sex + Skin-Tone + MTurk + P_hat_cnn 0.1162 0.7153 0.1147
0.1043 0.6839 0.1050
0.1729 0.7468 0.1712

Density plots

After seeing how weak the election_lm is with both the own_party_vote_share_jacknife and own_party_vote_share_ratio features, I plot their conditional densities below. There just seems to be not enough significant variation between the did_win_conditional == True and == False groups for these features to have legitimate explanatory power.

Table 1 - Configuration 6

I now include both own_party_vote_share_jacknife and own_party_vote_share_ratio in the election_lm such that the model becomes:

y = party + vote_share_jacknife + vote_share_ratio

Table 01 - Version 06 - Election Regressions
Fit measured in adjusted R squared and AUC
Model Configuration Election Outcome Vote Share
Adjusted R Squared ROC AUC Adjusted R Squared
Single Variable Model
Election LM 0.0253 0.5988 0.0172
Lower 95% C.I. 0.0113 0.5639 0.0059
Upper 95% C.I. 0.0441 0.6338 0.0337
Sex 0.0153 0.5605 0.0181
0.0043 0.5315 0.0070
0.0307 0.5895 0.0358
Skine-Tone −0.0049 0.5454 −0.0055
−0.0015 0.5107 −0.0016
0.0260 0.5801 0.0278
MTurk Features 0.0017 0.5434 0.0026
−0.0014 0.5078 −0.0012
0.0152 0.5790 0.0160
P_hat_cnn 0.0972 0.6815 0.1016
0.0702 0.6487 0.0740
0.1277 0.7142 0.1325
Combined Variable Model
Election LM + P_hat_cnn 0.1177 0.7000 0.1192
0.0897 0.6679 0.0909
0.1536 0.7320 0.1535
Election LM + Sex 0.0384 0.6213 0.0376
0.0217 0.5868 0.0211
0.0630 0.6557 0.0614
Election LM + Sex + P_hat_cnn 0.1202 0.7036 0.1251
0.0913 0.6716 0.0960
0.1553 0.7355 0.1627
Election LM + Sex + Skin-Tone 0.0321 0.6336 0.0314
0.0283 0.5994 0.0276
0.0759 0.6677 0.0782
Election LM + Sex + Skin-Tone + P_hat_cnn 0.1153 0.7118 0.1212
0.0992 0.6802 0.1071
0.1687 0.7434 0.1751
Election LM + Sex + Skin-Tone + MTurk 0.0328 0.6367 0.0335
0.0311 0.6027 0.0317
0.0798 0.6707 0.0838
Election LM + Sex + Skin-Tone + MTurk + P_hat_cnn 0.1163 0.7156 0.1228
0.1057 0.6841 0.1120
0.1735 0.7470 0.1819

Table 1 - Configuration 7

I now compute a non-jacknifed version of the own_party_vote_share and own_party_vote_share_ratio on the train_df. Then I transfer those valued, based on state, party groups, to the validation set. The hope is that the overfitting will be limited as the validation set contained ‘technically’ new observations.

This new election_lm will thus be using the own_party_vote_share_total and own_party_vote_share_ratio_total variables as interactions where each does not exclude the current year. However, as outlined above, these features are computed on different data.

Table 01 - Version 07 - Election Regressions
Fit measured in adjusted R squared and AUC
Model Configuration Election Outcome Vote Share
Adjusted R Squared ROC AUC Adjusted R Squared
Single Variable Model
Election LM 0.0590 0.6395 0.0589
Lower 95% C.I. 0.0365 0.6054 0.0391
Upper 95% C.I. 0.0853 0.6735 0.0837
Sex 0.0153 0.5605 0.0181
0.0050 0.5315 0.0063
0.0314 0.5895 0.0336
Skine-Tone −0.0049 0.5454 −0.0055
−0.0008 0.5107 −0.0022
0.0243 0.5801 0.0274
MTurk Features 0.0017 0.5434 0.0026
−0.0015 0.5078 −0.0011
0.0159 0.5790 0.0166
P_hat_cnn 0.0972 0.6815 0.1016
0.0682 0.6487 0.0731
0.1302 0.7142 0.1332
Combined Variable Model
Election LM + P_hat_cnn 0.1422 0.7197 0.1499
0.1118 0.6885 0.1186
0.1750 0.7509 0.1854
Election LM + Sex 0.0692 0.6544 0.0756
0.0480 0.6208 0.0524
0.0998 0.6880 0.1056
Election LM + Sex + P_hat_cnn 0.1437 0.7220 0.1543
0.1161 0.6908 0.1203
0.1807 0.7531 0.1914
Election LM + Sex + Skin-Tone 0.0620 0.6622 0.0687
0.0532 0.6288 0.0603
0.1109 0.6956 0.1186
Election LM + Sex + Skin-Tone + P_hat_cnn 0.1380 0.7279 0.1497
0.1219 0.6970 0.1314
0.1917 0.7587 0.2065
Election LM + Sex + Skin-Tone + MTurk 0.0623 0.6646 0.0706
0.0573 0.6314 0.0641
0.1153 0.6979 0.1231
Election LM + Sex + Skin-Tone + MTurk + P_hat_cnn 0.1386 0.7314 0.1510
0.1267 0.7008 0.1349
0.1971 0.7621 0.2098

More Density plots

I repeat the same plots for the new non-Jacknife own_party_vote_share_total and own_party_vote_share_ratio_total variables.