Election Project Log

This file contains a log of the progress on the faces project. Each log-entry contains:



Date: Saturday, March 6, 2021

Topic: New Image quality control algorithm

One Sentence Summary

This table 01 contains a new image quality control that is not based on an MTurk survey, but rather on a no-reference image quality assessment (IQA) algorithm. Our MTurk features increase from 0.015 to 0.021 on a validation set with over 1000 images.

Overview: An overview of what the markdown contains

I repeat our table 01 controlling for images with above average quality. This is (1) to see whether we can get our MTurk labels to increase in significance and (2) to aling with what Todorov et.al (2015) do.

  1. Configuration 4 - Here the election_lm is based on the new historical election data and we control for image quality.

NOTE We observe an increase in MTurk r-sqrt signal from 0.015 to 0.021

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Table 01: https://rpubs.com/JonasKnecht/election_table01_brisque_control

Next Steps

  1. Face Editing - I want to set up a working environment for the GAN port and the CNN predictor on AWS, and make sure we’re doing something akin to version control (so that I can come back to this down to the road). I am planning to have a prototype gradient algorithm by the end of next week.


Date: Friday, March 5, 2021

Topic: Election LM with rolling window vote-share

One Sentence Summary

Using the new historical election data, I construct a 3-year rolling-window own-party jacknife-vote-share feature in the election model and we see adjusted R squared increase from 0.067 to 0.098.

Overview: An overview of what the markdown contains

I repeat our table 01 with the inclusion of an new own_party_vote_share_jacknife feature.

  1. Configuration 4 - Here the election_lm is based on the new historical election data and now includes a backwards looking three year rolling window of own-party jacknife-vote-share from the house of representatives elections in that state.

NOTE We observe an increase in electon_lm r-sqrt signal from 0.067 to 0.098

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Table 01: https://rpubs.com/JonasKnecht/election_table01_hist_vote_share

Next Steps

  1. I did some digging into the label distribution from the Mturk image quality survey, and it seems they may not be quite as informative and accurate as we had hoped. As such, I’m now looking into some deterministic computer-vision based measures of image quality. There is some interesting stuff out there, and I have a prime candidate for a usable algorithm (called BRISQUE). I will try to make this work (it’s all a bit experimental) with our pipeline in the coming days to see whether we can control for image quality a bit more systematically.


Date: Wednesday, March 3, 2021

Topic: Image quality controlls for MTurk labels

One Sentence Summary

I now include a control for the quality of the election images, which reduces us to an 800 image subset. We see that MTurk signal increases from 0.012 to 0.015.

Overview: An overview of what the markdown contains

I repeat our table 01 with the inclusion of an image quality control. I do this for our old MTurk labels with 4 voters per image as otherwise we would not have enough data for regressions. We see that MTurk signal increases from 0.012 to 0.015 on a data-set with 800 observations.

  1. Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.

NOTE Here we observe an increase in the MTurk r-squared signal captured from 0.012 to 0.015 over the previous side-by-side survey results. This now excludes some of the bottom quality images for which we have a quality rating.

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/election_mturk_quality_control


Date: Monday, March 1, 2021

Topic: High detail side-by-side MTurk results

Overview: An overview of what the markdown contains

This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3. Here we now make use of new high detail MTurk results that use a side-by-side system with higher quality and 10 voters per image. The markdown contains two tables:

  1. Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.

NOTE Here we observe an increase in the MTurk r-squared signal captured from 0.012 to 0.078 over the previous side-by-side survey results. Previous iterations had used an verage of 4 MTurkers per image.

  1. Configuration 4 with individual MTurk labels - this is the same as above, with MTurk labels (attractiveness, competence, dominance, trustworthiness) entering the single-variables models on their own

The second link contains a markdown with single-variable-plots to replicate some of the correlation results from the Todorov et.al (2005) paper (https://science.sciencemag.org/content/308/5728/1623). Here we do single variable regressions of each of our Mturk labels on vote-share and see significant coefficients for all labels. The markdown includes:

  1. Vote-Share Plots for single MTurk labels. For comparison I include a plot with the difference in vote share and competence, which is the only feature that Todorov plot.

  2. Single-variable MTurk regression

  3. Conditional density plots for MTurk labels

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/mturk_sbs_high_detail

  2. Todorov plots: https://rpubs.com/JonasKnecht/mturk_sbs_plots



Date: Wednesday, February 24, 2021

Topic: New Side by Side-by-Side MTurk Results

Overview: An overview of what the markdown contains

This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3. Here we now make use of new MTurk results that use a side-by-side system with higher quality. The markdown contains two tables:

  1. Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.

  2. Configuration 4 with individual MTurk labels - this is the same as above, with MTurk labels (attractiveness, competence, dominance, trustworthiness) entering the single-variables models on their own

The second link contains a markdown with single-variable-plots to replicate plot no.1 from the Todorov et.al (2005) paper. Here we do single variable regressions of each of our Mturk labels on vote-share and see signficant coefficients for Dominance and Attractiveness. The markdown includes:

  1. Vote-Share Plots for single MTurk labels

  2. Single-variable MTurk regression

  3. Conditional density plots for MTurk labels

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/election_table01_mturk_sbs

  2. Todorov plots: https://rpubs.com/JonasKnecht/mturk_sbs_plots

Next Steps

  1. I am waiting for new historical vote-record data to construct a valid rolling-window state and party level win-record feature that we can include in our election_lm model


Date: Friday, February 19, 2021

Topic: TABLE 01 - New MTurk Results

Overview: An overview of what the markdown contains

This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3. Here we now make use of new MTurk results that use a side-by-side system. The markdown contains two tables:

  1. Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.

  2. Configuration 4 with individual MTurk labels - this is the same as above, with MTurk labels (attractiveness, competence, dominance, trustworthiness) entering the single-variables models on their own

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/election_table01_mturk_sbs

Next Steps

  1. I am waiting for new historical vote-record data to construct a valid rolling-window state and party level win-record feature that we can include in our election_lm model

  2. Since these MTurk labels are still insignificant I think the next step includes:

  • Controlling for side by side quality of images in our survey
  • Figuring out what impact more voters will have on our statistical significance
  • Potentially splitting our regressions at a more granular level (i.e gender split, race split, etc.)


Date: Friday, January 29, 2021

Topic: TABLE 01 - Vote Share Configurations

Overview: An overview of what the markdown contains

This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3.

  1. Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.

  2. Configuration 5 - this uses the own_party_vote_share_ratio of the individuals own/opponent party average vote share in a state over all years excluding the current.

  3. Configuration 6 - this uses an interaction of both own_party_vote_share_jacknife and own_party_vote_share_ratio

  4. Configuration 7 - this uses non-Jacknife features. Here I compute own_party_vote_share_total as the average vote share of a party over the entire sample in the training_df and then take those values for each state-party grouping and merge this with the validation_df. This now includes the vote-share of a party over all years for a particular state, but is computed on data outside the validation df. I repeat the same thing to get a own_party_vote_share_ratio_total. This is the still going to be overfitting, but hopefully not as blatantly as just computing these directly on the validation df.

For more detailed definitions of the terms please see the table sections on the markdown linked below

Link:

  1. Baseline table 01: https://rpubs.com/JonasKnecht/election_table01_win_voteshare

Experiments

This section will house links to markdown files that are not project code. This is intended to make sure we don’t confuse actual regression tables with smaller explorations.

  1. Since we are trying to make sure we are capturing signal reported in literature (esp. Todorov et.al. (2005)) I tried to replicate their figure 1(b) with out MTurk labels. For this I plot the vote-share for individual candidates together with their competence score, as well as the other MTurk labels. I also consider the relationship between a winning candidates vote-share-margin (i.e by how much they won) and their competence label (table 2 is titled as such). I also plot the difference in MTurk label distirbutions between win/loose.

Experiments Link

  1. MTurk Santity Checks: https://rpubs.com/JonasKnecht/election_mturk_plots

Next Steps

  1. Since our historical win predictors are super weak, which we believe stems from the lack of historical data in our dataset, we aim to include:
  • Party-state level historical win record
  • Separate this by local vs. hor election type
  • This will be work for an undergrad RA
  1. Mugshot tables that split by gender and include extra columns for further MTurk detail

  2. Comparing the three different CNNs we have access to now: (1) Logans baseline (2) Minority class sampler (3) Jims CNN



Date: Monday, January 25, 2021

Topic: TABLE 01 - Configuration 2

Overview: An overview of what the markdown contains

This update contains links to an updated table 01 with new variables and a stronger baseline lm.

  1. Table 1 Configuration 2: I now include an updated linear model for elections. This now includes an interaction term between state x party which leads to a sizable increase in adjusted-\(R^2\) to 0.0692. This table also includes two new variables capturing the parties win-rate prior to the current election, and the parties total win rate excluding the current one. In summary we have:
  1. Single Variable Model:
  • Election LM
    • This is a GLM based on the three election features we have: state, party, year
    • Includes interaction state x party
  • Win-Rate-Prior
    • The win rate of that party in that state over all previous election years
  • Win-Rate-Total
    • The win rate of that party in that state over all elections years excluding the current
  • Gender
  • Skin-Tone: 18 skin-tone variants from MTurk
  • MTurk Features
  • p_hat_cnn
    • This is interpreted the same was as in mugshots, now the CNN is trained on our elections data
  1. Multiple Variable Model:
  • Election LM + Win-Rate-Prior
  • Election LM + Win-Rate-Total
  • Election LM + (…) + Sex
    • p_hat_cnn
  • Election LM + (…) + Skin-Tone
    • p_hat_cnn
  • Election LM + (…) + MTurk Features
    • p_hat_cnn

Link:

  1. Baseline table 01: https://rpubs.com/JonasKnecht/election_table01

Next Steps

  1. Since our historical win predictors are super weak, which we believe stems from the lack of historical data in our dataset, we aim to include:
  • Party-state level historical win record
  • Separate this by local vs. hor election type
  1. Automate the table-making-process (this is currently taking too long to iterate at any speed)


Date: Saturday, January 23, 2021

Topic: TABLE 01 - Configuration 1 - FIRST DRAFT

Overview: An overview of what the markdown contains

This update contains links to a first pass at a table 01 for the elections data.

  1. Table 1 Configuration 1: This table is based on the agreed upon outline. As such it displays the adjusted R^2 and AUC from regressions of individual variables and combined models on the final release outcome. The models are:
  1. Single Variable Model:
  • Election LM
    • This is a GLM based on the three election features we have: state, party, year
  • Gender
  • Skin-Tone: 18 skin-tone variants from MTurk
  • MTurk Features
  • p_hat_cnn
    • This is interpreted the same was as in mugshots, now the CNN is trained on our elections data
  1. Multiple Variable Model:
  • Election LM + Sex
    • p_hat_cnn
  • Election LM + Sex + Skin-Tone
    • p_hat_cnn
  • Election LM + Sex + Skin-Tone + MTurk Features
    • p_hat_cnn

Link:

  1. Baseline table 01: https://rpubs.com/JonasKnecht/election_table01

Next Steps

Lets discuss this on Tuesday