Election Project Log
This file contains a log of the progress on the faces project. Each log-entry contains:
- An overview of the files contents
- Links to the result-markdowns
- A summary of updates
- Overview of next steps
Date: Saturday, March 6, 2021
Topic: New Image quality control algorithm
One Sentence Summary
This table 01 contains a new image quality control that is not based on an MTurk survey, but rather on a no-reference image quality assessment (IQA) algorithm. Our MTurk features increase from 0.015 to 0.021 on a validation set with over 1000 images.
Overview: An overview of what the markdown contains
I repeat our table 01 controlling for images with above average quality. This is (1) to see whether we can get our MTurk labels to increase in significance and (2) to aling with what Todorov et.al (2015) do.
- Configuration 4 - Here the
election_lm is based on the new historical election data and we control for image quality.
NOTE We observe an increase in MTurk r-sqrt signal from 0.015 to 0.021
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
- Table 01: https://rpubs.com/JonasKnecht/election_table01_brisque_control
Next Steps
- Face Editing - I want to set up a working environment for the GAN port and the CNN predictor on AWS, and make sure we’re doing something akin to version control (so that I can come back to this down to the road). I am planning to have a prototype gradient algorithm by the end of next week.
Date: Friday, March 5, 2021
Topic: Election LM with rolling window vote-share
One Sentence Summary
Using the new historical election data, I construct a 3-year rolling-window own-party jacknife-vote-share feature in the election model and we see adjusted R squared increase from 0.067 to 0.098.
Overview: An overview of what the markdown contains
I repeat our table 01 with the inclusion of an new own_party_vote_share_jacknife feature.
- Configuration 4 - Here the
election_lm is based on the new historical election data and now includes a backwards looking three year rolling window of own-party jacknife-vote-share from the house of representatives elections in that state.
NOTE We observe an increase in electon_lm r-sqrt signal from 0.067 to 0.098
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
- Table 01: https://rpubs.com/JonasKnecht/election_table01_hist_vote_share
Next Steps
- I did some digging into the label distribution from the Mturk image quality survey, and it seems they may not be quite as informative and accurate as we had hoped. As such, I’m now looking into some deterministic computer-vision based measures of image quality. There is some interesting stuff out there, and I have a prime candidate for a usable algorithm (called BRISQUE). I will try to make this work (it’s all a bit experimental) with our pipeline in the coming days to see whether we can control for image quality a bit more systematically.
Date: Wednesday, March 3, 2021
Topic: Image quality controlls for MTurk labels
One Sentence Summary
I now include a control for the quality of the election images, which reduces us to an 800 image subset. We see that MTurk signal increases from 0.012 to 0.015.
Overview: An overview of what the markdown contains
I repeat our table 01 with the inclusion of an image quality control. I do this for our old MTurk labels with 4 voters per image as otherwise we would not have enough data for regressions. We see that MTurk signal increases from 0.012 to 0.015 on a data-set with 800 observations.
- Configuration 4 - this uses the
own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.
NOTE Here we observe an increase in the MTurk r-squared signal captured from 0.012 to 0.015 over the previous side-by-side survey results. This now excludes some of the bottom quality images for which we have a quality rating.
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
- Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/election_mturk_quality_control
Date: Monday, March 1, 2021
Topic: High detail side-by-side MTurk results
Overview: An overview of what the markdown contains
This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3. Here we now make use of new high detail MTurk results that use a side-by-side system with higher quality and 10 voters per image. The markdown contains two tables:
- Configuration 4 - this uses the
own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.
NOTE Here we observe an increase in the MTurk r-squared signal captured from 0.012 to 0.078 over the previous side-by-side survey results. Previous iterations had used an verage of 4 MTurkers per image.
- Configuration 4 with individual MTurk labels - this is the same as above, with MTurk labels (attractiveness, competence, dominance, trustworthiness) entering the single-variables models on their own
The second link contains a markdown with single-variable-plots to replicate some of the correlation results from the Todorov et.al (2005) paper (https://science.sciencemag.org/content/308/5728/1623). Here we do single variable regressions of each of our Mturk labels on vote-share and see significant coefficients for all labels. The markdown includes:
Vote-Share Plots for single MTurk labels. For comparison I include a plot with the difference in vote share and competence, which is the only feature that Todorov plot.
Single-variable MTurk regression
Conditional density plots for MTurk labels
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/mturk_sbs_high_detail
Todorov plots: https://rpubs.com/JonasKnecht/mturk_sbs_plots
Date: Wednesday, February 24, 2021
Topic: New Side by Side-by-Side MTurk Results
Overview: An overview of what the markdown contains
This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3. Here we now make use of new MTurk results that use a side-by-side system with higher quality. The markdown contains two tables:
Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.
Configuration 4 with individual MTurk labels - this is the same as above, with MTurk labels (attractiveness, competence, dominance, trustworthiness) entering the single-variables models on their own
The second link contains a markdown with single-variable-plots to replicate plot no.1 from the Todorov et.al (2005) paper. Here we do single variable regressions of each of our Mturk labels on vote-share and see signficant coefficients for Dominance and Attractiveness. The markdown includes:
Vote-Share Plots for single MTurk labels
Single-variable MTurk regression
Conditional density plots for MTurk labels
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/election_table01_mturk_sbs
Todorov plots: https://rpubs.com/JonasKnecht/mturk_sbs_plots
Next Steps
- I am waiting for new historical vote-record data to construct a valid rolling-window state and party level win-record feature that we can include in our election_lm model
Date: Friday, February 19, 2021
Topic: TABLE 01 - New MTurk Results
Overview: An overview of what the markdown contains
This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3. Here we now make use of new MTurk results that use a side-by-side system. The markdown contains two tables:
Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.
Configuration 4 with individual MTurk labels - this is the same as above, with MTurk labels (attractiveness, competence, dominance, trustworthiness) entering the single-variables models on their own
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
- Table 01 with new MTurk survey results: https://rpubs.com/JonasKnecht/election_table01_mturk_sbs
Next Steps
I am waiting for new historical vote-record data to construct a valid rolling-window state and party level win-record feature that we can include in our election_lm model
Since these MTurk labels are still insignificant I think the next step includes:
- Controlling for side by side quality of images in our survey
- Figuring out what impact more voters will have on our statistical significance
- Potentially splitting our regressions at a more granular level (i.e gender split, race split, etc.)
Date: Friday, January 29, 2021
Topic: TABLE 01 - Vote Share Configurations
Overview: An overview of what the markdown contains
This update contains further iterations on the election table 01, using vote-share instead of the win-rate as in configuration 3.
Configuration 4 - this uses the own_party_vote_share_jacknife instead of the win-record (as in config.3). This is a parties average vote share in a state over all years excluding the current.
Configuration 5 - this uses the own_party_vote_share_ratio of the individuals own/opponent party average vote share in a state over all years excluding the current.
Configuration 6 - this uses an interaction of both own_party_vote_share_jacknife and own_party_vote_share_ratio
Configuration 7 - this uses non-Jacknife features. Here I compute own_party_vote_share_total as the average vote share of a party over the entire sample in the training_df and then take those values for each state-party grouping and merge this with the validation_df. This now includes the vote-share of a party over all years for a particular state, but is computed on data outside the validation df. I repeat the same thing to get a own_party_vote_share_ratio_total. This is the still going to be overfitting, but hopefully not as blatantly as just computing these directly on the validation df.
For more detailed definitions of the terms please see the table sections on the markdown linked below
Link:
- Baseline table 01: https://rpubs.com/JonasKnecht/election_table01_win_voteshare
Experiments
This section will house links to markdown files that are not project code. This is intended to make sure we don’t confuse actual regression tables with smaller explorations.
- Since we are trying to make sure we are capturing signal reported in literature (esp. Todorov et.al. (2005)) I tried to replicate their figure 1(b) with out MTurk labels. For this I plot the vote-share for individual candidates together with their competence score, as well as the other MTurk labels. I also consider the relationship between a winning candidates vote-share-margin (i.e by how much they won) and their competence label (table 2 is titled as such). I also plot the difference in MTurk label distirbutions between win/loose.
Experiments Link
- MTurk Santity Checks: https://rpubs.com/JonasKnecht/election_mturk_plots
Next Steps
- Since our historical win predictors are super weak, which we believe stems from the lack of historical data in our dataset, we aim to include:
- Party-state level historical win record
- Separate this by local vs. hor election type
- This will be work for an undergrad RA
Mugshot tables that split by gender and include extra columns for further MTurk detail
Comparing the three different CNNs we have access to now: (1) Logans baseline (2) Minority class sampler (3) Jims CNN
Date: Monday, January 25, 2021
Topic: TABLE 01 - Configuration 2
Overview: An overview of what the markdown contains
This update contains links to an updated table 01 with new variables and a stronger baseline lm.
- Table 1 Configuration 2: I now include an updated linear model for elections. This now includes an interaction term between
state x party which leads to a sizable increase in adjusted-\(R^2\) to 0.0692. This table also includes two new variables capturing the parties win-rate prior to the current election, and the parties total win rate excluding the current one. In summary we have:
- Single Variable Model:
- Election LM
- This is a GLM based on the three election features we have:
state, party, year
- Includes interaction
state x party
- Win-Rate-Prior
- The win rate of that party in that state over all previous election years
- Win-Rate-Total
- The win rate of that party in that state over all elections years excluding the current
- Gender
- Skin-Tone: 18 skin-tone variants from MTurk
- MTurk Features
- p_hat_cnn
- This is interpreted the same was as in mugshots, now the CNN is trained on our elections data
- Multiple Variable Model:
- Election LM + Win-Rate-Prior
- Election LM + Win-Rate-Total
- Election LM + (…) + Sex
- Election LM + (…) + Skin-Tone
- Election LM + (…) + MTurk Features
Link:
- Baseline table 01: https://rpubs.com/JonasKnecht/election_table01
Next Steps
- Since our historical win predictors are super weak, which we believe stems from the lack of historical data in our dataset, we aim to include:
- Party-state level historical win record
- Separate this by local vs. hor election type
- Automate the table-making-process (this is currently taking too long to iterate at any speed)
Date: Saturday, January 23, 2021
Topic: TABLE 01 - Configuration 1 - FIRST DRAFT
Overview: An overview of what the markdown contains
This update contains links to a first pass at a table 01 for the elections data.
- Table 1 Configuration 1: This table is based on the agreed upon outline. As such it displays the adjusted R^2 and AUC from regressions of individual variables and combined models on the final release outcome. The models are:
- Single Variable Model:
- Election LM
- This is a GLM based on the three election features we have:
state, party, year
- Gender
- Skin-Tone: 18 skin-tone variants from MTurk
- MTurk Features
- p_hat_cnn
- This is interpreted the same was as in mugshots, now the CNN is trained on our elections data
- Multiple Variable Model:
- Election LM + Sex
- Election LM + Sex + Skin-Tone
- Election LM + Sex + Skin-Tone + MTurk Features
Link:
- Baseline table 01: https://rpubs.com/JonasKnecht/election_table01
Next Steps
Lets discuss this on Tuesday