AEP 2026 Control Survey — Quota Planning

# Same loader as AEP_2026_PRE.Rmd: pre_anon.csv off Drive, drop the two
# Qualtrics meta rows, keep only real GenerateLink responses.
pre_csv_path <- "/Users/annemariegreen/Library/CloudStorage/GoogleDrive-annemarie_green@berkeley.edu/.shortcut-targets-by-id/1Uhd-WKcvvFI4EZWYZO1MmfDkGwsfw7xb/AEP Impact Evaluation/A. Data/AEP_2026_pre_anon.csv"

if (!file.exists(pre_csv_path)) {
  stop("AEP_2026_pre_anon.csv not found at:\n  ", pre_csv_path,
       "\nRun AEP_2026_ID_Cleaning.Rmd (in 2026 Matching/) first.")
}

pre_raw <- read.csv(pre_csv_path,
                    header = TRUE, sep = ",",
                    na.strings = c("", "NA"),
                    stringsAsFactors = FALSE)

pre <- pre_raw %>% slice(-1, -2) %>% filter(DistributionChannel == "gl")

# state_from is duplicated in the export (numeric code + text label).
# check.names tags the second one as state_from.1 -- the text label.
pre <- pre %>%
  rename(state_from_num  = state_from,
         state_from_name = state_from.1)

message("Loaded pre_anon.csv: ", nrow(pre), " responses")
# Survey-side only (per task spec): home_region from state_from_name,
# travel state straight from state_travel. No _sf columns used here.
# Map mirrors AEP_2026_PRE.Rmd's state_region_map (5-region scheme).
state_region_map <- tribble(
  ~state, ~home_region,
  "AL", "South",          "AK", "West Coast",     "AZ", "Mountain West",
  "AR", "South",          "CA", "West Coast",     "CO", "Mountain West",
  "CT", "Northeast",      "DE", "Northeast",      "DC", "South",
  "FL", "South",          "GA", "South",          "HI", "West Coast",
  "ID", "Mountain West",  "IL", "Midwest",        "IN", "Midwest",
  "IA", "Midwest",        "KS", "Midwest",        "KY", "South",
  "LA", "South",          "ME", "Northeast",      "MD", "South",
  "MA", "Northeast",      "MI", "Midwest",        "MN", "Midwest",
  "MS", "South",          "MO", "Midwest",        "MT", "Mountain West",
  "NE", "Midwest",        "NV", "Mountain West",  "NH", "Northeast",
  "NJ", "Northeast",      "NM", "Mountain West",  "NY", "Northeast",
  "NC", "South",          "ND", "Midwest",        "OH", "Midwest",
  "OK", "South",          "OR", "West Coast",     "PA", "Northeast",
  "RI", "Northeast",      "SC", "South",          "SD", "Midwest",
  "TN", "South",          "TX", "South",          "UT", "Mountain West",
  "VT", "Northeast",      "VA", "South",          "WA", "West Coast",
  "WV", "South",          "WI", "Midwest",        "WY", "Mountain West"
)

state_name_to_code <- c(setNames(state.abb, state.name),
                        "Washington D.C." = "DC")

pre <- pre %>%
  mutate(home_state = unname(state_name_to_code[state_from_name])) %>%
  left_join(state_region_map, by = c("home_state" = "state"))

REGION_LEVELS <- c("South", "Northeast", "West Coast", "Midwest", "Mountain West")
pre$home_region <- factor(pre$home_region, levels = REGION_LEVELS)

message(sum(!is.na(pre$home_region)), " of ", nrow(pre),
        " responses have home_region; ",
        sum(is.na(pre$home_region)), " missing/unmappable.")

1 What this is

Planning the control survey to compare against AEP 2026 students. We’re recruiting ~700 18-year-olds on Prolific, and each respondent gets randomly assigned a “travel state” to evaluate. The point is to mirror where AEP students from each region actually went, so the control comparison is anchored on the same destinations.

This Rmd pulls the AEP pre-survey to figure out (a) the home-region distribution and (b) within each region, where students actually traveled — that’s what the Qualtrics destination randomizer is built from.

Note: everything below uses survey-self-reported state_from_name and state_travel, not AEP’s internal data. For the control study we only need to mirror what the students themselves said.

2 Distribution of respondents by home region

Home-region distribution from the AEP pre-survey.

region_tbl <- pre %>%
  filter(!is.na(home_region)) %>%
  count(home_region, .drop = FALSE, name = "N") %>%
  mutate(`%` = sprintf("%.1f%%", 100 * N / sum(N))) %>%
  arrange(home_region)

# Total row at the bottom for quick sanity check.
region_tbl <- bind_rows(
  region_tbl,
  tibble(home_region = "Total",
         N           = sum(region_tbl$N),
         `%`         = "100.0%")
)

kable(region_tbl,
      col.names = c("Home region", "N", "%"),
      caption = "AEP students by home region (survey-reported)",
      align = c("l", "r", "r")) %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover")) %>%
  row_spec(nrow(region_tbl), bold = TRUE)
AEP students by home region (survey-reported)
Home region N %
South 193 28.1%
Northeast 159 23.2%
West Coast 139 20.3%
Midwest 90 13.1%
Mountain West 105 15.3%
Total 686 100.0%

3 Travel-state distribution within each home region

Within each home region, this is where students actually went. N is the count, % is the share within that home region (so each region’s column sums to 100%). This is the source for the Qualtrics destination randomizer — we want to over-sample the popular destinations and under-sample the travel states that only got 1-3% of students, so the control distribution looks AEP-shaped.

region_state_tbl <- pre %>%
  filter(!is.na(home_region), !is.na(state_travel)) %>%
  count(home_region, state_travel, name = "N") %>%
  group_by(home_region) %>%
  mutate(`%` = sprintf("%.1f%%", 100 * N / sum(N))) %>%
  arrange(home_region, desc(N), state_travel) %>%
  ungroup()

kable(region_state_tbl,
      col.names = c("Home region", "Travel state", "N", "% of region"),
      caption = "Travel destinations by home region (survey-reported)",
      align = c("l", "l", "r", "r")) %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover")) %>%
  collapse_rows(columns = 1, valign = "top")
Travel destinations by home region (survey-reported)
Home region Travel state N % of region
South California 33 17.1%
Utah 22 11.4%
New York 15 7.8%
Oregon 14 7.3%
Colorado 11 5.7%
New Jersey 10 5.2%
New Hampshire 8 4.1%
Ohio 8 4.1%
Connecticut 7 3.6%
Delaware 7 3.6%
Minnesota 6 3.1%
Pennsylvania 6 3.1%
Maine 5 2.6%
Vermont 5 2.6%
Kansas 4 2.1%
Massachusetts 4 2.1%
Texas 4 2.1%
Washington 4 2.1%
Alaska 3 1.6%
Arizona 3 1.6%
Nevada 3 1.6%
Hawaii 2 1.0%
Illinois 2 1.0%
New Mexico 2 1.0%
Florida 1 0.5%
Idaho 1 0.5%
Montana 1 0.5%
South Carolina 1 0.5%
Virginia 1 0.5%
Northeast Utah 25 15.7%
Texas 17 10.7%
California 16 10.1%
Oregon 15 9.4%
Tennessee 15 9.4%
Florida 10 6.3%
Kansas 10 6.3%
Minnesota 7 4.4%
Ohio 5 3.1%
Idaho 4 2.5%
Alaska 3 1.9%
Arizona 3 1.9%
Arkansas 3 1.9%
Illinois 3 1.9%
Virginia 3 1.9%
Wisconsin 3 1.9%
Louisiana 2 1.3%
Mississippi 2 1.3%
Montana 2 1.3%
Nevada 2 1.3%
South Carolina 2 1.3%
Washington 2 1.3%
Colorado 1 0.6%
Georgia 1 0.6%
Hawaii 1 0.6%
New York 1 0.6%
North Carolina 1 0.6%
West Coast Texas 13 9.4%
Tennessee 11 8.0%
Utah 10 7.2%
Virginia 10 7.2%
New York 8 5.8%
Maryland 6 4.3%
New Hampshire 6 4.3%
New Jersey 6 4.3%
Colorado 5 3.6%
Florida 5 3.6%
Connecticut 4 2.9%
Louisiana 4 2.9%
Minnesota 4 2.9%
Pennsylvania 4 2.9%
California 3 2.2%
Georgia 3 2.2%
Hawaii 3 2.2%
Kansas 3 2.2%
Massachusetts 3 2.2%
North Carolina 3 2.2%
Vermont 3 2.2%
Wisconsin 3 2.2%
Arkansas 2 1.4%
Illinois 2 1.4%
Mississippi 2 1.4%
Montana 2 1.4%
Ohio 2 1.4%
South Carolina 2 1.4%
Washington D.C. 2 1.4%
Idaho 1 0.7%
Maine 1 0.7%
Nevada 1 0.7%
New Mexico 1 0.7%
Midwest California 8 8.9%
Texas 8 8.9%
New Jersey 7 7.8%
Utah 7 7.8%
Massachusetts 4 4.4%
New Hampshire 4 4.4%
Oregon 4 4.4%
Arkansas 3 3.3%
Colorado 3 3.3%
Florida 3 3.3%
Nevada 3 3.3%
North Carolina 3 3.3%
Tennessee 3 3.3%
Washington 3 3.3%
Washington D.C. 3 3.3%
Arizona 2 2.2%
Connecticut 2 2.2%
Maryland 2 2.2%
New Mexico 2 2.2%
New York 2 2.2%
Pennsylvania 2 2.2%
Vermont 2 2.2%
Virginia 2 2.2%
Alaska 1 1.1%
Georgia 1 1.1%
Idaho 1 1.1%
Louisiana 1 1.1%
Maine 1 1.1%
Ohio 1 1.1%
South Carolina 1 1.1%
Wisconsin 1 1.1%
Mountain West New York 9 8.6%
Texas 8 7.6%
Ohio 7 6.7%
New Hampshire 6 5.7%
Arkansas 5 4.8%
Minnesota 5 4.8%
Oregon 5 4.8%
California 4 3.8%
Kansas 4 3.8%
Tennessee 4 3.8%
Virginia 4 3.8%
Alaska 3 2.9%
Louisiana 3 2.9%
Maryland 3 2.9%
Massachusetts 3 2.9%
New Jersey 3 2.9%
Pennsylvania 3 2.9%
South Carolina 3 2.9%
Vermont 3 2.9%
Connecticut 2 1.9%
Delaware 2 1.9%
Georgia 2 1.9%
Hawaii 2 1.9%
Maine 2 1.9%
Mississippi 2 1.9%
North Carolina 2 1.9%
Washington 2 1.9%
Illinois 1 1.0%
Utah 1 1.0%
Washington D.C. 1 1.0%
Wisconsin 1 1.0%

4 Tiered randomizer projection

Qualtrics doesn’t let you randomize by proportion directly, so to get the 50/30/20 tier weighting I duplicate the randomizer slots: 10 slots total, picked uniformly (10% each), with the slots split as:

  • 5 slots all point to Tier 1 (top destinations, ≥7% of region) — 50% combined weight
  • 3 slots point to Tier 2 (3–7%) — 30% combined
  • 2 slots point to Tier 3 (long tail, <3%) — 20% combined

Inside the chosen slot, a Randomizer picks one of that tier’s states with flat probability. So per-state probability = tier weight / number of states in tier. For South that works out to ~12.5% per Tier 1 state (50% / 4), ~3.75% per Tier 2 (30% / 8), ~1.18% per Tier 3 (20% / 17). All destinations a region’s students went to are included — nothing’s truncated.

Projected N is the expected per-state count under the recruitment targets (150 per region, 100 for Midwest, although Prolific fill rates aren’t guaranteed). % of region is the AEP share within all destinations (each region’s column sums to 100%).

# Knobs -- tweak to retune. TIER_CUTOFFS define which destinations land
# in each tier; TIER_WEIGHTS are the combined probabilities (achieved in
# Qualtrics via slot duplication: 5/3/2 of 10 evenly-randomized slots).
TIER_CUTOFFS   <- c(7, 3)                # Tier 1 / Tier 2 lower bounds (% of region)
TIER_WEIGHTS   <- c(0.50, 0.30, 0.20)    # Tier 1 / 2 / 3 combined weights
region_targets <- c("South" = 150, "Northeast" = 150, "West Coast" = 150,
                    "Midwest" = 100, "Mountain West" = 150)

# Per-state tier + projected counts. Within each (region, tier) the
# combined tier weight is split equally across the tier's states.
proj <- region_state_tbl %>%
  group_by(home_region) %>%
  mutate(pct_num = 100 * N / sum(N)) %>%
  ungroup() %>%
  mutate(Tier = case_when(
    pct_num >= TIER_CUTOFFS[1] ~ "1",
    pct_num >= TIER_CUTOFFS[2] ~ "2",
    TRUE                        ~ "3"
  )) %>%
  group_by(home_region, Tier) %>%
  mutate(n_in_tier = n()) %>%
  ungroup() %>%
  mutate(target_N     = unname(region_targets[as.character(home_region)]),
         proj_pct_num = TIER_WEIGHTS[as.integer(Tier)] / n_in_tier * 100,
         proj_N_num   = proj_pct_num / 100 * target_N)

# Per-region TOTAL row (sums of N + projected counts; both should hit 100% / target).
totals <- proj %>%
  group_by(home_region) %>%
  summarise(state_travel = "TOTAL",
            Tier         = "",
            N            = sum(N),
            pct_num      = sum(pct_num),
            proj_N_num   = sum(proj_N_num),
            proj_pct_num = sum(proj_pct_num),
            .groups      = "drop")

# Stack states + totals; TOTAL pinned to the bottom of each region.
projection_tbl <- bind_rows(
    proj   %>% mutate(is_total = FALSE),
    totals %>% mutate(is_total = TRUE)
  ) %>%
  arrange(home_region, is_total, desc(N)) %>%
  mutate(`% of region` = sprintf("%.1f%%", pct_num),
         `Projected N` = sprintf("%.1f",   proj_N_num),
         `Projected %` = sprintf("%.1f%%", proj_pct_num)) %>%
  select(`Home region`  = home_region,
         `Travel state` = state_travel,
         Tier, N, `% of region`, `Projected N`, `Projected %`)

kable(projection_tbl,
      caption = paste0("Tiered randomizer projection. Tier cutoffs: >=",
                       TIER_CUTOFFS[1], "% / ",
                       TIER_CUTOFFS[2], "-", TIER_CUTOFFS[1], "% / <",
                       TIER_CUTOFFS[2], "%. Combined tier weights (via Qualtrics 5/3/2 slot duplication): ",
                       paste(TIER_WEIGHTS * 100, collapse = "/"), "%."),
      align = c("l", "l", "c", "r", "r", "r", "r")) %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover")) %>%
  collapse_rows(columns = 1, valign = "top") %>%
  row_spec(which(projection_tbl$`Travel state` == "TOTAL"), bold = TRUE)
Tiered randomizer projection. Tier cutoffs: >=7% / 3-7% /
Home region Travel state Tier N % of region Projected N Projected %
South California 1 33 17.1% 18.8 12.5%
Utah 1 22 11.4% 18.8 12.5%
New York 1 15 7.8% 18.8 12.5%
Oregon 1 14 7.3% 18.8 12.5%
Colorado 2 11 5.7% 5.6 3.8%
New Jersey 2 10 5.2% 5.6 3.8%
New Hampshire 2 8 4.1% 5.6 3.8%
Ohio 2 8 4.1% 5.6 3.8%
Connecticut 2 7 3.6% 5.6 3.8%
Delaware 2 7 3.6% 5.6 3.8%
Minnesota 2 6 3.1% 5.6 3.8%
Pennsylvania 2 6 3.1% 5.6 3.8%
Maine 3 5 2.6% 1.8 1.2%
Vermont 3 5 2.6% 1.8 1.2%
Kansas 3 4 2.1% 1.8 1.2%
Massachusetts 3 4 2.1% 1.8 1.2%
Texas 3 4 2.1% 1.8 1.2%
Washington 3 4 2.1% 1.8 1.2%
Alaska 3 3 1.6% 1.8 1.2%
Arizona 3 3 1.6% 1.8 1.2%
Nevada 3 3 1.6% 1.8 1.2%
Hawaii 3 2 1.0% 1.8 1.2%
Illinois 3 2 1.0% 1.8 1.2%
New Mexico 3 2 1.0% 1.8 1.2%
Florida 3 1 0.5% 1.8 1.2%
Idaho 3 1 0.5% 1.8 1.2%
Montana 3 1 0.5% 1.8 1.2%
South Carolina 3 1 0.5% 1.8 1.2%
Virginia 3 1 0.5% 1.8 1.2%
TOTAL 193 100.0% 150.0 100.0%
Northeast Utah 1 25 15.7% 15.0 10.0%
Texas 1 17 10.7% 15.0 10.0%
California 1 16 10.1% 15.0 10.0%
Oregon 1 15 9.4% 15.0 10.0%
Tennessee 1 15 9.4% 15.0 10.0%
Florida 2 10 6.3% 11.2 7.5%
Kansas 2 10 6.3% 11.2 7.5%
Minnesota 2 7 4.4% 11.2 7.5%
Ohio 2 5 3.1% 11.2 7.5%
Idaho 3 4 2.5% 1.7 1.1%
Alaska 3 3 1.9% 1.7 1.1%
Arizona 3 3 1.9% 1.7 1.1%
Arkansas 3 3 1.9% 1.7 1.1%
Illinois 3 3 1.9% 1.7 1.1%
Virginia 3 3 1.9% 1.7 1.1%
Wisconsin 3 3 1.9% 1.7 1.1%
Louisiana 3 2 1.3% 1.7 1.1%
Mississippi 3 2 1.3% 1.7 1.1%
Montana 3 2 1.3% 1.7 1.1%
Nevada 3 2 1.3% 1.7 1.1%
South Carolina 3 2 1.3% 1.7 1.1%
Washington 3 2 1.3% 1.7 1.1%
Colorado 3 1 0.6% 1.7 1.1%
Georgia 3 1 0.6% 1.7 1.1%
Hawaii 3 1 0.6% 1.7 1.1%
New York 3 1 0.6% 1.7 1.1%
North Carolina 3 1 0.6% 1.7 1.1%
TOTAL 159 100.0% 150.0 100.0%
West Coast Texas 1 13 9.4% 18.8 12.5%
Tennessee 1 11 8.0% 18.8 12.5%
Utah 1 10 7.2% 18.8 12.5%
Virginia 1 10 7.2% 18.8 12.5%
New York 2 8 5.8% 7.5 5.0%
Maryland 2 6 4.3% 7.5 5.0%
New Hampshire 2 6 4.3% 7.5 5.0%
New Jersey 2 6 4.3% 7.5 5.0%
Colorado 2 5 3.6% 7.5 5.0%
Florida 2 5 3.6% 7.5 5.0%
Connecticut 3 4 2.9% 1.3 0.9%
Louisiana 3 4 2.9% 1.3 0.9%
Minnesota 3 4 2.9% 1.3 0.9%
Pennsylvania 3 4 2.9% 1.3 0.9%
California 3 3 2.2% 1.3 0.9%
Georgia 3 3 2.2% 1.3 0.9%
Hawaii 3 3 2.2% 1.3 0.9%
Kansas 3 3 2.2% 1.3 0.9%
Massachusetts 3 3 2.2% 1.3 0.9%
North Carolina 3 3 2.2% 1.3 0.9%
Vermont 3 3 2.2% 1.3 0.9%
Wisconsin 3 3 2.2% 1.3 0.9%
Arkansas 3 2 1.4% 1.3 0.9%
Illinois 3 2 1.4% 1.3 0.9%
Mississippi 3 2 1.4% 1.3 0.9%
Montana 3 2 1.4% 1.3 0.9%
Ohio 3 2 1.4% 1.3 0.9%
South Carolina 3 2 1.4% 1.3 0.9%
Washington D.C. 3 2 1.4% 1.3 0.9%
Idaho 3 1 0.7% 1.3 0.9%
Maine 3 1 0.7% 1.3 0.9%
Nevada 3 1 0.7% 1.3 0.9%
New Mexico 3 1 0.7% 1.3 0.9%
TOTAL 138 100.0% 150.0 100.0%
Midwest California 1 8 8.9% 12.5 12.5%
Texas 1 8 8.9% 12.5 12.5%
New Jersey 1 7 7.8% 12.5 12.5%
Utah 1 7 7.8% 12.5 12.5%
Massachusetts 2 4 4.4% 2.7 2.7%
New Hampshire 2 4 4.4% 2.7 2.7%
Oregon 2 4 4.4% 2.7 2.7%
Arkansas 2 3 3.3% 2.7 2.7%
Colorado 2 3 3.3% 2.7 2.7%
Florida 2 3 3.3% 2.7 2.7%
Nevada 2 3 3.3% 2.7 2.7%
North Carolina 2 3 3.3% 2.7 2.7%
Tennessee 2 3 3.3% 2.7 2.7%
Washington 2 3 3.3% 2.7 2.7%
Washington D.C. 2 3 3.3% 2.7 2.7%
Arizona 3 2 2.2% 1.2 1.2%
Connecticut 3 2 2.2% 1.2 1.2%
Maryland 3 2 2.2% 1.2 1.2%
New Mexico 3 2 2.2% 1.2 1.2%
New York 3 2 2.2% 1.2 1.2%
Pennsylvania 3 2 2.2% 1.2 1.2%
Vermont 3 2 2.2% 1.2 1.2%
Virginia 3 2 2.2% 1.2 1.2%
Alaska 3 1 1.1% 1.2 1.2%
Georgia 3 1 1.1% 1.2 1.2%
Idaho 3 1 1.1% 1.2 1.2%
Louisiana 3 1 1.1% 1.2 1.2%
Maine 3 1 1.1% 1.2 1.2%
Ohio 3 1 1.1% 1.2 1.2%
South Carolina 3 1 1.1% 1.2 1.2%
Wisconsin 3 1 1.1% 1.2 1.2%
TOTAL 90 100.0% 100.0 100.0%
Mountain West New York 1 9 8.6% 37.5 25.0%
Texas 1 8 7.6% 37.5 25.0%
Ohio 2 7 6.7% 5.0 3.3%
New Hampshire 2 6 5.7% 5.0 3.3%
Arkansas 2 5 4.8% 5.0 3.3%
Minnesota 2 5 4.8% 5.0 3.3%
Oregon 2 5 4.8% 5.0 3.3%
California 2 4 3.8% 5.0 3.3%
Kansas 2 4 3.8% 5.0 3.3%
Tennessee 2 4 3.8% 5.0 3.3%
Virginia 2 4 3.8% 5.0 3.3%
Alaska 3 3 2.9% 1.5 1.0%
Louisiana 3 3 2.9% 1.5 1.0%
Maryland 3 3 2.9% 1.5 1.0%
Massachusetts 3 3 2.9% 1.5 1.0%
New Jersey 3 3 2.9% 1.5 1.0%
Pennsylvania 3 3 2.9% 1.5 1.0%
South Carolina 3 3 2.9% 1.5 1.0%
Vermont 3 3 2.9% 1.5 1.0%
Connecticut 3 2 1.9% 1.5 1.0%
Delaware 3 2 1.9% 1.5 1.0%
Georgia 3 2 1.9% 1.5 1.0%
Hawaii 3 2 1.9% 1.5 1.0%
Maine 3 2 1.9% 1.5 1.0%
Mississippi 3 2 1.9% 1.5 1.0%
North Carolina 3 2 1.9% 1.5 1.0%
Washington 3 2 1.9% 1.5 1.0%
Illinois 3 1 1.0% 1.5 1.0%
Utah 3 1 1.0% 1.5 1.0%
Washington D.C. 3 1 1.0% 1.5 1.0%
Wisconsin 3 1 1.0% 1.5 1.0%
TOTAL 105 100.0% 150.0 100.0%