City of Austin Levers of Economic Mobility Index Indicators

Author

Kaitlan Wong

Overview

This document compiles American Community Survey (ACS) tract-level indicators used in the City of Austin Levers of Economic Mobility Index (LEMI).

The purpose of this script is to produce a single harmonized dataset of ACS-derived variables for all census tracts that fall at least 10% within Austin full and limited-purpose boundaries.

Indicators include:

Demographics (population, race/ethnicity, age)
Socioeconomic status (poverty, income, employment)
Household structure (Household Support Risk)
Education (pre-K, less than high school)
Health & access (disability, insurance, internet, limited English households)

All indicators are derived primarily from ACS 5-year estimates for 2016 and 2024. Internet access variables (ACS table B28002) are only available beginning in 2017, so 2017 is used in place of 2016 for that indicator.

While most indicators are drawn from ACS detailed tables (B- and S-series), selected indicators use ACS profile tables (DP-series). Profile tables provide published percentages with Census-derived margins of error that are not always available from detailed tables. These were used where they more accurately represented the intended construct and avoided manual ratio calculations.

Specifically:

Household Support Risk components (female-headed households with children; households with children) use DP02 profile percentages.
Race/ethnicity composition uses DP05 profile percentages.

The final dataset is restricted to Austin Full and Limited Purpose census tracts using the official tract list.

Non-ACS indicators used in the LEMI (institutional group quarters, life expectancy, low physical activity, eviction filings, energy burden, concentrated poverty, and temperature difference) are processed separately.

Output:
ACS_COA_Indicators.xlsx

Methods and Indicator Definitions

Austin Full and Limited-Purpose Tract Universe

All indicators were restricted to census tracts that fall within or substantially intersect the City of Austin’s full- and limited-purpose jurisdictional boundaries. The tract universe was defined using the file Austin_LTD_FULL_Tract_List_10pct.csv, which contains 2020 census tracts with at least 10% of their land area located within Austin’s full or limited-purpose city boundaries.

This tract list was produced in a separate spatial overlay analysis. City jurisdiction polygons were intersected with 2020 Texas census tract boundaries, and the proportion of each tract’s land area falling within Austin boundaries was calculated. Tracts meeting or exceeding the 10% threshold were included in the Austin analysis universe. This threshold was selected to ensure inclusion of tracts meaningfully overlapping Austin while avoiding inclusion of tracts with only negligible boundary contact.

The spatial inputs for this analysis were:

City of Austin Jurisdictions (full and limited purpose boundaries)
City of Austin Open Data Portal
https://data.austintexas.gov/City-Government/BOUNDARIES_jurisdictions/3pzb-6mbr
2020 Census Tracts (Texas)
U.S. Census Bureau TIGER/Line Shapefiles
https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html

The resulting tract list provides the geographic universe for all ACS indicator extraction and normalization in the LEMI analysis.

Use of ACS Profile (DP) Tables

ACS profile tables (DP-series) provide pre-calculated percentages and margins of error derived using Census replicate-weight methods. These were used for select indicators when the desired percentage was not directly available from detailed tables.

Using published ACS percentages improves statistical accuracy and consistency with Census methodology.

2024 DP tables are used in this dataset for:

Female-headed households with children (DP02_0011PE)
Households with children (DP02_0014PE)
Race/ethnicity composition (DP05 series)

All other indicators use detailed or subject tables.

ACS Data Profile variable naming and availability changed between earlier and later ACS releases. For 2016, equivalent Data Profile variables were obtained from the earlier ACS “PR” profile tables (e.g., DP02PR, DP05) that precede the modern DP variable structure used in later years. Conceptually equivalent indicators were selected to maintain consistency across years. These PR-series variables measure the same underlying constructs as the later DP-series percentages and are produced using the same ACS replicate-weight methodology. Harmonized variables were used in analysis so that household structure and race/ethnicity indicators are comparable across 2016 and 2024.

Under-employment

Under-employment is defined as:

Percent of workers (who worked in the past 12 months) who worked
<35 hours per week AND <27 weeks per year

ACS tables used: B23022

Numerator:
Workers 1–34 hours/week AND 1–26 weeks/year (male + female)

Denominator:
All workers in past 12 months

This captures involuntary part-time / unstable employment at tract scale given ACS limitations.

Household Support Risk Score (HSRS)

Household Support Risk Score is a composite measure designed to identify census tracts where residents may face heightened challenges related to caregiving burdens and lack of household support structures.

This indicator reflects social vulnerability related to family structure, focusing on populations who may require additional services or infrastructure to support well-being. Included in the score are the following household types:

Single parents with children
Households with children
Elderly individuals living alone
- Seniors living alone remains derived from detailed household counts (B11007) because no published percentage exists in ACS profile or subject tables.

ACS aggregate tables used for 2024 analysis:

Variable	ACS Table	Why It Matters
`% Female-headed households with children`	`DP02_0011PE`	Signals increased care giving burden and potential economic precarity.
`% Households with children under 18`	`DP02_0014PE`	Indicates general care giving demand and need for child-related services.
`% Seniors living alone`	`B11007_003 / B11007_001`	Reflects risk of social isolation and lack of informal support for aging adults.

For 2016, equivalent ACS Data Profile PR-series variables were used because the DP02 profile table structure differs in earlier ACS releases. Specifically, female-headed households with children and households with children were derived from DP02PR_0009P and DP02PR_0013P. These PR variables represent the same ACS concepts and published percentages as the later DP02_0011PE and DP02_0014PE variables used for 2024. Values were harmonized across years to ensure longitudinal comparability of the Household Support Risk Score components.

Notably, for 2016, the ACS Data Profile table uses the category label “female householder, no husband present,” whereas later ACS releases use the updated terminology “female householder, no spouse/partner present.” This change reflects modernization of Census relationship terminology rather than a substantive change in the underlying household concept. Both variables represent households headed by a female householder without a co-resident spouse or partner and with children under age 18. The 2016 PR-series variable (DP02PR_0009P) is therefore conceptually equivalent to the later DP02_0011PE variable used for 2024.

ACS aggregate tables used for 2016 analysis:

Variable	ACS Table	Why It Matters
`% Female-headed households with children`	`DP02PR_0009P`	Signals increased care giving burden and potential economic precarity.
`% Households with children under 18`	`DP02PR_0013PE`	Indicates general care giving demand and need for child-related services.
`% Seniors living alone`	`B11007_003 / B11007_001`	Reflects risk of social isolation and lack of informal support for aging adults.

HSRS Scores

The score is calculated by averaging percentile-ranked values of three variables at the census tract level. Each variable is converted to a percentile (0–100) using percent_rank() within census tracts that fall at least 10% within Austin full and limited-purpose boundaries, and the final score is a simple average.

HSRS = mean(percentile_femaleHHkids,
            percentile_HHkids,
            percentile_seniorsAlone)

A higher hh_support_risk_score indicates greater social vulnerability due to caregiving burdens or social isolation.

Tracts scoring high may benefit from programs like: wraparound family services, child care and early childhood investments, and senior outreach or home support programs.

This score complements economic indicators by highlighting non-economic vulnerability tied to household structure.

Additional Background

The request was to capture the vulnerability of different type of household structures as requested by the City of Austin EI staff.

Internet Access Availability

ACS household internet access (Table B28002) is only available beginning in 2017. For consistency across indicators, 2017 estimates are used in place of 2016 for internet variables. Internet values are therefore unavailable for 2016 in the combined dataset.

Margin of Error (MOE) for Derived Indicators

For indicators calculated from multiple ACS estimates (e.g., ratios and sums), margins of error (MOEs) were computed following U.S. Census Bureau guidance for derived ACS statistics. Published ACS variables include replicate-weight–based MOEs provided directly by the Census Bureau. However, several LEMI indicators require ratios or aggregates not available as published percentages (e.g., percent households without internet, percent seniors living alone, and underemployment rate).

For these indicators, MOEs were propagated using standard ACS formulas implemented in the tidycensus functions moe_ratio() and moe_sum(), which reproduce Census Bureau–recommended calculations.

Ratio-derived percentages

For indicators calculated as a ratio of two ACS estimates (numerator / denominator × 100), MOEs were computed using the Census Bureau ratio approximation:

[ MOE_{ratio} = ]

where

(N) = numerator estimate
(D) = denominator estimate
(R = N/D) = ratio
(MOE_N) = numerator MOE
(MOE_D) = denominator MOE

This formula assumes correlated ACS estimates drawn from the same sample and is the method specified in ACS Accuracy of the Data documentation. The resulting ratio MOE was multiplied by 100 to express uncertainty in percentage units.

This approach was applied to:

Percent households without internet (B28002_013 / B28002_001)
Percent seniors living alone (B11007_003 / B11007_001)
Underemployment rate (derived from B23022 components)

Sum-derived counts

For indicators requiring aggregation across ACS categories (e.g., total underemployed workers or total workers), MOEs were calculated using the Census Bureau root-sum-of-squares method:

[ MOE_{sum} = ]

This method assumes independence across ACS categories within the same table, consistent with Census guidance for combining detailed estimates.

This approach was applied to:

Total workers (B23022_003 + B23022_027)
Underemployed workers (sum of B23022 part-time/part-year categories)
Age group percentages were derived from ACS detailed table B01001 by aggregating single-year and five-year age categories into broader analytic bands (under 18, 18–24, 25–44, 45–64, 65+)

Alignment with Census standards

The MOE calculations in this analysis directly follow ACS recommended procedures and are implemented using tidycensus, which is maintained in alignment with Census statistical documentation. As a result, derived MOEs are consistent with the uncertainty that would be reported if these indicators were published by the Census Bureau.

Using propagated MOEs ensures that:

uncertainty is preserved for all derived indicators
reliability screening can be applied consistently across variables
comparisons across tracts and years reflect ACS sampling variability

Published ACS percentages (DP and S tables) retain their Census-provided MOEs without modification.

Race and Ethnicity Aggregation

Race and ethnicity data were included so they can be added to the final LEMI map tooltip.

Fro 2024, race and ethnicity composition was derived from ACS Data Profile table DP05. Most major race/ethnicity groups were retained using published ACS percentages and margins of error (MOEs). However, several non-Hispanic race categories—American Indian and Alaska Native (AIAN), Native Hawaiian and Other Pacific Islander (NHPI), and Some Other Race—have very small population counts at the census tract level in the Austin region.

For 2016, race and ethnicity composition was derived from equivalent ACS Data Profile PR-series variables (DP05_0066P–DP05_0078P) corresponding to Hispanic/Latino and non-Hispanic race categories. These PR variables measure the same population shares as the later DP05_0090PE–DP05_0102PE variables used for 2024. Conceptually equivalent race/ethnicity groups were mapped and harmonized across years prior to aggregation and analysis to maintain consistency in demographic composition indicators.

Small denominators in these categories produce unstable percentage estimates and large relative MOEs, limiting interpretability and reliability for tract-level analysis. To improve statistical stability while preserving representation of smaller populations, these categories were combined into a single aggregated group:

Non-Hispanic AIAN + NHPI + Some Other Race (“NH Other”)

Because ACS profile tables do not publish a combined percentage for this aggregation, counts were summed from DP05 detailed estimates:

NH Other count = DP05_0098E + DP05_0100E + DP05_0101E

MOEs for the summed count were calculated using the Census Bureau root-sum-of-squares method:

MOE_sum = √(MOE₁² + MOE₂² + MOE₃²)

The combined percentage and MOE were then derived using ACS ratio formulas:

Percent = (NH Other count / total population) × 100

MOE_percent = 100 × moe_ratio(NH Other count, total population)

The equivalent was done for 2016 PR tables.

This aggregation follows standard ACS guidance for combining sparse categories to reduce sampling error while maintaining demographic coverage. Larger race/ethnicity groups (Non-Hispanic White, Non-Hispanic Black, Non-Hispanic Asian, Non-Hispanic Multiracial, and Hispanic/Latino) retain published ACS percentages and MOEs without modification.

Data Assembly

Setup

knitr::opts_chunk$set(echo = TRUE)

library(tidycensus)

Warning: package 'tidycensus' was built under R version 4.5.2

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(readr)
library(stringr)
library(tidyr)
library(writexl)

Geography + Years

years <- c(2016, 2024)
internet_years <- c(2017, 2024)
dp_years <- c(2024)

austin_counties <- c("Travis", "Williamson", "Hays")

austin_tracts <- read_csv(
  "Austin_LTD_FULL_Tract_List_10pct.csv",
  col_types = cols(.default = col_character())
) %>%
  mutate(GEOID = GEOID_clean) %>%
  distinct(GEOID)

Variable Definitions

Population and Household Totals

vars_totalpop <- "S0101_C01_001"   # total population
vars_totalhh  <- "S1101_C01_001"   # total households

Disability (S1810)

vars_disability <- c(
  "S1810_C02_001",  # disability count
  "S1810_C03_001",  # disability percent
  "S1810_C03_052",  # age 65+ ambulatory difficulty percent
  "S1810_C02_052"   # age 65+ ambulatory difficulty count
)

Household Support Risk Components

I am using pre-calculated percentage variables from the ACS instead of using the B tables, per Dr. Sparks’ recommendation. These are in the Data Profile tables: https://api.census.gov/data/2024/acs/acs5/profile/variables.html

# HH risk — DP (2024)
vars_hh_risk_dp <- c(
  "DP02_0011PE", # Percent!!HOUSEHOLDS BY TYPE!!Total households!!Female householder, no spouse/partner present!!With children of the householder under 18 years
  "DP02_0014PE" # Percent!!HOUSEHOLDS BY TYPE!!Total households!!Households with one or more people under 18 years
)

# HH risk — PR (2016)
vars_hh_risk_pr <- c(
  "DP02PR_0009P", # female HH with children
  "DP02PR_0013P" # HH with children
)

# HH risk — non-profile (all years)
vars_hh_risk_base <- c(
  "B11007_003","B11007_001" # seniors living alone / seniors households
)

Income — Median Household (B19013)

vars_income <- "B19013_001"

Internet Access (B28002)

Percent without internet is calculated as:

B28002_013 / B28002_001

vars_internet <- c(
  "B28002_013",  # households with no internet
  "B28002_001"   # total households (table denominator)
)

Less than high school educational attainment (S1501)

vars_lths <- c(
  "S1501_C01_006",  # population 25+
  "S1501_C01_007",  # <9th grade
  "S1501_C01_008"   # 9–12 no diploma
)

Limited English Speaking Households (S1602)

vars_lesh <- c(
  "S1602_C03_001",  # limited English count
  "S1602_C04_001"   # limited English percent
)

Poverty (S1701)

vars_poverty <- c(
  "S1701_C03_001",  # poverty percent
  "S1701_C02_001"   # poverty count
)

Public Pre-K Enrollment (S1401)

vars_prek <- c(
  "S1401_C04_014",  # public pre-K percent
  "S1401_C03_014",  # public pre-K count
  "S1401_C01_013"   # total population age 3–4 (denominator)
)

Uninsured (S2701)

vars_uninsured <- c(
  "S2701_C05_001",  # uninsured percent
  "S2701_C04_001"   # uninsured count
)

Under-employment (B23022)

Defined as workers who worked <35 hours/week and <27 weeks/year divided by all workers in the past 12 months.

vars_underemp <- c(
  "B23022_016","B23022_017","B23022_023","B23022_024",
  "B23022_040","B23022_041","B23022_047","B23022_048",
  "B23022_003","B23022_027"
)

Race / Ethnicity + Age

# race/eth
vars_race <- c(
  "DP05_0096PE", # NH white alone
  "DP05_0097PE", # NH Black alone
  "DP05_0098PE", # NH AIAN alone
  "DP05_0099PE", # NH Asian alone
  "DP05_0100PE", # NH NHPI alone
  "DP05_0101PE", # NH some other race alone
  "DP05_0102PE", # NH two or more races
  "DP05_0090PE", # Hispanic or Latino (of any race)
  "DP05_0089E"    # total population
)

# 2016 PR equivalents (race)
vars_race_pr <- c(
  "DP05_0066P",  # Hispanic alone
  "DP05_0072P",  # NH white alone
  "DP05_0073P",  # NH Black alone
  "DP05_0074P",  # NH AIAN alone
  "DP05_0075P",  # NH Asian alone
  "DP05_0076P",  # NH NHPI alone
  "DP05_0077P",  # NH other alone
  "DP05_0078P",  # NH multi
  "DP05_0065E"   # total pop
)
  
# age
vars_age <- c(
  
  age_total_pop = "B01001_001", # total population
  
  # Under 18
  u5_m   = "B01001_003",
  u5_f   = "B01001_027",
  a5_9_m = "B01001_004",
  a5_9_f = "B01001_028",
  a10_14_m = "B01001_005",
  a10_14_f = "B01001_029",
  a15_17_m = "B01001_006",
  a15_17_f = "B01001_030",

  # 18–24
  a18_19_m = "B01001_007",
  a18_19_f = "B01001_031",
  a20_m    = "B01001_008",  #20 years
  a20_f    = "B01001_032",
  a21_m    = "B01001_009",  #21 years
  a21_f    = "B01001_033",
  a22_24_m = "B01001_010",  #22 to 24 years
  a22_24_f = "B01001_034",

  # 25–44
  a25_29_m = "B01001_011",
  a25_29_f = "B01001_035",
  a30_34_m = "B01001_012",
  a30_34_f = "B01001_036",
  a35_39_m = "B01001_013",
  a35_39_f = "B01001_037",
  a40_44_m = "B01001_014",
  a40_44_f = "B01001_038",
  
  # 45–64
  a45_49_m = "B01001_015",
  a45_49_f = "B01001_039",
  a50_54_m = "B01001_016",
  a50_54_f = "B01001_040",
  a55_59_m = "B01001_017",
  a55_59_f = "B01001_041",
  a60_61_m = "B01001_018",
  a60_61_f = "B01001_042",
  a62_64_m = "B01001_019",
  a62_64_f = "B01001_043",

  # 65+
  a65_66_m = "B01001_020",
  a65_66_f = "B01001_044",
  a67_69_m = "B01001_021",
  a67_69_f = "B01001_045",
  a70_74_m = "B01001_022",
  a70_74_f = "B01001_046",
  a75_79_m = "B01001_023",
  a75_79_f = "B01001_047",
  a80_84_m = "B01001_024",
  a80_84_f = "B01001_048",
  a85p_m   = "B01001_025",
  a85p_f   = "B01001_049"
)

Master Variable List

all_vars <- unique(c(
  vars_totalpop,
  vars_totalhh,
  vars_disability,
  vars_hh_risk_base,
  vars_hh_risk_dp,
  vars_hh_risk_pr,
  vars_internet,
  vars_lesh,
  vars_lths,
  vars_income,
  vars_poverty,
  vars_prek,
  vars_underemp,
  vars_uninsured,
  vars_race,
  vars_race_pr,
  vars_age
))

View and Check ACS Table Names

To view the full names of tables / variables listed above, run the following code:

vars <- load_variables(2024, "acs5", cache = TRUE)

View(vars)

sub_vars <- load_variables(2024, "acs5/subject", cache = TRUE)

View(sub_vars)

profile_vars <- load_variables(2024, "acs5/profile", cache = TRUE)

View(profile_vars)

### to see what DP (equivalent) tables are available for 2016

dp_2016 <- load_variables(2016, "acs5/profile", cache = TRUE)

View(dp_2016)

Pull ACS Data (core years)

all_vars_core <- unique(c(
  vars_totalpop, vars_totalhh,
  vars_disability,
  vars_income, vars_lesh, vars_lths,
  vars_poverty, vars_prek,
  vars_uninsured, vars_underemp,
  vars_age,
  vars_hh_risk_base
))



# split DP vs non-DP variables (must split since DP race tables did not exist in 2016. I instead use PR tables for 2016)
vars_dp <- c(vars_race, vars_hh_risk_dp)     # 2024 only
vars_pr <- c(vars_race_pr, vars_hh_risk_pr)  # 2016 only


# pulls
acs_non_dp <- bind_rows(lapply(years, function(yr){
  get_acs(
    geography="tract",
    variables=all_vars_core,
    year=yr,
    survey="acs5",
    state="TX",
    county=austin_counties,
    output="wide"
  ) %>% mutate(year=yr)
}))

Getting data from the 2012-2016 5-year ACS

Fetching data by table type ("B/C", "S", "DP") and combining the result.

Getting data from the 2020-2024 5-year ACS

Fetching data by table type ("B/C", "S", "DP") and combining the result.

acs_dp <- bind_rows(lapply(dp_years, function(yr){
  get_acs(
    geography="tract",
    variables=vars_dp,
    year=yr,
    survey="acs5",
    state="TX",
    county=austin_counties,
    output="wide"
  ) %>% mutate(year=yr)
}))

Getting data from the 2020-2024 5-year ACS

Using the ACS Data Profile

acs_pr <- get_acs(
  geography="tract",
  variables=vars_pr,
  year=2016,
  survey="acs5",
  state="TX",
  county=austin_counties,
  output="wide"
) %>% mutate(year=2016)

Getting data from the 2012-2016 5-year ACS
Using the ACS Data Profile

acs_core <- acs_non_dp %>%
  left_join(acs_dp, by=c("GEOID","NAME","year")) %>%
  left_join(acs_pr, by=c("GEOID","NAME","year"))

# add 2017 rows so internet-only year exists
acs_core <- bind_rows(
  acs_core,
  acs_core %>%
    filter(year == 2016) %>%
    mutate(year = 2017)
)

Pull Internet Separately (2017+)

get_internet <- function(yr){
  get_acs(
    geography="tract",
    variables=vars_internet,
    year=yr,
    survey="acs5",
    state="TX",
    county=austin_counties,
    output="wide"
  ) %>% mutate(year=yr)
}

acs_internet <- bind_rows(lapply(internet_years, get_internet))

Getting data from the 2013-2017 5-year ACS

Getting data from the 2020-2024 5-year ACS

Merge Internet Into Main Dataset

acs_all <- acs_core %>%
  left_join(
    acs_internet %>%
      select(
        GEOID, year,
        B28002_013E, B28002_013M,
        B28002_001E, B28002_001M
      ),
    by = c("GEOID","year")
  )

Indicator Construction & Scoring

Construct Indicators

acs_final <- acs_all %>%
  rowwise() %>%
  mutate(

    # totals
    total_population = S0101_C01_001E,
    total_population_moe = S0101_C01_001M,

    total_households = S1101_C01_001E,
    total_households_moe = S1101_C01_001M,

    # disability
    disability_pct = S1810_C03_001E,
    disability_pct_moe = S1810_C03_001M,

    amb65_pct      = S1810_C03_052E,
    amb65_pct_moe  = S1810_C03_052M,
    
    # educational attainment - less than HS diploma
    pop25_n = S1501_C01_006E,
    pop25_moe = S1501_C01_006M,
    
    lths_n = S1501_C01_007E + S1501_C01_008E,
    lths_moe = moe_sum(
      moe      = c(S1501_C01_007M, S1501_C01_008M),
      estimate = c(S1501_C01_007E, S1501_C01_008E),
      na.rm    = TRUE
      ),
    
    lths_pct = if_else(
      pop25_n > 0,
      100 * lths_n / pop25_n,
      NA_real_
      ),
    
    lths_pct_moe = 100 * moe_ratio(
      num       = lths_n,
      denom     = pop25_n,
      moe_num   = lths_moe,
      moe_denom = pop25_moe
      ),

    # income
    med_hh_income = B19013_001E,
    med_hh_income_moe = B19013_001M,

    # internet
    no_internet_n = B28002_013E,
    no_internet_n_moe = B28002_013M,

    hh_total_internet = B28002_001E,
    hh_total_internet_moe = B28002_001M,
    
    no_internet_pct = if_else(
      hh_total_internet > 0,
      no_internet_n / hh_total_internet * 100,
      NA_real_
      ),

    no_internet_pct_moe = 100 * moe_ratio(
      no_internet_n,
      hh_total_internet,
      no_internet_n_moe,
      hh_total_internet_moe
    ),

    # limited English-speaking households
    lesh_pct = S1602_C04_001E,
    lesh_pct_moe = S1602_C04_001M,

    # poverty
    poverty_pct = S1701_C03_001E,
    poverty_pct_moe = S1701_C03_001M,

    # preK
    prek_pct = S1401_C04_014E,
    prek_pct_moe = S1401_C04_014M,

    # uninsured
    uninsured_pct = S2701_C05_001E,
    uninsured_pct_moe = S2701_C05_001M,

    # HH risk (harmonized 2016 PR vs 2024 DP)
    pct_female_hh_kids = if_else(
      year == 2016,
      DP02PR_0009PE,
      DP02_0011PE
      ),
    
    pct_female_hh_kids_moe = if_else(
      year == 2016,
      DP02PR_0009PM,
      DP02_0011PM
      ),
    
    pct_hh_with_kids = if_else(
      year == 2016,
      DP02PR_0013PE,
      DP02_0014PE
      ),
    
    pct_hh_with_kids_moe = if_else(
      year == 2016,
      DP02PR_0013PM,
      DP02_0014PM
      ),

    pct_seniors_alone  = B11007_003E / B11007_001E * 100,
    pct_seniors_alone_moe = 100 * moe_ratio(
      B11007_003E,
      B11007_001E,
      B11007_003M,
      B11007_001M
    ),
    

    # underemployment
    total_worked = B23022_003E + B23022_027E,
    total_worked_moe = moe_sum(
      moe      = c(B23022_003M, B23022_027M),
      estimate = c(B23022_003E, B23022_027E),
      na.rm = TRUE
    ),

    underemp_n = B23022_016E + B23022_017E + B23022_023E + B23022_024E +
                 B23022_040E + B23022_041E + B23022_047E + B23022_048E,

    underemp_n_moe = moe_sum(
      moe      = c(
        B23022_016M,B23022_017M,B23022_023M,B23022_024M,
        B23022_040M,B23022_041M,B23022_047M,B23022_048M
      ),
      estimate = c(
        B23022_016E,B23022_017E,B23022_023E,B23022_024E,
        B23022_040E,B23022_041E,B23022_047E,B23022_048E
      ),
      na.rm = TRUE
    ),
    
    underemp_pct = if_else(
      total_worked > 0,
      underemp_n / total_worked * 100,
      NA_real_
      ),

    underemp_pct_moe = 100 * moe_ratio(
      num = underemp_n,
      denom = total_worked,
      moe_num = underemp_n_moe,
      moe_denom = total_worked_moe
    ),
    
    # ======================
    # RACE / ETH (DP05)
    # ======================

    # race harmonized (2016 PR vs 2024 DP)
    pct_nh_white = if_else(year==2016, DP05_0072PE, DP05_0096PE),
    pct_nh_white_moe = if_else(year==2016, DP05_0072PM, DP05_0096PM),

    pct_nh_black = if_else(year==2016, DP05_0073PE, DP05_0097PE),
    pct_nh_black_moe = if_else(year==2016, DP05_0073PM, DP05_0097PM),

    pct_nh_asian = if_else(year==2016, DP05_0075PE, DP05_0099PE),
    pct_nh_asian_moe = if_else(year==2016, DP05_0075PM, DP05_0099PM),

    pct_nh_multi = if_else(year==2016, DP05_0078PE, DP05_0102PE),
    pct_nh_multi_moe = if_else(year==2016, DP05_0078PM, DP05_0102PM),

    pct_hispanic = if_else(year==2016, DP05_0066PE, DP05_0090PE),
    pct_hispanic_moe = if_else(year==2016, DP05_0066PM, DP05_0090PM),

    race_total_pop = if_else(year==2016, DP05_0065E, DP05_0089E),
    race_total_pop_moe = if_else(year==2016, DP05_0065M, DP05_0089M),

    # NH Other (AIAN + NHPI + Some Other) — harmonized 2016 PR vs 2024 DP
    pct_nh_other = if_else(
      year == 2016,
      DP05_0074PE + DP05_0076PE + DP05_0077PE,
      DP05_0098PE + DP05_0100PE + DP05_0101PE
    ),

    pct_nh_other_moe = if_else(
      year == 2016,
      moe_sum(
        c(DP05_0074PM, DP05_0076PM, DP05_0077PM),
        estimate = c(DP05_0074PE, DP05_0076PE, DP05_0077PE)
      ),
      moe_sum(
        c(DP05_0098PM, DP05_0100PM, DP05_0101PM),
        estimate = c(DP05_0098PE, DP05_0100PE, DP05_0101PE)
      )
    ),
    
    # ======================
    # AGE BANDS (COUNTS)
    # ======================

    est_age_total = B01001_001E,
    moe_age_total = B01001_001M,

    est_under18 = sum(
      B01001_003E, B01001_027E,
      B01001_004E, B01001_028E,
      B01001_005E, B01001_029E,
      B01001_006E, B01001_030E,
      na.rm = TRUE
    ),

    moe_under18 = moe_sum(
      c(
        B01001_003M, B01001_027M,
        B01001_004M, B01001_028M,
        B01001_005M, B01001_029M,
        B01001_006M, B01001_030M
      ),
      estimate = c(
        B01001_003E, B01001_027E,
        B01001_004E, B01001_028E,
        B01001_005E, B01001_029E,
        B01001_006E, B01001_030E
      ),
      na.rm = TRUE
    ),

    est_18_24 = sum(
      B01001_007E, B01001_031E,
      B01001_008E, B01001_032E,
      B01001_009E, B01001_033E,
      B01001_010E, B01001_034E,
      na.rm = TRUE
    ),

    moe_18_24 = moe_sum(
      c(
        B01001_007M, B01001_031M,
        B01001_008M, B01001_032M,
        B01001_009M, B01001_033M,
        B01001_010M, B01001_034M
      ),
      estimate = c(
        B01001_007E, B01001_031E,
        B01001_008E, B01001_032E,
        B01001_009E, B01001_033E,
        B01001_010E, B01001_034E
      ),
      na.rm = TRUE
    ),

    est_25_44 = sum(
      B01001_011E, B01001_035E,
      B01001_012E, B01001_036E,
      B01001_013E, B01001_037E,
      B01001_014E, B01001_038E,
      na.rm = TRUE
    ),

    moe_25_44 = moe_sum(
      c(
        B01001_011M, B01001_035M,
        B01001_012M, B01001_036M,
        B01001_013M, B01001_037M,
        B01001_014M, B01001_038M
      ),
      estimate = c(
        B01001_011E, B01001_035E,
        B01001_012E, B01001_036E,
        B01001_013E, B01001_037E,
        B01001_014E, B01001_038E
      ),
      na.rm = TRUE
    ),

    est_45_64 = sum(
      B01001_015E, B01001_039E,
      B01001_016E, B01001_040E,
      B01001_017E, B01001_041E,
      B01001_018E, B01001_042E,
      B01001_019E, B01001_043E,
      na.rm = TRUE
    ),

    moe_45_64 = moe_sum(
      c(
        B01001_015M, B01001_039M,
        B01001_016M, B01001_040M,
        B01001_017M, B01001_041M,
        B01001_018M, B01001_042M,
        B01001_019M, B01001_043M
      ),
      estimate = c(
        B01001_015E, B01001_039E,
        B01001_016E, B01001_040E,
        B01001_017E, B01001_041E,
        B01001_018E, B01001_042E,
        B01001_019E, B01001_043E
      ),
      na.rm = TRUE
    ),

    est_65p = sum(
      B01001_020E, B01001_044E,
      B01001_021E, B01001_045E,
      B01001_022E, B01001_046E,
      B01001_023E, B01001_047E,
      B01001_024E, B01001_048E,
      B01001_025E, B01001_049E,
      na.rm = TRUE
    ),

    moe_65p = moe_sum(
      c(
        B01001_020M, B01001_044M,
        B01001_021M, B01001_045M,
        B01001_022M, B01001_046M,
        B01001_023M, B01001_047M,
        B01001_024M, B01001_048M,
        B01001_025M, B01001_049M
      ),
      estimate = c(
        B01001_020E, B01001_044E,
        B01001_021E, B01001_045E,
        B01001_022E, B01001_046E,
        B01001_023E, B01001_047E,
        B01001_024E, B01001_048E,
        B01001_025E, B01001_049E
      ),
      na.rm = TRUE
    ),

    # ======================
    # AGE PERCENTS
    # ======================

    age_under18 = 100 * est_under18 / est_age_total,
    age_moe_under18 = 100 * moe_ratio(est_under18, est_age_total, moe_under18, moe_age_total),

    age_18_24 = 100 * est_18_24 / est_age_total,
    age_moe_18_24 = 100 * moe_ratio(est_18_24, est_age_total, moe_18_24, moe_age_total),

    age_25_44 = 100 * est_25_44 / est_age_total,
    age_moe_25_44 = 100 * moe_ratio(est_25_44, est_age_total, moe_25_44, moe_age_total),

    age_45_64 = 100 * est_45_64 / est_age_total,
    age_moe_45_64 = 100 * moe_ratio(est_45_64, est_age_total, moe_45_64, moe_age_total),

    age_65p = 100 * est_65p / est_age_total,
    age_moe_65p = 100 * moe_ratio(est_65p, est_age_total, moe_65p, moe_age_total)
) %>%
  ungroup()

Join to Austin Tracts

acs_final <- acs_final %>%
  filter(GEOID %in% austin_tracts$GEOID)

Household Support Risk Score (HSRS)

acs_final <- acs_final %>%
  group_by(year) %>%
  mutate(

    norm_pct_female_hh_kids = percent_rank(pct_female_hh_kids) * 100,

    norm_pct_hh_with_kids   = percent_rank(pct_hh_with_kids) * 100,

    norm_pct_seniors_alone  = percent_rank(pct_seniors_alone) * 100

  ) %>%
  ungroup() %>%
  rowwise() %>%
  mutate(
    hh_support_risk_score = mean(
      c(norm_pct_female_hh_kids,
        norm_pct_hh_with_kids,
        norm_pct_seniors_alone),
      na.rm = TRUE
    )
  ) %>%
  ungroup()

HSRS PCA Score

acs_final <- acs_final %>%
  group_by(year) %>%
  group_modify(~{

    df <- .x

    keep <- complete.cases(
      df[, c("norm_pct_female_hh_kids",
             "norm_pct_hh_with_kids",
             "norm_pct_seniors_alone")]
    )

    df$hh_pca_score <- NA_real_

    if(sum(keep) > 1){
      pca <- prcomp(
        df[keep, c("norm_pct_female_hh_kids",
                   "norm_pct_hh_with_kids",
                   "norm_pct_seniors_alone")],
        scale. = TRUE
      )
      df$hh_pca_score[keep] <- pca$x[,1]
    }

    df
  }) %>%
  ungroup()

Final Outputs

Save Final Output

# ensure year column exists (numeric)
acs_export <- acs_final %>%
  mutate(year = as.integer(year))


# columns to keep
final_cols <- c(
  "GEOID",
  "NAME",
  "year",

  # totals
  "total_population",
  "total_population_moe",

  "total_households",
  "total_households_moe",

  # disability
  "disability_pct",
  "disability_pct_moe",

  "amb65_pct",
  "amb65_pct_moe",
  
  # education
  "lths_pct",
  "lths_pct_moe",

  # income
  "med_hh_income",
  "med_hh_income_moe",

  # internet
  "no_internet_pct",
  "no_internet_pct_moe",

  # language
  "lesh_pct",
  "lesh_pct_moe",

  # poverty
  "poverty_pct",
  "poverty_pct_moe",

  # education
  "prek_pct",
  "prek_pct_moe",

  # insurance
  "uninsured_pct",
  "uninsured_pct_moe",

  # employment
  "underemp_pct",
  "underemp_pct_moe",

  # household risk components
  "pct_female_hh_kids",
  "pct_female_hh_kids_moe",

  "pct_hh_with_kids",
  "pct_hh_with_kids_moe",

  "pct_seniors_alone",
  "pct_seniors_alone_moe",

  # ======================
  # RACE
  # ======================

  "pct_nh_white",
  "pct_nh_white_moe",

  "pct_nh_black",
  "pct_nh_black_moe",

  "pct_nh_asian",
  "pct_nh_asian_moe",

  "pct_nh_multi",
  "pct_nh_multi_moe",

  "pct_nh_other",
  "pct_nh_other_moe",

  "pct_hispanic",
  "pct_hispanic_moe",

  # ======================
  # AGE (% only)
  # ======================

  "age_under18",
  "age_moe_under18",

  "age_18_24",
  "age_moe_18_24",

  "age_25_44",
  "age_moe_25_44",

  "age_45_64",
  "age_moe_45_64",

  "age_65p",
  "age_moe_65p",

  # ======================
  # HSRS
  # ======================

  "norm_pct_female_hh_kids",
  "norm_pct_hh_with_kids",
  "norm_pct_seniors_alone",

  "hh_pca_score",
  "hh_support_risk_score"
)


acs_export <- acs_export %>%
  select(all_of(final_cols))

# create sheets by year
years_out <- c(2016, 2017, 2024)

sheets <- lapply(years_out, function(y){
  acs_export %>% filter(year == y)
})

names(sheets) <- as.character(years_out)

# write Excel workbook
write_xlsx(
  sheets,
  "ACS_COA_Indicators.xlsx"
)

Preview Final Data

head(acs_export)

# A tibble: 6 × 60
  GEOID       NAME   year total_population total_population_moe total_households
  <chr>       <chr> <int>            <dbl>                <dbl>            <dbl>
1 48453000101 Cens…  2016             4331                  282             1997
2 48453000102 Cens…  2016             2572                  172             1108
3 48453000203 Cens…  2016             1470                  187              710
4 48453000204 Cens…  2016             3090                  206             1629
5 48453000205 Cens…  2016             4227                  425             2095
6 48453000206 Cens…  2016             3201                  255             1687
# ℹ 54 more variables: total_households_moe <dbl>, disability_pct <dbl>,
#   disability_pct_moe <dbl>, amb65_pct <dbl>, amb65_pct_moe <dbl>,
#   lths_pct <dbl>, lths_pct_moe <dbl>, med_hh_income <dbl>,
#   med_hh_income_moe <dbl>, no_internet_pct <dbl>, no_internet_pct_moe <dbl>,
#   lesh_pct <dbl>, lesh_pct_moe <dbl>, poverty_pct <dbl>,
#   poverty_pct_moe <dbl>, prek_pct <dbl>, prek_pct_moe <dbl>,
#   uninsured_pct <dbl>, uninsured_pct_moe <dbl>, underemp_pct <dbl>, …

Combine All Indicator Data Into Master Spreadsheet (2024 ACS only)

List of non-ACS indicators:

Indicator	Source
Temperature Difference	https://www.climateengine.org/
Group Quarters	https://data.census.gov/table?q=P5:+GROUP+QUARTERS+POPULATION+BY+MAJOR+GROUP+QUARTERS+TYPE&g=050XX00US48209$1400000,48453$1400000,48491$1400000
Energy Burden	https://www.energy.gov/scep/slsc/lead-tool
Eviction Filings	https://bastaaustin.org/en/eviction-dashboard
Low Physical Acitvity	https://data.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2025-/yjkw-uj5s/about_data
Persistent Poverty	https://www.census.gov/library/publications/2023/acs/acs-51.html
USALEEP Life Expectancy (2010 tracts)	https://www.cdc.gov/nchs/nvss/usaleep/usaleep.html Crosswalk analysis here: https://rpubs.com/kaitlan/1398783

library(readxl)

# load ACS 2024
acs_2024 <- acs_export %>%
  filter(year == 2024) %>%
  rename(acs_year = year) %>%
  mutate(GEOID = as.character(GEOID))
  
  
  # add in pct_in_austin variable (tells us what % of the tract falls within the LTD and full boundaries of Austin)
pct_austin <- read_csv(
  "Austin_LTD_FULL_Tract_List_10pct.csv",
  col_types = cols(.default = col_character())
) %>%
  select(GEOID_clean, pct_in_austin) %>%
  rename(GEOID = GEOID_clean) %>%
  mutate(GEOID = as.character(GEOID))


# =========================================================
# Temperature Difference (Climate Engine)
# =========================================================
heat <- read.csv(
  "non_acs_data/Austin_Greater_Heat_Full_Analysis_2024.csv",
  colClasses = c(FIPS = "character")
) %>%
  select(FIPS, Max_LST_F, Temperature_Diff) %>%
  rename(GEOID = FIPS)

# =========================================================
# Institutional Group Quarters (Decennial PL)
# =========================================================

# P5 institutionalized counts
gq_p5 <- read_excel(
  "non_acs_data/CLEANED_DECENNIALPL2020.P5-2026-02-04T212658.xlsx",
  sheet = "CLEANED"
) %>%
  mutate(
    tract  = str_extract(Label, "\\d+\\.\\d+"),
    county = str_extract(Label, "(?<=, ).+?(?= County)"),
    county_fips = case_when(
      county == "Travis"     ~ "453",
      county == "Williamson" ~ "491",
      county == "Hays"       ~ "209",
      county == "Bastrop"    ~ "021"
    ),
    GEOID = paste0(
      "48",
      county_fips,
      str_replace(tract, "\\.", "") %>% str_pad(6, pad="0")
    ),
    inst_pop = `Institutionalized population:`
  ) %>%
  select(GEOID, inst_pop)

# P1 total population
gq_p1 <- read_excel(
  "non_acs_data/cleaned_DECENNIALPL2020.P1-2026-02-09T043143.xlsx",
  sheet = "transposed"
) %>%
  mutate(
    tract  = str_extract(Label, "\\d+\\.\\d+"),
    county = str_extract(Label, "(?<=, ).+?(?= County)"),
    county_fips = case_when(
      county == "Travis"     ~ "453",
      county == "Williamson" ~ "491",
      county == "Hays"       ~ "209",
      county == "Bastrop"    ~ "021"
    ),
    GEOID = paste0(
      "48",
      county_fips,
      str_replace(tract, "\\.", "") %>% str_pad(6, pad="0")
    ),
    total_pop = `Total:`
  ) %>%
  select(GEOID, total_pop)

# combine + calculate pct of population living in institutional GQ
gq <- gq_p5 %>%
  left_join(gq_p1, by = "GEOID") %>%
  mutate(
    gq_institutional_pct = inst_pop / total_pop * 100
  ) %>%
  select(GEOID, gq_institutional_pct)

Warning in left_join(., gq_p1, by = "GEOID"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 22 of `x` matches multiple rows in `y`.
ℹ Row 22 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

# =========================================================
# Energy Burden (LEAD)
# =========================================================
lead <- read_excel(
  "non_acs_data/CLEANED_LEAD Tool Data Census Tracts (Feb 4, 2026 2_22pm).xlsx",
  sheet = "Austin Counties"
) %>%
  mutate(GEOID = as.character(`Geography ID`)) %>%
  select(GEOID, `Energy Burden (% income)`) %>%
  rename(energy_burden = `Energy Burden (% income)`)

# =========================================================
# Eviction Filings (BASTA)
# =========================================================
evict <- read_excel(
  "non_acs_data/evictions_travis_tract_basta_cleaned.xlsx"
) %>%
  mutate(
    GEOID = as.character(GEOID),
    evict_fil_rate = readr::parse_number(`Eviction filing rate`) * 100
  ) %>%
  select(GEOID, evict_fil_rate)

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `evict_fil_rate = readr::parse_number(`Eviction filing rate`) *
  100`.
Caused by warning:
! 20 parsing failures.
row col expected actual
  2  -- a number      -
 26  -- a number      -
 33  -- a number      -
 35  -- a number      -
 79  -- a number      -
... ... ........ ......
See problems(...) for more details.

# =========================================================
# PLACES (physical inactivity)
# =========================================================
places <- read.csv(
  "non_acs_data/PLACES__Census_Tract_Data_(GIS_Friendly_Format),_2025_release_20260204.csv",
  colClasses = c(TractFIPS = "character")
) %>%
  select(TractFIPS, LPA_CrudePrev, LPA_Crude95CI) %>%
  rename(
    GEOID = TractFIPS,
    low_phys_activity = LPA_CrudePrev,
    low_phys_activity_ci = LPA_Crude95CI
  )

# =========================================================
# Persistent Poverty
# =========================================================
pers_pov_list <- read_excel(
  "non_acs_data/travis_hays_census-tracts-in-persistent-poverty.xlsx",
  sheet = "cleaned"
) %>%
  mutate(GEOID = as.character(Tract)) %>%
  select(GEOID) %>%
  mutate(
    pers_pov = 1,
    pers_pov_label = "Yes"
  )

# =========================================================
# Life Expectancy
# =========================================================
le <- read_csv(
  "non_acs_data/LE_2020_pre_imputation.csv",
  col_types = cols(.default = col_guess())
) %>%
  mutate(GEOID = as.character(GEOID_TRACT_20)) %>%
  select(
    GEOID,
    le_2020_normalized,
    data_coverage
  ) %>%
  rename(
    le_data_coverage = data_coverage
  )

# =========================================================
# Join all non-ACS to ACS
# =========================================================
coa_master_2024 <- acs_2024 %>%
  left_join(pct_austin, by = "GEOID") %>% 
  left_join(heat, by = "GEOID") %>%
  left_join(gq, by = "GEOID") %>%
  left_join(lead, by = "GEOID") %>%
  left_join(evict, by = "GEOID") %>%
  left_join(places, by = "GEOID") %>%
  left_join(pers_pov_list, by = "GEOID") %>%
  left_join(le, by = "GEOID") %>%
  mutate(
    pers_pov = if_else(is.na(pers_pov), 0, pers_pov),
    pers_pov_label = if_else(is.na(pers_pov_label), "No", pers_pov_label)
  )


# =========================================================
# Save final master spreadsheet
# =========================================================
write_xlsx(
  coa_master_2024,
  "coa_master_pre_impute_021826.xlsx"
)

# preview
head(coa_master_2024)

# A tibble: 6 × 72
  GEOID    NAME  acs_year total_population total_population_moe total_households
  <chr>    <chr>    <int>            <dbl>                <dbl>            <dbl>
1 4820901… Cens…     2024             9890                 1253             3288
2 4845300… Cens…     2024             5096                  509             2588
3 4845300… Cens…     2024             2112                  293             1021
4 4845300… Cens…     2024             2814                  201             1246
5 4845300… Cens…     2024             2686                  292             1397
6 4845300… Cens…     2024             4410                  495             2799
# ℹ 66 more variables: total_households_moe <dbl>, disability_pct <dbl>,
#   disability_pct_moe <dbl>, amb65_pct <dbl>, amb65_pct_moe <dbl>,
#   lths_pct <dbl>, lths_pct_moe <dbl>, med_hh_income <dbl>,
#   med_hh_income_moe <dbl>, no_internet_pct <dbl>, no_internet_pct_moe <dbl>,
#   lesh_pct <dbl>, lesh_pct_moe <dbl>, poverty_pct <dbl>,
#   poverty_pct_moe <dbl>, prek_pct <dbl>, prek_pct_moe <dbl>,
#   uninsured_pct <dbl>, uninsured_pct_moe <dbl>, underemp_pct <dbl>, …