Estimating the Preventable Burden of Dementia in Cameron County: A Quasi-Causal Modeling Approach

Executive Summary and Overview:

Alzheimer’s Disease and Related Dementias (ADRD) pose a growing public health challenge, particularly as the global population ages. Preventing dementia through reduction of modifiable risk factors is a key strategy, but reliable local-level data are often limited by missing responses and small sample sizes. This white paper presents a proof-of-concept pipeline leveraging multiple imputation techniques on 2023 Texas Behavioral Risk Factor Surveillance System (BRFSS) data to estimate the prevalence of eleven modifiable dementia risk factors at the county level in Cameron County, Texas. By applying predictive mean matching and survey-weighted analysis, we improved data completeness and produced age-stratified prevalence estimates, with midlife adults (ages 45-64) showing the highest burden of risk factors such as high cholesterol, hypertension, and obesity. Principal component analysis revealed clustering among risk factors like diabetes, excessive alcohol consumption, and depression, indicating potential targets for integrated intervention. Modeling a modest 10% reduction across all risk factors suggests that approximately 350 dementia cases could be prevented in this population. These findings demonstrate the value of multiple imputation in enhancing small-area risk estimates and provide actionable insights for local public health programming aimed at reducing dementia risk and improving community health outcomes.

Objective:

To calculate point prevalence estimates in smaller geographic units using multiple imputation to account for missing data in order to construct population attributable fractions (PAF), potential impact fractions (PIF), and potential impact on dementia cases associated with a reduction in risk factor prevalence.

Methods and Design:

We estimated the prevalence and relative impact of eleven (11) modifiable dementia risk factors in Cameron County, Texas, using 2023 Texas BRFSS data. To address missing data, multiple imputation was used (predictive mean matching, m = 20) rather than multi-year BRFSS pooling to assess a greater number of Lancet Commission modifiable risk factors within the data; checks for convergence confirmed that distributions of imputed categorical data matched those of the original data. Survey-weighted totals were pooled across imputations and stratified by age group for risk factors identified as relevant. We computed prevalence proportions and applied principal component analysis (PCA) to a tetrachoric correlation matrix to estimate shared variance (communalities, H²).

Results:

Prevalence point estimates were built for each age group, with midlife populations (i.e. 45 to 64 years of age) representing the majority of prevention opportunities with high cholesterol, high blood pressure, and obesity being present in >49% of those in midlife. Interrelated risk clustering was also observed with highest communalities, or shared correlation with other risk factors as observed by Lee et al (Lee et al. 2022a), noted for diabetes (H² = 79.2%), excessive alcohol consumption (H² = 78.9%), and depression (H² = 75.4%). Finally, when a modest (10%) reduction in prevalence across all eleven risk factors was considered, an estimated 352 dementia cases could have been prevented.

Foreword:

In 1999, Dr. Gladys Maestre wrote:

…people who choose neuroscience as their field of endeavor face a special burden because neural problems are not high on the list of public health priorities, little funding is available, the public does not [apply pressure] to solve these problems, [and] researchers …are often secretive about their work.

To compensate for these difficulties, the successful researcher searches for creative solutions such as strategic alliances [involving two or more teams that share resources and information to their mutual benefit]
.” (Maestre 1999)

Nearly 30 years later, with tremendous advances in the field of Alzheimer’s Disease and Related Dementia (ADRD) risk factor reduction and prevention, these words remain as relevant as ever if not more so in the face of an aging global population and a renewed interest in ADRD programming in public health.

Thus, in the spirit of these words, this white paper and the work within it has been developed to empower public health professionals large and small in the hopes that open access, functional work built to solve problems at home can serve to foster strategic alliances that solve problems across the world.

1. Introduction:

Alzheimer’s Disease and Related Dementias (ADRD) are becoming increasingly recognized as a global health crisis owing to the number of people in the aging population. As modifiable risk factor reduction across the lifespan becomes a front-line intervention to mitigate dementia onset, increasingly granular epidemiological approaches in public health programming are needed to evaluate programming outcomes and ensure that prevention efforts by state and local health departments are aligned with known risk data (such as those identified by the 2024 Lancet Commission) as well as conditions of high burden within communities. Accordingly, health departments can better contextualize community approaches to ADRD prevention and establish high-yield best practices.

Thus, to support public health programming for ADRD, a “proof-of-concept” pipeline was established to produce modifiable risk factor prevalence estimates within smaller geographic units (e.g. counties), identify which risk factors are most highly correlated, approximate the effects of modest improvement in prevalence (simulating small-scale but effective outreach impact), and ultimately calculate potentially preventable cases of dementia in that area as a result. This work aims to improve quantification of public health programming outcomes in small and large communities. In turn, health departments can meaningfully improve health outcomes in a cost-effective fashion, support policy development through analysis, and build community trust through contextually relevant outreach and programming.

2. Methods and Design:

This work focuses on using readily available “microdata” like county-level survey data found within state BRFSS datasets and U.S. Census Bureau data to estimate how much dementia prevalence could be reduced in a specific area if common modifiable risk factors could be reduced.

This work accomplishes all of this through:

  • Imputation of missing risk factor data through the use of Random Forest (“rf”) via the “mice” package in R.

  • Estimation of age-stratified prevalence through the use of the “survey” package in R.

    • A version stratified by age and binary race/ethnicity (including race/ethnicity as a predictor) was also created but is questionable for reasons owing to data granularity and stability.
  • Calculation of Population Attributable Fractions (PAFs) based heavily off of the Lancet 2024 supplementary material methodologies (Livingston, n.d.)through use of the “psych” package in R.

  • Calculation of Potential Impact Fractions (PIFs), and adjustments thereof in line with the work found in Ma’u et al. (Ma’u et al. 2025) and Lee et al. (Lee et al. 2022b).

  • Projection of preventable dementia burden through these fractions as found in Ma’u et al. and Lee et al. (ibid)

In turn, local health departments and state departments are better able to reasonably identify risk factors that contribute most to ADRD prevalence and approximate the prospective effects of outreach and programming impact across communities through a calculation of the number of dementia cases they may prevent as a result. As such, policy and the effects of public health programming in small and large organizations can be further quantified.

2.1 Data Source and Variables

This analysis utilizes publicly available data from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), restricted to Cameron County, Texas. BRFSS is a state-based, cross-sectional telephone survey conducted by the CDC to collect information on health-related risk behaviors, chronic health conditions, and preventive service use among U.S. adults. The survey employs a complex sampling design, including stratification, clustering, and weighting, to ensure population-level representativeness.

Cameron County was selected due to its relevance as a prototype jurisdiction for local dementia prevention efforts and its demographic similarity to other underserved U.S.–Mexico border communities.

Study Population

Adults aged 18 and older residing in Cameron County who completed the BRFSS core module were included. For stratified analyses, age was grouped into three categories:

  • 18–44 years (agegr3 = 1)
  • 45–64 years (agegr3 = 2)
  • 65+ years (agegr3 = 3)

Key Variables

We operationalized 12 modifiable risk factors based on the 2024 Lancet Commission report on Dementia Prevention, Intervention, and Care. Each risk factor was linked to a corresponding BRFSS variable and classified as either direct (derived from explicit survey response across one or more answer choics) or proxy (constructed from related but indirect indicators). Afterwards, binary indicators were derived based on specific coding keys for each risk factor.

Risk Factor BRFSS Variable BRFSS Variable Label Coding Key Description Measurement Type
Vision Loss c09q02 …blind or [have serious difficulty seeing], even when wearing glasses? 1 Self-reported serious difficulty seeing, even with glasses (questionnaire) Direct
High Cholesterol cholch1 Have had cholesterol checked and told it was high 1 Had cholesterol checked and told it was high (calculated) Direct
Less Education educat3a Educational Attainment 1 Less than high school education (calculated) Direct
Physical Inactivity pacat Physical activity categories 3–4 Sedentary or limited physical activity (calculated) Direct
Smoking rfsmok Current Smoker 2 “Current Smoker”, derived from variable C11Q02, “Do you now smoke cigarettes every day, some days, or not all?” (calculated) Direct
Excessive Alcohol Use rfdrhv2 Heavy drinking 2 More than 14 drinks/week (men) or 7 drinks/week (women) (calculated) Direct
High Blood Pressure highbp Doctor diagnosed high blood pressure 1 Doctor diagnosed high blood pressure (calculated) Direct
Obesity bmicat5 BMI – 5 categories 4–5 BMI across five categories ranging from Underweight (1) to Extremely Obese (5) (calculated) Direct
Diabetes diabetes (Ever told you had) diabetes? 1 Ever told they had diabetes (excluding gestational) Direct
Hearing Loss c09q01 Are you deaf or do you have serious difficulty hearing? 1 Self-reported serious difficulty hearing Direct
Depression c07q09 (Ever told you had) a depressive disorder (including depression, major depression, dysthymia, or minor depression)? 1 Ever told by a provider they had a depressive disorder Direct
Infrequent Social Contact c08q04 Marital Status 2–5 Respondents who were either Divorced, Widowed, Separated, or Never Married Proxy

Notes: - All indicators were binarized based on BRFSS response categories, aligned to the coding keys shown above (e.g., response category "1" for “Yes” or "4–5" for highest BMI brackets). - These harmonized variables were used in subsequent clustering, regression modeling, and population attributable fraction (PAF) analyses.

Ultimately, the risk factor for physical inactivity was dropped from our analysis owing to non-response resulting in no variation among observed answers. In such an occasion, there was insufficient data upon which to perform multiple imputation and, thus, “pacat” was dropped.

Survey Design Variables

To account for BRFSS’s complex sampling design, the following elements were used in all weighted analyses:

  • Weight variable: llcpwt
  • Stratification: ststr

Prevalence point estimates were generated using the svytotal() and interaction() functions from the R survey package. To address “lonely” or singleton primary sampling units (PSUs), which are sampling units that appear alone within a stratum and can bias variance estimates, we applied the options survey.lonely.psu = “adjust” and survey.adjust.domain.lonely = TRUE.

These options apply a correction that uses residuals from the population mean to avoid underestimating variance (Lumley 2010). While lonely PSUs can complicate variance estimation, real-world survey data often include such singletons alongside extensive non-response. Since this analysis pipeline aims to extend data usability for smaller geographic units, adjusting for lonely PSUs was considered an appropriate and necessary step.

2.2 Multiple Imputation/Handling Missing Data

To address missing data commonly encountered with survey data, we employed multiple imputation using the mice R package, generating 20 imputed datasets using random forest (rf) imputation, which has been demonstrated to perform well with mixed and categorical data types [Shah et al., 2014]. We additionally ran sensitivity analyses using alternative imputation methods (pmm, cart, sample, and midastouch) and found that all approaches produced highly similar post-imputation distributions (mean absolute proportional differences <4%). These results suggest the missingness mechanism was stable and the imputation models converged on similar estimates, supporting the robustness of our findings.

Although imputed values minimally altered the marginal distributions of variables relative to the raw data, imputation was essential for preserving sample size and enabling stratified, survey-weighted analyses, as well as multivariate modeling (e.g., PCA on dichotomized risk factors). We therefore proceeded with the rf-imputed dataset for all downstream analyses to balance accuracy, interpretability, and completeness.

2.3 Tetrachoric Matrices, PCA, and Communality Construction

To accurately estimate the Potential Impact Fraction (PIF) and its adjusted form, we accounted for correlation and shared variance among the retained eleven risk factors by deriving communalities for each risk factor. These communalities quantify the proportion of variance a given risk factor shares with the others and are essential for computing the adjusted PIF, which aims to avoid overestimating the preventable dementia burden by accounting for redundancy or overlap across predictors (Ma’u et al. 2025; Lee et al. 2022b).

Because all retained risk factors were binarized, we used tetrachoric correlation matrices to estimate the inter-item correlations appropriate for dichotomous variables in line with the procedures described in the Lancet 2024 supplementary materials (Livingston et al. 2024). These were calculated across each of the 20 imputed datasets.

Principal Components Analysis (PCA) was then conducted on each matrix to extract latent components underlying the shared variance structure. Across all imputations, four components had eigenvalues ≥ 1, aligning with the Kaiser criterion. Accordingly, we fixed the number of components to four in all datasets.

Given our use of multiple imputation, where simply pooling derived communalities produced by the psych::principal() function may introduce compounded noise or obscure shared structure across imputations, we instead pooled the unrotated factor loadings themselves and calculated communalities from the squared loadings. This approach preserves the structural information in the underlying latent factor model and helps avoid inconsistencies introduced by imputation-driven variability in factor solutions. This approach also closely mirrors the approach employed by Livingston et al. Ideally, this approach should be more stable and interpretable as opposed to averages of secondary summaries.

It’s worth noting that pooling loadings without sign alignment can underestimate communalities, potentially leading to overoptimistic adjustments to our Potential Impact Fractions (PIF) (i.e. we risk overestimating the adjusted impact of reducing that risk factor). Given our use of multiple imputed datasets to capture data uncertainty, we accepted this limitation as a reasonable trade-off in this context. To account for overly optimistic impact fractions, we introduced a 10% reduction in prevalence rather than the 15% reduction suggested in the existing literature.

To our knowledge, this approach has not been routinely applied in small-area or BRFSS-based modeling and offers a replicable pathway toward more accurate estimation of the impact of overlapping, modifiable risk factors.

2.4 PAF/PIF Construction and Adjustment

Population Attributable Fractions (PAFs) and Potential Impact Fractions (PIFs) were calculated to estimate the proportion of dementia cases attributable to modifiable risk factors and the impact of reducing their prevalence, respectively. PAFs quantify the fraction of cases attributable to a risk factor based on its prevalence (P) and relative risk (RR):

\[PAF=P(RR−1)/P(RR−1)+1\]

PIFs estimate the proportional reduction in cases if the risk factor prevalence is reduced from \(P\) to a target prevalence \(P′\):

\[PIF=(P−P′)(RR−1)​/P(RR−1)​+1\]

To account for the overlapping risk factors and avoid overestimating combined effects, PIFs were adjusted using communalities derived from principal component analysis, representing shared variance among variables. The adjusted PIF (AdjPIF) for each risk factor was calculated as:

\[AdjPIF=(1−communality)×PIF\]

A conservative target reduction of 10% (rather than 15%) in prevalence was applied to all risk factors to mitigate overestimation bias. Physical inactivity was excluded from analysis due to data limitations.

2.5 Calculate Potential Impact on Dementia Cases

The total weighted adjusted PIF across all risk factors was computed assuming multiplicative effects to estimate the overall proportional reduction in dementia cases:

\[\text{Total Weighted PIF}=1−∏_i(1−AdjPIF_i) \]

This estimate was applied to the baseline number of dementia cases aged 65 and older in Cameron County (n=8,800 in 2020) to calculate the potential number of cases preventable by reducing risk factor prevalence:

\[ \text{Cases Prevented}= \text{Number of Dementia Cases}×\text{Total Weighted PIF}\]

This method provides a conservative estimate of the public health impact of modifiable risk factor reduction in this population.

Other Stuff/Discussion

Lee, Mark, Eric Whitsel, Christy Avery, Timothy M. Hughes, Michael E. Griswold, Sanaz Sedaghat, Rebecca F. Gottesman, Thomas H. Mosley, Gerardo Heiss, and Pamela L. Lutsey. 2022b. “Variation in Population Attributable Fraction of Dementia Associated With Potentially Modifiable Risk Factors by Race and Ethnicity in the US.” JAMA Network Open 5 (7): e2219672. https://doi.org/10.1001/jamanetworkopen.2022.19672.
———. 2022a. “Variation in Population Attributable Fraction of Dementia Associated With Potentially Modifiable Risk Factors by Race and Ethnicity in the US.” JAMA Network Open 5 (7): e2219672. https://doi.org/10.1001/jamanetworkopen.2022.19672.
Livingston, Gill. n.d. “The Lancet 2024 Supplementary Appendix.” https://www.thelancet.com/cms/10.1016/S0140-6736(24)01296-0/attachment/95b06bf4-f411-4c87-b960-00b474cdd26f/mmc1.pdf.
Livingston, Gill, Jonathan Huntley, Kathy Y Liu, Sergi G Costafreda, Geir Selbæk, Suvarna Alladi, David Ames, et al. 2024. “Dementia Prevention, Intervention, and Care: 2024 Report of the Lancet Standing Commission.” The Lancet 404 (10452): 572–628. https://doi.org/10.1016/s0140-6736(24)01296-0.
Lumley, Thomas. 2010. Complex Surveys: A Guide to Analysis Using R. 1st ed. Wiley. https://doi.org/10.1002/9780470580066.
Ma’u, Etuini, Naaheed Mukadam, Gill Livingston, Sebastian Walsh, Susanne Röhr, Carol Brayne, Gary Cheung, and Sarah Cullum. 2025. “Estimating the Impact of Risk Factor Reduction on Dementia Prevalence in New Zealand.” Alzheimer’s & Dementia 21 (7). https://doi.org/10.1002/alz.70440.
Maestre, Gladys. 1999. “Strategic Alliances in Neuroscience,” August. https://research.ebsco.com/c/y5wonk/search/details/bh53z2awff?db=a9h.