Disparities in Food Access across United States Census Tracts
“Food Desert” by Shirley Cannon
Introduction
This project explores various disparities in food access across U.S. census tracts using data from the USDA Food Access Research Atlas. The dataset contains 72,531 observations and 147 variables, and provides information on socioeconomic conditions, transportation access, and geographic characteristics. After cleaning the original excel file to include only relevant variables, there were 17 variables in total left.
Research Question:How are socioeconomic and structural factors associated with low food access across communities?
The key predictors in this analysis are poverty rate, median family income, vehicle access, and urban classification, and how these factors relate to food access. Food access is measured as both a continuous variable (the proportion of the population living far from supermarkets) and as a categorical indicator of food deserts.
I selected this topic because limited access to healthy food is associated with disparities in nutrition, chronic disease, and overall well-being. I believe that understanding these patterns at the community level can help inform public health interventions and policy decisions aimed at achieving food equity.
Background Research:
Limited access to healthy and affordable food remains an important public health issue in the United States. Areas with low food access, also known as “food deserts,” are communities where residents have difficulty accessing supermarkets or stores with nutritious food options due to distance, transportation barriers, or low income (U.S. Department of Agriculture [USDA], 2023). Research shows that transportation access plays a major role in food access disparities, especially in rural and low-income communities where residents may lack reliable access to vehicles or public transportation (Centers for Disease Control and Prevention [CDC], 2025; U.S. Hunger, 2023). In addition, food access challenges have been linked to increased risks of chronic conditions such as obesity and diabetes, as limited access to healthy food can lead to increased reliance on ultra-processed or less nutritious food options (Walker et al., 2010; Beaulac et al., 2009). According to the USDA, both income and vehicle access are key factors that influence whether households can consistently obtain healthy food (USDA, 2023). This highlights the importance of examining how poverty, transportation, and geographic location contribute to food access disparities across communities.
Variables
Variable
Definition
Role
CensusTract
Unique identifier for each census tract (small geographic unit used by the U.S. Census Bureau)
Identifier
State
State in which the census tract is located
Identifier
County
County in which the census tract is located
Identifier
Urban
Indicates whether the tract is classified as urban (1) or rural (0)
Predictor
PovertyRate
Percentage of the population living below the federal poverty line
Predictor
MedianFamilyIncome
Median family income within the census tract
Predictor
HUNVFlag
Indicates low vehicle access (tracts where many households do not have a vehicle and are far from a supermarket)
Predictor
LILATracts_1And10
Indicates whether a tract is both low-income and has low access to supermarkets (food desert indicator)
Outcome
lapop1share
Proportion of the population living more than 1 mile from a supermarket
Alternative Outcome
Load Libraries
# Load required librarieslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load datasetsetwd("C:/Users/cathe/OneDrive/Desktop/Montgomery College Transition/2025-2026 MONTGOMERY COLLEGE TRANSITION/MC COURSES 25-26/Spring 2026/DATA 110/02. Projects/Project 3 - Final")food_data <-read_csv("FoodAccessResearchAtlasData2019_original.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 72531 Columns: 147
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (114): CensusTract, State, County, MedianFamilyIncome, LAPOP1_10, LAPOP0...
dbl (33): Urban, Pop2010, OHU2010, GroupQuartersFlag, NUMGQTRS, PCTGQTRS, L...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 72531 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): CensusTract, State, County, MedianFamilyIncome, lapop1, lapop1share
dbl (11): Urban, Pop2010, OHU2010, GroupQuartersFlag, NUMGQTRS, PCTGQTRS, LI...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data Cleaning
Explore the data
# View structure of datasethead(food_data_updated)
Warning: There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `low_access_share = as.numeric(low_access_share)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
CensusTract State County urban
Length:52047 Length:52047 Length:52047 0:17219
Class :character Class :character Class :character 1:34828
Mode :character Mode :character Mode :character
poverty_rate median_income low_vehicle_access food_desert
Min. : 0.0 Min. : 2499 0:39375 0:42837
1st Qu.: 6.0 1st Qu.: 54061 1:12672 1: 9210
Median :10.8 Median : 70455
Mean :13.7 Mean : 78216
3rd Qu.:18.4 3rd Qu.: 94375
Max. :99.5 Max. :250001
low_access_share
Min. : 0.00
1st Qu.: 17.86
Median : 55.04
Mean : 53.97
3rd Qu.: 93.78
Max. :100.00
To prepare the data for analysis, I first selected only the variables that were relevant to my research question and removed any unnecessary columns from the excel file. I then renamed the variables to make them easier to understand and work with. Some variables were converted into the correct format, such as turning categorical variables into factors and converting the low access variable into a numeric value so it could be used in graphs and regression. I noticed that some values could not be converted properly, which created missing values. To keep the analysis accurate, I removed observations with missing values in the outcome variable as well as median income variable. After cleaning, the data set was ready for exploratory data and statistical analysis.
Updated List of Variables after cleaning
Variable
Definition
Role
CensusTract
Unique identifier for each census tract, a small geographic unit used for statistical analysis by the U.S. Census Bureau
Identifier
State
U.S. state in which the census tract is located
Identifier
County
County in which the census tract is located
Identifier
urban
Indicator of whether the census tract is classified as urban (1) or rural (0) based on population density and land use
Predictor
poverty_rate
Percentage of individuals in the census tract living below the federal poverty line
Predictor
median_income
Median family income within the census tract, representing the midpoint of household income distribution
Predictor
low_vehicle_access
Indicator of whether the census tract has low vehicle access, meaning a significant share of households do not have access to a vehicle and may face transportation barriers to reaching food retailers
Predictor
food_desert
Binary indicator of whether the census tract is classified as a low-income, low-access area (i.e., a “food desert”) based on distance to supermarkets and income thresholds
Secondary Outcome
low_access_share
Proportion of the population in the census tract that lives more than one mile from the nearest supermarket, representing the level of geographic food access
Primary Outcome
In this analysis, low_access_share is used as the primary outcome variable because it is a continuous measure of food access, while food_desert is included as a secondary categorical indicator.
Statistical Analysis
I plan to run a multiple linear regression, however, before doing this, I will run a correlation matrix to check for multicolinearity
library(corrplot)
Warning: package 'corrplot' was built under R version 4.5.3
Warning in ind1:ind2: numerical expression has 2 elements: only the first used
The correlation matrix shows that most of the relationships between the variables are weak. The strongest relationship is between poverty rate and median income (r = -0.66), which indicates a moderately strong negative relationship. This means that areas with higher poverty tend to have lower income which is expected. However, the relationships between low food access and both poverty rate (r = -0.11) and median income (r = -0.02) are very weak. This is surprising but may also suggest that poverty and income do not strongly explain differences in low food access across census tracts on their own in this data set. Overall, the results show that there is weak linear relationship between the main variables and low food access. This can be further examined in the multiple linear regression below
Multiple Linear Regression
Model 1
model <-lm(low_access_share ~ poverty_rate + median_income + urban + low_vehicle_access, data = clean_data)summary(model)
Call:
lm(formula = low_access_share ~ poverty_rate + median_income +
urban + low_vehicle_access, data = clean_data)
Residuals:
Min 1Q Median 3Q Max
-88.658 -25.990 0.538 16.765 89.460
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.675e+01 5.732e-01 151.352 < 2e-16 ***
poverty_rate -3.276e-01 1.723e-02 -19.016 < 2e-16 ***
median_income 1.805e-05 5.002e-06 3.607 0.000309 ***
urban1 -4.578e+01 2.858e-01 -160.190 < 2e-16 ***
low_vehicle_access1 3.821e+00 3.298e-01 11.587 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 29.62 on 52042 degrees of freedom
Multiple R-squared: 0.3489, Adjusted R-squared: 0.3489
F-statistic: 6972 on 4 and 52042 DF, p-value: < 2.2e-16
Model 2:
I performed backward elimination and removed Median Income as the effect size is small.
Call:
lm(formula = low_access_share ~ poverty_rate + urban + low_vehicle_access,
data = clean_data)
Residuals:
Min 1Q Median 3Q Max
-87.715 -25.959 0.543 16.691 91.562
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.55951 0.27821 318.32 <2e-16 ***
poverty_rate -0.36719 0.01329 -27.64 <2e-16 ***
urban1 -45.53610 0.27759 -164.04 <2e-16 ***
low_vehicle_access1 3.74428 0.32914 11.38 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 29.62 on 52043 degrees of freedom
Multiple R-squared: 0.3487, Adjusted R-squared: 0.3487
F-statistic: 9289 on 3 and 52043 DF, p-value: < 2.2e-16
Interpretation of Final Regression Model
The intercept is 88.56 (p < 2e-16), which represents the expected percentage of low food access when all predictors are equal to zero. While this value is statistically significant, it is not very meaningful in practice because having zero values for all predictors is unrealistic.
For poverty rate, on average, a 1 percentage point increase in poverty is associated with about a 0.37 percentage point decrease in low food access (β = -0.37, p < 2e-16). This relationship is statistically significant. However, the negative direction is unsual. perhpas, poverty on its own may not fully explain differences in food access.
For urban classification, urban areas have about 45.54 percentage points lower low food access compared to rural areas (β = -45.54, p < 2e-16). This difference is statistically significant and is the strongest effect in the model. This result indicates that urban areas generally have much better access to food than rural areas.
For low vehicle access, areas with limited access to vehicles have about 3.74 percentage points higher low food access (β = 3.74, p < 2e-16). This relationship is statistically significant and suggests that transportation barriers are an important factor influencing food access.
The adjusted R-squared for Model 1 (full model) was 0.3489 (34.89%), while the adjusted R-squared for Model 2 (reduced model) is 0.3487(34.87%). This shows that removing median income resulted in a tiny change in the model’s explanatory power.
Because the adjusted R-squared remained nearly the same, I selected Model 2 as the final model.
Diagnostic Plots
autoplot(model2, nrow =2, ncol =2)
Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Interpretation of Diagnostic Plots
Residuals vs Fitted: The residuals are mostly scattered around zero, but there is slight curve in the smooth line. The clustered patterns are expected due to categorical variables like urban. Overall, linearity is mostly satisfied with minor deviations.
Normal Q-Q: The points generally follow the reference line, indicating approximate normality. There are some deviations at the tails, suggesting minor non-normality, but no major issues.
Scale-Location: The spread of residuals is not perfectly constant and shows a slight trend. This suggests mild heteroscedasticity, but it is not severe.
Residuals vs Leverage: Most points have low leverage, with a few slightly higher ones. However, none appear highly influential, so no single observation strongly affects the model.
Visualizations
Alluvial
I chose an alluvial plot to visualize how multiple factors, such as urban/rural classification and vehicle access, influence food access. This type of plot is useful for showing how observations flow across categories and helps reveal patterns that are not easily seen in simpler plots.
I used a faceted density plot to better compare how low food access is distributed across urban/rural groups. This type of plot provides a clearer view of patterns than a scatterplot and allows me to compare urban and rural areas while also separating the data by vehicle access.
ggplot(clean_data, aes(x = low_access_share, fill = urban)) +geom_density(alpha =0.5) +facet_wrap(~ low_vehicle_access,labeller =labeller(low_vehicle_access =c("0"="Adequate Vehicle Access","1"="Low Vehicle Access" ) )) +scale_fill_manual(values =c("0"="#AFCBFF", "1"="#F4A6A6"),labels =c("Rural", "Urban") ) +labs(title ="Distribution of Low Food Access by Urban Status and Vehicle Access",x ="Low Food Access (%)",y ="Density",fill ="Urban Classification",caption ="This plot compares the distribution of low food access across urban and rural areas, separated by levels of vehicle access." ) +theme_minimal()
Essay
The alluvial plot shows how census tracts move across urban/rural classification, vehicle access, and levels of food access. The width of each flow represents the number of census tracts in each category. A clear pattern is that areas with low vehicle access tend to fall into higher levels of food access issues, while areas with better vehicle access are more likely to have fewer access issues. Additionally, rural areas appear to have a larger proportion of census tracts with high access issues compared to urban areas, suggesting that both transportation access and geographic location play important roles in food access disparities.
The faceted density plot shows how low food access is distributed across urban and rural areas, separated by vehicle access. In both panels, rural areas tend to have higher levels of low food access, as distributions are concentrated at higher percentages. These differences are more noticeable in areas with low vehicle access, where both urban and rural communities experience higher food access challenges. Overall, both visualizations highlight consistent patterns showing that transportation and location are key factors influencing food access.
One interesting pattern that stands out is that poverty does not appear to have a strong relationship with food access on its own, per the regression results, even though it is often assumed to be a major factor. Instead, structural factors such as transportation access and whether an area is urban or rural seem to play a more important role.
One limitation of this project is that some potential visualizations were difficult display. For example, initial scatterplots during exploratory data analysis were severely cluttered due to the large dataset and did not clearly show patterns, most scatterplots showed weak relationships. I also considered making the alluvial plot interactive using plotly, but this was not compatible with the required format of this project (Rpubs). Additionally, there may be other important variables, such as public transportation access or the number of nearby grocery stores, that were not included in the dataset but could have provided further insight. Overall, the visualizations suggest that food access disparities are complex and influenced more by structural factors than by income on its own.
References:
Beaulac, J., Kristjansson, E., & Cummins, S. (2009). A systematic review of food deserts, 1966–2007. Preventing Chronic Disease, 6(3).
Centers for Disease Control and Prevention (CDC). (2025). Food access and public health. https://www.cdc.gov/pcd/issues/2025/24_0458.htm
U.S. Department of Agriculture (USDA). (2023). Food access research atlas documentation. https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation
U.S. Hunger. (2023). Transportation and food insecurity. https://ushunger.org/blog/transportation-food-insecurity/
Walker, R. E., Keane, C. R., & Burke, J. G. (2010). Disparities and access to healthy food in the United States: A review of food deserts literature. Health & Place, 16(5), 876–884.
Visualization techniques for the faceted density plot were created with assistance from ChatGPT (OpenAI)