Food Access

Author

Chibogwu Onyeabo

Food Access

Introduction

This data comes from the Economic Research Service from the U.S. Department of Agriculture and contains parameters for citizens across the country that have limited food access. Low food access, according to the USDA, is determined by accessibility to sources of healthy food, individual factors that may affect accessibility, and neighborhood-level indicators. The variables include vehicle access, housing data, and the number of children, seniors, and low income individuals that are considered to have low food access. These groups are divided by their distance to a supermarket; beyond a half mile, one mile, 10 miles, and 20 miles.Using this dataset, I plan investigate the relationship between low food access and low income populations.

#uploading the data and related libraries
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(RColorBrewer)
foodaccess <- read_csv("food_access.csv")

Rows: 3142 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): County, State
dbl (23): Population, Housing Data.Residing in Group Quarters, Housing Data....

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(foodaccess) #brief glimpse of the dataset

# A tibble: 6 × 25
  County         Population State  Housing Data.Residin…¹ Housing Data.Total H…²
  <chr>               <dbl> <chr>                   <dbl>                  <dbl>
1 Autauga County      54571 Alaba…                    455                  20221
2 Baldwin County     182265 Alaba…                   2307                  73180
3 Barbour County      27457 Alaba…                   3193                   9820
4 Bibb County         22915 Alaba…                   2224                   7953
5 Blount County       57322 Alaba…                    489                  21578
6 Bullock County      10914 Alaba…                   1690                   3745
# ℹ abbreviated names: ¹`Housing Data.Residing in Group Quarters`,
#   ²`Housing Data.Total Housing Units`
# ℹ 20 more variables: `Vehicle Access.1 Mile` <dbl>,
#   `Vehicle Access.1/2 Mile` <dbl>, `Vehicle Access.10 Miles` <dbl>,
#   `Vehicle Access.20 Miles` <dbl>,
#   `Low Access Numbers.Children.1 Mile` <dbl>,
#   `Low Access Numbers.Children.1/2 Mile` <dbl>, …

names(foodaccess) #viewing the variables the dataset contains

 [1] "County"                                       
 [2] "Population"                                   
 [3] "State"                                        
 [4] "Housing Data.Residing in Group Quarters"      
 [5] "Housing Data.Total Housing Units"             
 [6] "Vehicle Access.1 Mile"                        
 [7] "Vehicle Access.1/2 Mile"                      
 [8] "Vehicle Access.10 Miles"                      
 [9] "Vehicle Access.20 Miles"                      
[10] "Low Access Numbers.Children.1 Mile"           
[11] "Low Access Numbers.Children.1/2 Mile"         
[12] "Low Access Numbers.Children.10 Miles"         
[13] "Low Access Numbers.Children.20 Miles"         
[14] "Low Access Numbers.Low Income People.1 Mile"  
[15] "Low Access Numbers.Low Income People.1/2 Mile"
[16] "Low Access Numbers.Low Income People.10 Miles"
[17] "Low Access Numbers.Low Income People.20 Miles"
[18] "Low Access Numbers.People.1 Mile"             
[19] "Low Access Numbers.People.1/2 Mile"           
[20] "Low Access Numbers.People.10 Miles"           
[21] "Low Access Numbers.People.20 Miles"           
[22] "Low Access Numbers.Seniors.1 Mile"            
[23] "Low Access Numbers.Seniors.1/2 Mile"          
[24] "Low Access Numbers.Seniors.10 Miles"          
[25] "Low Access Numbers.Seniors.20 Miles"

#sorting
tinyfoodaccess <- foodaccess |>
  arrange(desc("Low Access Numbers.People.10.Miles")) |>
  
  #eliminating unncessary columns and only keeping relevant ones
  select(!("Housing Data.Residing in Group Quarters":"Low Access Numbers.Low Income People.1/2 Mile")) |> #keep low income (10) column
  select(!("Low Access Numbers.Low Income People.20 Miles":"Low Access Numbers.People.1/2 Mile")) |>
  select(!("Low Access Numbers.People.20 Miles":"Low Access Numbers.Seniors.20 Miles"))

Linear Regression Analysis

For my linear regression, I chose to compare the rate of low access individuals per population and rate of low income individuals per population, both on the 10 miles radius. Low food access is my dependent variable and low income is the independent variable. I assume that the two variables have a strong relationship as having a low income can make it much more difficult to access food. They’re less likely to be able to afford food and have means of transportation to their nearest supermarket.

names(tinyfoodaccess)[4] <- "LowAccessLowIncome10Miles"
names(tinyfoodaccess)[5] <- "LowAccessPeople10Miles"

tinyfoodaccess <- tinyfoodaccess |>
  #new column for % low access, 10 miles per county population
  mutate(lowaccessrate10 = (LowAccessPeople10Miles / Population)*100) |>
  #new column for % low income, 10 miles
  mutate(lowincomerate10 = (LowAccessLowIncome10Miles / Population) * 100)

#getting rid of scientific notation
tinyfoodaccess$lowaccessrate10 <- format(tinyfoodaccess$lowaccessrate10, scientific = FALSE)
tinyfoodaccess$lowincomerate10 <- format(tinyfoodaccess$lowincomerate10, scientific = FALSE)

#this turned the factors into characters so im switching them back to numeric
tinyfoodaccess$lowincomerate10 <- as.numeric(tinyfoodaccess$lowincomerate10)
tinyfoodaccess$lowaccessrate10 <- as.numeric(tinyfoodaccess$lowaccessrate10)

#round numbers to two decimal places
tinyfoodaccess$lowaccessrate10 <- round(tinyfoodaccess$lowaccessrate10, digits = 2)
tinyfoodaccess$lowincomerate10 <- round(tinyfoodaccess$lowincomerate10, digits = 2)

Linear Model

This is the resulting linear model for low food access (beyond 10 miles) as the dependent variable and low income (beyond 10 miles) as the independent variable. As expected, there is a strong correlation between the two. The given p-value is an incredibly small number, proving that this data is significant. The adjusted R-squared is about 88%, meaning the linear model greatly aligns with the original data.

#create linear model and assign it to variable "foodlm"
foodlm <- lm(lowaccessrate10 ~ lowincomerate10, data = tinyfoodaccess)
summary(foodlm)


Call:
lm(formula = lowaccessrate10 ~ lowincomerate10, data = tinyfoodaccess)

Residuals:
    Min      1Q  Median      3Q     Max 
-64.635  -1.206  -1.090   0.391  61.693 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      1.19682    0.13468   8.886   <2e-16 ***
lowincomerate10  2.45275    0.01603 153.045   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.534 on 3140 degrees of freedom
Multiple R-squared:  0.8818,    Adjusted R-squared:  0.8818 
F-statistic: 2.342e+04 on 1 and 3140 DF,  p-value: < 2.2e-16

Data Visualization

#group states by region to add third variable

Northeast <- c("Connecticut","Maine","Massachusetts","New Hampshire",
             "Rhode Island","Vermont","New Jersey","New York",
             "Pennsylvania")

Midwest <- c("Indiana","Illinois","Michigan","Ohio","Wisconsin",
             "Iowa","Kansas","Minnesota","Missouri","Nebraska",
             "North Dakota","South Dakota")
South <- c("Delaware","District of Columbia","Florida","Georgia",
            "Maryland","North Carolina","South Carolina","Virginia",
            "West Virginia","Alabama","Kentucky","Mississippi",
            "Tennessee","Arkansas","Louisiana","Oklahoma","Texas")
West <- c("Arizona","Colorado","Idaho","New Mexico","Montana",
            "Utah","Nevada","Wyoming","Alaska","California",
            "Hawaii","Oregon","Washington")
regionlist <- list(Northeast = Northeast, Midwest = Midwest, South = South, West = West)

tinyfoodaccess <- tinyfoodaccess |>
  mutate(Region = "x")

tinyfoodaccess$Region <- sapply(tinyfoodaccess$State, function(x) names(regionlist)[grepl(x, regionlist)])

tibble(tinyfoodaccess)

# A tibble: 3,142 × 8
   County         Population State LowAccessLowIncome10…¹ LowAccessPeople10Miles
   <chr>               <dbl> <chr>                  <dbl>                  <dbl>
 1 Autauga County      54571 Alab…                   2307                   5119
 2 Baldwin County     182265 Alab…                    846                   2308
 3 Barbour County      27457 Alab…                   2440                   4643
 4 Bibb County         22915 Alab…                    102                    365
 5 Blount County       57322 Alab…                      0                      0
 6 Bullock County      10914 Alab…                   1267                   2586
 7 Butler County       20947 Alab…                    556                   1334
 8 Calhoun County     118572 Alab…                      0                      0
 9 Chambers Coun…      34215 Alab…                    292                    680
10 Cherokee Coun…      25989 Alab…                     34                     91
# ℹ 3,132 more rows
# ℹ abbreviated name: ¹LowAccessLowIncome10Miles
# ℹ 3 more variables: lowaccessrate10 <dbl>, lowincomerate10 <dbl>,
#   Region <chr>

#scatterplot of low income vs. low access
fa <- tinyfoodaccess |>
  ggplot(aes(lowaccessrate10, lowincomerate10, text = paste("State:", State, "\nCounty:", County))) +
  geom_point(aes(color = Region)) +
  labs(x = "% of Population with Low Food Access",
       y = "% of Population with Low Income",
       title = "Low Food Access Data by County",
       caption = "Source: U.S. Department of Agriculture - Economic Research Service") +
  scale_color_brewer(palette = "PuRd") +
  theme_bw()

ggplotly(fa)

Source for region list code: https://stackoverflow.com/questions/46066974/add-column-to-label-u-s-states-by-their-u-s-census-region

Essay

This dataset contains information on populations that have low food access in every county in the U.S., organized by the resident’s proximity to a supermarket, grocery store, or other source of healthy food. They’re organized by their distance of living beyond a half mile, one mile, ten miles, and twenty miles from a healthy food source. For my project, I chose to focus on the ten-mile level. Since I was comparing that variable with the low income population count, I filtered out the columns to only include the total low access population count per county and low income pop. per county, both on the ten-mile level. Initially, I wanted to compare low access population with the group household data to find a connection, as those that live in group charters almost always have food provided to them. However, because the group housing data is calculated by households instead of individuals, my linear regression analysis became a bit inconsistent, so I switched my focus to low income.

My visualization is a scatterplot demonstrating the rates of low income + low access populations to the total low access population per county. To have the most data possible, I didn’t want to filter my data to a few states but I wanted to include region as a third variable, which I did by adding an additional column that assigns each county a region depending on which state it’s in. Through my graph, I noticed practically all the counties in the Northeast have very low rates of low food access, although they do have fewer counties in general. The Midwest and South were more spread out, but overall these variables have a positive linear relationship. In each county, the rates of low food access and low-income populations are very similar.