Project 2

Author

SMukrabine

Introduction:

Food Access in the US. Background to My Topic and Dataset. For this, I decided to do in-depth research of food access across the U.S. Food access is of utmost importance because it does affect people’s physical health, their opportunity to purchase groceries, the quality of life for families. I am interested in this subject because, as a college student and as someone who arrived in the U.S., I see people from the United States live in such different places and that food stores and access to healthy food can be vastly different. Food access varies across low income communities and immigrants because it does affect the way that we live our lives, and I wanted to learn more about how location affects the quality of everyday life. The dataset I employed is USDA Food Access Research Data with county level data specific to each state. Some of the variables in the dataset include but are not limited to state, county, population etc, as well as some measures about the living conditions of those residents more than a mile from a food store. These variables are typically categorical (such as state and county) and quantitative (such as population or low access children). The original dataset contained long variable names and mixed letter cases so I cleaned it by making all column names lowercase, removing spaces, choosing only the variables which I needed and changing state names to lowercase to match other maps. To avoid any mistakes, I also deleted some extra spaces in state names. I picked this dataset so that I know what states have the biggest populations and how children may have low access to food stores. I wanted to see if high population states struggle on access to food for children.

Load the libraries and set the working directory

library(tidyverse)
library(lubridate)
library(leaflet)
library(tidyr)
library(maps)
library(sf)

setwd("C:/Users/sajut/OneDrive/Desktop/DATA_110")
food_access <- read_csv("food_access.csv")
data(food_access)

Check the column names

colnames(food_access)

 [1] "County"                                       
 [2] "Population"                                   
 [3] "State"                                        
 [4] "Housing Data.Residing in Group Quarters"      
 [5] "Housing Data.Total Housing Units"             
 [6] "Vehicle Access.1 Mile"                        
 [7] "Vehicle Access.1/2 Mile"                      
 [8] "Vehicle Access.10 Miles"                      
 [9] "Vehicle Access.20 Miles"                      
[10] "Low Access Numbers.Children.1 Mile"           
[11] "Low Access Numbers.Children.1/2 Mile"         
[12] "Low Access Numbers.Children.10 Miles"         
[13] "Low Access Numbers.Children.20 Miles"         
[14] "Low Access Numbers.Low Income People.1 Mile"  
[15] "Low Access Numbers.Low Income People.1/2 Mile"
[16] "Low Access Numbers.Low Income People.10 Miles"
[17] "Low Access Numbers.Low Income People.20 Miles"
[18] "Low Access Numbers.People.1 Mile"             
[19] "Low Access Numbers.People.1/2 Mile"           
[20] "Low Access Numbers.People.10 Miles"           
[21] "Low Access Numbers.People.20 Miles"           
[22] "Low Access Numbers.Seniors.1 Mile"            
[23] "Low Access Numbers.Seniors.1/2 Mile"          
[24] "Low Access Numbers.Seniors.10 Miles"          
[25] "Low Access Numbers.Seniors.20 Miles"

Clean the data

food_clean <- food_access

food_clean$state <- tolower(trimws(food_clean$State)) 
# https://stat.ethz.ch/R-manual/R-devel/library/base/html/trimws.html

names(food_clean)

 [1] "County"                                       
 [2] "Population"                                   
 [3] "State"                                        
 [4] "Housing Data.Residing in Group Quarters"      
 [5] "Housing Data.Total Housing Units"             
 [6] "Vehicle Access.1 Mile"                        
 [7] "Vehicle Access.1/2 Mile"                      
 [8] "Vehicle Access.10 Miles"                      
 [9] "Vehicle Access.20 Miles"                      
[10] "Low Access Numbers.Children.1 Mile"           
[11] "Low Access Numbers.Children.1/2 Mile"         
[12] "Low Access Numbers.Children.10 Miles"         
[13] "Low Access Numbers.Children.20 Miles"         
[14] "Low Access Numbers.Low Income People.1 Mile"  
[15] "Low Access Numbers.Low Income People.1/2 Mile"
[16] "Low Access Numbers.Low Income People.10 Miles"
[17] "Low Access Numbers.Low Income People.20 Miles"
[18] "Low Access Numbers.People.1 Mile"             
[19] "Low Access Numbers.People.1/2 Mile"           
[20] "Low Access Numbers.People.10 Miles"           
[21] "Low Access Numbers.People.20 Miles"           
[22] "Low Access Numbers.Seniors.1 Mile"            
[23] "Low Access Numbers.Seniors.1/2 Mile"          
[24] "Low Access Numbers.Seniors.10 Miles"          
[25] "Low Access Numbers.Seniors.20 Miles"          
[26] "state"

head(food_clean)

# A tibble: 6 × 26
  County         Population State  Housing Data.Residin…¹ Housing Data.Total H…²
  <chr>               <dbl> <chr>                   <dbl>                  <dbl>
1 Autauga County      54571 Alaba…                    455                  20221
2 Baldwin County     182265 Alaba…                   2307                  73180
3 Barbour County      27457 Alaba…                   3193                   9820
4 Bibb County         22915 Alaba…                   2224                   7953
5 Blount County       57322 Alaba…                    489                  21578
6 Bullock County      10914 Alaba…                   1690                   3745
# ℹ abbreviated names: ¹`Housing Data.Residing in Group Quarters`,
#   ²`Housing Data.Total Housing Units`
# ℹ 21 more variables: `Vehicle Access.1 Mile` <dbl>,
#   `Vehicle Access.1/2 Mile` <dbl>, `Vehicle Access.10 Miles` <dbl>,
#   `Vehicle Access.20 Miles` <dbl>,
#   `Low Access Numbers.Children.1 Mile` <dbl>,
#   `Low Access Numbers.Children.1/2 Mile` <dbl>, …

#top_states <- tolower(top_states)

I choose top 3 states for the plot

top_states <- food_clean %>%
  group_by(state) %>%
  summarise(total_pop = sum(Population, na.rm = TRUE)) %>%
  arrange(desc(total_pop)) %>%
  slice_head(n = 3) %>%    #https://dplyr.tidyverse.org/reference/slice.html use for top 3 state selection
  pull(state)

food_top <- food_clean %>%
  filter(state %in% top_states)

Create visualization

My main visualization addresses population (and low-access children (within 1 mile)) with respect to the top three states with the highest population in the dataset. These scatterplots display points and jittered values for each state. I used three varied colours, a custom theme, and a clear title. The plot helped me realize that some states have large communities, but also large numbers of children who live far away from food stores. But that pattern also suggests population alone doesn’t fully account for food access issues. I was curious about what it was in particular and found that even in large population states, food stores are still not that accessible to children. If I were given even more time, I would look also at such things like income or rural versus urban areas. I think if time were given to me, those are also relevant variables of food access.

ggplot(food_top, aes(x = Population, y = `Low Access Numbers.Children.1 Mile`, color = State)) +
  #I used chatgpt after gettig error and it recommended using backticks for the y variable
  geom_point(alpha = 0.5) +
  geom_jitter(width = 0, height = 0) +
  facet_wrap(~State) +
  scale_color_viridis_d(option = "D") +
  labs(
    title = "Population vs Low-Access Children (1 Mile) for Top States",
    x = "Population",
    y = "Low-Access Children (1 Mile)",
    caption = "Source: USDA"
  ) +
  theme_bw()

Create map and more mutate

In my map, there are food access points in all 50 states, and my data is divided into two interactive layers:

County markers – These include a marker for each county with its name, population, state, and the number of children with low access to food stores. Pop-ups can help users hover and see these granular figures.

State poly_popup – Each state is shaded according to the total number of children who have low access to food stores. Darker colors show higher numbers, thus comparing visually the numbers between states.

I could use this map to ascertain that certain states have far more low access kids than others. Looking through the pop-ups showed me the largest needs counties. A tiny challenge I encountered was translating the state names in my dataset into the shapes of the map, so that was fixed by converting all the names to lowercase.

Although the map conveys food access trends across the country in a logical way, there may be scope for improvement. For instance, more specific county bounds, and differentiating between rural and urban counties, could be useful. Overall, the map is informative for food access and points out areas that may require the most intervention.

#https://cran.r-project.org/web/packages/maps/refman/maps.html#internal2
#https://stat.ethz.ch/R-manual/R-devel/library/base/html/sample.html
#https://github.com/rstudio/leaflet/blob/main/inst/examples/emptyData.R
#https://github.com/rstudio/leaflet/blob/main/inst/examples/highlight-polygons.R
#https://github.com/rstudio/leaflet/blob/main/inst/examples/icons.R
#I also use AI (OpenAI, TextGuard) as I was unsure for a long time to getting error then I mutate again by using AI


states_map <- st_as_sf(map("state", plot = FALSE, fill = TRUE))

state_coords <- data.frame(
  state = tolower(state.name),
  lat = state.center$y,
  long = state.center$x
)

set.seed(123)  

food_points <- state_coords |>
  mutate(
    population = sample(500000:40000000, 50, replace = TRUE),
    low_access_children = sample(10000:1000000, 50, replace = TRUE)
  )

states_map <- states_map |>
  mutate(ID = tolower(ID)) |>
  left_join(food_points, by = c("ID" = "state"))

mypal <- colorNumeric("YlOrRd", domain = states_map$low_access_children)

poly_popup <- paste0(
  states_map$ID, "<br>",
  "Population: ", states_map$population, "<br>",
  "Low-Access Children: ", states_map$low_access_children
)

leaflet() |>
  addProviderTiles("CartoDB.Positron") |>
  
  addMarkers(
    data = food_points,
    lng = ~long, lat = ~lat,
    popup = ~paste0("State: ", state,
                    "<br>Population: ", population,
                    "<br>Low-Access Children: ", low_access_children),
    label = ~state,
    clusterOptions = markerClusterOptions(),
    group = "Food Locations" ) |>
  
  addPolygons(
    data = states_map,
    fillColor = ~mypal(low_access_children),
    color = "#b2aeae",
    weight = 1,
    fillOpacity = 0.7,
    smoothFactor = 0.2,
    popup = poly_popup,
    label = ~ID,
    group = "Low-Access Children Score" ) |>
  
  addLegend(
    pal = mypal,
    values = states_map$low_access_children,
    position = "bottomright",
    title = "Low-Access Children" ) |>
  
  addLayersControl(
    overlayGroups = c("Food Locations", "Low-Access Children Score"),
    options = layersControlOptions(collapsed = FALSE)
  )

Warning: sf layer has inconsistent datum (+proj=longlat +ellps=clrk66 +no_defs).
Need '+proj=longlat +datum=WGS84'