Final project data 110

Author

Ayan Elmi

Introduction

My data set contains information about US county’s ability to access supermarkets, supercenters, grocery stores, or other sources of healthy and affordable food. The name of the data set is food_acces.csv. The organization that collected the data is the United States Department of Agriculture’s Economic Research Service.

The data set additionally has information on measures of how individuals and neighborhoods are able to access food are based on the following indicators:
Accessibility to sources of healthy food, as measured by distance to a store or by the number of stores in an area. - Individual-level resources that may affect accessibility, such as family income or vehicle availability. - Neighborhood-level indicators of resources, such as the average income of the neighborhood and the availability of public transportation.

My data set has 3142 observations and 13 variables.

Background on data and why I choose it:

According to (World Bank Group, 2024), food security is defined when all people, at all times, have physical and economic access to sufficient safe and nutritious food that meets their dietary needs and food preferences for an active and healthy life. The reason why I choose this data is because food security is a real issue that affects millions people around the world and this data set observes, number of stores, family income, state, population vehicle transport all which affect food security which I found to be impactful.

World Bank Group. (2024). What is food security. In World Bank. https://www.worldbank.org/en/topic/agriculture/brief/food-security-update/what-is-food-security

Methodology:

The USDA’s (United States Department of Agriculture) methodology on their website is: They first collected geographic data from 2010 census and other sources not listed. Next, they used distance to stores, the number of stores and availability of transport to measure for accessibility. Futhermore, they aggregated the individual and community factors such as vehicle access and income. Lastly, they cleaned the data and standardized the values for accuracy.

(Food Access Research Atlas - About the Atlas | Economic Research Service, n.d.)

Variables used:

state county population low_access_numbers_children_1_mile low_access_numbers_people_1_mile low_access_numbers_seniors_1_mile low_access_numbers_low_income_people_1_mile vehicle_access_1_mile

Research question:

“How does the percentage of people with low access to supermarkets (within 1 mile) vary across U.S. Virginia, Maryland and Pensslyvania states?”

Loading library

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggfortify)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Setting working directory

setwd("~/Desktop/Data 110")
food_access<-read_csv("food_access (1).csv")

Checking the head and structure of the dataset

str(food_access)

spc_tbl_ [3,142 × 25] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ County                                       : chr [1:3142] "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
 $ Population                                   : num [1:3142] 54571 182265 27457 22915 57322 ...
 $ State                                        : chr [1:3142] "Alabama" "Alabama" "Alabama" "Alabama" ...
 $ Housing Data.Residing in Group Quarters      : num [1:3142] 455 2307 3193 2224 489 ...
 $ Housing Data.Total Housing Units             : num [1:3142] 20221 73180 9820 7953 21578 ...
 $ Vehicle Access.1 Mile                        : num [1:3142] 834 1653 545 312 752 ...
 $ Vehicle Access.1/2 Mile                      : num [1:3142] 1045 2178 742 441 822 ...
 $ Vehicle Access.10 Miles                      : num [1:3142] 222 32 201 0 0 80 23 0 12 1 ...
 $ Vehicle Access.20 Miles                      : num [1:3142] 0 0 0 0 0 0 0 0 0 0 ...
 $ Low Access Numbers.Children.1 Mile           : num [1:3142] 9973 30633 3701 4198 12575 ...
 $ Low Access Numbers.Children.1/2 Mile         : num [1:3142] 13281 38278 4943 4824 13583 ...
 $ Low Access Numbers.Children.10 Miles         : num [1:3142] 1199 516 791 90 0 ...
 $ Low Access Numbers.Children.20 Miles         : num [1:3142] 0 0 0 0 0 0 0 0 0 0 ...
 $ Low Access Numbers.Low Income People.1 Mile  : num [1:3142] 12067 38848 9290 6480 18188 ...
 $ Low Access Numbers.Low Income People.1/2 Mile: num [1:3142] 15518 48117 11901 8349 19806 ...
 $ Low Access Numbers.Low Income People.10 Miles: num [1:3142] 2307 846 2440 102 0 ...
 $ Low Access Numbers.Low Income People.20 Miles: num [1:3142] 0 0 0 0 0 0 0 0 0 0 ...
 $ Low Access Numbers.People.1 Mile             : num [1:3142] 37424 132442 19007 17560 50848 ...
 $ Low Access Numbers.People.1/2 Mile           : num [1:3142] 49497 165616 23762 20989 54933 ...
 $ Low Access Numbers.People.10 Miles           : num [1:3142] 5119 2308 4643 365 0 ...
 $ Low Access Numbers.People.20 Miles           : num [1:3142] 0 0 0 0 0 0 0 0 0 0 ...
 $ Low Access Numbers.Seniors.1 Mile            : num [1:3142] 4393 21828 2537 2262 7114 ...
 $ Low Access Numbers.Seniors.1/2 Mile          : num [1:3142] 5935 27241 3348 2630 7810 ...
 $ Low Access Numbers.Seniors.10 Miles          : num [1:3142] 707 390 629 72 0 488 259 0 144 12 ...
 $ Low Access Numbers.Seniors.20 Miles          : num [1:3142] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "spec")=
  .. cols(
  ..   County = col_character(),
  ..   Population = col_double(),
  ..   State = col_character(),
  ..   `Housing Data.Residing in Group Quarters` = col_double(),
  ..   `Housing Data.Total Housing Units` = col_double(),
  ..   `Vehicle Access.1 Mile` = col_double(),
  ..   `Vehicle Access.1/2 Mile` = col_double(),
  ..   `Vehicle Access.10 Miles` = col_double(),
  ..   `Vehicle Access.20 Miles` = col_double(),
  ..   `Low Access Numbers.Children.1 Mile` = col_double(),
  ..   `Low Access Numbers.Children.1/2 Mile` = col_double(),
  ..   `Low Access Numbers.Children.10 Miles` = col_double(),
  ..   `Low Access Numbers.Children.20 Miles` = col_double(),
  ..   `Low Access Numbers.Low Income People.1 Mile` = col_double(),
  ..   `Low Access Numbers.Low Income People.1/2 Mile` = col_double(),
  ..   `Low Access Numbers.Low Income People.10 Miles` = col_double(),
  ..   `Low Access Numbers.Low Income People.20 Miles` = col_double(),
  ..   `Low Access Numbers.People.1 Mile` = col_double(),
  ..   `Low Access Numbers.People.1/2 Mile` = col_double(),
  ..   `Low Access Numbers.People.10 Miles` = col_double(),
  ..   `Low Access Numbers.People.20 Miles` = col_double(),
  ..   `Low Access Numbers.Seniors.1 Mile` = col_double(),
  ..   `Low Access Numbers.Seniors.1/2 Mile` = col_double(),
  ..   `Low Access Numbers.Seniors.10 Miles` = col_double(),
  ..   `Low Access Numbers.Seniors.20 Miles` = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

head(food_access)

# A tibble: 6 × 25
  County         Population State  Housing Data.Residin…¹ Housing Data.Total H…²
  <chr>               <dbl> <chr>                   <dbl>                  <dbl>
1 Autauga County      54571 Alaba…                    455                  20221
2 Baldwin County     182265 Alaba…                   2307                  73180
3 Barbour County      27457 Alaba…                   3193                   9820
4 Bibb County         22915 Alaba…                   2224                   7953
5 Blount County       57322 Alaba…                    489                  21578
6 Bullock County      10914 Alaba…                   1690                   3745
# ℹ abbreviated names: ¹`Housing Data.Residing in Group Quarters`,
#   ²`Housing Data.Total Housing Units`
# ℹ 20 more variables: `Vehicle Access.1 Mile` <dbl>,
#   `Vehicle Access.1/2 Mile` <dbl>, `Vehicle Access.10 Miles` <dbl>,
#   `Vehicle Access.20 Miles` <dbl>,
#   `Low Access Numbers.Children.1 Mile` <dbl>,
#   `Low Access Numbers.Children.1/2 Mile` <dbl>, …

Cleaning the data

names(food_access) <- gsub("[(). \\-]", "_", names(food_access)) # replace ., (), space, with dash
names(food_access) <- gsub("_$", "", names(food_access))  # remove trailing underscore
names(food_access) <- tolower(names(food_access))         # lowercase

head(food_access) #verify

# A tibble: 6 × 25
  county         population state  housing_data_residin…¹ housing_data_total_h…²
  <chr>               <dbl> <chr>                   <dbl>                  <dbl>
1 Autauga County      54571 Alaba…                    455                  20221
2 Baldwin County     182265 Alaba…                   2307                  73180
3 Barbour County      27457 Alaba…                   3193                   9820
4 Bibb County         22915 Alaba…                   2224                   7953
5 Blount County       57322 Alaba…                    489                  21578
6 Bullock County      10914 Alaba…                   1690                   3745
# ℹ abbreviated names: ¹housing_data_residing_in_group_quarters,
#   ²housing_data_total_housing_units
# ℹ 20 more variables: vehicle_access_1_mile <dbl>,
#   `vehicle_access_1/2_mile` <dbl>, vehicle_access_10_miles <dbl>,
#   vehicle_access_20_miles <dbl>, low_access_numbers_children_1_mile <dbl>,
#   `low_access_numbers_children_1/2_mile` <dbl>,
#   low_access_numbers_children_10_miles <dbl>, …

Checking for Na’s

colSums(is.na(food_access))

                                       county 
                                            0 
                                   population 
                                            0 
                                        state 
                                            0 
      housing_data_residing_in_group_quarters 
                                            0 
             housing_data_total_housing_units 
                                            0 
                        vehicle_access_1_mile 
                                            0 
                      vehicle_access_1/2_mile 
                                            0 
                      vehicle_access_10_miles 
                                            0 
                      vehicle_access_20_miles 
                                            0 
           low_access_numbers_children_1_mile 
                                            0 
         low_access_numbers_children_1/2_mile 
                                            0 
         low_access_numbers_children_10_miles 
                                            0 
         low_access_numbers_children_20_miles 
                                            0 
  low_access_numbers_low_income_people_1_mile 
                                            0 
low_access_numbers_low_income_people_1/2_mile 
                                            0 
low_access_numbers_low_income_people_10_miles 
                                            0 
low_access_numbers_low_income_people_20_miles 
                                            0 
             low_access_numbers_people_1_mile 
                                            0 
           low_access_numbers_people_1/2_mile 
                                            0 
           low_access_numbers_people_10_miles 
                                            0 
           low_access_numbers_people_20_miles 
                                            0 
            low_access_numbers_seniors_1_mile 
                                            0 
          low_access_numbers_seniors_1/2_mile 
                                            0 
          low_access_numbers_seniors_10_miles 
                                            0 
          low_access_numbers_seniors_20_miles 
                                            0

Checking how many states are in the data set

unique(food_access$state)

 [1] "Alabama"              "Alaska"               "Arizona"             
 [4] "Arkansas"             "California"           "Colorado"            
 [7] "Connecticut"          "Delaware"             "District of Columbia"
[10] "Florida"              "Georgia"              "Hawaii"              
[13] "Idaho"                "Illinois"             "Indiana"             
[16] "Iowa"                 "Kansas"               "Kentucky"            
[19] "Louisiana"            "Maine"                "Maryland"            
[22] "Massachusetts"        "Michigan"             "Minnesota"           
[25] "Mississippi"          "Missouri"             "Montana"             
[28] "Nebraska"             "Nevada"               "New Hampshire"       
[31] "New Jersey"           "New Mexico"           "New York"            
[34] "North Carolina"       "North Dakota"         "Ohio"                
[37] "Oklahoma"             "Oregon"               "Pennsylvania"        
[40] "Rhode Island"         "South Carolina"       "South Dakota"        
[43] "Tennessee"            "Texas"                "Utah"                
[46] "Vermont"              "Virginia"             "Washington"          
[49] "West Virginia"        "Wisconsin"            "Wyoming"

Selecting possible predictors for the model

food_access_2<-food_access |>
  select (state,county, population, low_access_numbers_children_1_mile, low_access_numbers_people_1_mile, low_access_numbers_seniors_1_mile, low_access_numbers_low_income_people_1_mile,vehicle_access_1_mile)
head(food_access_2)

# A tibble: 6 × 8
  state   county        population low_access_numbers_c…¹ low_access_numbers_p…²
  <chr>   <chr>              <dbl>                  <dbl>                  <dbl>
1 Alabama Autauga Coun…      54571                   9973                  37424
2 Alabama Baldwin Coun…     182265                  30633                 132442
3 Alabama Barbour Coun…      27457                   3701                  19007
4 Alabama Bibb County        22915                   4198                  17560
5 Alabama Blount County      57322                  12575                  50848
6 Alabama Bullock Coun…      10914                   1872                   8709
# ℹ abbreviated names: ¹low_access_numbers_children_1_mile,
#   ²low_access_numbers_people_1_mile
# ℹ 3 more variables: low_access_numbers_seniors_1_mile <dbl>,
#   low_access_numbers_low_income_people_1_mile <dbl>,
#   vehicle_access_1_mile <dbl>

Reason for the predictors chosen

In order to explore my potential predictors, I used a correlation heatmap of all numeric variables. The heatmap displayed a very strong correlation between, low_access_numbers_children_1_mile, low_access_numbers_seniors_1_mile and low_access_numbers_people_1_2_mile and since they are the same variable calculated at different distances and age groups it would be redundant and show co linearity they were not good predictors for my final model. The predictors were selected because they had the highest correlation to low access to supermarket (1 mile) and were exploring different aspects such as population demographics, transport and economic disadvantages. Based on my exploration, population, vehicle_access_1_mile, low_access_numbers_low_income_people_1_mile were the best fitted predictors for low_access_numbers_people_1_mile.

Multiple Regression Model

model<- lm(  low_access_numbers_people_1_mile  ~ population + vehicle_access_1_mile+ low_access_numbers_low_income_people_1_mile  , data = food_access_2)
autoplot(model, 1:4,nrow=2,ncol=2) ##Got this from correlation scatter plots and regressions tutorial, to see the diagnostic plots

Warning: `fortify(<lm>)` was deprecated in ggplot2 3.6.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.

Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
  Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.

Summary of model

summary(model)


Call:
lm(formula = low_access_numbers_people_1_mile ~ population + 
    vehicle_access_1_mile + low_access_numbers_low_income_people_1_mile, 
    data = food_access_2)

Residuals:
    Min      1Q  Median      3Q     Max 
-198059   -5256   -1704     849  174853 

Coefficients:
                                             Estimate Std. Error t value
(Intercept)                                 1.680e+03  4.440e+02   3.784
population                                  4.624e-02  1.630e-03  28.373
vehicle_access_1_mile                       1.687e+01  7.153e-01  23.592
low_access_numbers_low_income_people_1_mile 1.937e+00  4.618e-02  41.953
                                            Pr(>|t|)    
(Intercept)                                 0.000157 ***
population                                   < 2e-16 ***
vehicle_access_1_mile                        < 2e-16 ***
low_access_numbers_low_income_people_1_mile  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20590 on 3138 degrees of freedom
Multiple R-squared:  0.899, Adjusted R-squared:  0.8989 
F-statistic:  9314 on 3 and 3138 DF,  p-value: < 2.2e-16

Linear regression analysis

This regression model predicts for the number of people in a county that are living 1 mile away from a supermarket based on the predictors population size,vehicle availability, and the number of low-income residents that have limited food access.

Model equation

The model equation for this multiple linear regression is as follows: y= a+b1x1+b2x2+ b3x3

where: a= intercept y= low_access_numbers_people_1_mile b1= effect of population b2= effect of vehicle access(1 mile) b3= effect of low-income population with low food access (1 mile)

Fitted equation low_access_numbers_people_1_mile= 1680+ 0.046(population)+16.87(vehichle acces (1 mile))+ 1.94 (low-income population with low food access)

Interpretation This means that: The number of people with low food access increase by about 0.046 for each additionally person in a county this a small effect per person but meaningful overall. The number of people with low food access increases as people without vehicle access increases. Low income population has a moderate effect on people with low food access, number of people with low food access increases as the low income population increases.

P-value and adjusted R^2 value

The three predictors were all very statistically significant with p value= < 2e-16. Thus, showing strong evidence that vehicle access, population and income are strongly associated with number of people with low food access.

The Adjusted R-squared is 0.8989 for this linear regression model. This means that the model explains 89.89% of the variation in food access across the country. The model overall is statistically significant at p value =< 2.2e-16.

Diagnostic plots

Residuals vs fitted plot : the blue line is slightly bent downwards, the residuals are centered around 0 however the wider spread of residuals are around the counties with large fitted values suggesting Heteroscedasticity in extreme high fitted residual counties.

Normal Q-Q plot : Majority of the points on the graph follow the diagonal line, meaning the residuals are approximately normal, but there are a few outliers on both tails (such as observations 205, 611,and 2681) that deviate from normality.

Scale-Location plot : The blue line upwards shows an increase in variance and the spread of residuals increases with fitted values, indicating that there is mild heteroscedasticity.

Cook’s Distance : some of the few points that are influential and could affect the model results are (205,611, and 2631) indicating counties that strongly affect the model but dont invalidate the entire model.

Source of information:

https://rpubs.com/rsaidi/950425

Final visualization 1:

food_access_filtered <-food_access_2 |>
  filter(population>= 10000) |>
  filter(state %in% c("Virginia", "Maryland", "Pennsylvania")) |>
  select(county, state, population, vehicle_access_1_mile,low_access_numbers_people_1_mile, low_access_numbers_low_income_people_1_mile)      # I choose 10,000 since I didnt want small counties skewing results
food_access_filtered

# A tibble: 204 × 6
   county          state population vehicle_access_1_mile low_access_numbers_p…¹
   <chr>           <chr>      <dbl>                 <dbl>                  <dbl>
 1 Allegany County Mary…      75087                  1051                  46218
 2 Anne Arundel C… Mary…     537656                  2276                 209639
 3 Baltimore Coun… Mary…     805029                  4094                 228432
 4 Baltimore city  Mary…     620961                  2867                  28080
 5 Calvert County  Mary…      88737                   583                  71158
 6 Caroline County Mary…      33066                   456                  25269
 7 Carroll County  Mary…     167134                  1320                 108830
 8 Cecil County    Mary…     101108                  1033                  73410
 9 Charles County  Mary…     146551                   850                  94730
10 Dorchester Cou… Mary…      32618                   543                  22745
# ℹ 194 more rows
# ℹ abbreviated name: ¹low_access_numbers_people_1_mile
# ℹ 1 more variable: low_access_numbers_low_income_people_1_mile <dbl>

Making percent of low access

food_access_filtered<-food_access_filtered |>
  mutate(
    percent_low_access= (low_access_numbers_low_income_people_1_mile/population)*100) # I choose percents since its gives a fairer comparison
food_access_filtered

# A tibble: 204 × 7
   county          state population vehicle_access_1_mile low_access_numbers_p…¹
   <chr>           <chr>      <dbl>                 <dbl>                  <dbl>
 1 Allegany County Mary…      75087                  1051                  46218
 2 Anne Arundel C… Mary…     537656                  2276                 209639
 3 Baltimore Coun… Mary…     805029                  4094                 228432
 4 Baltimore city  Mary…     620961                  2867                  28080
 5 Calvert County  Mary…      88737                   583                  71158
 6 Caroline County Mary…      33066                   456                  25269
 7 Carroll County  Mary…     167134                  1320                 108830
 8 Cecil County    Mary…     101108                  1033                  73410
 9 Charles County  Mary…     146551                   850                  94730
10 Dorchester Cou… Mary…      32618                   543                  22745
# ℹ 194 more rows
# ℹ abbreviated name: ¹low_access_numbers_people_1_mile
# ℹ 2 more variables: low_access_numbers_low_income_people_1_mile <dbl>,
#   percent_low_access <dbl>

Final visualization: static

food_access_plot <- food_access_filtered |>
  filter(
    vehicle_access_1_mile > 0,
    low_access_numbers_people_1_mile > 0
  )

options (scipen= 999) # to remove significant figures
final_plot_static <-ggplot (food_access_filtered, aes(x= vehicle_access_1_mile, y= low_access_numbers_people_1_mile, color = percent_low_access,
                  )) +

   geom_point(alpha=0.6) +
  scale_x_log10()+
  scale_y_log10()+
  labs(x= "Households without vehicle beyond 1 mile(log-scale)", y= "People with low food access 1 mile (log-scale per 100)",
       title= "    Vehicle access vs low food access Mid-atlantic (log scale)",
       caption= "Source:United States Department of Agriculture’s Economic Research Service.", color= "% Low Access", )+
   facet_wrap(~state)+
  scale_colour_viridis_c()+
theme_bw()+
  theme(plot.title = element_text(face="bold",size=14,hjust=0.5,family = "mono"),
        axis.text.x = element_text(size = 9),
        axis.text.y = element_text(size = 9))

final_plot_static

Final visualization: using plotly

options (scipen= 999) # to remove significant figures
final_plot <-ggplot (food_access_filtered, aes(x= vehicle_access_1_mile, y= low_access_numbers_people_1_mile, color = percent_low_access,
                  text= paste( 
                    "County:",county, "\n",
                    "People with low food access:", low_access_numbers_people_1_mile, "\n",
                     "No vehicle household:",vehicle_access_1_mile, "\n",
                     "low access:",round(percent_low_access, 1)))) +

   geom_point(alpha=0.8) +
  scale_x_log10()+
  scale_y_log10()+
  labs(x= "Households without vehicle beyond 1 mile(log-scale)", y= "People with low food access 1 mile (log-scale per 100)",
       title= "Vehicle access vs low food access Mid-atlantic (log scale)",
       caption= "Source:United States Department of Agriculture’s Economic Research Service.", color= "% Low Access", )+
   facet_wrap(~state)+
  scale_colour_viridis_c()+ #https://ggplot2.tidyverse.org/reference/scale_viridis.html found the colour theme using the reference.
theme_bw()+
  theme(plot.title = element_text(face="bold",size=14,hjust=0.5,family = "mono"),
        axis.text.x = element_text(size = 9),
        axis.text.y = element_text(size = 9),
        legend.title = element_blank()) 

final_plot<- ggplotly(final_plot, tooltip="text")

Warning in scale_x_log10(): log-10 transformation introduced infinite values.

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

final_plot

Visualization write up

Grouping by state

state_summary <- food_access_filtered |>
  group_by(state) |>
  summarize(
    avg_pct_low_access = mean(percent_low_access, na.rm = TRUE)
  )
state_summary

# A tibble: 3 × 2
  state        avg_pct_low_access
  <chr>                     <dbl>
1 Maryland                   12.0
2 Pennsylvania               16.8
3 Virginia                   19.6

Final visualization 2

library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

Highcharts (www.highcharts.com) is a Highsoft software product which is

not free for commercial and Governmental use

highchart() |>
  hc_chart(type = "column") |>
  hc_title(text = "Average Percent of Low Food Access by State") |>
  hc_xAxis(
    categories = state_summary$state,
    title = list(text = "State")
  ) |>
  hc_yAxis(
    title = list(text = "Average Percent Low Food Access")
  ) |>
  hc_add_series(
    name = "Average %",
    data = as.numeric(state_summary$avg_pct_low_access), #had to convert to numeric for highcharter.
    colorByPoint = TRUE) |>
   hc_colors(c("#4C72B0", "#DD8452", "#55A868")) |># added colour 
  hc_tooltip(pointFormat = "{point.y:.1f}%") |>
  hc_caption(text = "Source: USDA Economic Research Service") |>
  hc_add_theme(hc_theme_538())

Final visualization 2:static

ggplot(state_summary, aes(x = state, y = avg_pct_low_access, fill = state)) +
  geom_col(width = 0.7) +
  labs(
    title = "Average Percent of Low Food Access by State",
    x = "State",
    y = "Average Percent of Population with Low Food Access",
    fill = "State",
    caption = "Source: USDA Economic Research Service"
  ) +
  scale_fill_brewer(palette = "Set2") +
  theme_bw() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "right"
  )

Making a column with all the different food low access groups

demo_long <- food_access_2 |>
  select(
    state,
    county,
    vehicle_access_1_mile,
    low_access_numbers_children_1_mile,
    low_access_numbers_seniors_1_mile,
    low_access_numbers_low_income_people_1_mile
  ) |>
  pivot_longer(
    cols = starts_with("low_access_numbers"),
    names_to = "group",
    values_to = "low_access_count"
  )
demo_long

# A tibble: 9,426 × 5
   state   county         vehicle_access_1_mile group           low_access_count
   <chr>   <chr>                          <dbl> <chr>                      <dbl>
 1 Alabama Autauga County                   834 low_access_num…             9973
 2 Alabama Autauga County                   834 low_access_num…             4393
 3 Alabama Autauga County                   834 low_access_num…            12067
 4 Alabama Baldwin County                  1653 low_access_num…            30633
 5 Alabama Baldwin County                  1653 low_access_num…            21828
 6 Alabama Baldwin County                  1653 low_access_num…            38848
 7 Alabama Barbour County                   545 low_access_num…             3701
 8 Alabama Barbour County                   545 low_access_num…             2537
 9 Alabama Barbour County                   545 low_access_num…             9290
10 Alabama Bibb County                      312 low_access_num…             4198
# ℹ 9,416 more rows

Visualization 3:

** I used J Amaya’s tutorial**

highchart() |>
  hc_add_series(
    data = demo_long,
    type = "bubble",  
    hcaes(
      x = vehicle_access_1_mile,
      y = low_access_count,
      group = group,
      size = low_access_count
    )
  ) |>
  hc_title(text = "Vehicle Access and Low Food Access by Demographic Group (1 Mile)") |>
  hc_xAxis(title = list(text = "Households without vehicles beyond 1 mile")) |>
  hc_yAxis(title = list(text = "Population with low food access (1 mile)")) |>
  hc_colors(c("#8CE4FF", "#F08787", "#5D866C")) |>
  hc_tooltip(
    shared = TRUE,
    pointFormat = paste(
      "Group: {point.group}<br>",
      "Vehicle access: {point.x}<br>",
      "Low food access: {point.y}"
    )
  ) |>
  hc_add_theme(hc_theme_538())

Visualization write up’s

or tablaeu

##Conclusion