The Data

For this final homework assignment, I’ll be combining the three datasets I’ll be utilizing in my final project.

  1. Missoula Food Retailers. Since I created this dataset with this analysis in mind, this one will be nice and easy to work with from the start.
  2. Population. I retrieved population data for each zip code in Missoula from the U.S. Census Bureau. Since there are only five zip codes in Missoula, this is actually a very simple table.
  3. Survey responses from the 2021 Rapid Community Food Assessment, conducted by the Missoula City-County Food Policy Advisory Board.

I’ll be using the first two datasets to calculate the density of local food retailers (farm stands/stores and farmers markets). The third dataset will tell me the percentage of people who primarily source their food directly from local farms and I’ll be combining the three datasets to see if those two variables are correlated.

# set wd
setwd("~/Documents/R Projects/Data")

# read data
foodRetailers <- read_excel("Missoula Food System.xlsx")

# view
foodRetailers
## # A tibble: 26 × 6
##    Category         Name          Neighborhood    Ward Address          Duration
##    <chr>            <chr>         <chr>          <dbl> <chr>            <chr>   
##  1 farm stand/store Clark Fork O… <NA>              NA 542 Tower St, M… summer  
##  2 farm stand/store Frank's Litt… <NA>              NA 203 N Curtis St… summer  
##  3 farm stand/store Oxbow F2M     <NA>              NA 6215 Haugan Dr,… year-ro…
##  4 farm stand/store Turner Farms  <NA>              NA 3515 S 3rd St W… year-ro…
##  5 farmers market   Clark Fork M… <NA>              NA 101 Carousel Dr… summer  
##  6 farmers market   Missoula Far… Heart of Miss…    NA 534 N Higgins A… summer  
##  7 farmers market   Missoula Val… <NA>              NA 2901 Brooks St,… winter  
##  8 farmers market   Orchard Home… <NA>              NA 2537 S 3rd St W… summer  
##  9 farmers market   Target Range… <NA>              NA 4095 South Ave … summer  
## 10 grocery store    Albertsons    Heart of Miss…     1 1003 E Broadway… year-ro…
## # … with 16 more rows

I won’t be working with all of the variables in the dataset so I’ll start by creating a subset with just the variables I’m interested in.

# create subset as datatable
foodRetailersDT <- foodRetailers %>%
  subset(select = c("Category", "Name", "Address", "Duration"))

# view
foodRetailersDT
## # A tibble: 26 × 4
##    Category         Name                          Address               Duration
##    <chr>            <chr>                         <chr>                 <chr>   
##  1 farm stand/store Clark Fork Organics           542 Tower St, Missou… summer  
##  2 farm stand/store Frank's Little Farm           203 N Curtis St, Mis… summer  
##  3 farm stand/store Oxbow F2M                     6215 Haugan Dr, Miss… year-ro…
##  4 farm stand/store Turner Farms                  3515 S 3rd St W, Mis… year-ro…
##  5 farmers market   Clark Fork Market             101 Carousel Dr, Mis… summer  
##  6 farmers market   Missoula Farmers Market       534 N Higgins Ave, M… summer  
##  7 farmers market   Missoula Valley Winter Market 2901 Brooks St, Miss… winter  
##  8 farmers market   Orchard Homes Farmers Market  2537 S 3rd St W, Mis… summer  
##  9 farmers market   Target Range Farmers Market   4095 South Ave W, Mi… summer  
## 10 grocery store    Albertsons                    1003 E Broadway St, … year-ro…
## # … with 16 more rows
# calculate number in summer
localSummer <- foodRetailers %>%
  separate(Address, c("Street Address", "Zip"), sep = -5) %>%
  filter(Category == "farm stand/store" | Category == "farmers market" & Duration == "summer") %>%
  count(Zip) %>%
  rename("Summer" = n)

# calculate number in winter
localWinter <- foodRetailers %>%
  separate(Address, c("Street Address", "Zip"), sep = -5) %>%
  filter(Category == "farm stand/store" | Category == "farmers market" & Duration == "winter") %>%
  count(Zip) %>%
  rename("Winter" = n)

# join
local <- full_join(localSummer, localWinter, by = "Zip")

# view
local
## # A tibble: 4 × 3
##   Zip   Summer Winter
##   <chr>  <int>  <int>
## 1 59801      1      2
## 2 59802      2     NA
## 3 59803      1      1
## 4 59804      4      2

Now I can read the population data and join the two datasets together. This will allow me to calculate the density per 1,000 people, which is the standard measure of grocery store density.

# set wd
setwd("~/Documents/R Projects/Data")

# read data
zipPop <- read_excel("Decennial 2010 Pop by Zip.xlsx", sheet = "Data") %>%
  rename("59801" = "ZCTA5 59801",
         "59802" = "ZCTA5 59802",
         "59803" = "ZCTA5 59803",
         "59804" = "ZCTA5 59804",
         "59808" = "ZCTA5 59808") %>%
 pivot_longer(
    cols = c(`59801`, `59802`, `59803`, `59804`, `59808`),
    names_to = "Zip",
    values_to = "Total"
  ) %>%
  rename("Population" = "Total") %>%
  subset(select = c("Zip", "Population"))

# join local food retailer data with population data
zipDT <- full_join(local, zipPop, by = "Zip")

# calculate density per 1000 people for each zip code
zipDensityDT <- zipDT %>%
  mutate("Summer Density" = round(Summer/(Population/1000), digits = 2),
         "Winter Density" = round(Winter/(Population/1000), digits = 2))

# view
zipDensityDT
## # A tibble: 5 × 6
##   Zip   Summer Winter Population `Summer Density` `Winter Density`
##   <chr>  <int>  <int>      <dbl>            <dbl>            <dbl>
## 1 59801      1      2      30940             0.03             0.06
## 2 59802      2     NA      18407             0.11            NA   
## 3 59803      1      1      16203             0.06             0.06
## 4 59804      4      2       7533             0.53             0.27
## 5 59808     NA     NA      17904            NA               NA
# set wd
setwd("~/Documents/R Projects/Data/Rapid CFA data")

# read data
CFA <- read_excel("FPAB Community Food Assessment Survey Data.xlsx", sheet = "Raw") %>%
  rename("Zip" = "Zip Code",
         "Identify As" = 14,
         "Primary Source in Summer" = 31,
         "Primary Source in Winter" = 39) %>%
  select(c("Zip",
           "Identify As",
           "Primary Source in Summer",
           "Primary Source in Winter")) %>%
  filter(`Identify As` == "Consumer", `Zip` == 59801 | `Zip` == 59802 | `Zip` == 59803 | `Zip` == 59804 | `Zip` == 59808) %>%
  replace_na(list("Primary Source in Summer" = "Other", "Primary Source in Winter" = "Other")) %>%
  select(c("Zip", "Primary Source in Summer", "Primary Source in Winter"))

# use table to get counts
summerSource <- CFA %>%
  group_by(`Zip`) %>% # group by zip code
  select("Primary Source in Summer") %>% # select specific column
  table() %>%
  as.data.frame.matrix(row.names = FALSE) # coerce back to dataframe

# create vector of zip codes
Zip <- c("59801", "59802", "59803", "59804", "59808")

# bind zip code vector to front of dataframe
summerSource <- cbind(Zip, summerSource)

# mutate to get percentage of people in each zip code who primarily buy from local farmers in the summer
summerSource %>%
  mutate("Total" = `Direct from local farmers` + `Other`) %>%
  mutate("Percent Local" = round(`Direct from local farmers`/`Total` * 100)) %>%
  select(c("Zip", "Percent Local")) # use select to effectively remove unnecessary columns
##     Zip Percent Local
## 1 59801            64
## 2 59802            69
## 3 59803            56
## 4 59804            83
## 5 59808            48
# join density and percent data
seasonFull <- full_join(zipDensityDT, summerSource, by = "Zip")

# create subset
summerDT <- seasonFull %>%
  select(c("Zip", "Summer Density", "Direct from local farmers")) %>%
  rename("Density" = "Summer Density",
         "Percent" = "Direct from local farmers") %>%
  replace_na(list("Density" = 0))

# view
summerDT
## # A tibble: 5 × 3
##   Zip   Density Percent
##   <chr>   <dbl>   <int>
## 1 59801    0.03      84
## 2 59802    0.11      74
## 3 59803    0.06      19
## 4 59804    0.53      19
## 5 59808    0         14

Analysis

Now that I have everything in one place, I can use the cor() function to see if the two variables are correlated.

cor(summerDT$Density, summerDT$Percent)
## [1] -0.2956464

I can also get a sense of the range of each variable.

summarize(summerDT, label = min(Density))
## # A tibble: 1 × 1
##   label
##   <dbl>
## 1     0
summarize(summerDT, label = max(Density))
## # A tibble: 1 × 1
##   label
##   <dbl>
## 1  0.53
summarize(summerDT, label = min(Percent))
## # A tibble: 1 × 1
##   label
##   <int>
## 1    14
summarize(summerDT, label = max(Percent))
## # A tibble: 1 × 1
##   label
##   <int>
## 1    84

For my final project, I’ll play around with visualizing the above and discussing the implications.