For this final homework assignment, I’ll be combining the three datasets I’ll be utilizing in my final project.
I’ll be using the first two datasets to calculate the density of local food retailers (farm stands/stores and farmers markets). The third dataset will tell me the percentage of people who primarily source their food directly from local farms and I’ll be combining the three datasets to see if those two variables are correlated.
# set wd
setwd("~/Documents/R Projects/Data")
# read data
foodRetailers <- read_excel("Missoula Food System.xlsx")
# view
foodRetailers
## # A tibble: 26 × 6
## Category Name Neighborhood Ward Address Duration
## <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 farm stand/store Clark Fork O… <NA> NA 542 Tower St, M… summer
## 2 farm stand/store Frank's Litt… <NA> NA 203 N Curtis St… summer
## 3 farm stand/store Oxbow F2M <NA> NA 6215 Haugan Dr,… year-ro…
## 4 farm stand/store Turner Farms <NA> NA 3515 S 3rd St W… year-ro…
## 5 farmers market Clark Fork M… <NA> NA 101 Carousel Dr… summer
## 6 farmers market Missoula Far… Heart of Miss… NA 534 N Higgins A… summer
## 7 farmers market Missoula Val… <NA> NA 2901 Brooks St,… winter
## 8 farmers market Orchard Home… <NA> NA 2537 S 3rd St W… summer
## 9 farmers market Target Range… <NA> NA 4095 South Ave … summer
## 10 grocery store Albertsons Heart of Miss… 1 1003 E Broadway… year-ro…
## # … with 16 more rows
I won’t be working with all of the variables in the dataset so I’ll start by creating a subset with just the variables I’m interested in.
# create subset as datatable
foodRetailersDT <- foodRetailers %>%
subset(select = c("Category", "Name", "Address", "Duration"))
# view
foodRetailersDT
## # A tibble: 26 × 4
## Category Name Address Duration
## <chr> <chr> <chr> <chr>
## 1 farm stand/store Clark Fork Organics 542 Tower St, Missou… summer
## 2 farm stand/store Frank's Little Farm 203 N Curtis St, Mis… summer
## 3 farm stand/store Oxbow F2M 6215 Haugan Dr, Miss… year-ro…
## 4 farm stand/store Turner Farms 3515 S 3rd St W, Mis… year-ro…
## 5 farmers market Clark Fork Market 101 Carousel Dr, Mis… summer
## 6 farmers market Missoula Farmers Market 534 N Higgins Ave, M… summer
## 7 farmers market Missoula Valley Winter Market 2901 Brooks St, Miss… winter
## 8 farmers market Orchard Homes Farmers Market 2537 S 3rd St W, Mis… summer
## 9 farmers market Target Range Farmers Market 4095 South Ave W, Mi… summer
## 10 grocery store Albertsons 1003 E Broadway St, … year-ro…
## # … with 16 more rows
# calculate number in summer
localSummer <- foodRetailers %>%
separate(Address, c("Street Address", "Zip"), sep = -5) %>%
filter(Category == "farm stand/store" | Category == "farmers market" & Duration == "summer") %>%
count(Zip) %>%
rename("Summer" = n)
# calculate number in winter
localWinter <- foodRetailers %>%
separate(Address, c("Street Address", "Zip"), sep = -5) %>%
filter(Category == "farm stand/store" | Category == "farmers market" & Duration == "winter") %>%
count(Zip) %>%
rename("Winter" = n)
# join
local <- full_join(localSummer, localWinter, by = "Zip")
# view
local
## # A tibble: 4 × 3
## Zip Summer Winter
## <chr> <int> <int>
## 1 59801 1 2
## 2 59802 2 NA
## 3 59803 1 1
## 4 59804 4 2
Now I can read the population data and join the two datasets together. This will allow me to calculate the density per 1,000 people, which is the standard measure of grocery store density.
# set wd
setwd("~/Documents/R Projects/Data")
# read data
zipPop <- read_excel("Decennial 2010 Pop by Zip.xlsx", sheet = "Data") %>%
rename("59801" = "ZCTA5 59801",
"59802" = "ZCTA5 59802",
"59803" = "ZCTA5 59803",
"59804" = "ZCTA5 59804",
"59808" = "ZCTA5 59808") %>%
pivot_longer(
cols = c(`59801`, `59802`, `59803`, `59804`, `59808`),
names_to = "Zip",
values_to = "Total"
) %>%
rename("Population" = "Total") %>%
subset(select = c("Zip", "Population"))
# join local food retailer data with population data
zipDT <- full_join(local, zipPop, by = "Zip")
# calculate density per 1000 people for each zip code
zipDensityDT <- zipDT %>%
mutate("Summer Density" = round(Summer/(Population/1000), digits = 2),
"Winter Density" = round(Winter/(Population/1000), digits = 2))
# view
zipDensityDT
## # A tibble: 5 × 6
## Zip Summer Winter Population `Summer Density` `Winter Density`
## <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 59801 1 2 30940 0.03 0.06
## 2 59802 2 NA 18407 0.11 NA
## 3 59803 1 1 16203 0.06 0.06
## 4 59804 4 2 7533 0.53 0.27
## 5 59808 NA NA 17904 NA NA
# set wd
setwd("~/Documents/R Projects/Data/Rapid CFA data")
# read data
CFA <- read_excel("FPAB Community Food Assessment Survey Data.xlsx", sheet = "Raw") %>%
rename("Zip" = "Zip Code",
"Identify As" = 14,
"Primary Source in Summer" = 31,
"Primary Source in Winter" = 39) %>%
select(c("Zip",
"Identify As",
"Primary Source in Summer",
"Primary Source in Winter")) %>%
filter(`Identify As` == "Consumer", `Zip` == 59801 | `Zip` == 59802 | `Zip` == 59803 | `Zip` == 59804 | `Zip` == 59808) %>%
replace_na(list("Primary Source in Summer" = "Other", "Primary Source in Winter" = "Other")) %>%
select(c("Zip", "Primary Source in Summer", "Primary Source in Winter"))
# use table to get counts
summerSource <- CFA %>%
group_by(`Zip`) %>% # group by zip code
select("Primary Source in Summer") %>% # select specific column
table() %>%
as.data.frame.matrix(row.names = FALSE) # coerce back to dataframe
# create vector of zip codes
Zip <- c("59801", "59802", "59803", "59804", "59808")
# bind zip code vector to front of dataframe
summerSource <- cbind(Zip, summerSource)
# mutate to get percentage of people in each zip code who primarily buy from local farmers in the summer
summerSource %>%
mutate("Total" = `Direct from local farmers` + `Other`) %>%
mutate("Percent Local" = round(`Direct from local farmers`/`Total` * 100)) %>%
select(c("Zip", "Percent Local")) # use select to effectively remove unnecessary columns
## Zip Percent Local
## 1 59801 64
## 2 59802 69
## 3 59803 56
## 4 59804 83
## 5 59808 48
# join density and percent data
seasonFull <- full_join(zipDensityDT, summerSource, by = "Zip")
# create subset
summerDT <- seasonFull %>%
select(c("Zip", "Summer Density", "Direct from local farmers")) %>%
rename("Density" = "Summer Density",
"Percent" = "Direct from local farmers") %>%
replace_na(list("Density" = 0))
# view
summerDT
## # A tibble: 5 × 3
## Zip Density Percent
## <chr> <dbl> <int>
## 1 59801 0.03 84
## 2 59802 0.11 74
## 3 59803 0.06 19
## 4 59804 0.53 19
## 5 59808 0 14
Now that I have everything in one place, I can use the cor() function to see if the two variables are correlated.
cor(summerDT$Density, summerDT$Percent)
## [1] -0.2956464
I can also get a sense of the range of each variable.
summarize(summerDT, label = min(Density))
## # A tibble: 1 × 1
## label
## <dbl>
## 1 0
summarize(summerDT, label = max(Density))
## # A tibble: 1 × 1
## label
## <dbl>
## 1 0.53
summarize(summerDT, label = min(Percent))
## # A tibble: 1 × 1
## label
## <int>
## 1 14
summarize(summerDT, label = max(Percent))
## # A tibble: 1 × 1
## label
## <int>
## 1 84
For my final project, I’ll play around with visualizing the above and discussing the implications.