Voronoi Tessellation as a Spatial Price Analysis Approach: The Case of Tahini Prices in Central Tel Aviv

This accompaniment documentation is part of the final assessment in module CASA0005: Geographic Information Systems and Science (19/20)

Overview

Since 2014, all the big retailers in Israel are obliged by law to publish on an hourly basis a full data set of stores and prices of all the products that are being sold in each store. According to the Israeli anti-trust authority, the intention of this act is to mitigate the cost of living in Israel and to encourage competition between different retailers, supermarkets and pharmacies.

This data set allows us to examine the relationship between location and product pricing in the retail sector, and even detect geographic monopolies, where a single retailer provides product or service to a local area.

The Research Question

This paper addresses and examines the following question: What is the impact of the distance from a retailer to their competitors on product pricing? The hypothesis which is being examined accordingly is that the pricing gets higher as the retailer is farther from their competitors and vice versa.

Methodology

A common practice of defining catchment areas of stores is based on network analysis of roads and other transportation infrastructure. This work takes a different approach, and suggests using Voronoi tessellation as a simpler interpolation for this purpose. Voronoi tessellation builds a polygon around each point in a data set, this polygon covers all the points that are the closest to the point seed rather than to any other point in the data set.

This analysis compares the prices of a single package of 500g raw sesame Tahini, a popular middle eastern condiment, between 68 different stores in central Tel Aviv, Israel. The analysis was carried out in the R environment for statistical computing and visualisation.

The following flowchart summarises the three steps of the process: data preparation, spatial analysis and descriptive and inferential statistics:

Data Preparation

First, let’s load our packages

library(methods)
library(httr)
library(dplyr)
library(tidyr)
library(tmap)
library(tmaptools)
library(sp)
library(sf)
library(rgdal)
library(maptools)
library(rgeos)
library(raster)
library(dismo)
library(PBSmapping)
library(geojsonio)
library(slippymath)
library(ggplot2)
library(pastecs)
library(plotly)

Specifying Hebrew as input language

Sys.setlocale("LC_ALL", "Hebrew")

## [1] "LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255"

Creating an Area of Interest Polygon

We will be focusing in our analysis on central Tel Aviv, the economic centre of Israel. First, we need to download the quarters shapefile which is available on the municipal QIS platform (Layers > Zones Boundaries > Quarters > Download):

Once we have downloaded the shapefile, let’s create a single polygon which represents the city centre: quarters 3,4,5 and 6:

#reading the shapefile
TLVQuarters <- st_read("TLVQuarters/Quarters.shp")
#selecting the city centre - quarters 3, 4, 6 and 5
TLVCityCentre <- TLVQuarters[TLVQuarters$krova %in% c('3','4','6','5'),]

Now let’s check that we have the right quarters and plot these two sf objects together:

#setting tmap mode to plot
tmap_mode("plot")

#plotting
tm_shape(TLVQuarters) +
  tm_polygons(col = NA) +
  tm_shape(TLVCityCentre) + 
  tm_polygons(col = "red")

Quarters of Tel Aviv-Jaffa, city centre is highlighted in red

Out area of interest is shown in red, with four different quarters. Let’s use the st_combine function to create a single polygon:

#dissolving the four quarters into one polygon
TLVCityCentreUnion <- st_combine(TLVCityCentre)

#converting the sf object to sp object
TLVCityCentreUnionSp <- as(TLVCityCentreUnion,'Spatial')

#quick plotting'
qtm(TLVCityCentreUnionSp)

Great, now we have a single sp object with a single polygon for the city centre.

Creating a Stores Data Set

We’ll be using an API service to find the attributes of each store in Tel Aviv-Jaffa. This service requires an API key, which can be obtained from here (in Hebrew).

#definig the API's URL
url <-"https://api.superget.co.il"
#specifying an API key
key <- "ENTER--API--KEY--HERE"
#specifying city code parameter
TLVCityCode <- "1180"
#specifying action parameter
getStoresAction <- "GetStoresByCityID"
#creating an API call using the httr package
TLVStores <- GET(url, query=list(api_key = key, action = getStoresAction, city_id = TLVCityCode))
#parsing the respone to JSON
parsed_stores <-  content(TLVStores,"text", encoding = "UTF-8")
#converting the JSON to data frame using the jsonlite package
parsed_storesDF <- fromJSON(parsed_stores)

summary(parsed_storesDF)

   store_id           chain_id         sub_chain_id         city_id           store_area         store_type         store_name       
 Length:160         Length:160         Length:160         Length:160         Length:160         Length:160         Length:160        
 Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  store_code        store_address      store_zip_code     store_gps_lat      store_gps_lng      store_update_time  store_insert_time 
 Length:160         Length:160         Length:160         Length:160         Length:160         Length:160         Length:160        
 Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  city_name          city_name2         city_name3        city_gps_lat       city_gps_lng        chain_name         chain_code       
 Length:160         Length:160         Length:160         Length:160         Length:160         Length:160         Length:160        
 Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 chain_image        sub_chain_name     sub_chain_code     sub_chain_image    sub_chain_is_pharm
 Length:160         Length:160         Length:160         Length:160         Length:160        
 Class :character   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character

Now we need to get rid of pharmacies and online stores, using the filter function within the dplyr package:

#Excluding the "SuperPharm" chain
filteredStores <- dplyr::filter(parsed_storesDF, chain_code != "7290172900007"
  #Excluding the "Be Pharm" sub chain (belongs to Shufersal)
  & sub_chain_code != "5" 
  #Exluding the "AM:PM online" store
  & store_code != "539")

Now we need to fix manually wrong or missing XY data of some of the stores:

#Adding manualy xy coordinates for a single store that has no xy data
filteredStores[114,11] = "32.0624141"
filteredStores[114,12] = "34.7732659"

#Fixing wrong coordinates for store 1196 (line 111)
filteredStores[111,11] = "32.0824511"
filteredStores[111,12] = "34.7784227"

#Fixing wrong coordinates for store 1075 (line 93)
filteredStores[93,11] = "32.057635"
filteredStores[93,12] = "34.811125"

#Fixing wrong coordinates for store 160 (line 12)
filteredStores[12,11] = "32.0509574"
filteredStores[12,12] = "34.7516468"

#Fixing wrong coordinates for store 1222 (line 113)
filteredStores[113,11] = "32.0954439"
filteredStores[113,12] = "34.7756062"

Since the chain names are in Hebrew, it might be beneficial to add a new column with English chain names. First, let’s create a factor with unique chain names in Hebrew:

#find unique chain names in Hebrew
chainNameHeb <- unique(filteredStoressp$chain_name,incomparables = FALSE)

chainNameHeb

 [1] "שופרסל"              "אושר עד"             "דור אלון"            "טיב טעם"             "מחסני להב"           "ויקטורי"            
 [7] "פרש מרקט"            "רמי לוי שיווק השקמה" "מ. יוחננוף ובניו"    "יינות ביתן"          "מגה"                 "עדן טבע"

Now we can create a new factor with English names, create a new data frame as a dictionary, and then add a new column to filteredStores with English chain names:

#English chain names factor
chainNameEng <- c("Shufersal",
                  "Osher Ad",
                  "Dor Alon",
                  "Tiv Taam",
                  "Lahav",
                  "Victory",
                  "Fresh Market",
                  "Rami Levy",
                  "Yohananof",
                  "Bitan",
                  "Mega",
                  "Eden Teva")

#data frame as a right table
chainNameDic <- data.frame(chainNameHeb,chainNameEng)

#tabular join to add a new column to filteredStores
filteredStores$chainNameEng <- data.frame(filteredStores,chainNameDic[match(
  filteredStores[,"chain_name"],chainNameDic[,"chainNameHeb"]),])

Now we need to convert our filteredStores data frame to a sp object:

#transfer xy coordinates to numeric values
filteredStores$store_gps_lat <- as.numeric(filteredStores$store_gps_lat)
filteredStores$store_gps_lng <- as.numeric(filteredStores$store_gps_lng)

#converting the data frame to a sp object
filteredStoressp <-  SpatialPointsDataFrame(filteredStores[,c(12,11)],filteredStores[,-c(12,11)],
                                            proj4string = CRS("+proj=longlat +datum=WGS84"))
#plotting
tm_shape(TLVQuarters) +
  tm_polygons(col=NA) +
  tm_shape(filteredStoressp) +
  tm_dots(col = "blue")

Quarters of Tel Aviv-Jaffa and Store Locations

Creating a Price Data Set

Now it’s our time to find the price of Tahini in each store in the city centre. First, we need to create a new sp object for all the relevant stores:

#project TLVCityCentreUnionSp to wgs84
TLVCityCentreUnionSpWGS84 <- spTransform(TLVCityCentreUnionSp,CRS("+proj=longlat +datum=WGS84"))
#exctracting city centre stores
filteredStoresCityCentre <- filteredStoressp[TLVCityCentreUnionSpWGS84,]

Now, let’s create a new factor that holds all the relevant store IDs, and an empty factor that will hold the price of Tahini in each store:

#creating a factor that holds the store IDs
storeIDs <- filteredStoresCityCentre$store_id

#creating an empty factore to hold the prices of Tahini in each store
prices <- numeric(length(storeIDs))

Now we need to specify the parameters for our API call:

url - hasn’t changed
api_key - hasn’t changed
action is set to GetPriceByProductBarCode
product_barcode is set to "7290000112220" (The product barcode for “Tahini Achva”, a popular Tahini brand)
store_id is set to the ith element of the storeIDs factor. We will iterate over this factor using a for loop

#specifying product barcode
productBarcode <- "7290000112220"

#specifying the API function
getPriceFunc = "GetPriceByProductBarCode"

We’re all set! Now we can run a for loop that actually sends a single API call for each store, and then pushes the Tahini price in that store into the prices factor. If this product isn’t available in that store, an NA value is stored instead.

#iterating over the storeIDs factor
for (i in 1:(length(storeIDs))) {
  #getting a response of a single store
  singleStore <- GET(url, query=list(api_key = key,
                                     action = getPriceFunc,
                                     product_barcode = productBarcode,
                                     store_id = storeIDs[i]))
  
  #parsing the response content to text
  singlePrice <- content(singleStore,"text", encoding = "UTF-8")
  
  #parsing the text to JSON
  SinglePriceValue <-  fromJSON(singlePrice)
  
    #if the price is null, set an NA value
    if(is.null(SinglePriceValue$store_product_price))
    {
      prices[i] = "NA"
    }
    #else, set the product price as the ith element in the prices factor
    else {
      prices[i] = SinglePriceValue$store_product_price
    }
  }

Now we can create a new data frame with two columns:

StoreID
Price

#parsing the prices factor values to float
pricesFloat <- as.double(prices)

#combining the storeID and pricesFloat into a data frame
pricesDF <- data.frame(storeIDs,pricesFloat)

Creating a Buildings Data Set

Our last data preparation is to find all the buildings within our area of interest. This time we’ll use a different API service, https://osmbuildings.org/, which provides access to buildings polygons based on Open Street Map data. Due to the fact this service provides a single 15-zoom tile per call, we need to find all the relevant tiles in our area of interest.

First, let’s find the bounding box of our area of interest, using the st_bbox function in sf package:

#finding the bounding box of the area of interest
TLVCityCentrebbox <- st_bbox(TLVCityCentreUnionSpWGS84)

TLVCityCentrebbox

    xmin     ymin     xmax     ymax 
34.75689 32.05687 34.80976 32.10244

Now we can use the bbox_to_tile_grid function in slippymath package to create a new data frame with all the relevant tiles for our area of interest. Each tile is specified by a combination of x and y values:

#finding the relevant tiles for our area of interest
tiles <- bbox_to_tile_grid(TLVCityCentrebbox,zoom=15)

#let's have a look
head(tiles$tiles)

      x     y
1 19547 13295
2 19548 13295
3 19549 13295
4 19550 13295
5 19551 13295
6 19552 13295

#how many tiles do we have?
nrow(tiles$tiles)

[1] 36

We will be using a for loop to iterate over the tiles data frame and get all the buildings within the area of interest. Lets try to get all the buildings within a single tile, x=19550, y=13298. We’ll use the readOGR function in the rgdal package to convert a geojson to sp. The ducomentation for API calls in OSM Buildings is available here.

#getting an API call response for a single tile
TLVBuildingRaw <- GET("https://data.osmbuildings.org/0.2/anonymous/tile/15/19550/13298.json")

#parsing the content to text
TLVBuildingContent <- content(TLVBuildingRaw, "text")

#parsing the geojson to sp
buildings <- rgdal::readOGR(TLBBuildingContent)

#quick tmap
qtm(buildings)

Now we can clean the Buildings object and run a for loop that iterates over the tiles object to get a full coverage of the area of interest. We’ll use the sprintf function to format the API call with the right XY values for each tile. We’ll also use the duplicated function to eliminate duplicated buildings which are included in more than one API response:

#cleaning the buildings object
buildings <-buildings[0,]

#iterating over the tiles and populating the buildings object
for(i in (1:nrow(tiles$tiles))){
  #a single API call for the ith tile in the tiles data frame
  getBuildingsRaw <- GET(sprintf("https://data.osmbuildings.org/0.2/anonymous/tile/15/%s/%s.json",
                                 tiles$tiles[i,"x"],tiles$tiles[i,"y"]))
  
  #passing forward to the next tile if an error occures
  if (getBuildingsRaw$status_code != "200")
  {
    next
  }
  
  else 
  {
    getBuildingsContent <- content(getBuildingsRaw, "text",encoding="UTF-8")
    getBuildingsSp <- rgdal::readOGR(getBuildingsContent)
    buildings <- bind(buildings,getBuildingsSp)
  }
  }

#removing duplicates
buildings <- buildings[!duplicated(buildings@data),]

#quick tmap
qtm(buildings)

Data Analysis

In this analysis, we would like to know if there is any correlation between the catchment area of each store and the price of Tahini in each store within the area of interest. In order to find the catchment area of each store, we will use a special method called voronoi tessellation, which creates a single polygon for each store. Each polygon covers all the urban area that is closer to one specific store than to any other store.

Voronoi Tessellation

To run this procedure, we’ll user the voronoi function in the dismo package, with the filteredStoressp object as input.

#creating a new voronoi tessellation
vor <- voronoi(filteredStoressp)

#quick tmap
qtm(vor, fill = "chainNameEng")

It can clearly be seen that “Mega” has the largest catchment area, but we must acknowledge the limitations of using voronoi tessellation: the outer polygons are the most prone to distortion, due to the fact that we hadn’t included any stores outside of Tel Aviv-Jaffa. These stores, in fact, affecting the size of the catchment areas of stores within Tel Aviv - because they might serve people who live or work in Tel Aviv.

In order to mitigate this limitation, the analysis includes only stores within the area of interest:

#projecting voronoi polygons to WGS84
vorWGS84 <- spTransform(vor,CRS("+proj=longlat +datum=WGS84"))

#exctracting city centre voronoi polygons
filteredVorCityCentre <- vorWGS84[TLVCityCentreUnionSpWGS84,]

#tmap
tm_shape(filteredVorCityCentre) +
  tm_polygons("chainNameEng") +
  tm_shape(TLVCityCentreUnionSpWGS84) + 
  tm_polygons(col = "red", alpha = 0.5)

Filtered Voronoi polygons with AOI in red

There are two main problems with this selected set of polygons:

Some of the polygons are stretching far beyond the area of interest
In some cases, two neighbouring polygons share the same chain, this fact makes them a single catchment area

In order to this issue, we’ll first dissolve all the polygons that share the same chain value, and then split the merged layer back into different polygons. Then, due to the fact that some of the polygons are actually covering the Mediterranean Sea, we’ll count and sum the built area within each catchment area.

Converting Voronoi Polygons to Catchment Areas

First, we need to make sure that filteredVorCityCentre is in the right projection, WGS84. Then, we’ll merge adjacent voronoi polygons who share the same chain, using the gUnaryUnion function in the rgeos package. This procedure returns an object without any attribute data, so we’ll have to restore the chain value for each merged polygon.

#projecting voronoi polygons to WGS84
filteredVorCityCentre <-spTransform(filteredVorCityCentre,CRS("+proj=longlat +datum=WGS84"))

#merging adjacent voronoi polygons who share the same chain
vorWGS84Union <- gUnaryUnion(filteredVorCityCentre,
                             id = filteredVorCityCentre@data$chainNameEng)
#restoring data frame
row.names(vorWGS84Union) <- as.character(1:length(vorWGS84Union))
chain_ids <- unique(filteredVorCityCentre@data$chain_id)
chain_ids <- as.data.frame(chain_ids)
colnames(chain_ids) <- "catchment_chain_id"
vorWGS84Union <- SpatialPolygonsDataFrame(vorWGS84Union, chain_ids)

Now we can split the vorWGS84Union layer into single polygons using the disaggregate function, and then we give a unique identifier for each catchment area.

#splitting multipart polygons
vorWGS84UnionDis <- disaggregate(vorWGS84Union)

#adding a catchment id column
vorWGS84UnionDis@data$catchment_id <- 1:nrow(vorWGS84UnionDis@data)

#quick tmap
qtm(vorWGS84UnionDis, fill="catchment_chain_id")

Summing Built Area per Catchment Area

Once we have a layer of catchment areas with chain code values, we can go further and find the sum of the built area in each catchment area.

Cropping the Buildings Layer

First, let’s crop the buildings layer and pick only the buildings inside our area of interest, using the intersect function in the raster package. Then we need to make sure that the layer is in the right projection, WGS84:

#cropping buildings by area of interest
buildingsCrop <- crop(buildings,TLVCityCentreUnionSpWGS84)

#projecting buildingsCrop to WGS84
buildingsCropWGS84 <- spTransform(buildingsCrop,CRS("+proj=longlat +datum=WGS84"))

Now, using the intersect function, we can intersect the buildings layer with the voronoi layer. Each polygon in our new layer buildingsVoronoi will have the attributes of both input layers. Then, we’ll calculate the area of each polygon in square meters using the area function.

#intersecting voronoi polygons and buildingsCrop
buildingsVoronoi <- intersect(buildingsCropWGS84,vorWGS84UnionDis)


#claculating area of each polygon (in square meters)
buildingsVoronoi@data$buildingArea <- area(buildingsVoronoi)

#quick tmap
qtm(buildingsVoronoi, fill = "catchment_chain_id", borders = NULL)

Buildings in area of interest by catchment chain ID

Summarising Built Area per Catchment Area

The next step is to sum up the built area for each catchment area by using the aggregate, the output is a single data frame, sumArea, with two columns: catchment_id and catchment_sum_area:

#summaring built area by catchment_id
sumArea <- aggregate(
  buildingsVoronoi@data$buildingArea,
  by=list(catchment_id=buildingsVoronoi@data$catchment_id),
  FUN=sum)

#updating column names
colnames(sumArea) <- c("catchment_id","catchment_sum_area")

#let's have a look
head(sumArea)

  catchment_id catchment_sum_area
1            1           4736.143
2            2          82949.349
3            3         172744.090
4            4          50512.904
5            5          47038.235
6            6         127774.424

Now we can do another tabular join and create a new column in the vorWGS84UnionDis object with the size of built area within each catchment area:

#joining sumArea to vorWGS84UnionDis
vorWGS84UnionDis@data <- data.frame(vorWGS84UnionDis@data,
                        sumArea[match(
                        vorWGS84UnionDis@data[,"catchment_id"],sumArea[,"catchment_id"]),])

Aggregating Data into a Single Store Data Frame

The final step is to aggregate the data into a single store data frame, in this data frame each record is a store within the area of interest with a value of built area (in sq meter) of the catchment area that the store falls into. We’ll use a spatial join in order to match each store to a single catchment area:

First, we’ll extract all the stores in the city centre
Using a tabular join, we’ll add to the new store data set a new column with the price of Tahini (in ILS) in each store
Then we’ll convert the catchment area polygons and the store points to sf to preform a spatial join using the st_as_sf function
The next step is to spatially join these two sf objects, using the st_join function
Then we’ll drop any stores with missing price value
Finally, we’ll create another tabular join in order to add the English chain names for each store, and then export the data to external data frame

#extracting stores in city centre
stores <- filteredStoressp[TLVCityCentreUnionSpWGS84,]


#joining prices to stores
stores@data <- data.frame(stores@data,
              pricesDF[match(
              stores@data[,"store_id"],pricesDF[,"storeIDs"]),])

#converting sp to sf for a spatial join
storesSF <- st_as_sf(stores)
catchmentSF <- st_as_sf(vorWGS84UnionDis)

#spatial join
sjoin <- sf::st_join(storesSF,catchmentSF)

#ommiting rows with missing price values
sjoinClean <- sjoin[!is.na(sjoin$pricesFloat),]

#tabular join for English chain Names
sjoinClean <- merge(sjoinClean,
                    chainNameDic,
                    by.x = "chain_name",
                    by.y = "chainNameHeb")

#converting sf to data.frame
sjoinCleanDF <- as.data.frame(sjoinClean)

Descriptive and Inferential Statistics

Once we have a single data frame, sjoinCleanDF which aggregates the data of our spatial analysis, we can dive into our statistical analysis.

Descriptive Statistics

sjoinCleanDF is consisted of 68 records and 33 variables. By creating a simple pivot table, we can understand our data better. Let’s summarise our data by chain name, with the following statistics:

Mean Tahini price by chain
Store count by chain
Mean size of built area in catchment area by chain
Sum of all built area in catchment areas by chain

WE’ll use the dplyr package and the functions select, group_by and summarise to create the pivot table.

#creating a pivot table
sumByRetailer <- sjoinCleanDF %>%
  dplyr::select(chainNameEng,pricesFloat,catchment_sum_area) %>%
  group_by(chainNameEng) %>%
  summarise(
    meanPrice = mean(pricesFloat),
    storeCount = length(chainNameEng),
    meanCatchmentArea = mean(catchment_sum_area),
    sumCatchmentArea = sum(catchment_sum_area))

sumByRetailer

chainNameEng	meanPrice	storeCount	meanCatchmentArea	sumCatchmentArea
Dor Alon	16.98	26	364,790.11	9,484,542.81
Tiv Taam	16.45	11	165,039.89	1,815,438.77
Shufersal	15.17	14	173,797.47	2,433,164.59
Eden Teva	14.90	1	61,748.04	61,748.04
Victory	14.20	4	36,411.64	145,646.57
Mega	11.33	11	469,886.63	5,168,752.93
Fresh Market	10.90	1	14,848.00	14,848.00

plot_ly package is useful for simple visualisations, like the following pie chart:

#creating a new pie chart
chainPieChart <- plot_ly(sumByRetailer,
  labels = sumByRetailer$chainNameEng,
  values = sumByRetailer$storeCount, type = 'pie',
  textposition = 'inside',
  textinfo = 'label+percent',
  insidetextfont = list(color = '#FFFFFF'),
  showlegend = TRUE) %>%
  layout(title = 'Stores within Area of Interest by Chain',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

chainPieChart

We can learn from this table that Dor Alon is the priciest chain with mean price of 16.98 NIS per single package of Tahini. This chain is also the chain with the highest number of stores within the area of interest - 26 stores. The highest mean built area within a single catchment area belongs to Mega (470 thousand square meters), but the highest total sum of built area within all the catchment areas belongs to Dor Alon as well, almost 9.5 million square meters. These findings might be an evidence to a correlation between the size of built area within catchment areas and the price of Tahini in each store.

Let’s continue with some summary statistics and basic visualisation for each of the following variables:

Tahini Price (pricesFloat) - dependent variable
Size of store’s catchment area (catchment_sum_area) - independent varialbe

Tahini Price

We’ll use the stat.desc function in the pastecs package to extract descriptive statistics:

stat.desc(sjoinCleanDF[,c("pricesFloat")])

	x
nbr.val	68.00
nbr.null	0.00
nbr.na	0.00
min	10.00
max	18.90
range	8.90
sum	1,041.90
median	16.90
mean	15.32
SE.mean	0.32
CI.mean.0.95	0.64
var	6.93
std.dev	2.63
coef.var	0.17

plot_ly package allows us to create a histogram with price distribution:

#plotting a histogram
priceHist <- plot_ly(x = sjoinCleanDF$pricesFloat, type = "histogram") %>%
  layout(title="Histogram of Tahini Prices within Area of Interest",
         xaxis = list(title="Tahini Price (in NIS)"),
         yaxis = list(title="Frequency"))

priceHist

Histogram of Tahini Prices within Area of Interest

Size of store’s catchment area

stat.desc(sjoinCleanDF[,c("catchment_sum_area")])

	x
nbr.val	68.00
nbr.null	0.00
nbr.na	0.00
min	14,848.00
max	713,576.21
range	698,728.21
sum	19,124,141.73
median	172,744.09
mean	281,237.38
SE.mean	30,494.28
CI.mean.0.95	60,866.84
var	63,233,272,984.51
std.dev	251,462.27
coef.var	0.89

catchmentArea <- plot_ly(x = sjoinCleanDF$catchment_sum_area, type = "histogram") %>%
  layout(title="Histogram of Built Area within a Single Catchment Area",
         xaxis = list(title="Built Area (in sq meters)"),
         yaxis = list(title="Frequency"))

catchmentArea

Histogram of Built Area within a Single Catchment Area

Pearson Correlation and Linear Regression

As mentioned above, the null hypothesis is that there is no linear association between the built area within the catchment area of each store, and the price of Tahini in that store. To reject this hypothesis, Pearson correlation coefficient and a simple linear regression were calculated.

Pearson Correlation Coefficient

We’ll use the cor.test function to find the Pearson correlation for these two variables:

#Pearson correlation coefficient
Pearson <- cor.test(sjoinCleanDF$pricesFloat,
               sjoinCleanDF$catchment_sum_area)

#show coefficients
Pearson

    Pearson's product-moment correlation

data:  sjoinCleanDF$pricesFloat and sjoinCleanDF$catchment_sum_area
t = 0.21397, df = 66, p-value = 0.8312
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2134361  0.2631023
sample estimates:
      cor 
0.0263288

The built area within the catchment area and the price of Tahini in each store were not found to be significantly correlated, r(66) = .03, p = .83, for significance level of 5%.

Simple Linear Regression

A simple linear regression was calculated to predict the price of Tahini in each store based on the size of built area within the catchment area which the store falls into.

We’ll use the lm function to create a linear regression model, and the plot_ly package to create a scatter plot with a trend line.

#creating a model
linearRegression <- lm(sjoinCleanDF$pricesFloat 
                       ~ sjoinCleanDF$catchment_sum_area)

#summarising the model
summary(linearRegression)

Call:
lm(formula = sjoinCleanDF$pricesFloat ~ sjoinCleanDF$catchment_sum_area)

Residuals:
   Min     1Q Median     3Q    Max 
-5.418 -1.383  1.560  1.639  3.459 

Coefficients:
                                     Estimate    Std. Error t value            Pr(>|t|)    
(Intercept)                     15.2445243931  0.4844950730  31.465 <0.0000000000000002 ***
sjoinCleanDF$catchment_sum_area  0.0000002757  0.0000012885   0.214               0.831    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.652 on 66 degrees of freedom
Multiple R-squared:  0.0006932, Adjusted R-squared:  -0.01445 
F-statistic: 0.04578 on 1 and 66 DF,  p-value: 0.8312

#creating a scatter plot
scatterPlot <- plot_ly(data = sjoinCleanDF, 
             x = sjoinCleanDF$catchment_sum_area,
             y = sjoinCleanDF$pricesFloat, 
             type = "scatter",
             mode = "markers") %>% 
      add_lines(x = ~catchment_sum_area,
                y = fitted(linearRegression)) %>%
                layout(showlegend = F,
                xaxis = list(title="Built Area within the Store's Catchment Area (in sq meters)"),
                yaxis = list(title="Tahini Price (in NIS)"))

#showing the scatter plot
scatterPlot

Scatter Plot: price of Tahini in each store based on the size of built area within the catchment area which the store falls into

As expected, no significant regression equation was found (F(1,66) = .05, p = .83), with R² = 0.

Conclusion

No evidence of association between the size of built area within catchment areas and Tahini prices in different stores in central Tel Aviv was found in this analysis. This isn’t to say that this association does not exist: since the store data set contains only retailers that are being regulated by the government (mainly big retailers), this analysis actually ignores little grocery stores that are probably affecting the real size of each catchment area. In addition, this analysis considers only one product within small geographic area - hence there is room for a wider investigation of bigger regions and wider range of products.

Another potential analysis might use the total built area within catchment areas by chain and local authority - with an updated hypothesis of correlation between the mean price of a product by retailer and local authority, and the total built area within the catchment areas of this retailer in each local authority.

Appendix: How to Make Middle Eastern Tahini?

Use this quick recipe to make you own delicious homemade Tahini. It’s super easy, and super healthy too. Enjoy!