This accompaniment documentation is part of the final assessment in module CASA0005: Geographic Information Systems and Science (19/20)
Overview
Since 2014, all the big retailers in Israel are obliged by law to publish on an hourly basis a full data set of stores and prices of all the products that are being sold in each store. According to the Israeli anti-trust authority, the intention of this act is to mitigate the cost of living in Israel and to encourage competition between different retailers, supermarkets and pharmacies.
This data set allows us to examine the relationship between location and product pricing in the retail sector, and even detect geographic monopolies, where a single retailer provides product or service to a local area.
The Research Question
This paper addresses and examines the following question: What is the impact of the distance from a retailer to their competitors on product pricing? The hypothesis which is being examined accordingly is that the pricing gets higher as the retailer is farther from their competitors and vice versa.
Methodology
A common practice of defining catchment areas of stores is based on network analysis of roads and other transportation infrastructure. This work takes a different approach, and suggests using Voronoi tessellation as a simpler interpolation for this purpose. Voronoi tessellation builds a polygon around each point in a data set, this polygon covers all the points that are the closest to the point seed rather than to any other point in the data set.
This analysis compares the prices of a single package of 500g raw sesame Tahini, a popular middle eastern condiment, between 68 different stores in central Tel Aviv, Israel. The analysis was carried out in the R environment for statistical computing and visualisation.
The following flowchart summarises the three steps of the process: data preparation, spatial analysis and descriptive and inferential statistics:
Data Preparation
First, let’s load our packages
library(methods)
library(httr)
library(dplyr)
library(tidyr)
library(tmap)
library(tmaptools)
library(sp)
library(sf)
library(rgdal)
library(maptools)
library(rgeos)
library(raster)
library(dismo)
library(PBSmapping)
library(geojsonio)
library(slippymath)
library(ggplot2)
library(pastecs)
library(plotly)Specifying Hebrew as input language
## [1] "LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255"
Creating an Area of Interest Polygon
We will be focusing in our analysis on central Tel Aviv, the economic centre of Israel. First, we need to download the quarters shapefile which is available on the municipal QIS platform (Layers > Zones Boundaries > Quarters > Download):
Once we have downloaded the shapefile, let’s create a single polygon which represents the city centre: quarters 3,4,5 and 6:
#reading the shapefile
TLVQuarters <- st_read("TLVQuarters/Quarters.shp")
#selecting the city centre - quarters 3, 4, 6 and 5
TLVCityCentre <- TLVQuarters[TLVQuarters$krova %in% c('3','4','6','5'),]Now let’s check that we have the right quarters and plot these two sf objects together:
#setting tmap mode to plot
tmap_mode("plot")
#plotting
tm_shape(TLVQuarters) +
tm_polygons(col = NA) +
tm_shape(TLVCityCentre) +
tm_polygons(col = "red")Out area of interest is shown in red, with four different quarters. Let’s use the st_combine function to create a single polygon:
#dissolving the four quarters into one polygon
TLVCityCentreUnion <- st_combine(TLVCityCentre)
#converting the sf object to sp object
TLVCityCentreUnionSp <- as(TLVCityCentreUnion,'Spatial')
#quick plotting'
qtm(TLVCityCentreUnionSp)Great, now we have a single sp object with a single polygon for the city centre.
Creating a Stores Data Set
We’ll be using an API service to find the attributes of each store in Tel Aviv-Jaffa. This service requires an API key, which can be obtained from here (in Hebrew).
#definig the API's URL
url <-"https://api.superget.co.il"
#specifying an API key
key <- "ENTER--API--KEY--HERE"
#specifying city code parameter
TLVCityCode <- "1180"
#specifying action parameter
getStoresAction <- "GetStoresByCityID"
#creating an API call using the httr package
TLVStores <- GET(url, query=list(api_key = key, action = getStoresAction, city_id = TLVCityCode))
#parsing the respone to JSON
parsed_stores <- content(TLVStores,"text", encoding = "UTF-8")
#converting the JSON to data frame using the jsonlite package
parsed_storesDF <- fromJSON(parsed_stores) store_id chain_id sub_chain_id city_id store_area store_type store_name
Length:160 Length:160 Length:160 Length:160 Length:160 Length:160 Length:160
Class :character Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
store_code store_address store_zip_code store_gps_lat store_gps_lng store_update_time store_insert_time
Length:160 Length:160 Length:160 Length:160 Length:160 Length:160 Length:160
Class :character Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
city_name city_name2 city_name3 city_gps_lat city_gps_lng chain_name chain_code
Length:160 Length:160 Length:160 Length:160 Length:160 Length:160 Length:160
Class :character Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
chain_image sub_chain_name sub_chain_code sub_chain_image sub_chain_is_pharm
Length:160 Length:160 Length:160 Length:160 Length:160
Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character
Now we need to get rid of pharmacies and online stores, using the filter function within the dplyr package:
#Excluding the "SuperPharm" chain
filteredStores <- dplyr::filter(parsed_storesDF, chain_code != "7290172900007"
#Excluding the "Be Pharm" sub chain (belongs to Shufersal)
& sub_chain_code != "5"
#Exluding the "AM:PM online" store
& store_code != "539") Now we need to fix manually wrong or missing XY data of some of the stores:
#Adding manualy xy coordinates for a single store that has no xy data
filteredStores[114,11] = "32.0624141"
filteredStores[114,12] = "34.7732659"
#Fixing wrong coordinates for store 1196 (line 111)
filteredStores[111,11] = "32.0824511"
filteredStores[111,12] = "34.7784227"
#Fixing wrong coordinates for store 1075 (line 93)
filteredStores[93,11] = "32.057635"
filteredStores[93,12] = "34.811125"
#Fixing wrong coordinates for store 160 (line 12)
filteredStores[12,11] = "32.0509574"
filteredStores[12,12] = "34.7516468"
#Fixing wrong coordinates for store 1222 (line 113)
filteredStores[113,11] = "32.0954439"
filteredStores[113,12] = "34.7756062"Since the chain names are in Hebrew, it might be beneficial to add a new column with English chain names. First, let’s create a factor with unique chain names in Hebrew:
#find unique chain names in Hebrew
chainNameHeb <- unique(filteredStoressp$chain_name,incomparables = FALSE)
chainNameHeb [1] "שופרסל" "אושר עד" "דור אלון" "טיב טעם" "מחסני להב" "ויקטורי"
[7] "פרש מרקט" "רמי לוי שיווק השקמה" "מ. יוחננוף ובניו" "יינות ביתן" "מגה" "עדן טבע"
Now we can create a new factor with English names, create a new data frame as a dictionary, and then add a new column to filteredStores with English chain names:
#English chain names factor
chainNameEng <- c("Shufersal",
"Osher Ad",
"Dor Alon",
"Tiv Taam",
"Lahav",
"Victory",
"Fresh Market",
"Rami Levy",
"Yohananof",
"Bitan",
"Mega",
"Eden Teva")
#data frame as a right table
chainNameDic <- data.frame(chainNameHeb,chainNameEng)
#tabular join to add a new column to filteredStores
filteredStores$chainNameEng <- data.frame(filteredStores,chainNameDic[match(
filteredStores[,"chain_name"],chainNameDic[,"chainNameHeb"]),])Now we need to convert our filteredStores data frame to a sp object:
#transfer xy coordinates to numeric values
filteredStores$store_gps_lat <- as.numeric(filteredStores$store_gps_lat)
filteredStores$store_gps_lng <- as.numeric(filteredStores$store_gps_lng)
#converting the data frame to a sp object
filteredStoressp <- SpatialPointsDataFrame(filteredStores[,c(12,11)],filteredStores[,-c(12,11)],
proj4string = CRS("+proj=longlat +datum=WGS84"))
#plotting
tm_shape(TLVQuarters) +
tm_polygons(col=NA) +
tm_shape(filteredStoressp) +
tm_dots(col = "blue")Creating a Price Data Set
Now it’s our time to find the price of Tahini in each store in the city centre. First, we need to create a new sp object for all the relevant stores:
#project TLVCityCentreUnionSp to wgs84
TLVCityCentreUnionSpWGS84 <- spTransform(TLVCityCentreUnionSp,CRS("+proj=longlat +datum=WGS84"))
#exctracting city centre stores
filteredStoresCityCentre <- filteredStoressp[TLVCityCentreUnionSpWGS84,]Now, let’s create a new factor that holds all the relevant store IDs, and an empty factor that will hold the price of Tahini in each store:
#creating a factor that holds the store IDs
storeIDs <- filteredStoresCityCentre$store_id
#creating an empty factore to hold the prices of Tahini in each store
prices <- numeric(length(storeIDs))Now we need to specify the parameters for our API call:
url- hasn’t changedapi_key- hasn’t changedactionis set toGetPriceByProductBarCodeproduct_barcodeis set to"7290000112220"(The product barcode for “Tahini Achva”, a popular Tahini brand)store_idis set to theith element of thestoreIDsfactor. We will iterate over this factor using aforloop
#specifying product barcode
productBarcode <- "7290000112220"
#specifying the API function
getPriceFunc = "GetPriceByProductBarCode"We’re all set! Now we can run a for loop that actually sends a single API call for each store, and then pushes the Tahini price in that store into the prices factor. If this product isn’t available in that store, an NA value is stored instead.
#iterating over the storeIDs factor
for (i in 1:(length(storeIDs))) {
#getting a response of a single store
singleStore <- GET(url, query=list(api_key = key,
action = getPriceFunc,
product_barcode = productBarcode,
store_id = storeIDs[i]))
#parsing the response content to text
singlePrice <- content(singleStore,"text", encoding = "UTF-8")
#parsing the text to JSON
SinglePriceValue <- fromJSON(singlePrice)
#if the price is null, set an NA value
if(is.null(SinglePriceValue$store_product_price))
{
prices[i] = "NA"
}
#else, set the product price as the ith element in the prices factor
else {
prices[i] = SinglePriceValue$store_product_price
}
}Now we can create a new data frame with two columns:
- StoreID
- Price
Creating a Buildings Data Set
Our last data preparation is to find all the buildings within our area of interest. This time we’ll use a different API service, https://osmbuildings.org/, which provides access to buildings polygons based on Open Street Map data. Due to the fact this service provides a single 15-zoom tile per call, we need to find all the relevant tiles in our area of interest.
First, let’s find the bounding box of our area of interest, using the st_bbox function in sf package:
#finding the bounding box of the area of interest
TLVCityCentrebbox <- st_bbox(TLVCityCentreUnionSpWGS84)
TLVCityCentrebbox xmin ymin xmax ymax
34.75689 32.05687 34.80976 32.10244
Now we can use the bbox_to_tile_grid function in slippymath package to create a new data frame with all the relevant tiles for our area of interest. Each tile is specified by a combination of x and y values:
#finding the relevant tiles for our area of interest
tiles <- bbox_to_tile_grid(TLVCityCentrebbox,zoom=15)
#let's have a look
head(tiles$tiles) x y
1 19547 13295
2 19548 13295
3 19549 13295
4 19550 13295
5 19551 13295
6 19552 13295
[1] 36
We will be using a for loop to iterate over the tiles data frame and get all the buildings within the area of interest. Lets try to get all the buildings within a single tile, x=19550, y=13298. We’ll use the readOGR function in the rgdal package to convert a geojson to sp. The ducomentation for API calls in OSM Buildings is available here.
#getting an API call response for a single tile
TLVBuildingRaw <- GET("https://data.osmbuildings.org/0.2/anonymous/tile/15/19550/13298.json")
#parsing the content to text
TLVBuildingContent <- content(TLVBuildingRaw, "text")
#parsing the geojson to sp
buildings <- rgdal::readOGR(TLBBuildingContent)
#quick tmap
qtm(buildings)Now we can clean the Buildings object and run a for loop that iterates over the tiles object to get a full coverage of the area of interest. We’ll use the sprintf function to format the API call with the right XY values for each tile. We’ll also use the duplicated function to eliminate duplicated buildings which are included in more than one API response:
#cleaning the buildings object
buildings <-buildings[0,]
#iterating over the tiles and populating the buildings object
for(i in (1:nrow(tiles$tiles))){
#a single API call for the ith tile in the tiles data frame
getBuildingsRaw <- GET(sprintf("https://data.osmbuildings.org/0.2/anonymous/tile/15/%s/%s.json",
tiles$tiles[i,"x"],tiles$tiles[i,"y"]))
#passing forward to the next tile if an error occures
if (getBuildingsRaw$status_code != "200")
{
next
}
else
{
getBuildingsContent <- content(getBuildingsRaw, "text",encoding="UTF-8")
getBuildingsSp <- rgdal::readOGR(getBuildingsContent)
buildings <- bind(buildings,getBuildingsSp)
}
}
#removing duplicates
buildings <- buildings[!duplicated(buildings@data),]
#quick tmap
qtm(buildings)Data Analysis
In this analysis, we would like to know if there is any correlation between the catchment area of each store and the price of Tahini in each store within the area of interest. In order to find the catchment area of each store, we will use a special method called voronoi tessellation, which creates a single polygon for each store. Each polygon covers all the urban area that is closer to one specific store than to any other store.
Voronoi Tessellation
To run this procedure, we’ll user the voronoi function in the dismo package, with the filteredStoressp object as input.
#creating a new voronoi tessellation
vor <- voronoi(filteredStoressp)
#quick tmap
qtm(vor, fill = "chainNameEng")It can clearly be seen that “Mega” has the largest catchment area, but we must acknowledge the limitations of using voronoi tessellation: the outer polygons are the most prone to distortion, due to the fact that we hadn’t included any stores outside of Tel Aviv-Jaffa. These stores, in fact, affecting the size of the catchment areas of stores within Tel Aviv - because they might serve people who live or work in Tel Aviv.
In order to mitigate this limitation, the analysis includes only stores within the area of interest:
#projecting voronoi polygons to WGS84
vorWGS84 <- spTransform(vor,CRS("+proj=longlat +datum=WGS84"))
#exctracting city centre voronoi polygons
filteredVorCityCentre <- vorWGS84[TLVCityCentreUnionSpWGS84,]
#tmap
tm_shape(filteredVorCityCentre) +
tm_polygons("chainNameEng") +
tm_shape(TLVCityCentreUnionSpWGS84) +
tm_polygons(col = "red", alpha = 0.5)There are two main problems with this selected set of polygons:
- Some of the polygons are stretching far beyond the area of interest
- In some cases, two neighbouring polygons share the same chain, this fact makes them a single catchment area
In order to this issue, we’ll first dissolve all the polygons that share the same chain value, and then split the merged layer back into different polygons. Then, due to the fact that some of the polygons are actually covering the Mediterranean Sea, we’ll count and sum the built area within each catchment area.
Converting Voronoi Polygons to Catchment Areas
First, we need to make sure that filteredVorCityCentre is in the right projection, WGS84. Then, we’ll merge adjacent voronoi polygons who share the same chain, using the gUnaryUnion function in the rgeos package. This procedure returns an object without any attribute data, so we’ll have to restore the chain value for each merged polygon.
#projecting voronoi polygons to WGS84
filteredVorCityCentre <-spTransform(filteredVorCityCentre,CRS("+proj=longlat +datum=WGS84"))
#merging adjacent voronoi polygons who share the same chain
vorWGS84Union <- gUnaryUnion(filteredVorCityCentre,
id = filteredVorCityCentre@data$chainNameEng)
#restoring data frame
row.names(vorWGS84Union) <- as.character(1:length(vorWGS84Union))
chain_ids <- unique(filteredVorCityCentre@data$chain_id)
chain_ids <- as.data.frame(chain_ids)
colnames(chain_ids) <- "catchment_chain_id"
vorWGS84Union <- SpatialPolygonsDataFrame(vorWGS84Union, chain_ids)Now we can split the vorWGS84Union layer into single polygons using the disaggregate function, and then we give a unique identifier for each catchment area.
#splitting multipart polygons
vorWGS84UnionDis <- disaggregate(vorWGS84Union)
#adding a catchment id column
vorWGS84UnionDis@data$catchment_id <- 1:nrow(vorWGS84UnionDis@data)
#quick tmap
qtm(vorWGS84UnionDis, fill="catchment_chain_id")Summing Built Area per Catchment Area
Once we have a layer of catchment areas with chain code values, we can go further and find the sum of the built area in each catchment area.
Cropping the Buildings Layer
First, let’s crop the buildings layer and pick only the buildings inside our area of interest, using the intersect function in the raster package. Then we need to make sure that the layer is in the right projection, WGS84:
#cropping buildings by area of interest
buildingsCrop <- crop(buildings,TLVCityCentreUnionSpWGS84)
#projecting buildingsCrop to WGS84
buildingsCropWGS84 <- spTransform(buildingsCrop,CRS("+proj=longlat +datum=WGS84"))Now, using the intersect function, we can intersect the buildings layer with the voronoi layer. Each polygon in our new layer buildingsVoronoi will have the attributes of both input layers. Then, we’ll calculate the area of each polygon in square meters using the area function.
#intersecting voronoi polygons and buildingsCrop
buildingsVoronoi <- intersect(buildingsCropWGS84,vorWGS84UnionDis)
#claculating area of each polygon (in square meters)
buildingsVoronoi@data$buildingArea <- area(buildingsVoronoi)
#quick tmap
qtm(buildingsVoronoi, fill = "catchment_chain_id", borders = NULL)Summarising Built Area per Catchment Area
The next step is to sum up the built area for each catchment area by using the aggregate, the output is a single data frame, sumArea, with two columns: catchment_id and catchment_sum_area:
#summaring built area by catchment_id
sumArea <- aggregate(
buildingsVoronoi@data$buildingArea,
by=list(catchment_id=buildingsVoronoi@data$catchment_id),
FUN=sum)
#updating column names
colnames(sumArea) <- c("catchment_id","catchment_sum_area")
#let's have a look
head(sumArea) catchment_id catchment_sum_area
1 1 4736.143
2 2 82949.349
3 3 172744.090
4 4 50512.904
5 5 47038.235
6 6 127774.424
Now we can do another tabular join and create a new column in the vorWGS84UnionDis object with the size of built area within each catchment area:
Aggregating Data into a Single Store Data Frame
The final step is to aggregate the data into a single store data frame, in this data frame each record is a store within the area of interest with a value of built area (in sq meter) of the catchment area that the store falls into. We’ll use a spatial join in order to match each store to a single catchment area:
- First, we’ll extract all the stores in the city centre
- Using a tabular join, we’ll add to the new store data set a new column with the price of Tahini (in ILS) in each store
- Then we’ll convert the catchment area polygons and the store points to
sfto preform a spatial join using thest_as_sffunction - The next step is to spatially join these two
sfobjects, using thest_joinfunction - Then we’ll drop any stores with missing price value
- Finally, we’ll create another tabular join in order to add the English chain names for each store, and then export the data to external data frame
#extracting stores in city centre
stores <- filteredStoressp[TLVCityCentreUnionSpWGS84,]
#joining prices to stores
stores@data <- data.frame(stores@data,
pricesDF[match(
stores@data[,"store_id"],pricesDF[,"storeIDs"]),])
#converting sp to sf for a spatial join
storesSF <- st_as_sf(stores)
catchmentSF <- st_as_sf(vorWGS84UnionDis)
#spatial join
sjoin <- sf::st_join(storesSF,catchmentSF)
#ommiting rows with missing price values
sjoinClean <- sjoin[!is.na(sjoin$pricesFloat),]
#tabular join for English chain Names
sjoinClean <- merge(sjoinClean,
chainNameDic,
by.x = "chain_name",
by.y = "chainNameHeb")
#converting sf to data.frame
sjoinCleanDF <- as.data.frame(sjoinClean)Descriptive and Inferential Statistics
Once we have a single data frame, sjoinCleanDF which aggregates the data of our spatial analysis, we can dive into our statistical analysis.
Descriptive Statistics
sjoinCleanDF is consisted of 68 records and 33 variables. By creating a simple pivot table, we can understand our data better. Let’s summarise our data by chain name, with the following statistics:
- Mean Tahini price by chain
- Store count by chain
- Mean size of built area in catchment area by chain
- Sum of all built area in catchment areas by chain
WE’ll use the dplyr package and the functions select, group_by and summarise to create the pivot table.
#creating a pivot table
sumByRetailer <- sjoinCleanDF %>%
dplyr::select(chainNameEng,pricesFloat,catchment_sum_area) %>%
group_by(chainNameEng) %>%
summarise(
meanPrice = mean(pricesFloat),
storeCount = length(chainNameEng),
meanCatchmentArea = mean(catchment_sum_area),
sumCatchmentArea = sum(catchment_sum_area))
sumByRetailer| chainNameEng | meanPrice | storeCount | meanCatchmentArea | sumCatchmentArea |
|---|---|---|---|---|
| Dor Alon | 16.98 | 26 | 364,790.11 | 9,484,542.81 |
| Tiv Taam | 16.45 | 11 | 165,039.89 | 1,815,438.77 |
| Shufersal | 15.17 | 14 | 173,797.47 | 2,433,164.59 |
| Eden Teva | 14.90 | 1 | 61,748.04 | 61,748.04 |
| Victory | 14.20 | 4 | 36,411.64 | 145,646.57 |
| Mega | 11.33 | 11 | 469,886.63 | 5,168,752.93 |
| Fresh Market | 10.90 | 1 | 14,848.00 | 14,848.00 |
plot_ly package is useful for simple visualisations, like the following pie chart:
#creating a new pie chart
chainPieChart <- plot_ly(sumByRetailer,
labels = sumByRetailer$chainNameEng,
values = sumByRetailer$storeCount, type = 'pie',
textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
showlegend = TRUE) %>%
layout(title = 'Stores within Area of Interest by Chain',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
chainPieChartWe can learn from this table that Dor Alon is the priciest chain with mean price of 16.98 NIS per single package of Tahini. This chain is also the chain with the highest number of stores within the area of interest - 26 stores. The highest mean built area within a single catchment area belongs to Mega (470 thousand square meters), but the highest total sum of built area within all the catchment areas belongs to Dor Alon as well, almost 9.5 million square meters. These findings might be an evidence to a correlation between the size of built area within catchment areas and the price of Tahini in each store.
Let’s continue with some summary statistics and basic visualisation for each of the following variables:
- Tahini Price (
pricesFloat) - dependent variable - Size of store’s catchment area (
catchment_sum_area) - independent varialbe
Tahini Price
We’ll use the stat.desc function in the pastecs package to extract descriptive statistics:
| x | |
|---|---|
| nbr.val | 68.00 |
| nbr.null | 0.00 |
| nbr.na | 0.00 |
| min | 10.00 |
| max | 18.90 |
| range | 8.90 |
| sum | 1,041.90 |
| median | 16.90 |
| mean | 15.32 |
| SE.mean | 0.32 |
| CI.mean.0.95 | 0.64 |
| var | 6.93 |
| std.dev | 2.63 |
| coef.var | 0.17 |
plot_ly package allows us to create a histogram with price distribution:
#plotting a histogram
priceHist <- plot_ly(x = sjoinCleanDF$pricesFloat, type = "histogram") %>%
layout(title="Histogram of Tahini Prices within Area of Interest",
xaxis = list(title="Tahini Price (in NIS)"),
yaxis = list(title="Frequency"))
priceHistSize of store’s catchment area
| x | |
|---|---|
| nbr.val | 68.00 |
| nbr.null | 0.00 |
| nbr.na | 0.00 |
| min | 14,848.00 |
| max | 713,576.21 |
| range | 698,728.21 |
| sum | 19,124,141.73 |
| median | 172,744.09 |
| mean | 281,237.38 |
| SE.mean | 30,494.28 |
| CI.mean.0.95 | 60,866.84 |
| var | 63,233,272,984.51 |
| std.dev | 251,462.27 |
| coef.var | 0.89 |
catchmentArea <- plot_ly(x = sjoinCleanDF$catchment_sum_area, type = "histogram") %>%
layout(title="Histogram of Built Area within a Single Catchment Area",
xaxis = list(title="Built Area (in sq meters)"),
yaxis = list(title="Frequency"))
catchmentAreaPearson Correlation and Linear Regression
As mentioned above, the null hypothesis is that there is no linear association between the built area within the catchment area of each store, and the price of Tahini in that store. To reject this hypothesis, Pearson correlation coefficient and a simple linear regression were calculated.
Pearson Correlation Coefficient
We’ll use the cor.test function to find the Pearson correlation for these two variables:
#Pearson correlation coefficient
Pearson <- cor.test(sjoinCleanDF$pricesFloat,
sjoinCleanDF$catchment_sum_area)
#show coefficients
Pearson Pearson's product-moment correlation
data: sjoinCleanDF$pricesFloat and sjoinCleanDF$catchment_sum_area
t = 0.21397, df = 66, p-value = 0.8312
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2134361 0.2631023
sample estimates:
cor
0.0263288
The built area within the catchment area and the price of Tahini in each store were not found to be significantly correlated, r(66) = .03, p = .83, for significance level of 5%.
Simple Linear Regression
A simple linear regression was calculated to predict the price of Tahini in each store based on the size of built area within the catchment area which the store falls into.
We’ll use the lm function to create a linear regression model, and the plot_ly package to create a scatter plot with a trend line.
#creating a model
linearRegression <- lm(sjoinCleanDF$pricesFloat
~ sjoinCleanDF$catchment_sum_area)
#summarising the model
summary(linearRegression)Call:
lm(formula = sjoinCleanDF$pricesFloat ~ sjoinCleanDF$catchment_sum_area)
Residuals:
Min 1Q Median 3Q Max
-5.418 -1.383 1.560 1.639 3.459
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.2445243931 0.4844950730 31.465 <0.0000000000000002 ***
sjoinCleanDF$catchment_sum_area 0.0000002757 0.0000012885 0.214 0.831
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.652 on 66 degrees of freedom
Multiple R-squared: 0.0006932, Adjusted R-squared: -0.01445
F-statistic: 0.04578 on 1 and 66 DF, p-value: 0.8312
#creating a scatter plot
scatterPlot <- plot_ly(data = sjoinCleanDF,
x = sjoinCleanDF$catchment_sum_area,
y = sjoinCleanDF$pricesFloat,
type = "scatter",
mode = "markers") %>%
add_lines(x = ~catchment_sum_area,
y = fitted(linearRegression)) %>%
layout(showlegend = F,
xaxis = list(title="Built Area within the Store's Catchment Area (in sq meters)"),
yaxis = list(title="Tahini Price (in NIS)"))
#showing the scatter plot
scatterPlotAs expected, no significant regression equation was found (F(1,66) = .05, p = .83), with R2 = 0.
Conclusion
No evidence of association between the size of built area within catchment areas and Tahini prices in different stores in central Tel Aviv was found in this analysis. This isn’t to say that this association does not exist: since the store data set contains only retailers that are being regulated by the government (mainly big retailers), this analysis actually ignores little grocery stores that are probably affecting the real size of each catchment area. In addition, this analysis considers only one product within small geographic area - hence there is room for a wider investigation of bigger regions and wider range of products.
Another potential analysis might use the total built area within catchment areas by chain and local authority - with an updated hypothesis of correlation between the mean price of a product by retailer and local authority, and the total built area within the catchment areas of this retailer in each local authority.
Appendix: How to Make Middle Eastern Tahini?
Use this quick recipe to make you own delicious homemade Tahini. It’s super easy, and super healthy too. Enjoy!