Introduction

According to the Australian Bureau of Statistics, around 15% of household spending is destined for transportation (ABS, 2017). Considering that 57% of people use cars as their primary means of transport (ABS, 2016), a high proportion of the household budget is spent on petrol. This makes people extremely sensitive to changes in petrol prices, which can vary mostly due to international variables. However, according to our previous research, petrol stations have significative differences in prices across Greater Sydney, where competition seems to be one of the factors that influence these local variations.

Nevertheless, how competition affects petrol prices is not constant across different petrol stations clusters. Therefore, it could be affected by market characteristics and even possible collusion between retailers. According to Byrne & de Roos (2015), this last factor was present in Perth petrol market, but there was no evidence to call it formal collusion, just tacit.

The purpose of this research is to understand how competition affects retail petrol prices in Greater Sydney using spatial analysis, so we can capture how it changes across different locations. The outcome of this analysis may be helpful for organizations, like the ACCC, whose aim is to promote competition and fair trading. Therefore, these results could help them to focus their investigations on specific risk areas.

Background

In our previous analysis, we aimed to evaluate a series of factors that affect petrol prices for three years of data. We used variables that capture variability across time (seasonality and trend), environmental status, stations attributes and market characteristics. Table 1 shows the variables that we used.

Table 1: Features that we used in our previous research
Time Environmental Station_Characteristics Market_Characteristics
International petrol price index Air Quality Index Brand Size Competition
Local price cycle Bush fire Intensity
Type of day (holiday)

There were several limitations in that research that could have influenced the results. Firstly, we summarised the data of same-brand stations in each suburb; therefore, local variation was underestimated. Additionally, we utilized linear regression, which assumes that the observations are not autocorrelated through space and time, which is counterintuitive because consecutive days and close stations have similar prices. On the other hand, the competition variable did only indicated if a rival brand was present in the suburb, not considering other closest stations in different suburbs or the increase in competition when more than one brand was present in the area.

Considering those reasons, it was not possible to use our previous analysis to understand the effect of competition on petrol prices. Therefore, if the assumption of spatial autocorrelation is correct, spatial analysis and the geographically weighted regression (GWR) appears as a good alternative.

Data Understanding

To start the spatial analysis is necessary to have data without temporal autocorrelation. For this reason, I changed the data and variables that we used in our first research. In regards to the data, only the day that represented better the population of petrol stations was selected to remove temporal autocorrelation. In this case, November 29th of 2019 had the prices for 87% of Greater Sydney petrol stations. Moreover, the fuel E10 was selected because it was the one with more observations.

The next map shows the petrol stations in Greater Sydney with at least one observation between November and December of 2019, where the yellow dots correspond to the stations in our sample (November 29th of 2019). It can be noticed that the distribution of the sample is similar to the population of petrol stations.

The features that I used for the analysis could be divided into two groups, petrol stations characteristics and market characteristics. It is worth mentioning that the number of competitors is the only variable that is capturing competition. Table 2 shows a brief description of each variable.

Table 2: Features used for spatial analysis
Category Feature Definition
Station Characteristics Brand size Indicates if the brand belongs to a big company. 1 for Metro, CostCo, United, 7 Eleven, BP, Caltex, Shell and Coles Express. 0 for others
Market Characteristics Number of competitors Indicates the number of competitors (other brands) within a radius of 1 mile. According to Lee (2009), competitors within this distance affects petrol prices
IER decile The Index of Economic Resources summarises variables related to income and wealth, excluding education and occupation. A high score (in deciles) indicates relatively greater access to economic resources in general. In this case, the Index is presented per Statistical Area Level 2
Cars per dwelling Number of cars divided by the number of dwellings in each Statistical Area Level 2
Average commute distance Average commute distance of occupied people per Statistical Area Level 2
Own car as main trasportation Percentage of people over 15 years old who used private vehicles as their primary transportation in each Statistical Area Level 2
Population density Population density in people per squared kilometre. The variable is presented per Statistical Area Level 2

Spatial Autocorrelation

Spatial autocorrelation measures how close observations (or neighbours) influence a variable, in this case, E10 price. When spatial autocorrelation is present, the residuals from a linear regression model tend to have a spatial dependency. Figure 1 shows the distribution of the residuals from linear regression (with E10 price as the independent variable) in Greater Sydney, where we can see that similar residual values are grouped. In this case, I used Kriging interpolation to fill the gaps between petrol stations.

Distribution of the residuals from a linear regression model

Figure 1: Distribution of the residuals from a linear regression model

Additionally, a statistical test was used to prove spatial autocorrelation. The statistic that I used is called Moran’s I, which evaluates whether the pattern expressed is clustered, dispersed, or random (Esri, 2020). The hypothesis test related to this statistic corresponds to a random spatial distribution as the null hypothesis and presence of spatial clusters as the alternative hypothesis. Table 3 shows a p-value close to zero; therefore, the null hypothesis is rejected, and we can assume that there is global spatial autocorrelation at the 0.05 significance level.

Table 3: Moran’s I statistic results for global spatial autocorrelation
Statistics Value
Morans.I 0.761
Expected.I -0.001
z.resampling 15.672
z.randomization 15.709
p.value.resampling 0.000
p.value.randomization 0.000

Moran’s I statistic can also be calculated locally, so we can see where these clusters formed. Figure 2 shows how each observation is related to its neighbours; therefore, the diagonal line corresponds to a positive spatial autocorrelation. However, there are points separated from the diagonal line, which means that the relationship between neighbours is different. Four types of relationships are shown in each quadrant of the Figure, where the most important for this research corresponds to “high-high”. This relationship means that where the petrol station has a price over the average, their neighbours also have it.

Moran's I Scatter Plot

Figure 2: Moran’s I Scatter Plot

Nevertheless, there is no spatial autocorrelation in all locations; therefore, not all relationship are significative. Figure 3 shows the areas where Moran’s I statistic enable us to reject the null hypothesis, which means that we can assume that there is spatial autocorrelation (in intense red).

p-values calculated using local Moran's statistic

Figure 3: p-values calculated using local Moran’s statistic

According to Lundberg (2017) the Moran’s I test for spatial correlation could be used to detect potential price coordination, which in this case means clusters of petrol stations with high prices, not being affected by the nature of competition. Table 4 shows all the clusters with significative spatial autocorrelation and their type of relationship. For this task, only the clusters with relationship “high-high” are considered as possible collusion due to “low-low” is a signal of high competition. It is worth noticing that some of the clusters have only stations of the same company; therefore, it can not be considered as possible collusion.

Table 4: Clusters with significance spatial autocorrelation by type of relationship
cluster_id relationship station_ids brands prices
1 high-high 91, 131, 333 caltex, 7-eleven, caltex woolworths 159.9, 159.9, 159.9
3 high-high 276, 350 coles express, caltex 153.9, 156.4
4 high-high 287, 1474 bp, caltex 157.2, 157.2
5 high-high 452, 471, 476, 834, 839 7-eleven, caltex, united, coles express, coles express 153.9, 153.9, 153.9, 153.9, 153.9
7 high-high 540, 964 caltex, caltex 153.9, 153.9
9 high-high 660, 1038 metro fuel, bp 154.9, 154.9
10 high-high 691, 715 bp, caltex woolworths 156.9, 153.9
12 high-high 878, 1487 caltex woolworths, coles express 163.9, 163.9
14 high-high 908, 2066 7-eleven, caltex 153.9, 153.9
15 high-high 1104, 1891 bp, coles express 163.9, 163.9
17 high-high 1336, 1338, 1341 7-eleven, 7-eleven, bp 153.9, 155.4, 156.2
18 low-low 2, 415 metro fuel, independent 136.9, 142.8
19 low-low 21, 891, 1399 metro fuel, caltex woolworths, bp 133.2, 134.9, 134.9
21 low-low 65, 257, 638, 1032 caltex, bp, 7-eleven, coles express 137.9, 134.4, 132.7, 141.9
22 low-low 99, 853 caltex woolworths, coles express 132.7, 134.9
23 low-low 381, 924, 1157 westside, 7-eleven, metro fuel 134.9, 134.9, 135.7
24 low-low 414, 564 united, metro fuel 135.9, 131.9
26 low-low 576, 2086 metro fuel, shell 128.9, 129.9
28 low-low 909, 1008 7-eleven, caltex 139.9, 139.9
29 low-low 1140, 1142 metro fuel, metro fuel 139.4, 139.9
30 low-low 1231, 1807 budget, speedway 137.9, 137.5

Modelling

The next step consists of understanding how each of the features exposed in Table 2 affect E10 price in the clusters. We discovered that spatial autocorrelation was present in the data, which means that some E10 prices are highly correlated with its neighbours, but not related to prices of distant stations. This means that features have a different effect on price depending on the location; therefore, a linear regression model would not be able to capture the local variations.

On the other hand, the geographically weighted regression (GWR) can capture spatial variation in the coefficients. This exploratory technique work in a similar way as linear regression; however, it assigns different weights to each observation, which values are given based on distance. In other words, data points more close to the regression point are more important in the regression than far points (Fotheringham & Rogerson, 2009). Therefore, the weights and the resulting coefficients change every time the regression points change.

The weighted function could be categorized into two types; fixed and adaptative. In this case, I used an adaptative weight function; therefore, the weights depend on the density of data points in the area and not only on a fixed distance. In this dataset that means that in more dense regions (regarding to petrol stations) the distance to high weighted observations will be lower than in areas with few data points.

Results

A summary with the coefficient of the geographically weighted regression with adaptative weight function is shown in Table 5. In this Table we can see some measures of the distribution of the coefficients, which vary across space. Due to the objective is to find information that can expose signs of collusion, special attention is given to the variable “Number of competitors”.

According to Table 5 the coefficient of “Number of competitors” range from -1.71 to 0.86, which means that in some places more competitors decrease petrol prices and in others, they increase it.

Table 5: Summary of the coefficients resulting from a geographically weighted regression
Feature Min 1st_Quantile Median 3rd_Quantile Max
Intercept 90.3505151 144.5344364 152.2079755 160.0836560 347.2609983
Brand size -17.3993238 -3.7450544 -2.1226120 -1.0312379 2.1690844
Number of competitors -1.7149135 -0.3824475 -0.1897301 0.0241860 0.8638648
IER decile -9.5726281 -0.1971946 0.1672479 0.5662864 4.0334078
Cars per dwelling -39.6784506 -5.5140819 -2.1419560 0.8293799 132.7510396
Average commute distance -4.5429968 -0.2672923 0.1447342 0.4252069 3.8149118
Own car as main transportation -351.6745029 -11.2901661 -2.1615927 3.3075972 33.5047926
Population density -0.0246855 -0.0005849 0.0000532 0.0005493 0.0038535

In Table 4 we saw the clusters that are more suspicious in case of collusion; therefore, I only did an analysis of coefficients of those clusters. Table 6 shows the coefficient of “Number of competitors” for each cluster_id, where no significant difference between “high-high” and “low-low” relationship was found.

Table 6: Number of competitors coefficient for each cluster
cluster_id relationship coeff_number_of_competitors
1 high-high -0.2937029
3 high-high -0.2119655
4 high-high -0.5209038
5 high-high -0.2897457
7 high-high -0.5787346
9 high-high -0.2316382
10 high-high -0.4326928
12 high-high -0.0368127
14 high-high -0.4687507
15 high-high 0.8628327
17 high-high -0.3175554
18 low-low 0.4027787
19 low-low -0.6659126
21 low-low -1.3797851
22 low-low -1.5399562
23 low-low -0.3561540
24 low-low -0.2069171
26 low-low 0.2589430
28 low-low -1.7064208
29 low-low -0.0708851
30 low-low -0.1787549


In regards to how the model explains E10 prices, the Quasi-global R2 (which indicates how much variation of the prices is capture by the model) corresponds to 0.751, which is far superior compared to the R2 obtained from a linear regression (0.21). However, due to spatial dependency, the R2 from a GWR is not constant through space. For this reason, the local R2 is calculated, so we can see it in each location (see Figure 4). It is worth noticing that the areas with significance spatial autocorrelation are also the areas with less local R2.

Local  R2  heat    map for GWR model

Figure 4: Local R2 heat map for GWR model

Interpreting results

The focus of this research is to find clusters where collusion could be present; therefore, the first step was to find areas where E10 prices were spatial dependent. Using the Moran’s I statistic twenty-one clusters were found, which correspond to two main categories, “high-high” and “low-low”. While “low-low” category could indicate lower prices due to competition, “high-high” could reflect price coordination used to achieve higher prices.

The second step consisted in understanding how the number of close competitors affects petrol prices in the clusters previously described, which is capture by the coefficient of the variable “Number of competitors”. The coefficient of the feature was divided into three categories, where the presence of one of the last two could reflect signs of collusion:

  • Coefficient < 0: The market has not reached an equilibrium; therefore, when a new competitor enters, the prices decrease.
  • Coefficient > 0: The market has reached an equilibrium; however when a new competitor enters, prices increases. This could be a signal of price coordination.
  • Coefficient ≈ 0: The market has reached an equilibrium, which could be natural or artificial. This means that when a new competitor enters, the prices stay the same. The equilibrium is artificial when it delivers greater profits than those available in a static Nash equilibrium (Byrne & de Roos, 2015).

The following map shows the clusters divided by categories and a heat-map with the value of the coefficient of “Number of competitors”. Special attention has to be given to “high-high” clusters (in purple) in zones where the coefficient is close or more than zero (yellow and red zones) because they are the clusters more suspicious of having unfair trading practices. Table 7 shows the top five clusters associated with possible collusion.

Table 7: Top five clusters associated with possible collusion
cluster_id relationship station_ids brands prices coeff_number_of_competitors
15 high-high 1104, 1891 bp, coles express 163.9, 163.9 0.8628327
12 high-high 878, 1487 caltex woolworths, coles express 163.9, 163.9 -0.0368127
3 high-high 276, 350 coles express, caltex 153.9, 156.4 -0.2119655
9 high-high 660, 1038 metro fuel, bp 154.9, 154.9 -0.2316382
5 high-high 452, 471, 476, 834, 839 7-eleven, caltex, united, coles express, coles express 153.9, 153.9, 153.9, 153.9, 153.9 -0.2897457

Conclusion

This research presented how to use the Moran’s I statistic in conjunction with the geographically weighted regression to obtain petrol station clusters with possible price coordination. However, this methodology must not be used as the only tool to detect collusion, I only recommend it as a reference for resources allocation, so fair-trading agencies could know where to start further investigations. Moreover, any possible collusion detected will be difficult to verify because there is no explicit communication and coordination between companies, as stated in Byrne & de Roos (2015) research. This type of coordination, called tacit collusion, is easier to develop in situations where competitors’ prices are publicly available, which is the case here with platforms like Fuel Watch.

On the other hand, due to the model simplicity, the results could be out of touch with the market reality. There are three main limitations in the methodology used, where the main one corresponds to the lack of temporal analysis. In this case, we only selected one day worth of data, which is not enough to reflect price coordination between petrol stations due to collision must be consistent over time. Moreover, according to Wheeler & Tiefelsdorf (2005) multicollinearity in GWR is higher than in a global regression model, which means that the interpretation of the variable coefficients could lead to misleading conclusions. Lastly, we used a distance of 1 mile to define competitors; however, that distances could vary through space, being less in dense areas and higher in remote places.

Nevertheless, this methodology could be improved using a spatio-temporal analysis, which can capture consistent price coordination in particular areas. Moreover, features could be refined creating an index of competition for each petrol stations, considering their location and particularities. Finally, multicollinearity diagnostics tools must be used to detect and understand its effect on the GWR parameters.

GitHub repository with the code: https://github.com/felipemonroy/Spatial-Analysis--Petrol-Price

References

ABS. (2016). Working Population Profile. https://datapacks.censusdata.abs.gov.au/datapacks/.

ABS. (2017). Main Features - Average Household Spending. https://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/6530.0Main%20Features32015-16; c=AU; o=Commonwealth of Australia; ou=Australian Bureau of Statistics.

Byrne, D., & de Roos, N. (2015). Learning to coordinate: A study in retail gasoline. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2570637

Esri. (2020). How Spatial Autocorrelation (Global Moran’s I) worksArcGIS Pro | Documentation. https://pro.arcgis.com/en/pro-app/tool-reference%20/spatial-statistics/h-how-spatial-autocorrelation-moran-s-i-spatial-st.htm.

Fotheringham, A., & Rogerson, P. (2009). The sage handbook of spatial analysis.

Lee, S.-Y. (2009). Spatial competition in the retail gasoline market: An equilibrium approach using SAR models.

Lundberg, J. (2017). On cartel detection and Moran’s I. Letters in Spatial and Resource Sciences, 10(1), 129–139. https://doi.org/10.1007/s12076-016-0176-4

Wheeler, D., & Tiefelsdorf, M. (2005). Multicollinearity and correlation among local regression coefficients in geographically weighted regression. Journal of Geographical Systems, 7(2), 161–187. https://doi.org/10.1007/s10109-005-0155-6