1 Introduction

The growth in online consumption and the shift in consumer culture towards convenience and value have impacted the health of retail centres in complex ways, potentially leading to long-term structural changes. The rise of online sales was particularly noticeable during the COVID-19 pandemic, as well as the shift towards value for money during the current ‘cost of living crisis’. Adapting to these trends has often involved the rationalisation of store portfolios and the adoption of new business models. Traditional town centre ‘brick-and-mortar’ retailers have primarily focused on supply-side factors such as competition, retail mix, and vacant spaces. However, there has been less emphasis on demand-side factors, such as catchment demographics, socio-economic characteristics, and their engagement with online shopping.
Understanding the geography of consumer behaviour at a small area level is crucial for understanding the vitality and viability of both retail centres and the retailers themselves. Such consumer insights form the basis for site location analysis, in conjunction with an analysis of key competitors.

1.1 Learning Aims and Objectives

The Internet has revolutionised the way in which people consume products and services, however a variety of factors influence these use and engagement behaviours. Understanding the geography of these influences is complex, although, one way in which this has been made tractable is through various geodemographic classifications. However, these only tell you about the characteristics of the places in which people live and their environments, but not where they shop?

The objectives of this practical are:
a) to examine the geography of retail supply and demand related factors;
b) to evaluate the extent to which retail centres are exposed to consumers with different income and online consumption behaviour;
c) to choose the most suitable location for a new discount supermarket in Liverpool.

In this case study you’ll take on the role of a property agent who is advising retail clients about those locations most suitable for their new premises.

In addition to reinforcing learning from earlier practicals, you will also be developing the following GIS skills and understanding:

  1. Use of a Geodemographic Classification and subdomains of the IMD
  2. How to calculate buffers and their uses
  3. Recoding variables in QGIS
  4. Creating spatial joins
  5. Basic understanding of retail catchment modelling techniques
  6. Exploring suitable locations for a retail store based on various selection criteria (socio-economic, consumer behaviour and competition)

1.2 Assignment

Taking on the role of a location analyst, in no more than 1,250 words, write a short summary report that identifies the most suitable location in Liverpool (preferably, but not exclusively an existing shopping area such as town centre/high street or retail park) for a new corporate ‘neighbourhood hub’ store (e.g. a large convenience store or discount supermarket) that will have a click & collect facility, using the results of your GIS analysis as evidence. In your analysis you should consider: catchment characteristics (total population, affluence, geodemographics), Internet Use Classification (IUC) and competition between existing stores. Use data from the practicals on ‘Consumption spaces’ and ’People and Places, establish clear selection criteria, and use various GIS tools to conduct a location analysis. Your outputs should include at least: four maps, two tables and a flowchart showing the steps you have taken in your analysis.

1.3 Data

  • In your “M:” drive, create a new folder for this practical - this is your “working directory”
  • Go to Canvas and download the Practical 4 data files
  • Unzip the files into your working directory
  • For your assignment optional data go to https://data.cdrc.ac.uk/geodata-packs and download the Index of Multiple Deprivation IMD 2019: Liverpool and Internet User Classification 2018: Liverpool data packs. From Canvas there are also available the latest 2023 dataset with GB Foodstore locations and Z-scores for the 2018 IUC.

2 Geodemographics and consumer characteristics (Week 7)

2.1 Internet User Classification map of Liverpool

In the first part of this practical we will explore the Internet User Classification (IUC) data for Liverpool. The IUC is created from over seventy measures selected from survey and lifestyle data, alongside census and infrastructure performance statistics.

  • Locate the IUC data within your working directory: Practical 4 > IUC 2014 Liverpool > E08000012 > E08000012_Liverpool.shp). Load in the E08000012_Liverpool.shp onto the map interface by dragging and dropping the shapefile. Alternatively, click on the Add Vector Layer button and navigate to your working directory.

In order to display the Internet Use Classification, we first need to familiarise ourselves with the IUC User Guide (attached in the CDRC IUC Geodata Pack) and then the structure of the associated database, so we can decide what field to use best in order to display our data. Follow the steps below:

  • Right click the layer name and select Open Attribute Table
  • Match the names of columns against the IUC_User_Guide (it should be in your CDRC IUC Geodata Pack) and try to make sense of them
  • Decide which column would be the most sensible to display for an IUC geodemographic map of Liverpool.

In fact, we are going to create two maps: one showing aggregated classifications based on the IUC supergroup, and the other showing disaggregated classifications based on the IUC group.

  • Right click the layer and select Properties
  • Go to the Symbology tab and select Categorized from the dropdown menu
  • Then from the Value dropdown menu select supgrp_nm and press the Classify button
  • Press OK

This will create your first IUC map for Liverpool, well done! However, those default colours do not look great, do they?. In this instance, we are going to use ColorBrewer ColorBrewer pallets to display the different IUC groups and then we’ll remove the black LSOA boundaries.

  • Go to Color ramp in the Symbology tab and select from the dropdown menu Create New Color Ramp
  • Select Catalog:ColorBrewer from the drop down menu, set number of Colours to 4 and use Accent as your Scheme name
  • Press OK twice
  • Go back to Symbology in the Properties window and click on the Symbol box, select Simple fill and set the Stroke Style to No line
  • Lastly, change the layer’s name to IUC supergroup

Your map disaggregated by IUC supergroup should look similar to the one shown below:
IUC geodemographic map

  • Now, create a copy of the IUC supergroup layer (right click on IUC supergroup > Duplicate Layer) and create a map of IUC in Liverpool using the IUC group variable (grp_nm) rather that the supergroup.

(Please note that you may need more colour schemes than those available by default in ColorBrewer. One option is to design your own symbol display, such as using line pattern fill or point pattern fill for your symbology. Importantly, the colour schemes used in both maps should correspond.Can you think of why?)

  • Name the layer: IUC group
  • In Symbology, double click the symbol of each individual classified group name and explore different Fill style options (within Simple fill) such as line pattern fill or point pattern fill
  • When you display them on a map make sure that the color scheme makes sense. For instance if super group 1 is displayed in read, all corresponding nested groups (1a,1b and 1c) could also be displayed as different shades of red)
  • List the groups in the right order (1a, 1b, 1c, 2a…) and name them including their respective group code
  • Here is an example of a map displaying the IUC group. Can you comment on it? Does the colour scheme match that of the IUC supergroup map? IUC group map *Note: The classification comprises of supergroups (e.g. 1,2,3 etc. and nested groups e.g. 1a, 1b, 1c etc. SO when you display them on a map make sure that the color scheme makes sense. For instance if super group 1 is displayed in read, all corresponding nested groups (1a,1b and 1c)should be also displayed as different shades of red)

Comment on the spatial patterns of internet use in Liverpool

2.2 Online shopping prevalence (theoretical background)

Nationally, rates of online shopping equated to 53% when the research was done, however there were differences between the IUC Groups as some customers are more likely to shop online than others. For example, groups 4c (low density but high connectivity), 4b (constrained by infrastructure), 4a (e-fringe) and to an extent 2a (next generation users) are most likely to engage in online shopping; whereas: 3a (uncommitted and casual users), 1b (e-marginals: not a necessity) and 3b (young and mobile) have lower than average propensities as shown on the plot below. IUC ggplot

So, in this part of the practical, we will examine the prevalence of online shopping and display the areas in Liverpool with the highest prevalence. In order to do so, we’ll need to do some variable recoding; in other words create a new variable where 1 is associated with higher prevalence of online shopping and 0 with lower. Do the following tasks (Video 4.1):

  • Refer to the pen portraits of different IUC groups, available from the IUC User Guide

  • Open the Attribute Table of the IUC_group layer

  • Click on Open field calculator and make sure that the Create a new field box is ticked

  • In the Output field name type: Online_Shop

  • Choose from Functions window the Conditionals and then the CASE conditional (Check when and how we use CASE conditionals - there is some useful info provided in the window on the right hand side…)

  • Create a statement in the following format:
    “WHEN condition = ‘x’ THEN the assigned value is ‘1’, otherwise (ELSE) the assigned value is ‘0’”. In our case we assign the value of 1 to groups 4a, 4b and 4c (high online shopping prevalence as shown in the graph above), and all other IUC groups (lower prevalence) will have the value of 0 assigned. Give it a go and then check your conditional statement against the picture below:
    Recoding conditional

  • Click Ok to create a new variable

  • Check the Attribute table - a new column showing our binary variable (0 or 1) should have been added

Now, we will display the new variable and then create a map that can be printed or published; more specifically a map showing the LSOAs with the highest propensity for online shopping. Follow the steps below:

  • Right click the IUC_group layer and click on Duplicate
  • A copy of the layer has been created; name it Online shopping prevalence
  • Then following those steps used earlier to create the IUC maps classify the newly created binary variable, so both ‘low prevalence’ and ‘high prevalence’ areas are clearly visible
  • Save your QGIS map

For your map to be publishable, you need to add a legend, scale bar, north arrow, and possibly a title. By utilising the skills from the previous practicals, you should be able to complete it on your own. Nevertheless, the key steps are listed below:

  • Open New Print layout from the Project menu and name it: Online shopping prevalence, click OK
  • Adjust the scale to 1:120000 or similar in the Item properties tab
  • Add Legend, Scale Bar and North arrow to your map
  • Export your map using Export as Image button from the Layout menu; hopefully it looks something like the one below. (You can add labels using the RetailCentres_Liverpool layer from section 3.1 and Liverpool boundary from Practical 2)
    IU_prevalence

However, creating only a binary variable masks some variance present in our data, so in this step, we will create an ordinal variable for which the values are ordered (the online shopping prevalence is captured from high to low). This approach allows us to capture more variance in online shopping behaviour, and if desired, we could use this variable to run a regression model.

  • Open the Attribute Table of the IUC_group layer
  • Click on Open field calculator and make sure that the Create a new field box is ticked
  • Name the Output field: IUC_ordinal (there is a 10 character limit with column names so it is okay if the name is cut off)
  • Choose from the Functions window Conditionals and then CASE
  • Assign the highest value (e.g.’4’ - high online shopping prevalence) to the following IUC groups: ‘4a, 4b and 4c’
  • Assign ‘3’ (above the average prevalence) to the groups ‘2a and 1c’
  • Assign ‘2’ (below the average prevalence) to groups ‘1b and 3a’
  • Assign ‘1’ (low prevalence) to the remaining IUC groups
  • You may find it useful to create another (text) column to record the names (high prevalence, above the average etc.) rather than just relying on the numeric values
  • Name the layer Online Shopping Ordinal
  • Comment on the spatial variation; how does it differ from the binary variable previously created, is there a likelihood of a spatial autocorrelation?

2.3 IMD income domain map

  • Open the IMD folder in your working directory and load the shapefile for Liverpool(E08000012)

  • Name the layer IMD2015

  • Go to Properties open the Symbology tab and select Graduated from the drop down menu.

  • In the Value field choose income, type 5 in the Classes field and choose Equal Count as your Mode

  • Press Classify and then OK buttons; this will close the dialog and show you the map.

  • Again, render the image by using ColorBrewer and removing LSOAs boundaries

  • Can you specify the threshold defining the lowest quintile of the IMD Income domain for Liverpool?

  • To find this out, go to the Histogram tab within Symbology. Click Load Values and check the thresholds and distribution of the Income domain

By now, you should have created a map of Income deprivation domain quintiles for Liverpool which looks something like the one below: IMD2015_quintiles

  • What do terms ‘quintile’ and ‘quantile breaks’ mean?

  • What is the mean value of the Income domain, and what are the standard deviation values?

  • There are other classification methods (Modes) available in QGIS; try them and see what difference do they make to the outcome map

  • Which method may be the most appropriate for this case

Your next task involves displaying the LSOAs within the lowest IMD quintile and then checking visibly whether there is an overlap between the IMD income domain and propensity for online shopping. Think about how you might proceed with this analysis… Give it a go; however, if you are stuck, follow the steps below:

  • Go to Attribute table of the IMD2015 layer and click on Select features using an expression
  • Choose Fields and Values from the functions tab
  • Specify all polygons that are within the lowest quintile (use a simple expression such as “income” <= 0.11)
  • Click Select Features and then Close buttons
    IMD2015_Expression

- How many polygons have you got selected?

Now, check whether there is any significant overlap between the LSOAs with the highest prevalence of online shopping and the least deprived IMD income quintile areas in Liverpool?

  • Open Attribute Table of the Online shopping prevalence layer and then select all the LSOAs with the value of 1. You can do this by creating another simple expression e.g. “Online_Shop” = 1
  • Run a Select by Location from the Vector > Research Tools menu with the following setup:
    1. In the Select features from - choose the Online shopping prevalence layer
    2. Choose from the dropdown menu Contains as your Spatial Query method
    3. In the By comparing to the features from choose your IMD2015 polygon and again ensure that the selected geometries are marked
    4. Choose Selecting within current selection from the Modify current selection by drop down menu
  • Click Run and then the Close buttons. Check how many LSOAs have been selected this time?

If we wanted to quantify the relationship between the Online shopping prevalence and the lowest quintile of the IMD2015 income variable we could divide the number of overlapping LSOAs by the number of LSOAs with highest online shopping prevalence, which in our case is 39/60 = 0.65.

So, we could conclude that in Liverpool there is a relatively strong correspondence between the two variables as 65% of the LSOAs with highest prevalence of online shopping overlap with the lowest quintile of IMD income domain (highest income).

2.3.1 Basic R - statistical significance and relationships between variables (optional)

A more robust (statistical) analysis could also be done to explore the relationship between all IMD domains (https://www.gov.uk/government/publications/english-indices-of-deprivation-2015-technical-report) and online shopping propensity variables. First we could create and then export a csv table with different variables. second, we could perform a correlation test or even run a regression model to test their statistical significance. This can be done in any statistical package but we will use the legendary “R”.
- Go to Vector > Geoprocessing Tools > Intersection
- Choose the the layers of interest such as Online Shopping Ordinal and IMD2015 (also using the Population layer from Practical 2 would be useful) as your inputs and save the output to your working directory - Name the output Intersect - Once Intersect appears in your Layers Panel on QGIS export the Intersect layer to your working directory as a.csv file (Right click > Export > Save Features as..> Comma Separated Value (CSV) as your format). Name it Intersect.csv

Note: If you’re having trouble with the previous step or your results differ from those used in here, you can use the pre-made .CSV file ‘Intersection.csv’ table for the next steps.

  • Open RStudio and follow the steps below:
# Set your working directory
setwd("M:/Liverpool/Teaching/Teaching 2023-24/ENVS609/Practicals/Practical 4")

## Tip: this has to be the path to your working directory, so a simple copy and paste of the above path won't work for you as this is my working directory## 

variables <- read.csv ('Intersection.csv')

# Check the structure of your dataset (this shows all variables, their respective class and examples in your dataset)
str(variables)

Now you can run a simple correlation test between various variables. R can perform correlation with the cor() function. Built-in to the base distribution of the program are three routines; for Pearson, Kendal and Spearman Rank correlations. The default method is “Pearson” so you may omit this if that is what you want. First, let’s check whether there is a correlation between the ‘income’ and ‘Online_Shop’ variables’

note that R is case sensitive so always check whether a variable name uses lower or upper-case letters as otherwise you’ll get an error.

cor(variables$income, variables$Online_Shop)
# you can plot the relationship between the two variables by using the plot() function
plot(variables$income, variables$Online_Shop)

So there is negative relationship between the two variables (Pearson coefficient = -0.53) but getting a correlation coefficient is generally only half the story. Most certainly, you would like to know if the relationship is significant. The cor() function in R can be extended to provide the significance testing required. The function is cor.test(). We are going to test the IUC_ordinal variable this time.

cor.test(variables$imd_score, variables$IUC_ordina)
plot(variables$imd_score, variables$IUC_ordina)

The above test indicates that the correlation is very significant as the p value is 2.2 to the power of -16, which is much smaller that the required 0.05 to reject the null hypothesis that there is no correlation between these two variables). If you want to check the correlation coefficients for a number of variables at once, create a correlation matrix by running the cor() function for the entire dataset: cor(my.variables, method = “pearson”). However, you have to ensure that your data consists of numerical variables only. There are various ways you can remove the unwanted (non-nuerical) variables (these are the initial 6 variables: lsoa_cd, lsoa_nm, supgrp_cd, supgrp_nm,grp_cd, grp_nm). First, we create a subset of the variables we need (my.variables <- variables 7 to 20). By doing this, we’re excluding the above 6 non-numerical variables.

my.variables <- variables[7:20]
str(my.variables)

Then remove the Factor/chr (non-numeric) variable (LSOA11CD) - in the dataset used to write this practical, this is the variable 3, but this may differ in your dataset (hopefully not). Run the code below to remove variable 3.

my.variables [,3] <- NULL
str(my.variables)

Now all your variables should be numeric/integers, so this means that you can create a correlation matrix.

c
plot(my.variables, method = "pearson")

At times, you may need a scatter plot you have created it. There is an easy way of saving it in R.

# Save the plot
png('correlation.png')
plot(my.variables, method = "pearson")
dev.off()

Check your scatterplot against the one below:
correlation

Finally, it is possible to check the statistical significance different variables (explanatory variables) have on the prevalence level of online sales (our dependent variable) by running a regression model. First, we’ll try a simple regression model with only one explanatory variable and then a multiple regression model with 3 different variables. This is done in R very easily by implementing the lm() function:

# Run a simple regression model
IUC.lm = lm(variables$IUC_ordina ~ variables$imd_score)
summary(IUC.lm)

# Run a multiple regression model
IUC.lm1 = lm(variables$IUC_ordina ~ variables$income + variables$education + variables$living_env)
summary(IUC.lm1)

The summary command is particularly useful in this case as it produces the coefficients so we can see which factors are statistically significant. We can also see the overall R-squared value for our model (the ‘Goodness of fit’ measure of our regression model). For instance, our simple regression model explains approx. 27% of the variation in the dependent variable (IUC_ordinal).

- How much variation in online shopping prevalence is explained by the multiple regression model?

3 Retail catchments (Week 8)

A retail catchment can be defined as the areal extent from which the main patrons of a store or retail centre will typically be found. There are numerous ways in which catchments can be delineated, depending on the requirements for a particular study, available data, software used or the analytical capability of a practitioner. The simplest technique might be to draw buffer rings around a store or retail centre; however, such a technique is naive as it doesn’t consider geographical barriers or competition. More advanced methods referred to as ‘Gravity’ and ‘Spatial Interaction Models’ delineate catchment areas by considering the spatial distribution of competing locations and evaluating their relative attractiveness to different groups of the population.

In the remainder of this practical session, we will be creating retail catchment areas using both basic and more advanced methods. We’ll start by creating buffer rings around retail centres in Liverpool, followed by drive distance polygons. Lastly, we will examine catchment areas based on a gravity model (Huff model in our case). The latter method will incorporate estimating retail centre attractiveness to depict the possible impact of retail hierarchy on the catchment extents.In this case we will use predefined index values available from the data pack ‘Liv_Attractiveness.csv’.

3.1 Buffer rings (primary and secondary catchments)

  • First, load retail centres point data for the entire country GB_RetailCentres (located in the retail folder)
  • From the Vector menu choose Geoprocessing tools and then Clip
  • Fill in the window so that you have the GB_RetailCentres layer as your input, IMD2015 (or any other layer with Liverpool boundaries) as your Clip (Overlay) layer and save the new shapefile to your output folder. Name it RetailCentres_Liverpool
  • Click OK; the new layer should be automatically added to the Table of Contents (Layers Panel)
  • Remove the GB_RetailCentres layer
    Clip_Retail Centres

Next, we’ll define primary catchments, which are usually areas within a short distance from the retail centre and account for over 50% of patronage, and secondary catchments, which generally cover larger distances and represent patronage levels between 20% and 50%. We’ll use a straightforward approach — buffer rings — to create these catchments. Follow the steps below:

  • Create the Primary catchments by using a buffer distance of 2000m (go to Vector/Geoprocessing Tools/Buffer)

  • Fill in the dialog box as follows: select RetailCentres_Liverpool as your Input layer and type 2000 in the Distance field

  • Save the Output shapefile to your working directory; name it Buffer2000 (see the picture below)
    Buffers Window

  • Then create the secondary catchments by using Buffer distance of 4000m (repeat the above steps and name the new layer: Buffer4000)

  • Render the image by adjusting colours, transparency etc.

  • Add labels by going to Properties of the RetailCentres_Liverpool layer

    • Choose the Single Labels tab and use ‘NAME’ from the drop down menu
    • Explore the labels tab (e.g. change the font to Arial, size to 10, add buffer and shadow)

Note: if the name is too long you can adjust it in the Toggle editing mode Toggle Edit - Go to the Attribute Table and delete part of the name e.g. Wavertree - You should now have created a map that looks similar to the one below:
Buffers2_4k

Comment on the retail catchments computed by using the buffer rings method:
- Can the primary catchments be distinguished easily from the - secondary ones?
- Is the hierarchy within the retail centres accounted for in any way?
- How do you think these representations could be improved?

3.1.1 Catchment estimation and retail hierarchy

Although the distinction between the primary and secondary catchment areas is reasonably clear, their extents are far from being realistic. One of the major reasons for it is that the so-called hierarchy of retail centres has not been accounted for. Typically, such hierarchy relate to their size, attractiveness and the geographical extent of retail centres influence, with those centres towards the upper end of a hierarchy typically offering a ‘multi-purpose shopping’ experience, and as such, drawing consumers from a wider area. Conversely, smaller town or district centres will typically serve a different function, and therefore be patronised more prevalently by local communities.Based on the Index of Retail Centres Attractiveness developed by Dolega et al. (2016) [https://www.sciencedirect.com/science/article/pii/S0969698915300412] and the retail hierarchies developed by Macdonald et al. (2022) [https://www.nature.com/articles/s41597-022-01556-3], at least three types of town/retail centres can be distinguished in Liverpool: Regional Centre (Liverpool), District Centres (Allerton Rd, Old Swan, Kirkdale), and Local Centres (the remaining centres). Note: this classification and data do not include Retail and Leisure Parks.

We can add information on retail hierarchy in Liverpool to the attribute table of the relevant shapefile. We will have to create a new variable that will depict the hierarchy. We can do this by using the “CASE” conditional function, which allows us to specify different values for each retail centre type directly within the attribute table. This approach enables efficient recoding, with each centre classified as either “Regional Centre,” “District Centre,” or “Local Centre” according to the hierarchy.

  • Go to the Attribute Table of RetailCentres_Liverpool. Go to Field Calculator. Put Hierarchy as the Output field name

  • Create a statement: CASE WHEN condition THEN result, ELSE result. In our case, we want to assign a value of 1 to Liverpool; value of 2 to Allerton Rd, Old Swan and Kirkdale and value of 3 to the remaining centres. Give it a go, however if you get stuck, try the following statement:
    CASE
    WHEN “NAME” = ‘Liverpool’ THEN ‘1’
    WHEN “NAME” = ‘Allerton Road’ THEN ‘2’
    WHEN “NAME” = ‘Old Swan’ THEN ‘2’
    WHEN “NAME” = ‘Kirkdale’ THEN ‘2’
    ELSE ‘3’
    END

  • Next, go to Vector > Geoprocessing Tools > Buffer

  • In the buffer window select RetailCentres Liverpool layer as your Input vector layer and this time click on Data defined override > Edit (next to the Distance window)

  • Create another CASE conditional statement (similar to the one above although the ‘condition’ will be different this time)

  • Create a buffer distance based on the hierarchy: 5,000 meters for Hierarchy 1; then 2,000 meters for Hierarchy 2; and 1,000 meters for Hierarchy 3

  • Save the new Buffer rings to your working directory as Buffer_Hierarchy

  • The new buffers should be added automatically to the Layers Panel

  • Since you need to create a map of Buffer rings taking into account the hierarchy of retail centres you should make your map very clear. Try the following:

  • Adjust transparency of the newly created layer and try different colours and transparency, so the overlaps are visible

  • Label the centres and adjust the size of your points to reflect the hierarchy - as shown on the map below (Use the CASE, WHEN condition THEN result, ELSE, END) statement in the layer’s Property > Size > Data defined override > Edit field to display different sizes of points e.g. for Hierarchy 1 - point size 5, Hierarchy 2 - point size 3 and Hierarchy 3 - point size 2. Analogically, you could also use the CASE conditional function to adjust the size of labels.
    BuffersHierarchy

3.2 Drive distances

Despite the consideration of retail centre hierarchies, there are still significant limitations to the buffer approach, as it does not account for real-world barriers such as rivers, lakes, and railway tracks etc. A more accurate approach would be to consider road distances and drive time techniques. Both techniques are still popular amongst the major retailers, so we will make a use of it too. This method is relatively easy to execute in some GIS packages such as ArcGIS Pro; however, in QGIS you would need an extension/plugin to do that. In QGIS this could be done by using the pgRouting extension, but it isn’t straightforward and is beyond the scope of this tutorial so instead, we will use ArcGIS Pro. Follow the steps below:

3.2.1 Setting up ArcGIS Pro

First we need to set up ArcGIS and then we’ll do network analysis
- Open ArcGIS by going to Start menu > ArcGIS Pro - Click on the Map icon - a new map/project will open

  • Go to Project > Licensing > Configure you licensing options and then check the box next to Network Analyst
  • Then go back to your ARCGIS Project/Map and in the Catalog window, located on the right, right click on Folders and then Add Folder Connection
  • Browse to your Practical 4 data > Retail folder, click OK

The Retail folder should appear now in your Catalog window.

3.2.2 Creating service areas

The ArcGIS setup is done now, so you’re ready to start your analysis

  • Add the RetailCentres_Liverpool layer by dragging and dropping the .shp file to the Table of Contents
  • Then add the RoadLiv layer to the Table of Contents. (If you use a different version of ArcGIS Pro than 3.3.0, you’ll need to add the RoadLiv_ND.nd layer, which is your Liverpool road network).

Note: If the RetailCentres_Liverpool has been saved as a geopackage, resave the layer in QGIS as a shapefile first, and then add it to your Practical 4 data > Retail folder

  • Click on the Analysis tab and then Network Analysis > Service Area
    Service Area

  • A new layer called Service Area will be added to the Table of Contents

  • Click on the Facilities and then select the Service Area Layer tab located at the top of ArcGIS toolbar

  • Add the retail centre locations by clicking on the Import Facilities icon and choose RetailCentres_Liverpool as your Input Locations, click Apply and then OK

  • Once the retail centres locations have been added to the Facilities, set your Cutoffs to 1 (the distance units are preset to km) and set your Mode as Driving Distance

  • By now, the NetworkAnalyst > Service Area Layer window should look something like the picture below:
    Service Area Layer

  • Play with other settings e.g. use Polygons or Polygons and Lines under the Polygons tab drop down menu

  • Click the Run button

  • This will generate 10 service areas (retail catchments), each delineated for 1km drive distance

  • You can also generate multiple drive times/distances by specifying two or more different distances in the Cutoffs window (e.g. type 1, 2 and hit the Run button again)

  • The output below shows the map of service areas for Liverpool Retail Centres delineated for 1km and 2km
    ServiceAreas

So if you want to export for example the 1km service areas, do the following:

  • Right click on the Polygons layer > Data > Export Features and save it to your Retail or any other folder you save your work to.
  • Name it Liv_DriveDist.shp.
  • Now, you should be able to use these drive polygons in QGIS, just drag and drop the newly created file to your QGIS Practical 4 project

3.2.3 Creating hierarchical drive distances - (optional)

So this is how the drive or distance polygons are generated. At the moment we have generated 10 polygons of 1km and 2km, however if we wanted to account for the retail hierarchy, we would also need to generate a new service area of 5km for Liverpool city centre, three service areas of 2km for the district centres with the hierarchy of 2 (Old Swan, Allerton Rd and Kirkdale) and six service areas of 1km for the remaining centres.

  • Using the above guidelines generate variable distance service areas to account for the hierarchy of Liverpool’s town centres and save them to your working directory. You may need to create separate service areas for each hierarchy and then merge them into one file. Name it Liv_Hierachy_DriveDist
  • Next, overlay the generated service areas (Liv_Herachy_DriveDist) onto the Buffer_Hierarchy layer
  • Create a printable map showing the spatial extent of both types of retail catchments
  • Begin with opening a new “Print composer”, add a north arrow, scale bar and legend.
  • Once ready, export your map as a picture image and answer the questions below:

- What are the limitations of these approaches? - How could these be addressed?

3.3 Huff catchments

Another way of estimating retail catchments (much more complex and arguably more accurate) is to use a gravity model (also referred to as Spatial Interaction Model (SIM)). We will use a probabilistic SIM, more specifically the Huff model. It estimates the probability of each domicile (LSOA in our case) to use a particular retail centre by taking into account road distances and competition between centres as well as their attractiveness and position within retail hierarchy. As such, this information can be utilised to delineate primary, secondary and tertiary catchment areas for each retail centre at the LSOA level of granularity. The results referred to as Huff probabilities are available in the Practical 4 data package, however if you wish to compute them by yourself the code is below.
Note: You won’t need Huff catchments for your assignment

3.3.1 Calculating Huff probabilities (optional)

I recommend that you consider attempting this optional section only if you have strong R language skills and plenty of spare time. Some of the used R packgaes have been was removed from the CRAN repository, so you’d need to obtain them from the archives (https://cran.r-project.org/src/contrib/Archive/rgeos/; https://cran.r-project.org/src/contrib/Archive/rgdal/).

3.3.1.1 Load Huff tools and data

The huff_basic function is used to apply the Huff model which is part of the huff-tools package and for that reason we should import huff-tools into R. The “huff_tools.r” package is available in the retail folder so you should download the ‘Huff’ file and save it in your working directory. Once you are ready, follow the steps below:

  • Open RStudio from All programmes
  • Set your working directory to where the Huff tool and your data are stored. (You can copy and paste the code below into RStudio, however you will have to set the path to your working directory
  • Click Run
# Set your working directory
setwd("M:/Liverpool/Teaching/Teaching 2023-24/ENVS609/Practical 4/Practical 4 data/Practical 4 data/retail")
# Import the huff-tools library
source("huff-tools.r")

The following libraries should be loaded automatically*:

library(rgdal)  
library(rgeos)  
library(igraph)  
library(FNN)  
library(dplyr)

Note: If you get an error saying that packages are unavailable, e.g. Error in import.packages(“rgdal”) : could not find function “import.packages” then you will need to copy the following code and Run:

install.packages("rgdal")
install.packages("rgeos")
install.packages("igraph")
install.packages("FNN")
install.packages("dplyr")

Next, we will use the ‘read.csv’ function to import the comma-delimited files (.csv table) into R. In most cases, the function can be applied by specifying just the name of the csv file as it assumes by default that the csv file has a header, and that the field separator character is ‘,’. These and other parameters can be modified if required (type: ‘?read.csv’ for more details). The function returns a data frame object and can be used as follows:

# load the data from your retail folder
attr_score <- read.csv('Liv_Attractiveness.csv')
distances <- read.csv('Liv_distances.csv')

We have now created two data frame objects. The “attr_score” data frame stores the attractiveness score of the retail centres (including ranking based on the hierarchy) while the “distances” object has the pre-calculated road distances between the centroids of each LSOA and the boundary of the retail centres.

A summary output of the data can be obtained with the ‘str’ and ‘summary’ functions. The former function outputs the name and type of each variable as well as the number of rows, while the latter function produces summaries depending on the type of the variable (e.g. quartiles, minimum and maximum values for continuous variables).

str(attr_score)
summary(attr_score)
str(distances)
summary(distances)

3.3.1.2 Join tables

Next step involves joining the two tables we have created: distances and attr_score. In R there are two commonly used functions that perform data join: merge and inner_join. In this practical we will use the latter. The inner_join function uses a common field to merge the tables. In our case it is the destinations_name field. Use the code below (name the output huff_input):

huff_input <- inner_join(distances, attr_score)
# Note: If the column names differ use the 'merge' function  and the 'by' argument to specify  the names
str(huff_input)

So the output is a long table with 328,430 rows and 6 columns.

3.3.1.3 Assign beta values

The next step is to assign different values to beta parameter based on the attractiveness rank.

This is easily done with an ifelse function. This function returns an object which is filled with elements according to whether or not a condition is met. It is also possible to nest two or more ‘ifelse’ statements to evaluate more complex conditions.

In this practical we’ll use the retail centre hierarchy to develop the beta parameter (distance decay exponent) of the Huff model, assuming that retail centres at the top of the hierarchy should have a lower distance decay parameter (i.e. their attractiveness is reduced with distance but at a slower rate than in the case of the small centres).

For the major retail centres, which serve extensive catchments (Rank 1) use beta = 1.4, for the secondary retail centres, in our case the district centres (Rank 2) beta = 1.6 and for the smallest centres (Rank 3) beta = 1.8.

huff_input$beta <- with(huff_input, ifelse(Rank == 1, 1.4,
                                           ifelse(Rank == 2, 1.6,1.8)))

Check whether the beta values have been allocated as required

# Display the first six values for Rank and beta fields
head(huff_input[,c("Rank", "beta")])
# Display the last six values for Rank and beta fields
tail(huff_input[,c("Rank", "beta")])

It should be noted that the beta values can be altered, depending on the requirements, estimation technique etc. The above used values were found to produce the most appropriate catchment areas for the national level. These, of course, may vary for the city level for a number of reasons: a) the number of competitors is smaller within a city, b) there is a violation of a boundary free modelling, and c) the retail hierarchy within a single city may vary from the national one.

3.3.1.4 Run Huff model

We will use the huff_basic function to estimate Huff patronage probabilities. The huff_basic function requires the input of 6 arguments to run, namely:

  1. A list of unique names for the destination locations.
  2. A list of attractiveness score for the destination locations.
  3. A list of unique names for the origin locations.
  4. A list of pairwise distances between origins and destinations.
  5. A list or scalar for the beta exponent of distance.**
  6. A list or scalar for the alpha exponent of the attractiveness score.

The first four arguments are required, while the last two are optional, and if not provided, a default value will be used (i.e. alpha = 1, beta = 2). If you examine the huff_input data frame that we created earlier, you’ll realise that we have 5 of the required variables (names for destinations and origins, attractiveness score, distance and beta values), so we only need an alpha value. In this practical we’ll use the default value of 1.

3.3.1.5 Calculate Huff probabilities

huff_probs <- huff_basic (huff_input$destinations_name,
                          huff_input$AttrScore,
                          huff_input$origins_name,
                          huff_input$distance,
                          huff_input$beta,
                          alpha = 1
                        )
# display data summary
str(huff_probs)

The output of the huff-basic function is a table in long format (all pairwise probabilities between origins and destinations).

3.3.1.6 Extract the highest Huff probabilities for each LSOA

To extract the highest probabilities for each LSOA we can use the select_by_probs function. The function will essentially group the data by LSOA name, sort each of the group entries by Huff probabilities (from higher to lower) and then extract the top number of entries, where the number is specified as second argument in the function. So to extract the highest entry (number = 1) for each LSOA, we can run the following R snippet:

# Extract probabilities
sele_probs <- select_by_probs(huff_probs, 1)

3.3.1.7 Output the result as shapefile

In order to display the extracted Huff probabilities, we will create a shapefile by merging our sele_probs data frame with a shapefile of LSOAs provided for our study area. First we import the shapefile into R with the readOGR function. The function accepts as first argument the directory where the vector dataset is located and as second argument the name of the dataset - note, there is no need to add a file extension.

#use YOUR working directory for `the readOGR` function
origins <- readOGR("M:/Documents/Liverpool/Practicals2017/retail", "Liv_lsoas")

Then merge the spatial data object with our data frame.

# Merge with spatial object of LSOAs
origins@data <- data.frame(origins@data, sele_probs[match(origins@data$lsoa_cd, sele_probs$origins_name), ])

#delete column 1 as it is a duplicate of the `origin locations`
origins@data[,1] <- NULL

Final step of the modelling is to save the data as a shapefile in your working directory. This can be done with the writeOGR function. First create new folder with ‘dir.create’ function, then save the results.

dir.create("Huff_results") # Create a new folder named results
# Save the origins shapefile in the "Huff_results" folder
writeOGR(origins, "Huff_results", "huff_probs", driver = "ESRI Shapefile")

#If you get a warning message, you can ignore it:

3.4 Mapping Huff catchments

If you managed to calculate Huff probabilities - Well Done as this was really advanced spatial analysis in action! Otherwise you can use Huff probabilities from the Practical 4 data pack. So, you will now display HUff probabilities using various thresholds, and secondly: create catchments by allocating the LSOAs based on these probabilities to each retail centre

  • Open the huff_probs shapefile in QGIS
  • Explore Attribute table of the layer
  • Navigate to Properties > Symbology window and display the hff_prb column (Graduated) as two different classes using the following thresholds: below 0.50 (secondary catchments) and above 0.50 (primary catchments)
  • In order to see the extent of a particular centre run a simple query in the Attribute table
  • Click on Select features using an expression and create an expression by specifying the destination name (“dstntn_” = ‘name of a town centre’)

The picture below shows the catchment for Smithdown Rd, which is highlighted in yellow SmithdownRd Catchment

3.4.0.1 (Optional)

  • Re-run the model and do not disaggregate the beta value by the hierarchy of town centres (assign one beta value = 2 to all the ranks, save the results in “Huff_results1” folder)
  • Compare the extent of catchments for both models; is there any difference in spatial extent of the catchments?
  • Which results seem to be more realistic and why?

3.4.1 Creating Huff polygons

  • Next, create polygons showing the extent of catchments for individual retail centres in Liverpool
  • This can be easily done by using the Dissolve function from Geoprocessing tools
  • Use huff_probs as your Input vector layer
  • Click … next to Dissolve field(s) and select dstntn_ and click OK
  • Save it to your working directory and name your output: Huff_catchments and click Save and then Run
  • Inspect the newly added layer; render the image (adjust transparency, change line colour and thickness)
  • Overlay the Huff_catchments onto the buffer rings and compare the extents
  • Create a map showing these differences
    Buffer_Huff

Answer the following questions:

  • What are the main differences in the catchment extents compared to the previous method (buffer rings)?
  • How do the differences affect total patronage levels?
  • What are the strengths and limitation of each of these techniques?

4 GIS analysis (Week 9)

4.1 Spatial joins

Spatial Joins are prevalently used when solving some real world problems in GIS. This functionality is available through the Join Attributes by Location tool from Vector > Data Management Tools. We are going to use this function to identify retail centres most at risk to consumers who prevalently shop online. More specifically, we are going to apply a spatial join to the retail catchments (both Buffers and Huff) and the IUC targets (the Online Shopping prevalence layer). Follow the steps below:

  • Make sure you have saved Huff_catchments, Buffer_Hierarchy and Online Shopping prevalence layers to your working directory

  • Open a new blank map in QGIS

  • Load the Huff_catchments, BufferHierarchy and Online Shopping prevalence layers

  • Open the Attribute table of the Online Shopping prevalence layer and examine all the attributes. Since we need to identify those retail centres that are at high risk to online shopping (exposure to consumers with high prevalence to online shopping), we can use the binary variable that was created earlier - Online_Shop. This can be joined with the catchment layers

  • Go to Processing Toolbox (by clicking Toolbox or right-click anywhere near the top of QGIS in the grey area and enabling Processing Toolbox). Then type Join Attributes by Location and double click on Join attributes by location (summary). This will allow you to generate various statistics for the joined layer.

  • In the dialog window, select BufferHierarchy as your input layer - Join to features in and Online Shopping prevalence as your By comparing to

  • Use Intersect as your Where the featuresfunction

  • Click the … button next to Fields to summarize and select Online_Shop. Click OK.

  • Save your Output shapefile (name it SpatialJoin_buffer) to your working directory and click Run and Close

  • A new layer should be added automatically to your Layers Panel

  • Open the Attribute table of the new layer and see what columns have been added

  • There are all possible statistics calculated, but the stats of an interest to us are the Sum and Count, so you can delete any other newly created columns.

Note: If your column names have been saved as, e.g. Online_shp1, then the columns you will need to keep are Online_Shop and Online_shp5. These are the count and sum columns, respectively.

  • To do so press the Delete column button DeleteCol and select the required field from the Delete Attributes dialog
  • Next you need to decide how the new columns can be used to identify those retail centres most at risk to consumers who prevalently shop online…

One way is to display the Online_Shop_sum column which shows the total number of LSOAs characterised by higher online shopping prevalence for each retail catchment. As such, by using the buffer rings as retail catchments we can see that Allerton Rd (with the sum of 31) is the most exposed retail centre in Liverpool, followed by Old Swan, Gatecare and Liverpool.

  • Create a map showing particular catchments exposure to consumers who prevalently shop online
  • Use for classes of exposure Low, Medium, High and Very High
  • Remember to include labels, North arrow, scale bar and legend
  • Compare your map against the one below:
    Exposure Buffers

Nevertheless, the results shown above only account for the number of LSOAs with high exposure. A more comprehensive way of identifying centres most at risk to consumers who prevalently shop online could be to look at a number of LSOAs with high exposure relative to total number of LSOAs within each catchment. This can also easily be calculated within QGIS.

  • All you need to do is:

  • Go to Properties>Symbology of the SpatialJoin_bufferlayer and change to Graduated

  • Click Expression next to the Value drop-down menu

  • In the expression window, type Online_Shop_sum / Online_Sho_count

  • Click OK and style

  • By using Online_Shop_sum / OnlineSho_count we will display the proportion of the LSOAs with higher exposure within each catchment, instead of a static column value

  • Create another map and compare it to the previous one
    Risk buffer

  • Finally, repeat all the above steps to perform a spatial join for the Huff_catchments

  • Create the two equivalent maps for Huff catchments

  • Compare the results against those obtained for Buffer catchments

  • Comment on the differences and similarities
    Risk_Huff

5 Siting a discount supermarket (Week 9-10)

Since the global economic crisis of 2008-09, the market share of hard discounters in the UK has been growing, often at the expense of the major grocery retailers known as the Big Four (Tesco, Sainsbury’s, Asda, and Morrisons). This trend has been exacerbated by the recent cost of living crisis, with Aldi becoming the UK’s fourth biggest supermarket, overtaking Morrisons in market share. The aim of this exercise is to find the most suitable location(s) (could be more than one) for a discount supermarket in Liverpool using GIS analysis. Typically, an analyst would use some sophisticated location analysis tools such as traffic pattern information, demographics, lifestyle data and footfall, and would carry out an analyses of its competitors. A location analysis often requires looking at footfall traffic generators (in particular when siting convenience stores), as other retailers in the neighbourhood may draw customers from various employment sites that are nearby. These could include industrial or office parks, schools, colleges and hospital complexes.
In the case of a discount supermarket, it is important to choose the right neighbourhood as typically, the most affluent catchments tend to shop more in the upmarket stores (not heavy discounters). First, we will explore the potential competition and different catchment estimation techniques.

5.1 Store catchments

  • Add the Liverpool_Foodstores shapefile (located in the retail folder) to the Layers Panel
  • Check the Attribute table
  • Create buffer catchments based on the Store_Type variable (e.g. c-stores 500m, supermarkets 1500m)
  • In order to create the buffers, you’ll need to create a CASE conditional statement as in the previous exercise (section 3.1.1)
  • Do the buffer catchments provide a clear picture?

As we want to find the most suitable location for a discount supermarket, we can make an assumption that there will be more competition from the supermarkets than convenience stores. As such, for our analysis we can remove the convenience stores from our database.

  • Display only the supermarkets and their catchments (remove/mask the convenience stores from your data by creating a spatial query for the Liverpool_Foodstoreslayer. Properties > Source > Query builder >“Store Type” = ‘supermarket’). Save the shapefile to your working directory, name it Supermarkets.

However, there are 73 supermarkets in Liverpool and therefore creating that many buffers is not very useful as the map would be very cluttered. Therefore, in our analysis we will focus only on the direct competitors - the heavy discounters - Aldi and Lidl stores. - Create a new layer for Aldi and Lidl supermarkets by using the Select features using an expression tool in the Attribute Table (“fascia” =‘Aldi’ OR “fascia”=‘Lidl’), name it Discount supermarkets. How many discounters there are in Liverpool? - Then create 1500m buffers for the Discount supermarketslayer and name it 1500buffer_Discounters Buffer_Discounters

You could also create drive distance catchments for all Liverpool discount supermarkets by following the steps from section 3.2
- Go to ArcGIS Pro, Load in the Discount supermarkets shapefile and then Network Analyst and create service areas for the discount supermarkets in Liverpool, use 1.5km distance and name them: DiscountSup_DriveDist
- Compare the extent of buffer catchments vs. drive distances
Buffer_DriveDist

So, it can be concluded that although there is an overlap, the drive distance catchments are more accurate than straight line buffers. In addition, there are some areas that are not falling within the delineated catchments, which means that these areas aren’t effectively served by the analysed discount supermarkets.From a competition perspective, these areas could be potential locations for the new discount supermarkets. However, this would depend on demand-related factors like total population, affluence etc.

Locating a new discounter in a vicinity of a larger supermarket doesn’t necessary have to be viewed as a disadvantage, providing of course, that the level of market saturation is not very high and you know the catchment’s demographics. One way of obtaining new insights to decide where to locate a store is to examine the analogue stores. As such, we will examine the catchments of other discount supermarkets in Liverpool.

5.1.1 Catchment affluence

Understanding the potential consumers and their spatial distribution that discount supermarkets are likely to target is vital here. Research shows that the predominant locations are either city/town centres or other urban areas, especially those with less affluent catchments. So let’s explore that - we will examine the relationship between Income deprivation in Liverpool and the discount supermarket locations.This exercise will use 1500m buffer catchments to obtain the statistics.

  • Create a Chloropleth Map of Income deprivation variable from the IMD 2015 layer
  • Using the Select by Location tool (Vector > Research Tools), select all the LSOAs that intersect with the 1500buffer_Discounters
  • Go to the Basic statistics for Fields tool and check what is the mean/median Income deprivation value for the selected LSOAs (is it 0.259 mean/0.27 median with StdDev 0.135?)
  • Now check what is the mean/median value for the Income deprivation within the boundary of Liverpool City. It should be 0.256/0.255 respectively, and to put that into a broader context the mean value for Great Britain is 0.15. Also the StdDev of 0.135 suggest that there is a relatively large degree of variation within those catchments. IMD_Discount

So based only on this very simple analysis we can make an assumption that the new discount supermarket should be located in a neighbourhood that is slightly less affluent than the city/regional average and has mixed demographics (population characteristics).

5.1.2 Catchment total population

  • Next, we will examine total population for our catchments (this could be extended to checking other population-related factors such as density or difference in total population between various catchments, as well as the demographic/employment composition of each catchment)

  • Similarly, as in the previous section (5.1.1), select a catchment area and check what is the total population of it (this can be referred to as potential patronage of that store). Use the Select by Location tool to select all the LSOAs in your residential/night time population layer that intersect with the appropriate catchment area - 1500buffer_Discounters layer

  • This will vary from around 20,000 to above 50,000 people per catchment
    Nevertheless, it is important to bear in mind that many of these catchments are overlapping so in some cases we count the same residential population twice - a phenomenon also called cannibalisation. So for this reason it would be useful to calculate the average population count per catchments.

  • Select all LSOAs that intersect with the catchments and divide the total population count by 14 (no of discount stores)

  • Is the average population per catchment around 28,000? (if your average varies dramatically please ask for help)

On the other hand, the day time population within the city centre is much higher that the residential population (ONS estimates) so combining the above results with the workplace zone population may be useful. In particular, this is important for the location of corporate convenience stores, as the larger employment sites play important role in creating demand for convenience goods retail.

5.1.2.1 Drive distance catchments (Optional)

For the purpose of this practical, we have only used the buffer catchments, however ideally for this practical and your assignment you could use drive distance catchments - it will give you more points in the assignment - Start with creating service areas for the Discount spermarketslayer in ArcGIS Pro, name them 1500DriveDist_Discounters - Repeat the above analyses using the 1500DriveDist_Discounters catchments

5.1.3 Location of the new discount store

Having done some analyses, we can now decide on the most suitable location for our new discount store. It is pretty obvious from the map that there are no discount stores in the north-east and south Liverpool. The map of income deprivation suggests that both areas could be considered although a large part of South Liverpool is quite affluent. The population distribution map suggests that the potential location for a new discounter could be around Croxteth/Gillmoss area (North East Liverpool), but there is also a scope to locate a discount supermarket in Garston or Speke area (South Liverpool). So we will examine the potential catchments for both locations.

  • Taking into consideration the existing road network, create 2 new points indicating the potential locations of the new discounters, one in Speke Boulevard (South Liverpool) and one in East Lancashire Rd (North East Liverpool)
  • Call them: Discounter 1 and Discounter 2
  • Create service areas (in ArcGIS Pro) for the newly allocated stores.
    Disc_DriveDist

Please note that the LSOA level is too coarse for the real world location analysis, OAs would produce much more accurate estimates; however we use free data only and most of our variables are available at LSOA level only.

  • Using the above methodology, you can examine Income deprivation levels and total population counts for the catchments of both Discounter 1 and Discounter 2 stores

  • You could also establish how many other (non-discount) supermarkets there are within let’s say 2,000m - if there is another supermarket in the vicinity, it may be of benefit, but if there are more than one the market saturation level may be already high enough.

  • Compare the results and decide on the most suitable location for your discount store

  • You have completed the last practical so well done!!!

  • You should be ready now to proceed with your assignment now