1 Introduction

The growth in online consumption and the shift in consumer culture towards convenience and value have impacted the health of retail centres in complex ways, potentially leading to long-term structural changes. The rise of online sales was particularly noticeable during the COVID-19 pandemic, as well as the shift towards value for money during the current ‘cost of living crisis’. Adapting to these trends has often involved the rationalisation of store portfolios and the adoption of new business models. Such an example in the UK is Asda, whose traditional portfolio comprises large supermarkets and hypermarket formats. More recently, following a significant loss in market share, Asda has been rolling out a new store format—Asda Express.

Moreover, traditional town centre ‘brick-and-mortar’ retailers have primarily focused on supply-side factors such as competition, retail mix, and vacant spaces. However, there has been less emphasis on demand-side factors, such as catchment demographics, socio-economic characteristics, and their engagement with online shopping. Understanding the geography of consumer behaviour at a small area level is crucial for understanding the vitality and viability of both retail centres and the retailers themselves. Such consumer insights form the basis for site location analysis, in conjunction with an analysis of key competitors.

1.1 Learning Aims and Objectives

The Internet has revolutionised the way in which people consume products and services, however a variety of factors influence these use and engagement behaviours. Understanding the geography of these influences is complex, although, one way in which this has been made tractable is through various geodemographic classifications. However, these only tell you about the characteristics of the places in which people live and their environments, but not where they shop?

The objectives of this practical are:
a) to examine the geography of retail supply and demand related factors;
b) to evaluate the extent to which retail centres are exposed to consumers with different income and online consumption behaviour;
c) to choose the most suitable location for a new discount supermarket in Liverpool.

In this case study you’ll take on the role of a property agent who is advising retail clients about those locations most suitable for their new premises.

In addition to reinforcing learning from earlier practicals, you will also be developing the following GIS skills and understanding:

  1. Use of a Geodemographic Classification and subdomains of the IMD
  2. How to calculate buffers and their uses
  3. Recoding variables in QGIS
  4. Creating spatial joins
  5. Basic understanding of retail catchment modelling techniques
  6. Exploring suitable locations for a retail store based on various selection criteria (socio-economic, consumer behaviour and competition)

1.2 Assignment

Taking on the role of a location analyst, in no more than 1,250 words, write a short summary report that identifies the most suitable location in Liverpool (preferably, but not exclusively an existing shopping area such as town centre/high street or retail park) for a new corporate convenience store - Asda Express that will have a click & collect facility, using the results of your GIS analysis as evidence. In your analysis you should consider: catchment characteristics (total population, affluence, geodemographics), Internet Use Classification (IUC) and competition between existing stores. Use data from the practicals on ‘Consumption spaces’ and ’People and Places, establish clear selection criteria, and use various GIS tools to conduct a location analysis. Your outputs should include at least: four maps, two tables and a flowchart showing the steps you have taken in your analysis.

1.3 Data

  • In your “M:” drive, create a new folder for this practical - this is your “working directory”
  • Go to Canvas and download the Practical 4 data files
  • Unzip the files into your working directory
  • For your assignment optional data go to https://data.cdrc.ac.uk/geodata-packs and download the Index of Multiple Deprivation IMD 2019: Liverpool and Internet User Classification 2018: Liverpool data packs. From Canvas there are also available the latest 2023 dataset with GB Foodstore locations and Z-scores for the 2018 IUC.

2 Geodemographics and consumer characteristics

2.1 Internet User Classification map of Liverpool

In the first part of this practical we will explore the Internet User Classification (IUC) data for Liverpool. The IUC is created from over seventy measures selected from survey and lifestyle data, alongside census and infrastructure performance statistics.

  • Locate the IUC data within your working directory: Practical 4 > IUC 2014 Liverpool > E08000012 > E08000012_Liverpool.shp). Load in the E08000012_Liverpool.shp onto the map interface by dragging and dropping the shapefile. Alternatively, click on the Add Vector Layer button and navigate to your working directory.

In order to display the Internet Use Classification, we first need to familiarise ourselves with the IUC User Guide (attached in the CDRC IUC Geodata Pack) and then the structure of the associated database, so we can decide what field to use best in order to display our data. Follow the steps below:

  • Right click the layer name and select Open Attribute Table
  • Match the names of columns against the IUC_User_Guide (it should be in your CDRC IUC Geodata Pack) and try to make sense of them
  • Decide which column would be the most sensible to display for an IUC geodemographic map of Liverpool.

In fact, we are going to create two maps: one showing aggregated classifications based on the IUC supergroup (main clusters), and the other showing disaggregated classifications based on the IUC group (nested subclusters).

  • Right click the layer and select Properties
  • Go to the Symbology tab and select Categorized from the dropdown menu
  • Then from the Value dropdown menu select supgrp_nm and press the Classify button
  • Press OK

This will create your first IUC map for Liverpool, well done! However, those default colours do not look great, do they?. In this instance, we are going to use ColorBrewer ColorBrewer pallets to display the different IUC groups and then we’ll remove the black LSOA outlines to produce a clearer and more professional-looking map.

  • Go to Color ramp in the Symbology tab and select from the dropdown menu Create New Color Ramp
  • Select Catalog:ColorBrewer from the drop down menu, set number of Colours to 4 and use Accent as your Scheme name
  • Press OK twice
  • Go back to Symbology in the Properties window and click on the Symbol box, select Simple fill and set the Stroke Style to No line
  • Lastly, change the layer’s name to IUC supergroup

Your map disaggregated by IUC supergroup should look similar to the one shown below:
IUC geodemographic map

  • Now, create a copy of the IUC supergroup layer (right click on IUC supergroup > Duplicate Layer) and create a map of IUC in Liverpool using the IUC group variable (grp_nm).

Note: You may need more colour schemes than those available by default in ColorBrewer. One option is to design your own symbol display, such as using line pattern fill or point pattern fill for your symbology. Importantly, the colour schemes used in both maps should correspond closely (e.g. if your supergroup 1 is in green, then groups 1a, 1b and 1c should also include symbols in a shade of green.) Can you think of why?

  • Name the layer: IUC group

  • List the groups in the right order (1a, 1b, 1c, 2a…) and name them including their respective group code and name

  • In Symbology, double click the symbol of each individual classified group name and explore different Fill style options (within Simple fill) such as line pattern fill or point pattern fill

  • Here is an example of a map displaying the IUC group. Can you comment on it? Does the colour scheme match that of the IUC supergroup map? IUC group map

Note: The classification comprises of supergroups (e.g. 1,2,3 etc. and nested groups e.g. 1a, 1b, 1c etc. So when you display them on a map make sure that the color scheme makes sense. For instance if super group 1 is displayed in green, all corresponding nested groups (1a,1b and 1c)should be also displayed as different shades of green)

Comment on the spatial patterns of internet use in Liverpool

2.2 Online shopping prevalence (theoretical background)

Nationally, rates of online shopping equated to 53% when the research was done, however there were differences between the IUC Groups as some customers are more likely to shop online than others. For example, groups 4c (low density but high connectivity), 4b (constrained by infrastructure), 4a (e-fringe) and to an extent 2a (next generation users) are most likely to engage in online shopping; whereas: 3a (uncommitted and casual users), 1b (e-marginals: not a necessity) and 3b (young and mobile) have lower than average propensities as shown on the plot below. IUC ggplot

So, in this part of the practical, we will examine the prevalence of online shopping and display the areas in Liverpool with the highest prevalence. In order to do so, we’ll need to do some variable recoding; in other words create a new variable where 1 is associated with higher prevalence of online shopping and 0 with lower. Do the following tasks:

  • Refer to the pen portraits of different IUC groups, available from the IUC User Guide

  • Open the Attribute Table of the IUC_group layer

  • Click on Open field calculator and make sure that the Create a new field box is ticked

  • In the Output field name type: Online_Shop

  • Choose from Functions window the Conditionals and then the CASE conditional (Check when and how we use CASE conditionals - there is some useful info provided in the window on the right hand side…)

  • Create a statement in the following format:
    “WHEN condition = ‘x’ THEN the assigned value is ‘1’, otherwise (ELSE) the assigned value is ‘0’”. In our case we assign the value of 1 to groups 4a, 4b and 4c (high online shopping prevalence as shown in the graph above), and all other IUC groups (lower prevalence) will have the value of 0 assigned. Give it a go and then check your conditional statement against the picture below:
    Recoding conditional

  • Click Ok to create a new variable

  • Check the Attribute table - a new column showing our binary variable (0 or 1) should have been added

Now, we will display the new variable and then create a map that can be printed or published; more specifically a map showing the LSOAs with the highest propensity for online shopping. Follow the steps below:

  • Right click the IUC_group layer and click on Duplicate
  • A copy of the layer has been created; name it Online shopping prevalence
  • Then following those steps used earlier to create the IUC maps classify the newly created binary variable, so both ‘low prevalence’ and ‘high prevalence’ areas are clearly visible
  • Save your QGIS map

For your map to be publishable, you need to add a legend, scale bar, north arrow, and possibly a title. By utilising the skills from the previous practicals, you should be able to complete it on your own. Nevertheless, the key steps are listed below:

  • Hopefully your map looks something like the one below. (You can add labels using the RetailCentres_Liverpool layer from the Practical 4_data folder and you have learnt in Practical 2 how to create Liverpool boundary)
    IU_prevalence

However, creating only a binary variable masks some variance present in our data, so in this step, we will create an ordinal variable for which the values are ordered (the online shopping prevalence is captured from high to low). This approach allows us to capture more variance in online shopping behaviour, and if desired, we could use this variable to run a regression model.

  • Open the Attribute Table of the IUC_group layer
  • Click on Open field calculator and make sure that the Create a new field box is ticked
  • Name the Output field: IUC_ordinal (there is a 10 character limit with column names so it is okay if the name is cut off)
  • Choose from the Functions window Conditionals and then CASE
  • Assign the highest value (e.g.’4’ - high online shopping prevalence) to the following IUC groups: ‘4a, 4b and 4c’
  • Assign ‘3’ (above the average prevalence) to the groups ‘2a and 1c’
  • Assign ‘2’ (below the average prevalence) to groups ‘1b and 3a’
  • Assign ‘1’ (low prevalence) to the remaining IUC groups
  • You may find it useful to create another (text) column to record the names (high prevalence, above the average etc.) rather than just relying on the numeric values
  • Name the layer Online Shopping Categories
  • Comment on the spatial variation; how does it differ from the binary variable previously created, is there a likelihood of a spatial autocorrelation?

2.3 IMD income domain map

  • Open the IMD folder in your working directory and load the shapefile for Liverpool(E08000012)

  • Name the layer IMD2015

  • Go to Properties open the Symbology tab and select Graduated from the drop down menu.

  • In the Value field choose income, type 5 in the Classes field and choose Equal Count as your Mode

  • Press Classify and then OK buttons; this will close the dialog and show you the map.

  • Again, render the image by using ColorBrewer and removing LSOAs outlines

  • Can you specify the threshold defining the lowest quintile of the IMD Income domain for Liverpool?

  • To find this out, go to the Histogram tab within Symbology. Click Load Values and check the thresholds and distribution of the Income domain

By now, you should have created a map of Income deprivation domain quintiles for Liverpool which looks something like the one below: IMD2015_quintiles

  • What do terms ‘quintile’ and ‘quantile breaks’ mean?

  • What is the mean value of the Income domain, and what are the standard deviation values?

  • There are other classification methods (Modes) available in QGIS; try them and see what difference do they make to the outcome map

  • Which method may be the most appropriate for this case

Your next task involves displaying the LSOAs within the lowest IMD quintile and then checking visibly whether there is an overlap between the IMD income domain and propensity for online shopping. Think about how you might proceed with this analysis… Give it a go; however, if you are stuck, follow the steps below:

  • Go to Attribute table of the IMD2015 layer and click on Select features using an expression
  • Choose Fields and Values from the functions tab
  • Specify all polygons that are within the lowest quintile (use a simple expression such as “income” <= 0.11)
  • Click Select Features and then Close buttons
    IMD2015_Expression

- How many polygons have you got selected?

Now, check whether there is any significant overlap between the LSOAs with the highest prevalence of online shopping and the least deprived IMD income quintile areas in Liverpool?

  • Open Attribute Table of the Online shopping prevalence layer and then select all the LSOAs with the value of 1. You can do this by creating another simple expression e.g. “Online_Shop” = 1
  • Run a Select by Location from the Vector > Research Tools menu with the following setup:
    1. In the Select features from - choose the Online shopping prevalence layer
    2. Choose Contain as your Spatial Query method (geometric predicate)
    3. In the By comparing to the features from choose your IMD2015 layer and again ensure that the Selected features only box is checked
    4. Choose Selecting within current selection from the Modify current selection by drop down menu
  • Click Run and then the Close buttons. Check how many LSOAs have been selected this time?

If we wanted to quantify the relationship between the Online shopping prevalence and the lowest quintile of the IMD2015 income variable we could divide the number of overlapping LSOAs by the number of LSOAs with the highest online shopping prevalence, which in our case is 39/60 = 0.65.

So, we could conclude that in Liverpool there is a relatively strong correspondence between the two variables as 65% of the LSOAs with highest prevalence of online shopping overlap with the lowest quintile of IMD income domain (highest income).

2.4 IMD 2025

The IMD data we have used is from 2015, which is somewhat outdated. For your assignment, you may consider using more recent IMD data, such as from 2019 or even 2025. The latter was only released last week, so it is as up to date as it gets and definitely worth trying. Get additional information from here: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2025. What is also important from a GIS user point of view, it uses the 2021 Census LSOA boundaries. You can obtain these from Practical 2 (Liverpool_pop2021.gpkg) and join the CSV table with the IMD 2025 data available from your Practical_4 data folder.

  • Add the Liverpool_pop2021.gpkg layer and the IMD_2025_All_Data.csv table
  • Explore the IMD 2025 data
  • Join the csv table to the 2021 LSOAs for Liverpool (Liverpool_pop2021.gpkg layer)
  • You will need to display the Income Score variable, but first make the variable numeric and permament
  • Create a new numeric variable using Field Calculator (name it: Income 2025, use Decimal number as your Output filed type)
  • Classify the Income 2025 variable
  • Now you can use this data for your assignment, run a simmilar analysis as the one above etc. IMD2025

2.4.1 Basic R - statistical significance and relationships between variables (optional)

A more robust (statistical) analysis could also be done to explore the relationship between all IMD domains (https://www.gov.uk/government/publications/english-indices-of-deprivation-2015-technical-report) and online shopping propensity variables. First we could create and then export a csv table with different variables. second, we could perform a correlation test or even run a regression model to test their statistical significance. This can be done in any statistical package but we will use the legendary “R”.
- Go to Vector > Geoprocessing Tools > Intersection
- Choose the the layers of interest such as Online Shopping Categories and IMD2015 (also using the Population layer from Practical 2 would be useful) as your inputs and save the output to your working directory - Name the output Intersect - Once Intersect appears in your Layers Panel on QGIS export the Intersect layer to your working directory as a.csv file (Right click > Export > Save Features as..> Comma Separated Value (CSV) as your format). Name it Intersect.csv

Note: If you’re having trouble with the previous step or your results differ from those used in here, you can use the pre-made .CSV file ‘Intersection.csv’ table for the next steps.

  • Open RStudio and follow the steps below:
# Set your working directory
setwd("M:/Liverpool/Teaching/ENVS609/Practicals/Practical 4")

## Tip: this has to be the path to your working directory, so a simple copy and paste of the above path won't work for you as this is my working directory## 

variables <- read.csv ('Intersection.csv')

# Check the structure of your dataset (this shows all variables, their respective class and examples in your dataset)
str(variables)

Now you can run a simple correlation test between various variables. R can perform correlation with the cor() function. Built-in to the base distribution of the program are three routines; for Pearson, Kendal and Spearman Rank correlations. The default method is “Pearson” so you may omit this if that is what you want. First, let’s check whether there is a correlation between the ‘income’ and ‘Online_Shop’ variables’

note that R is case sensitive so always check whether a variable name uses lower or upper-case letters as otherwise you’ll get an error.

cor(variables$income, variables$Online_Shop)
# you can plot the relationship between the two variables by using the plot() function
plot(variables$income, variables$Online_Shop)

So there is negative relationship between the two variables (Pearson coefficient = -0.53) but getting a correlation coefficient is generally only half the story. Most certainly, you would like to know if the relationship is significant. The cor() function in R can be extended to provide the significance testing required. The function is cor.test(). We are going to test the IUC_ordinal variable this time.

cor.test(variables$imd_score, variables$IUC_ordina)
plot(variables$imd_score, variables$IUC_ordina)

The above test indicates that the correlation is very significant as the p value is 2.2 to the power of -16, which is much smaller that the required 0.05 to reject the null hypothesis that there is no correlation between these two variables). If you want to check the correlation coefficients for a number of variables at once, create a correlation matrix by running the cor() function for the entire dataset: cor(my.variables, method = “pearson”). However, you have to ensure that your data consists of numerical variables only. There are various ways you can remove the unwanted (non-nuerical) variables (these are the initial 6 variables: lsoa_cd, lsoa_nm, supgrp_cd, supgrp_nm,grp_cd, grp_nm). First, we create a subset of the variables we need (my.variables <- variables 7 to 20). By doing this, we’re excluding the above 6 non-numerical variables.

my.variables <- variables[7:20]
str(my.variables)

Then remove the Factor/chr (non-numeric) variable (LSOA11CD) - in the dataset used to write this practical, this is the variable 3, but this may differ in your dataset (hopefully not). Run the code below to remove variable 3.

my.variables [,3] <- NULL
str(my.variables)

Now all your variables should be numeric/integers, so this means that you can create a correlation matrix.

c
plot(my.variables, method = "pearson")

At times, you may need a scatter plot you have created it. There is an easy way of saving it in R.

# Save the plot
png('correlation.png')
plot(my.variables, method = "pearson")
dev.off()

Check your scatterplot against the one below:
correlation

Finally, it is possible to check the statistical significance different variables (explanatory variables) have on the prevalence level of online sales (our dependent variable) by running a regression model. First, we’ll try a simple regression model with only one explanatory variable and then a multiple regression model with 3 different variables. This is done in R very easily by implementing the lm() function:

# Run a simple regression model
IUC.lm = lm(variables$IUC_ordina ~ variables$imd_score)
summary(IUC.lm)

# Run a multiple regression model
IUC.lm1 = lm(variables$IUC_ordina ~ variables$income + variables$education + variables$living_env)
summary(IUC.lm1)

The summary command is particularly useful in this case as it produces the coefficients so we can see which factors are statistically significant. We can also see the overall R-squared value for our model (the ‘Goodness of fit’ measure of our regression model). For instance, our simple regression model explains approx. 27% of the variation in the dependent variable (IUC_ordinal).

- How much variation in online shopping prevalence is explained by the multiple regression model?