The growth in online consumption and the shift in consumer culture towards convenience and value have impacted the health of retail centres in complex ways, potentially leading to long-term structural changes. The rise of online sales was particularly noticeable during the COVID-19 pandemic, as well as the shift towards value for money during the current ‘cost of living crisis’. Adapting to these trends has often involved the rationalisation of store portfolios and the adoption of new business models. Such an example in the UK is Asda, whose traditional portfolio comprises large supermarkets and hypermarket formats. More recently, following a significant loss in market share, Asda has been rolling out a new store format—Asda Express.
Moreover, traditional town centre ‘brick-and-mortar’ retailers have primarily focused on supply-side factors such as competition, retail mix, and vacant spaces. However, there has been less emphasis on demand-side factors, such as catchment demographics, socio-economic characteristics, and their engagement with online shopping. Understanding the geography of consumer behaviour at a small area level is crucial for understanding the vitality and viability of both retail centres and the retailers themselves. Such consumer insights form the basis for site location analysis, in conjunction with an analysis of key competitors.
The Internet has revolutionised the way in which people consume products and services, however a variety of factors influence these use and engagement behaviours. Understanding the geography of these influences is complex, although, one way in which this has been made tractable is through various geodemographic classifications. However, these only tell you about the characteristics of the places in which people live and their environments, but not where they shop?
The objectives of this practical are:
a) to examine the geography of retail supply and demand related
factors;
b) to evaluate the extent to which retail centres are exposed to
consumers with different income and online consumption behaviour;
c) to choose the most suitable location for a new discount supermarket
in Liverpool.
In this case study you’ll take on the role of a property agent who is advising retail clients about those locations most suitable for their new premises.
In addition to reinforcing learning from earlier practicals, you will also be developing the following GIS skills and understanding:
Taking on the role of a location analyst, in no more than 1,250 words, write a short summary report that identifies the most suitable location in Liverpool (preferably, but not exclusively an existing shopping area such as town centre/high street or retail park) for a new corporate convenience store - Asda Express that will have a click & collect facility, using the results of your GIS analysis as evidence. In your analysis you should consider: catchment characteristics (total population, affluence, geodemographics), Internet Use Classification (IUC) and competition between existing stores. Use data from the practicals on ‘Consumption spaces’ and ’People and Places, establish clear selection criteria, and use various GIS tools to conduct a location analysis. Your outputs should include at least: four maps, two tables and a flowchart showing the steps you have taken in your analysis.
In the first part of this practical we will explore the Internet User Classification (IUC) data for Liverpool. The IUC is created from over seventy measures selected from survey and lifestyle data, alongside census and infrastructure performance statistics.
In order to display the Internet Use Classification, we first need to
familiarise ourselves with the IUC User Guide (attached in
the CDRC IUC Geodata Pack) and then the structure of the associated
database, so we can decide what field to use best in order to display
our data. Follow the steps below:
In fact, we are going to create two maps: one showing aggregated classifications based on the IUC supergroup (main clusters), and the other showing disaggregated classifications based on the IUC group (nested subclusters).
This will create your first IUC map for Liverpool, well done! However, those default colours do not look great, do they?. In this instance, we are going to use ColorBrewer ColorBrewer pallets to display the different IUC groups and then we’ll remove the black LSOA outlines to produce a clearer and more professional-looking map.
IUC supergroupYour map disaggregated by IUC supergroup should look similar to the
one shown below:
IUC supergroup layer (right
click on IUC supergroup > Duplicate Layer) and create a map of IUC in
Liverpool using the IUC group variable (grp_nm).Note: You may need more colour schemes than those available by default in ColorBrewer. One option is to design your own symbol display, such as using line pattern fill or point pattern fill for your symbology. Importantly, the colour schemes used in both maps should correspond closely (e.g. if your supergroup 1 is in green, then groups 1a, 1b and 1c should also include symbols in a shade of green.) Can you think of why?
Name the layer: IUC group
List the groups in the right order (1a, 1b, 1c, 2a…) and name them including their respective group code and name
In Symbology, double click the symbol of each individual classified group name and explore different Fill style options (within Simple fill) such as line pattern fill or point pattern fill
Here is an example of a map displaying the
IUC group. Can you comment on it? Does the colour scheme
match that of the IUC supergroup map?
Note: The classification comprises of supergroups (e.g. 1,2,3 etc. and nested groups e.g. 1a, 1b, 1c etc. So when you display them on a map make sure that the color scheme makes sense. For instance if super group 1 is displayed in green, all corresponding nested groups (1a,1b and 1c)should be also displayed as different shades of green)
Comment on the spatial patterns of internet use in Liverpool
Nationally, rates of online shopping equated to 53% when the research
was done, however there were differences between the IUC Groups as some
customers are more likely to shop online than others. For example,
groups 4c (low density but high connectivity), 4b (constrained by
infrastructure), 4a (e-fringe) and to an extent 2a (next generation
users) are most likely to engage in online shopping; whereas: 3a
(uncommitted and casual users), 1b (e-marginals: not a necessity) and 3b
(young and mobile) have lower than average propensities as shown on the
plot below.
So, in this part of the practical, we will examine the prevalence of online shopping and display the areas in Liverpool with the highest prevalence. In order to do so, we’ll need to do some variable recoding; in other words create a new variable where 1 is associated with higher prevalence of online shopping and 0 with lower. Do the following tasks:
Refer to the pen portraits of different IUC groups, available from the IUC User Guide
Open the Attribute Table of the
IUC_group layer
Click on Open field calculator and make sure
that the Create a new field box is ticked
In the Output field name type: Online_Shop
Choose from Functions window the Conditionals and then the CASE conditional (Check when and how we use CASE conditionals - there is some useful info provided in the window on the right hand side…)
Create a statement in the following format:
“WHEN condition = ‘x’ THEN the assigned value is ‘1’, otherwise (ELSE)
the assigned value is ‘0’”. In our case we assign the value of 1 to
groups 4a, 4b and 4c (high online shopping prevalence as shown in the
graph above), and all other IUC groups (lower prevalence) will have the
value of 0 assigned. Give it a go and then check your conditional
statement against the picture below:
Click Ok to create a new variable
Check the Attribute table - a new column showing our binary variable (0 or 1) should have been added
Now, we will display the new variable and then create a map that can be printed or published; more specifically a map showing the LSOAs with the highest propensity for online shopping. Follow the steps below:
IUC_group layer and click on
DuplicateOnline shopping prevalenceFor your map to be publishable, you need to add a legend, scale bar, north arrow, and possibly a title. By utilising the skills from the previous practicals, you should be able to complete it on your own. Nevertheless, the key steps are listed below:
RetailCentres_Liverpool layer from the
Practical 4_data folder and you have learnt in Practical 2 how to create
Liverpool boundary)However, creating only a binary variable masks some variance present in our data, so in this step, we will create an ordinal variable for which the values are ordered (the online shopping prevalence is captured from high to low). This approach allows us to capture more variance in online shopping behaviour, and if desired, we could use this variable to run a regression model.
IUC_group layerOnline Shopping CategoriesOpen the IMD folder in your working directory and load the shapefile for Liverpool(E08000012)
Name the layer IMD2015
Go to Properties open the Symbology tab and select Graduated from the drop down menu.
In the Value field choose income, type 5 in the Classes field and choose Equal Count as your Mode
Press Classify and then OK buttons; this will close the dialog and show you the map.
Again, render the image by using ColorBrewer and removing LSOAs outlines
Can you specify the threshold defining the lowest quintile of the
IMD Income domain for Liverpool?
To find this out, go to the Histogram tab within Symbology. Click Load Values and check the thresholds and distribution of the Income domain
By now, you should have created a map of Income deprivation domain
quintiles for Liverpool which looks something like the one below:
What do terms ‘quintile’ and ‘quantile breaks’ mean?
What is the mean value of the Income domain, and what are the standard deviation values?
There are other classification methods (Modes) available in QGIS; try them and see what difference do they make to the outcome map
Which method may be the most appropriate for this case
Your next task involves displaying the LSOAs within the lowest IMD quintile and then checking visibly whether there is an overlap between the IMD income domain and propensity for online shopping. Think about how you might proceed with this analysis… Give it a go; however, if you are stuck, follow the steps below:
IMD2015
layer and click on Select features using an
expression- How many polygons have you got selected?
Now, check whether there is any significant overlap between the LSOAs with the highest prevalence of online shopping and the least deprived IMD income quintile areas in Liverpool?
Online shopping prevalence layer and then select all the
LSOAs with the value of 1. You can do this by creating another simple
expression e.g. “Online_Shop” = 1Select features from - choose the
Online shopping prevalence layerBy comparing to the features from choose your
IMD2015 layer and again ensure that the Selected features
only box is checkedModify current selection by drop down menuIf we wanted to quantify the relationship between the Online shopping prevalence and the lowest quintile of the IMD2015 income variable we could divide the number of overlapping LSOAs by the number of LSOAs with the highest online shopping prevalence, which in our case is 39/60 = 0.65.
So, we could conclude that in Liverpool there is a relatively strong correspondence between the two variables as 65% of the LSOAs with highest prevalence of online shopping overlap with the lowest quintile of IMD income domain (highest income).
The IMD data we have used is from 2015, which is somewhat outdated.
For your assignment, you may consider using more recent IMD data, such
as from 2019 or even 2025. The latter was only released last week, so it
is as up to date as it gets and definitely worth trying. Get additional
information from here: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2025.
What is also important from a GIS user point of view, it uses the 2021
Census LSOA boundaries. You can obtain these from Practical 2
(Liverpool_pop2021.gpkg) and join the CSV table with the IMD 2025 data
available from your Practical_4 data folder.
Liverpool_pop2021.gpkg layer and the
IMD_2025_All_Data.csv tableLiverpool_pop2021.gpkg layer)Income Score variable, but
first make the variable numeric and permamentIncome 2025, use Decimal number as your Output
filed type)Income 2025 variableA more robust (statistical) analysis could also be done to explore
the relationship between all IMD domains (https://www.gov.uk/government/publications/english-indices-of-deprivation-2015-technical-report)
and online shopping propensity variables. First we could create and then
export a csv table with different variables. second, we could perform a
correlation test or even run a regression model to test their
statistical significance. This can be done in any statistical package
but we will use the legendary “R”.
- Go to Vector > Geoprocessing Tools >
Intersection
- Choose the the layers of interest such as
Online Shopping Categories and IMD2015 (also
using the Population layer from Practical 2 would be
useful) as your inputs and save the output to your working directory -
Name the output Intersect - Once Intersect appears in your
Layers Panel on QGIS export the Intersect
layer to your working directory as a.csv file (Right click >
Export > Save Features as..> Comma Separated Value (CSV)
as your format). Name it Intersect.csv
Note: If you’re having trouble with the previous step or your results differ from those used in here, you can use the pre-made .CSV file ‘Intersection.csv’ table for the next steps.
# Set your working directory
setwd("M:/Liverpool/Teaching/ENVS609/Practicals/Practical 4")
## Tip: this has to be the path to your working directory, so a simple copy and paste of the above path won't work for you as this is my working directory##
variables <- read.csv ('Intersection.csv')
# Check the structure of your dataset (this shows all variables, their respective class and examples in your dataset)
str(variables)
Now you can run a simple correlation test between various variables. R can perform correlation with the cor() function. Built-in to the base distribution of the program are three routines; for Pearson, Kendal and Spearman Rank correlations. The default method is “Pearson” so you may omit this if that is what you want. First, let’s check whether there is a correlation between the ‘income’ and ‘Online_Shop’ variables’
note that R is case sensitive so always check whether a variable name uses lower or upper-case letters as otherwise you’ll get an error.
cor(variables$income, variables$Online_Shop)
# you can plot the relationship between the two variables by using the plot() function
plot(variables$income, variables$Online_Shop)
So there is negative relationship between the two variables (Pearson
coefficient = -0.53) but getting a correlation coefficient is generally
only half the story. Most certainly, you would like to know if the
relationship is significant. The cor() function in R can be extended to
provide the significance testing required. The function is cor.test().
We are going to test the IUC_ordinal variable this
time.
cor.test(variables$imd_score, variables$IUC_ordina)
plot(variables$imd_score, variables$IUC_ordina)
The above test indicates that the correlation is very significant as the p value is 2.2 to the power of -16, which is much smaller that the required 0.05 to reject the null hypothesis that there is no correlation between these two variables). If you want to check the correlation coefficients for a number of variables at once, create a correlation matrix by running the cor() function for the entire dataset: cor(my.variables, method = “pearson”). However, you have to ensure that your data consists of numerical variables only. There are various ways you can remove the unwanted (non-nuerical) variables (these are the initial 6 variables: lsoa_cd, lsoa_nm, supgrp_cd, supgrp_nm,grp_cd, grp_nm). First, we create a subset of the variables we need (my.variables <- variables 7 to 20). By doing this, we’re excluding the above 6 non-numerical variables.
my.variables <- variables[7:20]
str(my.variables)
Then remove the Factor/chr (non-numeric) variable (LSOA11CD) - in the dataset used to write this practical, this is the variable 3, but this may differ in your dataset (hopefully not). Run the code below to remove variable 3.
my.variables [,3] <- NULL
str(my.variables)
Now all your variables should be numeric/integers, so this means that you can create a correlation matrix.
c
plot(my.variables, method = "pearson")
At times, you may need a scatter plot you have created it. There is an easy way of saving it in R.
# Save the plot
png('correlation.png')
plot(my.variables, method = "pearson")
dev.off()
Check your scatterplot against the one below:
Finally, it is possible to check the statistical significance different variables (explanatory variables) have on the prevalence level of online sales (our dependent variable) by running a regression model. First, we’ll try a simple regression model with only one explanatory variable and then a multiple regression model with 3 different variables. This is done in R very easily by implementing the lm() function:
# Run a simple regression model
IUC.lm = lm(variables$IUC_ordina ~ variables$imd_score)
summary(IUC.lm)
# Run a multiple regression model
IUC.lm1 = lm(variables$IUC_ordina ~ variables$income + variables$education + variables$living_env)
summary(IUC.lm1)
The summary command is particularly useful in this case as it produces the coefficients so we can see which factors are statistically significant. We can also see the overall R-squared value for our model (the ‘Goodness of fit’ measure of our regression model). For instance, our simple regression model explains approx. 27% of the variation in the dependent variable (IUC_ordinal).
- How much variation in online shopping prevalence is explained by the multiple regression model?