Geodemographics and Social Marketing

1 Introduction

In this exercise you take on the role of an owner of a direct marketing consultancy that specialises in helping small businesses to target potential service users within their local area. Your most recent client is a group of doctors who are organising a diabetes awareness campaign that involves delivery of an information pamphlet to local residents and holding a testing clinic at one of their surgeries.

The doctors have two problems:

  1. They don't know which of the General Practitioner (GP) surgeries would be the most appropriate location for the clinic?
  2. They want to promote the clinic with a local leaflet campaign, however…they only have enough money to produce 5,000 leaflets and want best return on this investment

Project Tasks

You have been asked to write a short executive report (max 500 words) that will answer the following questions:

  1. Identify the top five GP surgeries that would make the most appropriate location for the clinic, embedding the reasons for these choices in the evidence generated by your data analysis.
  2. Identify a ranked list of Output Areas that you would target, detailing the allocation of the 5000 leaflets, again, embedding the reasons for these choices in the evidence generated by your data analysis.

In addition to answering these two questions in your report, you have been asked to present as a minimum the following information:

  1. An OAC SuperGroup map of Leeds that also details the locations of the GP surgeries
  2. An estimated GP surgery catchment map overlaying OAC SuperGroups
  3. A population at risk map that identifies those areas containing people who many benefit most from the clinic
  4. A map showing the estimated relative and absolute population at risk populations within the GP surgeries in Leeds
  5. A map showing the GP surgeries selected as most suitable for the clinics

Overall, your report should not have more than 10 maps or charts. All maps must be generated within R; and all maps should adopt different (but appropriate) colour schemes to those illustrated in this practical.

Getting good marks!

  1. Get the basics right - add legends, titles, scale bars, north arrows etc.
  2. Follow the standard GIS marking criteria!
  3. Ensure that your report clearly identifies your answers to Q1 and Q2
  4. Make sure your answers are clear, and supported by evidence generated by your analysis
  5. Think about the data you are putting on the maps - e.g. if you are plotting points, what is the most appropriate background map etc…
  6. Choose a single style for your maps, i.e. so they look like they are part of a set
  7. Do not print out the R code in your report
  8. Make sure you include all the analysis required (see above)

Learning Objectives

The learning objectives for this activity are as follows:

  1. Obtain and prepare population estimate, geodemographic and public sector administrative data
  2. Create model catchment areas (buffer analysis)
  3. Use geodemographic index scores to model predicted at risk populations
  4. Interpret findings from geodemographic analysis

2. Diabetes at Risk Population

Around 4% of the UK population has diabetes. In this exercise, the at risk population is defined as people who are aged over 40. When an age, gender and ethnic group standardised index score are calculated with the Office for National Statistics Output Area Classification (OAC), it can be shown that these patterns are not evenly distributed across different types of neighbourhood (see Figure 1 and Table 1).

Figure 1: Percentage different from the national average for diabetes related hospital admissions

plot of chunk unnamed-chunk-1

Table 1: Propensity for diabetes related hospital admissions

OAC Group Index Score
1a: Terraced Blue Collar 126.00
1b: Younger Blue Collar 96.00
1c: Older Blue Collar 109.00
2a: Transient Communities 81.00
2b: Settled in the City 87.00
3a: Village Life 82.00
3b: Agricultural 66.00
3c: Accessible Countryside 64.00
4a: Prospering Younger Families 103.00
4b: Prospering Older Families 71.00
4c: Prospering Semis 85.00
4d: Thriving Suburbs 81.00
5a: Senior Communities 92.00
5b: Older Workers 120.00
5c: Public Housing 110.00
6a: Settled Households 120.00
6b: Least Divergent 90.00
6c: Young Families in Terraced Homes 103.00
6d: Aspiring Households 104.00
7a: Asian Communities 106.00
7b: Afro-Caribbean Communities 78.00

Interpreting these index scores is a relatively straight forward task. A score of 100 indicates an OAC Group where admissions to hospital with conditions related to Diabetes are represented in the same proportion as the national average. A score of 200 would indicate admissions at a rate twice the national average, and a score of 50 half the national average. Index scores formatted in this way are commonly used in geodemographic analysis.

3 Obtaining the Data

For this exercise you will use three sets of data: population estimates, the Output Area Classification and the locations of GP surgeries.

You first need to setup your working directory, and then load a number of packages that you will need later in the practical. When you restart the practical on different machines, you will need to run these lines of code again.

library(maptools)
# Set working directory
setwd("M:/R work/social_marketing/")

# Choose the CRAN mirror where the packages are going to be downloaded
# from
options(repos = c(CRAN = "http://www.stats.bris.ac.uk/R/"))

# Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())

# Download / install and then load maptools
install.packages("maptools", depend = TRUE, lib = getwd())
library(maptools, lib.loc = getwd())

# Download / install and then load stringr
install.packages("stringr", depend = TRUE, lib = getwd())
library("stringr", lib.loc = getwd())

# Download / install and then load rgeos
install.packages("rgeos", depend = TRUE, lib = getwd())
library("rgeos", lib.loc = getwd())

# Download / install and then load classInt
install.packages("classInt", depend = TRUE, lib = getwd())
library("classInt", lib.loc = getwd())

# Download / install and then load RColorBrewer
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())

Shapefiles for use in this exercise have already been prepared for you. You can download the Output Area (OA) shapefile from here [link] ; and the CAS Wards here [link]

Once you have downloaded these two Shapefiles, unzip them into the working directory you are using for this practical. We then will take a quick look at both the geometry and the attributes of the newly created object OA.


# Get 2011 OA for Leeds
OA <- readShapeSpatial("OA", proj4string = CRS("+init=epsg:27700"))

# Get 2011 CAS for Leeds
CAS <- readShapeSpatial("CAS", proj4string = CRS("+init=epsg:27700"))

# Plot of Leeds
plot(OA)

plot of chunk unnamed-chunk-6

In the data slot, you will see that there are a variety of attributes about each Output Area in Leeds including an area id (Zone_Code), the co-ordinates of the OA centroid, the geodemographic group (OACGroup), a name for the geodemographic group (label) and the total population within the area (Tpop).

# First six rows of data
head(OA@data)
   ZONE_CODE      x      y OACGroup                            Label Tpop
0 00DAFD0037 429281 430671       6c Young Families in Terraced Homes  378
1 00DAFU0005 424041 428395       6a               Settled Households  605
2 00DAFX0010 428917 441090       4d                 Thriving Suburbs  372
3 00DAGH0063 440027 449531       4a      Prospering Younger Families  300
4 00DAGK0037 426390 432257       1c                Older Blue Collar  260
5 00DAGE0014 435052 436617       5b                    Older Workers  291
  Tpop40
0     97
1    326
2    193
3    164
4    139
5    134

The next dataset you will download is the locations of all GP surgeries from the Office for National Statistics website: http://www.neighbourhood.statistics.gov.uk/dissemination/

Near the base of the page click on the link “Topics”, then on the following page enter “General Practices” into the search box and click the “Go” button. Expand the returned list by clicking the + button on Data set name matches ==>> Exact matches, then select “General Practices / Surgeries, 2006” from the returned data sets, by clicking the radio button on the right hand side of the page. Then click the “Next” button on the bottom right of the page. Download the CSV file on the next page, placing this in your working directory.

This can now be read into R as a new object called GP, and trimmed to only include the surgery name and the Easting and Northing coordinates.

GP <- read.csv("General Practices 2006.csv", header = TRUE, skip = 3)
GP <- subset(GP, select = c("Practice.Doctor.s.Name", "Easting", "Northing"))
colnames(GP) <- c("Surgery", "Easting", "Northing")

4. Creating a spatial points data frame for surgeries residing within Leeds

The coverage of the GP data can be explored using a basic plot of the x,y coordinates.

plot(GP$Easting, GP$Northing)

plot of chunk unnamed-chunk-9

You will see that the GP locations have a coverage of the whole of England; and as such, it is necessary to extract only those GPs that reside within Leeds. We can do this using a point in polygon operation - i.e. identify those doctors (points) which lie within the Leeds area (polygon). However, before we can do this, we must first convert the GP data frame into a SpatialPointsDataFrame.

The function SpatialPointsDataFrame has a number of parameters - coords which accepts a list of the x and y coordinates, data that accepts the data frame containing attributes for each of the coordinates, and finally, a CRS string that shows the coordinate reference system for the given set of points - there are codes for lots of different referencing systems; however here we are using the one which corresponds to British National Grid.

# Remove those GP without Easting or Northing
GP <- subset(GP, Easting != "" | Northing != "")

# Create a unique ID for each GP
GP$GP_ID <- 1:nrow(GP)

# Create the SpatialPointsDataFrame
GP_SP <- SpatialPointsDataFrame(coords = c(GP[2], GP[3]), data = data.frame(GP$Surgery, 
    GP$GP_ID), proj4string = CRS("+init=epsg:27700"))

# Show the results
plot(GP_SP)

plot of chunk unnamed-chunk-10

Point in polygon - Create a spatially derived subset of the GP surgeries

The next stage of analysis is to perform a point in polygon operation using the over() function - this examines two spatial objects and returns a different set of attributes depending whether the objects are points, lines or polygons. In the case of a point and a polygon, the returned object is a data frame of the attributes of the polygon in which the point resides. The returned data frame therefore has the same number of rows as the input spatial point data frame.

# point in polygon - returns a dataframe of the attributes of the polygons
# that the point is within.
o <- over(GP_SP, CAS)
# Many of these will be NA values - because most GPs are not in Leeds!
head(o)
  gid ons_label name label
1  NA      <NA> <NA>  <NA>
2  NA      <NA> <NA>  <NA>
3  NA      <NA> <NA>  <NA>
4  NA      <NA> <NA>  <NA>
5  NA      <NA> <NA>  <NA>
6  NA      <NA> <NA>  <NA>
# Add the attributes back onto the GP_SP SpatialPointsDataFrame
GP_SP@data <- cbind(GP_SP@data, o)
# Use the NA values to remove those points now within Leeds
GP_SP_Leeds <- GP_SP[!is.na(GP_SP@data$label), ]
# Map your results
plot(GP_SP_Leeds)

plot of chunk unnamed-chunk-11

# View the data slot of the results
head(GP_SP_Leeds@data)
                     GP.Surgery GP.GP_ID  gid ons_label         name
3591 Dr A M Marshall & Partners     3591 1664    00DAFA  Aireborough
3592    Dr Adams R J & Partners     3592 1662    00DAFU Morley North
3593    Dr Adams R J & Partners     3593 1662    00DAFU Morley North
3594    Dr Adams R J & Partners     3594 1662    00DAFU Morley North
3595                 Dr Ali S A     3595 3470    00DAFQ      Hunslet
3596   Dr Allman I G & Partners     3596 3430    00DAGG     Weetwood
      label
3591 08DAFA
3592 08DAFU
3593 08DAFU
3594 08DAFU
3595 08DAFQ
3596 08DAGG

Estimating the population at risk

An estimated population at risk within each OA can be calculated using the index scores that were given in Table 1. The aim is to adjust the national rate of diabetes up or down from 4% based on the OAC Group characteristics of the area, and then calculate an estimated population count for the frequency of people with or at risk from Diabetes.

You now need to create a new data frame object called OAC_diabetes_rates that will contain these rates. You can create a data frame using the data.frame() function, which will accept a series of lists as their input. For example…

# Character strings use ''
forename <- c("Alex", "John", "Peter", "Eric")
surname <- c("Smith", "Peters", "Jones", "Mount")
# Numbers do not use ''''
age <- c(10, 15, 67, 50)
# Create a data frame
people <- data.frame(forename, surname, age)
# This should look like this if printed to the terminal
people
  forename surname age
1     Alex   Smith  10
2     John  Peters  15
3    Peter   Jones  67
4     Eric   Mount  50

You now need to create a lookup data frame for the index scores that contains:

If you have done this correctly, this will look like

   ID oac_names diabetes
1   1        1a      126
2   2        1b       96
3   3        1c      109
4   4        2a       81
5   5        2b       87
6   6        3a       82
7   7        3b       66
8   8        3c       64
9   9        4a      103
10 10        4b       71
11 11        4c       85
12 12        4d       81
13 13        5a       92
14 14        5b      120
15 15        5c      110
16 16        6a      120
17 17        6b       90
18 18        6c      103
19 19        6d      104
20 20        7a      106
21 21        7b       78

The next stage is to join this data onto the data slot of the OA object. This code uses the match() function to examine which rows in the OAC_diabetes_rates match with those in the OA@data. The matches are done using the OAC group name columns in both objects. If you print the first few rows of the data slot on the OA object you will see that this now contains the data you calculated above.


# Join higher geog onto OA
OA@data = data.frame(OA@data, OAC_diabetes_rates[match(OA@data[, "OACGroup"], 
    OAC_diabetes_rates[, "oac_names"]), ])

head(OA@data)
   ZONE_CODE      x      y OACGroup                            Label Tpop
0 00DAFD0037 429281 430671       6c Young Families in Terraced Homes  378
1 00DAFU0005 424041 428395       6a               Settled Households  605
2 00DAFX0010 428917 441090       4d                 Thriving Suburbs  372
3 00DAGH0063 440027 449531       4a      Prospering Younger Families  300
4 00DAGK0037 426390 432257       1c                Older Blue Collar  260
5 00DAGE0014 435052 436617       5b                    Older Workers  291
  Tpop40 ID oac_names diabetes
0     97 18        6c      103
1    326 16        6a      120
2    193 12        4d       81
3    164  9        4a      103
4    139  3        1c      109
5    134 14        5b      120

You now need to calculate a new column in the OA object that transforms the index scores into a predicted rate of people with diabetes in these areas. As noted earlier, the national average is around 4% of the population which we will adjust depending on the OAC group that the area is assigned.

To do this, we can use the following equation which transforms the index score into a predicted rate.

\[ \frac{\text{OAC Group Index Score}} {100} * 4 \]

If you remember from your previous work, you can use the $ to refer to columns within your data, and also use them to create a new column. Thus…

OA@data$d_rate <- OA@data$diabetes/100 * 4

You now need create another column in your OA@data called “at_risk” which is the number of people who are estimated to have diabetes. For this calculation, you should use the number of people over 40 as your target group (i.e. Tpop40). If this has been done correctly, your OA@data should look like…

head(OA@data)
   ZONE_CODE      x      y OACGroup                            Label Tpop
0 00DAFD0037 429281 430671       6c Young Families in Terraced Homes  378
1 00DAFU0005 424041 428395       6a               Settled Households  605
2 00DAFX0010 428917 441090       4d                 Thriving Suburbs  372
3 00DAGH0063 440027 449531       4a      Prospering Younger Families  300
4 00DAGK0037 426390 432257       1c                Older Blue Collar  260
5 00DAGE0014 435052 436617       5b                    Older Workers  291
  Tpop40 ID oac_names diabetes d_rate at_risk
0     97 18        6c      103   4.12   3.996
1    326 16        6a      120   4.80  15.648
2    193 12        4d       81   3.24   6.253
3    164  9        4a      103   4.12   6.757
4    139  3        1c      109   4.36   6.060
5    134 14        5b      120   4.80   6.432

5. Mapping OAC in Leeds

To illustrate the geography of OAC in Leeds we will first map this categorical variable. Earlier you created a variable called ID on 'OA@data'. We can use these 1-21 scores to assign different colours from a list to each of the OA.

# Define a set of colours, one for each of the OAC groups
my_colour <- c("#00688B", "#33A1C9", "#87CEEB", "#CDBE70", "#FFEC8B", "#6E8B3D", 
    "#A2CD5A", "#CAFF70", "#8B4C39", "#CD7054", "#FA8072", "#BC8F8F", "#848484", 
    "#B7B7B7", "#DCDCDC", "#5D478B", "#9F79EE", "#CD69C9", "#CD96CD", "#8B7E66", 
    "#CDBA96")

# Create a basic OAC choropleth map
plot(OA, col = my_colour[OA@data$ID], axes = FALSE, border = NA)
# Plot the wards on top of the OA
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
# Add the legend (the oac_names object was created earlier)
legend(x = 412971, y = 439516, legend = oac_names, fill = my_colour, bty = "n", 
    cex = 0.5, ncol = 2)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(444282, 423158), scale = 2000, 
    plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(436020, 423158), scale = 5000, 
    fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(436020, 424158, "0km", cex = 0.6)
text(436020 + 2500, 424158, "2.5km", cex = 0.6)
text(436020 + 5000, 424158, "5km", cex = 0.6)
# Add a title
title("OAC Group Map of Leeds")

plot of chunk unnamed-chunk-18

This map is not very attractive or easy to interpret as it has far too many categories. As such, you now need to recreate it, however, with the more aggregate OAC Supergroups rather than Groups. We will first create a new column called ID2, which corresponds to the number at the start of the the OAC Group code. For example, Groups 6c and 6a would be in SuperGroup 6. We will use the substr() function which can be used to extract part of a character vector. We also use the function as.numeric() to turn the number (which was extracted from a character string) into a numeric value.

# The numbers 1,1 refer to the start and end point in the string to
# extract
OA@data$ID2 <- as.numeric(substr(OA@data$OACGroup, 1, 1))

For your report you will need to include an OAC SuperGroup map of Leeds with the location of the GP surgeries on it - remember, you can use add=TRUE to overplot new layers. For example, you could add points onto the previous map as follows. The pch= parameter specifies the symbol type - there are 25 different symbols which can be specified by changing the number 19 to 1 through 25.

plot(GP_SP_Leeds, pch = 19, cex = 0.4, col = "#00688B")

plot of chunk unnamed-chunk-20

You won't just be able to replicate the OAC Group code entirely (shown above); you will for example need to specify a list object containing a set of just seven colours to use for the choropleth, and also a new set of names for the SuperGroups which are:

  1. Blue collar communities
  2. City Living
  3. Countryside
  4. Prospering suburbs
  5. Constrained by circumstances
  6. Typical traits
  7. Multicultural

You might aim to create a map which looks a little like the following…

plot of chunk unnamed-chunk-21

6. Mapping the estimated at risk population in Leeds

Earlier you calculated the estimated frequency of people who would be considered at risk of diabetes within each Output Area. You can now map both the percentage rates of the target group 40 plus (d_rate) and those estimated at risk (at_risk).

Ok, we will first map the percentage rates.

# set breaks for 6 categories and then add the values to a single list -
# style = fisher is jenks
breaks <- classIntervals(OA@data$d_rate, n = 6, style = "fisher")
# classIntervals returns and object with a number of different slots -
# however, we only need the break values; thus, we overwrite the object
# breaks as follows
breaks <- breaks$brks
# Select six colours from the pallet Blues - remember you can see all the
# pallets by running display.brewer.all()
my_colour <- brewer.pal(6, "Blues")
# Create a basic choropleth map
plot(OA, col = my_colour[findInterval(OA@data$d_rate, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the OA
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add the legend
legend(x = 412144, y = 429519, legend = leglabs(breaks, between = " to "), fill = my_colour, 
    bty = "n", cex = 0.7)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(444282, 423158), scale = 2000, 
    plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(436020, 423158), scale = 5000, 
    fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(436020, 424158, "0km", cex = 0.6)
text(436020 + 2500, 424158, "2.5km", cex = 0.6)
text(436020 + 5000, 424158, "5km", cex = 0.6)
# Add a title
title("Estimate rate of diabetes in Leeds")

plot of chunk unnamed-chunk-22

You now need to create another map for the at_risk column; which should look something like…

plot of chunk unnamed-chunk-23

7. Estimating GP Catchment Areas

The are many different ways in which GP surgery catchment areas could be estimated. In this example you will use a simple buffer method that assumes patients of a GP surgery predominantly reside within a euclidean distance approximating a ten minute walk. You will need to calculate this based on an estimated distance that the average person would walk in ten minutes. At an average of around 3mph this is 1608m.

The rgeos package has a function called gBuffer() that can be used to create buffers around points, lines or polygon objects. In the following example we create a new SpatialPolygons object called GP_SP_Leeds_Buffers. This then needs to be converted into a SpatialPolygonsDataFrame object by joining the @data from GP_SP_Leeds onto GP_SP_Leeds_Buffers.

# buffers
GP_SP_Leeds_Buffers <- gBuffer(GP_SP_Leeds, width = 1608, byid = TRUE)

# Convert GP_SP_Leeds_Buffers into a SpatialPolygonsDataFrame (rather than
# SpatialPolygons) by joining the data of the GP_SP_Leeds
# SpatialPolygonsDataFrame
GP_SP_Leeds_Buffers <- SpatialPolygonsDataFrame(GP_SP_Leeds_Buffers, GP_SP_Leeds@data)

We can then plot the the GPs and catchment areas on a map. You will see an addition of a par() parameter in this plot which we can use to change lots of attributes about the plot window - in this case the background colour.

# Change the plot window background colour
par(bg = "#696969")
# cas wards
plot(CAS, axes = FALSE, col = "#6E7B8B", border = "#CAE1FF")
# GP locations
plot(GP_SP_Leeds, pch = 19, cex = 0.4, col = "#5CACEE", add = TRUE)
# catchment buffers
plot(GP_SP_Leeds_Buffers, axes = FALSE, col = NA, border = "#F0F0F0", add = TRUE)

plot of chunk unnamed-chunk-25

null device 1

Estimating GP Catchment Areas Characteristics

To examine the characteristics of the estimated catchment areas you will now complete a further point in polygon operation that will examine the characteristics of the Output Areas that lie under the buffer zonal geography. However, before we can do this, we first need to convert the Output Area data into a set of points. Fortunately the attribute data slot of the OA object contains population weighted centroids for each OA. As such we can use these to build a new SpatialPointsDataFrame with a selection of the attribute data. An alternate way of doing this more generally is to use the coordinates() function.

# Build a new SpatialPointsDataFrame called OA_Points
OA_Points <- SpatialPointsDataFrame(coords = c(OA@data[2], OA@data[3]), data = data.frame(OA@data$at_risk, 
    OA@data$Tpop, OA@data$Tpop40), proj4string = CRS("+init=epsg:27700"))
# Plot a map
plot(OA_Points)

plot of chunk unnamed-chunk-27

We now need to implement another point in polygon operation, however, because we are interested in calculating some values for each catchment area, we first need to add an extra parameter to the over() function - returnList=TRUE. This returns a list of data frames, where each element of the list is a separate data frame and refers to a catchment area (i.e. GP), and the values those OA which are within this zone. This is a little different from the lists you have created previously which were just lists of character strings.

# Create point in polygon list
o <- over(GP_SP_Leeds_Buffers, OA_Points, returnList = TRUE)
# View length of the list
length(o)
[1] 149
# If we examine the object o, we will see also see that this comprises a
# list of dataframes. The summary function tells you about an object -
# head, is used to wrap around the function so only the first six elements
# are shown
head(summary(o))
     Length Class        Mode  
[1,] "3"    "data.frame" "list"
[2,] "3"    "data.frame" "list"
[3,] "3"    "data.frame" "list"
[4,] "3"    "data.frame" "list"
[5,] "3"    "data.frame" "list"
[6,] "3"    "data.frame" "list"
# View an item from the list (in this case sixth)
o[[6]]
     OA.data.at_risk OA.data.Tpop OA.data.Tpop40
10            4.7628          448            147
26            4.6000          230            125
28            3.9528          221            122
29            2.9928          237             86
33            4.5720          239            127
57            0.2916          395              9
77            3.4344          468            106
85            0.8700          392             25
129           3.1668          260             91
149           6.3336          328            182
201           3.8280          323            110
203           4.3264          232            104
208           2.0904          197             67
253           0.4860          182             15
285           4.6640          372            110
338           1.5552          175             48
342           2.4360          280             70
368           0.5508          299             17
373           2.6796          285             77
382           0.8100          533             25
384           5.3628          287            123
423           5.7240          316            135
455           5.1156          351            147
475           1.2636          304             39
489           2.3328          118             72
527           2.6448          275             76
550           1.0092          213             29
557           0.9396          510             29
585           2.6796          237             77
590           1.4580          232             45
606           0.8004          361             23
615           5.9160          507            170
635           2.8836          563             89
686           5.5440          215            154
708           0.7128          559             22
714           1.0368          387             32
730           6.0960          321            127
743           2.7232          141             74
752           3.9204          368            121
809           3.0528          309             72
815           0.4524          163             13
831           2.5404          197             73
841           3.0456          759             94
856           6.0480          393            126
872           1.1340          521             35
901           0.7656          222             22
947           3.5192          321             83
960           0.4212          112             13
969           1.4580          402             45
980           4.1760          187             87
989           1.0092          524             29
1000          1.4268          511             41
1018          0.1620          702              5
1030          5.5328          270            133
1034          2.1120          322             44
1036          1.8232          528             43
1041          1.3932          328             43
1052          1.6524          650             51
1076          5.3940          285            155
1143          5.7988          304            133
1148          1.4832          314             36
1176          1.1484          335             33
1224          6.1568          363            148
1229          0.1296          582              4
1242          2.2048          497             52
1257          4.8888          226             97
1275          3.9520          334             95
1306          1.3572          248             39
1311          6.4024          334            151
1321          2.8536          340             82
1343         11.1780          925            345
1345          1.3920          308             40
1392          1.5688          142             37
1400          2.5752          349             74
1431          0.6156          432             19
1494          0.8004          487             23
1548          4.9764          330            143
1564          5.5680          370            160
1607          1.9836          223             57
1615          3.3372          180            103
1629          1.3608           74             42
1653          0.8748          284             27
1654          0.9048          314             26
1672          8.1600          318            170
1707          0.2916          299              9
1728          3.7492          421             91
1733          8.7776          426            211
1736          2.1576          295             62
1774          0.2916           92              9
1788          8.0160          377            167
1803          5.2832          365            127
1868          0.7128          485             22
1878          1.5312          351             44
1892          5.6160          291            156
1898          3.0488          251             74
1903          3.0624          284             88
1919          0.5184          412             16
1920          4.4096          362            106
1953          8.6112          354            207
1967          6.0180          350            177
1970          2.8536          284             82
1974          3.9008          238             92
1975          2.6400          236             55
1976          3.5520          166             74
1984          1.6356          340             47
2003          3.3408          242             96
2036          8.0640          389            168
2042          2.8016          403             68
2064          1.8144          310             56
2124          1.6200          251             50
2166          2.2620          414             65
2188          5.1012          185            117
2220          2.4012          262             69
2302          1.2960          818             40
2310          3.7920          134             79
2314          0.8352          488             24
2320          5.4400          304            160
2321          2.6448          327             76
2341          3.9856          208             94
2343          1.5876          658             49
2367          1.3284          134             41
2429          3.0780          280             95
2438          3.4104          342             98

The above results are the values from those OA which are within the catchment area for the sixth GP in the GP_SP_Leeds_Buffers object; i.e.

GP_SP_Leeds_Buffers@data[6, ]
                   GP.Surgery GP.GP_ID  gid ons_label     name  label
3596 Dr Allman I G & Partners     3596 3430    00DAGG Weetwood 08DAGG

We can illustrate this by plotting the results for a single GP.

# Plot the buffer for the GP
plot(GP_SP_Leeds_Buffers[6, ], col = "#D4D4D4")
# get a list of those OA centroids that are within this buffer using the
# rownames - make them numeric
row_o <- as.numeric(rownames(o[[6]]))
# Plot all the OA centroids
plot(OA_Points, pch = 19, cex = 0.5, col = "#7F7F7F", add = TRUE)
# Plot those OA centroids which are inside the buffer - i.e. we use the
# row_o to select the rows
plot(OA_Points[row_o, ], pch = 19, cex = 0.5, col = "#FF4500", add = TRUE)

plot of chunk unnamed-chunk-30

The next stage is to look at the results of the point in polygon analysis and calculate the characteristics for each GP catchment. The first stage is to use the lapply() function to apply a function across the list. In this case, for each column in the list elements (data frames), we calculate the sum (i.e. colSums). We create a new object called GP_Risk that is a list containing the estimated at risk population and the total and 40+ year olds.

GP_Risk <- lapply(o, colSums)
# View the first six items from the list
GP_Risk[1:6]
[[1]]
OA.data.at_risk    OA.data.Tpop  OA.data.Tpop40 
          409.1         20270.0          9978.0 

[[2]]
OA.data.at_risk    OA.data.Tpop  OA.data.Tpop40 
          167.1          7905.0          4152.0 

[[3]]
OA.data.at_risk    OA.data.Tpop  OA.data.Tpop40 
          358.4         19370.0          8621.0 

[[4]]
OA.data.at_risk    OA.data.Tpop  OA.data.Tpop40 
          436.5         23768.0         10581.0 

[[5]]
OA.data.at_risk    OA.data.Tpop  OA.data.Tpop40 
            302           21298            7177 

[[6]]
OA.data.at_risk    OA.data.Tpop  OA.data.Tpop40 
          376.8         41508.0          9951.0 

In order to join this data back onto your GP records (GP_SP_Leeds) we need to collapse the list back into a single data frame which we can do by using the following code.

# Collapse list
GP_Risk_Data <- data.frame(matrix(unlist(GP_Risk), nrow = length(GP_Risk), byrow = T))
# Change the column names to something sensible
colnames(GP_Risk_Data) <- c("At_Risk", "Total_Pop", "Pop40plus")
# This should look like
head(GP_Risk_Data)
  At_Risk Total_Pop Pop40plus
1   409.1     20270      9978
2   167.1      7905      4152
3   358.4     19370      8621
4   436.5     23768     10581
5   302.0     21298      7177
6   376.8     41508      9951

We can now join these records back onto the GP_SP_Leeds spatial data frame object and calculate a relative rate.

# Join - the ordering has not been changed so this is valid
GP_SP_Leeds@data <- cbind(GP_SP_Leeds@data, GP_Risk_Data)
# Relative rate
GP_SP_Leeds@data$Rel_At_Risk <- GP_SP_Leeds@data$At_Risk/GP_SP_Leeds@data$Pop40plus * 
    100

And create a map…this is the absolute at risk.

# cas wards
plot(CAS, axes = FALSE, col = "#6E7B8B", border = "#AAAAAA")
# Generate a set of breaks and colours for the absolute population at risk
breaks <- classIntervals(GP_SP_Leeds@data$At_Risk, n = 5, style = "fisher")
breaks <- breaks$brks
my_colour <- brewer.pal(5, "OrRd")
# Plot the GPs, colouring them by the breaks
plot(GP_SP_Leeds, pch = 15, cex = 0.8, col = my_colour[findInterval(GP_SP_Leeds@data$At_Risk, 
    breaks, all.inside = TRUE)], add = TRUE)
# Add legend
legend(x = 412144, y = 429519, legend = leglabs(breaks, between = " to "), fill = my_colour, 
    bty = "n", cex = 0.7)

plot of chunk unnamed-chunk-34

In your report, you will need to think whether relative or absolute rates are most appropriate for use in your selection of target clinic location. A relative at risk rate map might look something like…

plot of chunk unnamed-chunk-35

The various analysis that you have prepared during this tutorial will be of assistance when compiling your report. You may have to piece together different parts of the code to achieve this, however, examples of any code that you need to complete your report are contained in this document.

As a final helper, if you want to display an ordered list you can do so using the order() function.

# An example sorting a data frame by the easting column
head(GP[order(GP$Easting, decreasing = TRUE), ])
                               Surgery Easting Northing GP_ID
8145 Seehra, Lockyer, Drummond & Davis  655043   294264  8145
8127                Beach Road Surgery  654968   292961  8127
8154           Dr Henderson & partners  654921   293553  8154
8135                    Cox & Partners  654703   292549  8135
8129          Dr Kelly, Gunn & De Ligt  654144   291453  8129
8140           Dr Henderson & partners  653448   295082  8140
# An example sorting a spatial points data frame by the GP.Surgery
head(GP[order(GP_SP_Leeds@data$GP.Surgery, decreasing = TRUE), ])
                            Surgery Easting Northing GP_ID
134    RICHMOND ROAD MEDICAL CENTRE  518183   170063   134
133        NEW MALDEN HEALTH CENTRE  521223   172008   133
71  Annie Prendergast Health Centre  547921   188403    71
132        NEW MALDEN HEALTH CENTRE  521490   168272   132
147               CLAREMONT SURGERY  517881   167121   147
131             MANOR DRIVE SURGERY  522034   166120   131

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.