In this exercise you take on the role of an owner of a direct marketing consultancy that specialises in helping small businesses to target potential service users within their local area. Your most recent client is a group of doctors who are organising a diabetes awareness campaign that involves delivery of an information pamphlet to local residents and holding a testing clinic at one of their surgeries.
The doctors have two problems:
You have been asked to write a short executive report (max 500 words) that will answer the following questions:
In addition to answering these two questions in your report, you have been asked to present as a minimum the following information:
Overall, your report should not have more than 10 maps or charts. All maps must be generated within R; and all maps should adopt different (but appropriate) colour schemes to those illustrated in this practical.
The learning objectives for this activity are as follows:
Interpret findings from geodemographic analysis
Around 4% of the UK population has diabetes. In this exercise, the at risk population is defined as people who are aged over 40. When an age, gender and ethnic group standardised index score are calculated with the Office for National Statistics Output Area Classification (OAC), it can be shown that these patterns are not evenly distributed across different types of neighbourhood (see Figure 1 and Table 1).
Figure 1: Percentage different from the national average for diabetes related hospital admissions
Table 1: Propensity for diabetes related hospital admissions
OAC Group | Index Score |
---|---|
1a: Terraced Blue Collar | 126.00 |
1b: Younger Blue Collar | 96.00 |
1c: Older Blue Collar | 109.00 |
2a: Transient Communities | 81.00 |
2b: Settled in the City | 87.00 |
3a: Village Life | 82.00 |
3b: Agricultural | 66.00 |
3c: Accessible Countryside | 64.00 |
4a: Prospering Younger Families | 103.00 |
4b: Prospering Older Families | 71.00 |
4c: Prospering Semis | 85.00 |
4d: Thriving Suburbs | 81.00 |
5a: Senior Communities | 92.00 |
5b: Older Workers | 120.00 |
5c: Public Housing | 110.00 |
6a: Settled Households | 120.00 |
6b: Least Divergent | 90.00 |
6c: Young Families in Terraced Homes | 103.00 |
6d: Aspiring Households | 104.00 |
7a: Asian Communities | 106.00 |
7b: Afro-Caribbean Communities | 78.00 |
Interpreting these index scores is a relatively straight forward task. A score of 100 indicates an OAC Group where admissions to hospital with conditions related to Diabetes are represented in the same proportion as the national average. A score of 200 would indicate admissions at a rate twice the national average, and a score of 50 half the national average. Index scores formatted in this way are commonly used in geodemographic analysis.
For this exercise you will use three sets of data: population estimates, the Output Area Classification and the locations of GP surgeries.
You first need to setup your working directory, and then load a number of packages that you will need later in the practical. When you restart the practical on different machines, you will need to run these lines of code again.
library(maptools)
# Set working directory
setwd("M:/R work/social_marketing/")
# Choose the CRAN mirror where the packages are going to be downloaded
# from
options(repos = c(CRAN = "http://www.stats.bris.ac.uk/R/"))
# Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())
# Download / install and then load maptools
install.packages("maptools", depend = TRUE, lib = getwd())
library(maptools, lib.loc = getwd())
# Download / install and then load stringr
install.packages("stringr", depend = TRUE, lib = getwd())
library("stringr", lib.loc = getwd())
# Download / install and then load rgeos
install.packages("rgeos", depend = TRUE, lib = getwd())
library("rgeos", lib.loc = getwd())
# Download / install and then load classInt
install.packages("classInt", depend = TRUE, lib = getwd())
library("classInt", lib.loc = getwd())
# Download / install and then load RColorBrewer
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())
Shapefiles for use in this exercise have already been prepared for you. You can download the Output Area (OA) shapefile from here [link] ; and the CAS Wards here [link]
Once you have downloaded these two Shapefiles, unzip them into the working directory you are using for this practical. We then will take a quick look at both the geometry and the attributes of the newly created object OA.
# Get 2011 OA for Leeds
OA <- readShapeSpatial("OA", proj4string = CRS("+init=epsg:27700"))
# Get 2011 CAS for Leeds
CAS <- readShapeSpatial("CAS", proj4string = CRS("+init=epsg:27700"))
# Plot of Leeds
plot(OA)
In the data slot, you will see that there are a variety of attributes about each Output Area in Leeds including an area id (Zone_Code), the co-ordinates of the OA centroid, the geodemographic group (OACGroup), a name for the geodemographic group (label) and the total population within the area (Tpop).
# First six rows of data
head(OA@data)
ZONE_CODE x y OACGroup Label Tpop
0 00DAFD0037 429281 430671 6c Young Families in Terraced Homes 378
1 00DAFU0005 424041 428395 6a Settled Households 605
2 00DAFX0010 428917 441090 4d Thriving Suburbs 372
3 00DAGH0063 440027 449531 4a Prospering Younger Families 300
4 00DAGK0037 426390 432257 1c Older Blue Collar 260
5 00DAGE0014 435052 436617 5b Older Workers 291
Tpop40
0 97
1 326
2 193
3 164
4 139
5 134
The next dataset you will download is the locations of all GP surgeries from the Office for National Statistics website: http://www.neighbourhood.statistics.gov.uk/dissemination/
Near the base of the page click on the link “Topics”, then on the following page enter “General Practices” into the search box and click the “Go” button. Expand the returned list by clicking the + button on Data set name matches ==>> Exact matches, then select “General Practices / Surgeries, 2006” from the returned data sets, by clicking the radio button on the right hand side of the page. Then click the “Next” button on the bottom right of the page. Download the CSV file on the next page, placing this in your working directory.
This can now be read into R as a new object called GP, and trimmed to only include the surgery name and the Easting and Northing coordinates.
GP <- read.csv("General Practices 2006.csv", header = TRUE, skip = 3)
GP <- subset(GP, select = c("Practice.Doctor.s.Name", "Easting", "Northing"))
colnames(GP) <- c("Surgery", "Easting", "Northing")
The coverage of the GP data can be explored using a basic plot of the x,y coordinates.
plot(GP$Easting, GP$Northing)
You will see that the GP locations have a coverage of the whole of England; and as such, it is necessary to extract only those GPs that reside within Leeds. We can do this using a point in polygon operation - i.e. identify those doctors (points) which lie within the Leeds area (polygon). However, before we can do this, we must first convert the GP data frame into a SpatialPointsDataFrame.
The function SpatialPointsDataFrame has a number of parameters - coords which accepts a list of the x and y coordinates, data that accepts the data frame containing attributes for each of the coordinates, and finally, a CRS string that shows the coordinate reference system for the given set of points - there are codes for lots of different referencing systems; however here we are using the one which corresponds to British National Grid.
# Remove those GP without Easting or Northing
GP <- subset(GP, Easting != "" | Northing != "")
# Create a unique ID for each GP
GP$GP_ID <- 1:nrow(GP)
# Create the SpatialPointsDataFrame
GP_SP <- SpatialPointsDataFrame(coords = c(GP[2], GP[3]), data = data.frame(GP$Surgery,
GP$GP_ID), proj4string = CRS("+init=epsg:27700"))
# Show the results
plot(GP_SP)
The next stage of analysis is to perform a point in polygon operation using the over() function - this examines two spatial objects and returns a different set of attributes depending whether the objects are points, lines or polygons. In the case of a point and a polygon, the returned object is a data frame of the attributes of the polygon in which the point resides. The returned data frame therefore has the same number of rows as the input spatial point data frame.
# point in polygon - returns a dataframe of the attributes of the polygons
# that the point is within.
o <- over(GP_SP, CAS)
# Many of these will be NA values - because most GPs are not in Leeds!
head(o)
gid ons_label name label
1 NA <NA> <NA> <NA>
2 NA <NA> <NA> <NA>
3 NA <NA> <NA> <NA>
4 NA <NA> <NA> <NA>
5 NA <NA> <NA> <NA>
6 NA <NA> <NA> <NA>
# Add the attributes back onto the GP_SP SpatialPointsDataFrame
GP_SP@data <- cbind(GP_SP@data, o)
# Use the NA values to remove those points now within Leeds
GP_SP_Leeds <- GP_SP[!is.na(GP_SP@data$label), ]
# Map your results
plot(GP_SP_Leeds)
# View the data slot of the results
head(GP_SP_Leeds@data)
GP.Surgery GP.GP_ID gid ons_label name
3591 Dr A M Marshall & Partners 3591 1664 00DAFA Aireborough
3592 Dr Adams R J & Partners 3592 1662 00DAFU Morley North
3593 Dr Adams R J & Partners 3593 1662 00DAFU Morley North
3594 Dr Adams R J & Partners 3594 1662 00DAFU Morley North
3595 Dr Ali S A 3595 3470 00DAFQ Hunslet
3596 Dr Allman I G & Partners 3596 3430 00DAGG Weetwood
label
3591 08DAFA
3592 08DAFU
3593 08DAFU
3594 08DAFU
3595 08DAFQ
3596 08DAGG
An estimated population at risk within each OA can be calculated using the index scores that were given in Table 1. The aim is to adjust the national rate of diabetes up or down from 4% based on the OAC Group characteristics of the area, and then calculate an estimated population count for the frequency of people with or at risk from Diabetes.
You now need to create a new data frame object called OAC_diabetes_rates that will contain these rates. You can create a data frame using the data.frame() function, which will accept a series of lists as their input. For example…
# Character strings use ''
forename <- c("Alex", "John", "Peter", "Eric")
surname <- c("Smith", "Peters", "Jones", "Mount")
# Numbers do not use ''''
age <- c(10, 15, 67, 50)
# Create a data frame
people <- data.frame(forename, surname, age)
# This should look like this if printed to the terminal
people
forename surname age
1 Alex Smith 10
2 John Peters 15
3 Peter Jones 67
4 Eric Mount 50
You now need to create a lookup data frame for the index scores that contains:
If you have done this correctly, this will look like
ID oac_names diabetes
1 1 1a 126
2 2 1b 96
3 3 1c 109
4 4 2a 81
5 5 2b 87
6 6 3a 82
7 7 3b 66
8 8 3c 64
9 9 4a 103
10 10 4b 71
11 11 4c 85
12 12 4d 81
13 13 5a 92
14 14 5b 120
15 15 5c 110
16 16 6a 120
17 17 6b 90
18 18 6c 103
19 19 6d 104
20 20 7a 106
21 21 7b 78
The next stage is to join this data onto the data slot of the OA object. This code uses the match() function to examine which rows in the OAC_diabetes_rates match with those in the OA@data. The matches are done using the OAC group name columns in both objects. If you print the first few rows of the data slot on the OA object you will see that this now contains the data you calculated above.
# Join higher geog onto OA
OA@data = data.frame(OA@data, OAC_diabetes_rates[match(OA@data[, "OACGroup"],
OAC_diabetes_rates[, "oac_names"]), ])
head(OA@data)
ZONE_CODE x y OACGroup Label Tpop
0 00DAFD0037 429281 430671 6c Young Families in Terraced Homes 378
1 00DAFU0005 424041 428395 6a Settled Households 605
2 00DAFX0010 428917 441090 4d Thriving Suburbs 372
3 00DAGH0063 440027 449531 4a Prospering Younger Families 300
4 00DAGK0037 426390 432257 1c Older Blue Collar 260
5 00DAGE0014 435052 436617 5b Older Workers 291
Tpop40 ID oac_names diabetes
0 97 18 6c 103
1 326 16 6a 120
2 193 12 4d 81
3 164 9 4a 103
4 139 3 1c 109
5 134 14 5b 120
You now need to calculate a new column in the OA object that transforms the index scores into a predicted rate of people with diabetes in these areas. As noted earlier, the national average is around 4% of the population which we will adjust depending on the OAC group that the area is assigned.
To do this, we can use the following equation which transforms the index score into a predicted rate.
\[ \frac{\text{OAC Group Index Score}} {100} * 4 \]
If you remember from your previous work, you can use the $ to refer to columns within your data, and also use them to create a new column. Thus…
OA@data$d_rate <- OA@data$diabetes/100 * 4
You now need create another column in your OA@data called “at_risk” which is the number of people who are estimated to have diabetes. For this calculation, you should use the number of people over 40 as your target group (i.e. Tpop40). If this has been done correctly, your OA@data should look like…
head(OA@data)
ZONE_CODE x y OACGroup Label Tpop
0 00DAFD0037 429281 430671 6c Young Families in Terraced Homes 378
1 00DAFU0005 424041 428395 6a Settled Households 605
2 00DAFX0010 428917 441090 4d Thriving Suburbs 372
3 00DAGH0063 440027 449531 4a Prospering Younger Families 300
4 00DAGK0037 426390 432257 1c Older Blue Collar 260
5 00DAGE0014 435052 436617 5b Older Workers 291
Tpop40 ID oac_names diabetes d_rate at_risk
0 97 18 6c 103 4.12 3.996
1 326 16 6a 120 4.80 15.648
2 193 12 4d 81 3.24 6.253
3 164 9 4a 103 4.12 6.757
4 139 3 1c 109 4.36 6.060
5 134 14 5b 120 4.80 6.432
To illustrate the geography of OAC in Leeds we will first map this categorical variable. Earlier you created a variable called ID on 'OA@data'. We can use these 1-21 scores to assign different colours from a list to each of the OA.
# Define a set of colours, one for each of the OAC groups
my_colour <- c("#00688B", "#33A1C9", "#87CEEB", "#CDBE70", "#FFEC8B", "#6E8B3D",
"#A2CD5A", "#CAFF70", "#8B4C39", "#CD7054", "#FA8072", "#BC8F8F", "#848484",
"#B7B7B7", "#DCDCDC", "#5D478B", "#9F79EE", "#CD69C9", "#CD96CD", "#8B7E66",
"#CDBA96")
# Create a basic OAC choropleth map
plot(OA, col = my_colour[OA@data$ID], axes = FALSE, border = NA)
# Plot the wards on top of the OA
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
# Add the legend (the oac_names object was created earlier)
legend(x = 412971, y = 439516, legend = oac_names, fill = my_colour, bty = "n",
cex = 0.5, ncol = 2)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(444282, 423158), scale = 2000,
plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(436020, 423158), scale = 5000,
fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(436020, 424158, "0km", cex = 0.6)
text(436020 + 2500, 424158, "2.5km", cex = 0.6)
text(436020 + 5000, 424158, "5km", cex = 0.6)
# Add a title
title("OAC Group Map of Leeds")
This map is not very attractive or easy to interpret as it has far too many categories. As such, you now need to recreate it, however, with the more aggregate OAC Supergroups rather than Groups. We will first create a new column called ID2, which corresponds to the number at the start of the the OAC Group code. For example, Groups 6c and 6a would be in SuperGroup 6. We will use the substr() function which can be used to extract part of a character vector. We also use the function as.numeric() to turn the number (which was extracted from a character string) into a numeric value.
# The numbers 1,1 refer to the start and end point in the string to
# extract
OA@data$ID2 <- as.numeric(substr(OA@data$OACGroup, 1, 1))
For your report you will need to include an OAC SuperGroup map of Leeds with the location of the GP surgeries on it - remember, you can use add=TRUE to overplot new layers. For example, you could add points onto the previous map as follows. The pch= parameter specifies the symbol type - there are 25 different symbols which can be specified by changing the number 19 to 1 through 25.
plot(GP_SP_Leeds, pch = 19, cex = 0.4, col = "#00688B")
You won't just be able to replicate the OAC Group code entirely (shown above); you will for example need to specify a list object containing a set of just seven colours to use for the choropleth, and also a new set of names for the SuperGroups which are:
You might aim to create a map which looks a little like the following…
Earlier you calculated the estimated frequency of people who would be considered at risk of diabetes within each Output Area. You can now map both the percentage rates of the target group 40 plus (d_rate) and those estimated at risk (at_risk).
Ok, we will first map the percentage rates.
# set breaks for 6 categories and then add the values to a single list -
# style = fisher is jenks
breaks <- classIntervals(OA@data$d_rate, n = 6, style = "fisher")
# classIntervals returns and object with a number of different slots -
# however, we only need the break values; thus, we overwrite the object
# breaks as follows
breaks <- breaks$brks
# Select six colours from the pallet Blues - remember you can see all the
# pallets by running display.brewer.all()
my_colour <- brewer.pal(6, "Blues")
# Create a basic choropleth map
plot(OA, col = my_colour[findInterval(OA@data$d_rate, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the OA
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add the legend
legend(x = 412144, y = 429519, legend = leglabs(breaks, between = " to "), fill = my_colour,
bty = "n", cex = 0.7)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(444282, 423158), scale = 2000,
plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(436020, 423158), scale = 5000,
fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(436020, 424158, "0km", cex = 0.6)
text(436020 + 2500, 424158, "2.5km", cex = 0.6)
text(436020 + 5000, 424158, "5km", cex = 0.6)
# Add a title
title("Estimate rate of diabetes in Leeds")
You now need to create another map for the at_risk column; which should look something like…
The are many different ways in which GP surgery catchment areas could be estimated. In this example you will use a simple buffer method that assumes patients of a GP surgery predominantly reside within a euclidean distance approximating a ten minute walk. You will need to calculate this based on an estimated distance that the average person would walk in ten minutes. At an average of around 3mph this is 1608m.
The rgeos package has a function called gBuffer() that can be used to create buffers around points, lines or polygon objects. In the following example we create a new SpatialPolygons object called GP_SP_Leeds_Buffers. This then needs to be converted into a SpatialPolygonsDataFrame object by joining the @data from GP_SP_Leeds onto GP_SP_Leeds_Buffers.
# buffers
GP_SP_Leeds_Buffers <- gBuffer(GP_SP_Leeds, width = 1608, byid = TRUE)
# Convert GP_SP_Leeds_Buffers into a SpatialPolygonsDataFrame (rather than
# SpatialPolygons) by joining the data of the GP_SP_Leeds
# SpatialPolygonsDataFrame
GP_SP_Leeds_Buffers <- SpatialPolygonsDataFrame(GP_SP_Leeds_Buffers, GP_SP_Leeds@data)
We can then plot the the GPs and catchment areas on a map. You will see an addition of a par() parameter in this plot which we can use to change lots of attributes about the plot window - in this case the background colour.
# Change the plot window background colour
par(bg = "#696969")
# cas wards
plot(CAS, axes = FALSE, col = "#6E7B8B", border = "#CAE1FF")
# GP locations
plot(GP_SP_Leeds, pch = 19, cex = 0.4, col = "#5CACEE", add = TRUE)
# catchment buffers
plot(GP_SP_Leeds_Buffers, axes = FALSE, col = NA, border = "#F0F0F0", add = TRUE)
null device 1
To examine the characteristics of the estimated catchment areas you will now complete a further point in polygon operation that will examine the characteristics of the Output Areas that lie under the buffer zonal geography. However, before we can do this, we first need to convert the Output Area data into a set of points. Fortunately the attribute data slot of the OA object contains population weighted centroids for each OA. As such we can use these to build a new SpatialPointsDataFrame with a selection of the attribute data. An alternate way of doing this more generally is to use the coordinates() function.
# Build a new SpatialPointsDataFrame called OA_Points
OA_Points <- SpatialPointsDataFrame(coords = c(OA@data[2], OA@data[3]), data = data.frame(OA@data$at_risk,
OA@data$Tpop, OA@data$Tpop40), proj4string = CRS("+init=epsg:27700"))
# Plot a map
plot(OA_Points)
We now need to implement another point in polygon operation, however, because we are interested in calculating some values for each catchment area, we first need to add an extra parameter to the over() function - returnList=TRUE. This returns a list of data frames, where each element of the list is a separate data frame and refers to a catchment area (i.e. GP), and the values those OA which are within this zone. This is a little different from the lists you have created previously which were just lists of character strings.
# Create point in polygon list
o <- over(GP_SP_Leeds_Buffers, OA_Points, returnList = TRUE)
# View length of the list
length(o)
[1] 149
# If we examine the object o, we will see also see that this comprises a
# list of dataframes. The summary function tells you about an object -
# head, is used to wrap around the function so only the first six elements
# are shown
head(summary(o))
Length Class Mode
[1,] "3" "data.frame" "list"
[2,] "3" "data.frame" "list"
[3,] "3" "data.frame" "list"
[4,] "3" "data.frame" "list"
[5,] "3" "data.frame" "list"
[6,] "3" "data.frame" "list"
# View an item from the list (in this case sixth)
o[[6]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
10 4.7628 448 147
26 4.6000 230 125
28 3.9528 221 122
29 2.9928 237 86
33 4.5720 239 127
57 0.2916 395 9
77 3.4344 468 106
85 0.8700 392 25
129 3.1668 260 91
149 6.3336 328 182
201 3.8280 323 110
203 4.3264 232 104
208 2.0904 197 67
253 0.4860 182 15
285 4.6640 372 110
338 1.5552 175 48
342 2.4360 280 70
368 0.5508 299 17
373 2.6796 285 77
382 0.8100 533 25
384 5.3628 287 123
423 5.7240 316 135
455 5.1156 351 147
475 1.2636 304 39
489 2.3328 118 72
527 2.6448 275 76
550 1.0092 213 29
557 0.9396 510 29
585 2.6796 237 77
590 1.4580 232 45
606 0.8004 361 23
615 5.9160 507 170
635 2.8836 563 89
686 5.5440 215 154
708 0.7128 559 22
714 1.0368 387 32
730 6.0960 321 127
743 2.7232 141 74
752 3.9204 368 121
809 3.0528 309 72
815 0.4524 163 13
831 2.5404 197 73
841 3.0456 759 94
856 6.0480 393 126
872 1.1340 521 35
901 0.7656 222 22
947 3.5192 321 83
960 0.4212 112 13
969 1.4580 402 45
980 4.1760 187 87
989 1.0092 524 29
1000 1.4268 511 41
1018 0.1620 702 5
1030 5.5328 270 133
1034 2.1120 322 44
1036 1.8232 528 43
1041 1.3932 328 43
1052 1.6524 650 51
1076 5.3940 285 155
1143 5.7988 304 133
1148 1.4832 314 36
1176 1.1484 335 33
1224 6.1568 363 148
1229 0.1296 582 4
1242 2.2048 497 52
1257 4.8888 226 97
1275 3.9520 334 95
1306 1.3572 248 39
1311 6.4024 334 151
1321 2.8536 340 82
1343 11.1780 925 345
1345 1.3920 308 40
1392 1.5688 142 37
1400 2.5752 349 74
1431 0.6156 432 19
1494 0.8004 487 23
1548 4.9764 330 143
1564 5.5680 370 160
1607 1.9836 223 57
1615 3.3372 180 103
1629 1.3608 74 42
1653 0.8748 284 27
1654 0.9048 314 26
1672 8.1600 318 170
1707 0.2916 299 9
1728 3.7492 421 91
1733 8.7776 426 211
1736 2.1576 295 62
1774 0.2916 92 9
1788 8.0160 377 167
1803 5.2832 365 127
1868 0.7128 485 22
1878 1.5312 351 44
1892 5.6160 291 156
1898 3.0488 251 74
1903 3.0624 284 88
1919 0.5184 412 16
1920 4.4096 362 106
1953 8.6112 354 207
1967 6.0180 350 177
1970 2.8536 284 82
1974 3.9008 238 92
1975 2.6400 236 55
1976 3.5520 166 74
1984 1.6356 340 47
2003 3.3408 242 96
2036 8.0640 389 168
2042 2.8016 403 68
2064 1.8144 310 56
2124 1.6200 251 50
2166 2.2620 414 65
2188 5.1012 185 117
2220 2.4012 262 69
2302 1.2960 818 40
2310 3.7920 134 79
2314 0.8352 488 24
2320 5.4400 304 160
2321 2.6448 327 76
2341 3.9856 208 94
2343 1.5876 658 49
2367 1.3284 134 41
2429 3.0780 280 95
2438 3.4104 342 98
The above results are the values from those OA which are within the catchment area for the sixth GP in the GP_SP_Leeds_Buffers object; i.e.
GP_SP_Leeds_Buffers@data[6, ]
GP.Surgery GP.GP_ID gid ons_label name label
3596 Dr Allman I G & Partners 3596 3430 00DAGG Weetwood 08DAGG
We can illustrate this by plotting the results for a single GP.
# Plot the buffer for the GP
plot(GP_SP_Leeds_Buffers[6, ], col = "#D4D4D4")
# get a list of those OA centroids that are within this buffer using the
# rownames - make them numeric
row_o <- as.numeric(rownames(o[[6]]))
# Plot all the OA centroids
plot(OA_Points, pch = 19, cex = 0.5, col = "#7F7F7F", add = TRUE)
# Plot those OA centroids which are inside the buffer - i.e. we use the
# row_o to select the rows
plot(OA_Points[row_o, ], pch = 19, cex = 0.5, col = "#FF4500", add = TRUE)
The next stage is to look at the results of the point in polygon analysis and calculate the characteristics for each GP catchment. The first stage is to use the lapply() function to apply a function across the list. In this case, for each column in the list elements (data frames), we calculate the sum (i.e. colSums). We create a new object called GP_Risk that is a list containing the estimated at risk population and the total and 40+ year olds.
GP_Risk <- lapply(o, colSums)
# View the first six items from the list
GP_Risk[1:6]
[[1]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
409.1 20270.0 9978.0
[[2]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
167.1 7905.0 4152.0
[[3]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
358.4 19370.0 8621.0
[[4]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
436.5 23768.0 10581.0
[[5]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
302 21298 7177
[[6]]
OA.data.at_risk OA.data.Tpop OA.data.Tpop40
376.8 41508.0 9951.0
In order to join this data back onto your GP records (GP_SP_Leeds) we need to collapse the list back into a single data frame which we can do by using the following code.
# Collapse list
GP_Risk_Data <- data.frame(matrix(unlist(GP_Risk), nrow = length(GP_Risk), byrow = T))
# Change the column names to something sensible
colnames(GP_Risk_Data) <- c("At_Risk", "Total_Pop", "Pop40plus")
# This should look like
head(GP_Risk_Data)
At_Risk Total_Pop Pop40plus
1 409.1 20270 9978
2 167.1 7905 4152
3 358.4 19370 8621
4 436.5 23768 10581
5 302.0 21298 7177
6 376.8 41508 9951
We can now join these records back onto the GP_SP_Leeds spatial data frame object and calculate a relative rate.
# Join - the ordering has not been changed so this is valid
GP_SP_Leeds@data <- cbind(GP_SP_Leeds@data, GP_Risk_Data)
# Relative rate
GP_SP_Leeds@data$Rel_At_Risk <- GP_SP_Leeds@data$At_Risk/GP_SP_Leeds@data$Pop40plus *
100
And create a map…this is the absolute at risk.
# cas wards
plot(CAS, axes = FALSE, col = "#6E7B8B", border = "#AAAAAA")
# Generate a set of breaks and colours for the absolute population at risk
breaks <- classIntervals(GP_SP_Leeds@data$At_Risk, n = 5, style = "fisher")
breaks <- breaks$brks
my_colour <- brewer.pal(5, "OrRd")
# Plot the GPs, colouring them by the breaks
plot(GP_SP_Leeds, pch = 15, cex = 0.8, col = my_colour[findInterval(GP_SP_Leeds@data$At_Risk,
breaks, all.inside = TRUE)], add = TRUE)
# Add legend
legend(x = 412144, y = 429519, legend = leglabs(breaks, between = " to "), fill = my_colour,
bty = "n", cex = 0.7)
In your report, you will need to think whether relative or absolute rates are most appropriate for use in your selection of target clinic location. A relative at risk rate map might look something like…
The various analysis that you have prepared during this tutorial will be of assistance when compiling your report. You may have to piece together different parts of the code to achieve this, however, examples of any code that you need to complete your report are contained in this document.
As a final helper, if you want to display an ordered list you can do so using the order() function.
# An example sorting a data frame by the easting column
head(GP[order(GP$Easting, decreasing = TRUE), ])
Surgery Easting Northing GP_ID
8145 Seehra, Lockyer, Drummond & Davis 655043 294264 8145
8127 Beach Road Surgery 654968 292961 8127
8154 Dr Henderson & partners 654921 293553 8154
8135 Cox & Partners 654703 292549 8135
8129 Dr Kelly, Gunn & De Ligt 654144 291453 8129
8140 Dr Henderson & partners 653448 295082 8140
# An example sorting a spatial points data frame by the GP.Surgery
head(GP[order(GP_SP_Leeds@data$GP.Surgery, decreasing = TRUE), ])
Surgery Easting Northing GP_ID
134 RICHMOND ROAD MEDICAL CENTRE 518183 170063 134
133 NEW MALDEN HEALTH CENTRE 521223 172008 133
71 Annie Prendergast Health Centre 547921 188403 71
132 NEW MALDEN HEALTH CENTRE 521490 168272 132
147 CLAREMONT SURGERY 517881 167121 147
131 MANOR DRIVE SURGERY 522034 166120 131
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.