In this project you will compile a number of measures that can be used to assess the extent that small areas have changed in their composition over a 10 year period. You will explore data about changes in both built structures and demographics, and through the representations you create, formulate plausible hypotheses about why the patterns that you uncover may have occurred.
Through this work you will also learn a series of fundamental skills for the input, manipulation, analysis and display of spatial data using the statistical programming language R.
At this stage it might be worth looking at the end of this tutorial for the assignment questions as you can keep these in mind as you run through the practical.
Before we begin the practical, there are a couple of things to learn…
Firstly, what is a CSV file, and how can we get them into R? These files will usually have the file extension “.csv” and are typically opened in a spreadsheet software package.
Supposing we want to create a data set that contains a list of people who have come along to an R training workshop. This might look something like this:
| Name | Age | Place |
|---|---|---|
| John | 20 | Liverpool |
| Helen | 10 | London |
| Mia | 16 | Liverpool |
| Carl | 21 | Manchester |
Open Notepad and create a new text file, entering the following details as separate lines, with each column in the table separated by a comma, and each row on a new line (please note that the first row contains our desired headings):
Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester
However, before you do, create yourself a project directory in your storage area. Then use the set working directory function to tell R where you are saving your files.
Save the text file as “test.csv” in a new folder in your storage area. This will be your working directory for the rest of the practical. We need to tell R that this is where you will be storing all your files by running the setwd() function.
You need to paste the location of your working directory into the setwd() function between the “”. Please note, that on windows you need to make sure the slashes between the directories are formatted as either / or .
# For example...
setwd("M:/R work/Neighbourhood_Change/")
Next we will read in the CSV you created above.
# First create a new variable with the location of your csv file - you
# only need the file name as by default R looks in the working directory
file_location <- "test.csv"
# Next we will read this into a new data frame called test.
test <- read.csv(file_location, header = TRUE)
# Now print test
test
Name Age Place
1 John 20 Liverpool
2 Helen 10 London
3 Mia 16 Liverpool
4 Carl 21 Manchester
Next, duplicate test.csv in your working directory, renaming this to test2.csv and opening it in Notepad. Add the following to the top of the file: “Here is some example text which we will use to demonstrate how to skip lines” and on another line: “Here is another line of text” so it looks as follows:
Here is some example text which we will use to demonstrate how to skip lines
Here is another line of text
Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester
Save the file and close it.
Next we will import this new .csv into R
# Update the file_location object with the location of the new file
file_location <- "test2.csv"
We have to add an extra parameter to the read.csv function which specifies that the data we are interested in are located after line 2.
test2 <- read.csv(file_location, header = TRUE, skip = 2)
If you print the results of the object test2 (i.e. type it in the terminal!) you will see this is now the same as test, because you skipped the two junk lines.
Next we will illustrate how we can create a subset of a data frame, add a column and calculate a new value. To create a subset of the test2 data frame, we can use the subset function - this can select certain rows, columns or a combination of the two.
For example…
# Select the name and age columns
subset(test, select = c("Name", "Age"))
Name Age
1 John 20
2 Helen 10
3 Mia 16
4 Carl 21
# Select rows where the place is Liverpool
subset(test, Place == "Liverpool")
Name Age Place
1 John 20 Liverpool
3 Mia 16 Liverpool
# Select the name and age columns, and where Place is Liverpool
subset(test, Place == "Liverpool", select = c("Name", "Age"))
Name Age
1 John 20
3 Mia 16
We will now repeat the last step, however store the results as a new object. You will see that nothing prints to the terminal.
# Select the name and age columns, however, only where Place is equal to
# Liverpool (in R we need to use a double equals sign)
test3 <- subset(test, Place == "Liverpool", select = c("Name", "Age"))
We can add a new column called “diff100” to the data frame object test3. We do this using the $ symbol. At first, we will make the values of the column NA (i.e. no value)
test3$diff100 <- NA
If you print test3, you will now see the new column which will contain the values NA. However, what we actually want to do is find out how different each of the ages are to 100 (i.e. 100 - age). In order to create this calculation we need to refer to the age column in the data frame. We can use the $ again to achieve this. For example…
test3$diff100 <- 100 - test3$Age
# print test3
test3
Name Age diff100
1 John 20 80
3 Mia 16 84
OK, using what you have learnt in this past section, you are now ready to play with some real data!
A couple of other important things! If you want to see the objects that you have created in the R workspace, you can use the ls() function
# List objects in the workspace
ls()
[1] "file_location" "test" "test2" "test3"
If you want to save your workspace - i.e. so you can open it up later on (like saving your work!); you can use the save.image function.
# This will save all objects currently loaded in the workspace
save.image(file = "myfile.RData")
After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.
# Remember that you will need to set your working directory before trying
# to load - otherwise R will not know where to find you RData file
load("myfile.RData")
This section of the practical will examine the use of multiple data that can be integrated to generate statistics that enable you to detect and hypothesise about potential changes in neighbourhood built structure occurring at small geographic scales. These changes may have occurred for a number of different reasons including: new building work, gentrification, clearances or changes of use (e.g. residential to retail).
Houses of different size are placed into a band that determines how much council tax occupants or owners of the property have to pay. The changing composition of property bands within an area over time can indicate structural changes. You will now download and prepare a series of data sets.
Open a web browser and visit the website: http://www.neighbourhood.statistics.gov.uk and select the “Topics” link on the bottom left hand side of the page. On the next screen search for “council tax”. Click on “Data set name matches”, then where it says “Dwelling Stock by Council Tax Band”, click the radio button for this entry on the far right hand side of the page. Then press the “Next” button on the bottom right hand side of the page.
For the row “Dwelling Stock by Council Tax Band, 2001”, click the related radio button for “Download” and press the “Next” button on the bottom right hand side of the page. Select the radio button for the North West Government Office Region. Press the “Next” button on the bottom right hand side of the page, and on the following page, download the CSV file by clicking on the “Comma separated values [.csv]”. This will be zipped, and once you have extracted the directory you find it contains CSV files for a variety of different geographies. You are interested in the **Lower Super Output Area level* file which will be named “G050301_939_GeoPolicy_NW_LSOA.CSV”. Copy this to your working directory which you defined earlier. If you have forgotten where this is you can find out with the following R command.
getwd()
Repeat the above step for the 2011 Lower Super Output Area Level council tax band data. Once you have downloaded and extracted the zip file, this will be called “G050311_2417_GeoPolicy_NW_LSOA.CSV”. Note: you may see reference to 2001 Administrative Geography when you are downloading this file - don't worry about this; it relates to the fact the data are organised into the 2001 boundaries. Also, when you look inside the zip file, you might need to browse through a number of nested directories in order to find the CSV.
The next step is to read these data into R. If you open either of the CSV up in Notepad, you will see that the first 5 lines are unwanted; with the 6th containing the column headings. Lets begin with the 2001 data.
# Read in the 2001 data into a new object CT_2001, skipping the first 5
# lines and keeping the headers
CT_2001 <- read.csv("G050301_939_GeoPolicy_NW_LSOA.CSV", header = TRUE, skip = 5)
It is not a good idea to view all of the data frame… there is a lot of it! However we can see the first 6 rows using the head() function. Because there are too many columns to fit into a single span of the terminal, R wraps these over multiple lines; however, you will see each row identified by a number 1-6.
head(CT_2001)
GOR_CODE GOR_NAME CTY_CODE CTY_NAME LA_CODE LA_NAME MSOA_CODE
1 B North West NA 00BL Bolton E02000988
2 B North West NA 00BL Bolton E02000988
3 B North West NA 00BL Bolton E02000984
4 B North West NA 00BL Bolton E02000986
5 B North West NA 00BL Bolton E02000986
6 B North West NA 00BL Bolton E02000986
MSOA_NAME LSOA_CODE LSOA_NAME AREA_METADATA X DATA_VALUE
1 Bolton 005 E01004766 Bolton 005A NA NA 858
2 Bolton 005 E01004767 Bolton 005B NA NA 685
3 Bolton 001 E01004768 Bolton 001A NA NA 602
4 Bolton 003 E01004769 Bolton 003A NA NA 645
5 Bolton 003 E01004770 Bolton 003B NA NA 629
6 Bolton 003 E01004771 Bolton 003C NA NA 631
DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3 DATA_VALUE.4 DATA_VALUE.5
1 605 70.51 107 12.47 121
2 292 42.63 127 18.54 199
3 2 0.33 8 1.33 162
4 92 14.26 176 27.29 206
5 30 4.77 99 15.74 367
6 9 1.43 21 3.33 298
DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8 DATA_VALUE.9 DATA_VALUE.10
1 14.10 11 1.28 4 0.47
2 29.05 58 8.47 4 0.58
3 26.91 163 27.08 185 30.73
4 31.94 96 14.88 38 5.89
5 58.35 107 17.01 13 2.07
6 47.23 207 32.81 66 10.46
DATA_VALUE.11 DATA_VALUE.12 DATA_VALUE.13 DATA_VALUE.14 DATA_VALUE.15
1 8 0.93 2 0.23 0
2 2 0.29 3 0.44 0
3 47 7.81 35 5.81 0
4 32 4.96 5 0.78 0
5 12 1.91 1 0.16 0
6 19 3.01 11 1.74 0
DATA_VALUE.16 DATA_VALUE.17 DATA_VALUE.18
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
Each row of the data pertains to a single Output Area (OA). In addition to the count and percentage of houses in each council tax band (DATA_VALUE columns), the data also contains lookups to the larger areas in which the OA nest. However, we do not need these for this project. Therefore, the first thing we will do is create a list of the columns we want to keep. You view a list of all the columns in a data frame using the colnames() function.
colnames(CT_2001)
[1] "GOR_CODE" "GOR_NAME" "CTY_CODE" "CTY_NAME"
[5] "LA_CODE" "LA_NAME" "MSOA_CODE" "MSOA_NAME"
[9] "LSOA_CODE" "LSOA_NAME" "AREA_METADATA" "X"
[13] "DATA_VALUE" "DATA_VALUE.1" "DATA_VALUE.2" "DATA_VALUE.3"
[17] "DATA_VALUE.4" "DATA_VALUE.5" "DATA_VALUE.6" "DATA_VALUE.7"
[21] "DATA_VALUE.8" "DATA_VALUE.9" "DATA_VALUE.10" "DATA_VALUE.11"
[25] "DATA_VALUE.12" "DATA_VALUE.13" "DATA_VALUE.14" "DATA_VALUE.15"
[29] "DATA_VALUE.16" "DATA_VALUE.17" "DATA_VALUE.18"
This printed a list of all the columns to the terminal. However, we can also append these to a new object which we will call keep_list.
keep_list <- colnames(CT_2001)
If you print the keep_list to the terminal you will see that the contents of the list are numbered; we can use these positions to select items from the list.
# We can select a single item from the list with a square bracket (e.g.
# item 1)
keep_list[1]
[1] "GOR_CODE"
# We can select multiple items from the list using a colon (e.g items 3-5
# in the list)
keep_list[3:5]
[1] "CTY_CODE" "CTY_NAME" "LA_CODE"
# We can combine a single select and a multiple select by using c(,) -
# i.e. concatenate
keep_list[c(1, 3:5)]
[1] "GOR_CODE" "CTY_CODE" "CTY_NAME" "LA_CODE"
# Finally, we will use this to create a new object called keep_list
keep_list <- keep_list[c(9, 13:31)]
If you print keep_list to the terminal, this should contain the names of the columns we want to keep from CT_2001. We will now use this with the subset function to create a new object where unwanted columns from the CT_2001 data frame have been removed. We also only want those LSOA which are in Liverpool. Fortunately, there is another column called “LA_NAME” which contains the local authority where the LSOA is located. So, if we specify this in a subset query, we can create a new object containing only the data we need. You will see that we have used the keep_list to select the columns.
CT_2001_Liverpool <- subset(CT_2001, LA_NAME == "Liverpool", keep_list)
# Have a quick look at the new data frame
head(CT_2001_Liverpool)
LSOA_CODE DATA_VALUE DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3
1746 E01006511 540 232 42.96 58
1747 E01006512 627 524 83.57 42
1748 E01006513 579 264 45.60 227
1749 E01006514 972 657 67.59 197
1750 E01006515 826 761 92.13 64
1751 E01006516 1212 1092 90.10 75
DATA_VALUE.4 DATA_VALUE.5 DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8
1746 10.74 22 4.07 61 11.30
1747 6.70 11 1.75 49 7.81
1748 39.21 31 5.35 12 2.07
1749 20.27 86 8.85 19 1.95
1750 7.75 1 0.12 0 0.00
1751 6.19 42 3.47 2 0.17
DATA_VALUE.9 DATA_VALUE.10 DATA_VALUE.11 DATA_VALUE.12 DATA_VALUE.13
1746 125 23.15 16 2.96 24
1747 1 0.16 0 0.00 0
1748 39 6.74 4 0.69 1
1749 10 1.03 1 0.10 2
1750 0 0.00 0 0.00 0
1751 0 0.00 1 0.08 0
DATA_VALUE.14 DATA_VALUE.15 DATA_VALUE.16 DATA_VALUE.17 DATA_VALUE.18
1746 4.44 0 0.00 2 0.37
1747 0.00 0 0.00 0 0.00
1748 0.17 1 0.17 0 0.00
1749 0.21 0 0.00 0 0.00
1750 0.00 0 0.00 0 0.00
1751 0.00 0 0.00 0 0.00
You should now have a data frame formatted as shown above. However, you will notice that the column names are not really that useful - and indeed, you are probably wondering what all those numbers are? We actually discarded this information when we skipped the rows in the original CSV input. However, if you open the original source CSV file up in a spreadsheet programme you will see that they are a mixture of counts and percentages. You will now rename the columns with labels that are easy to remember. We can do this in R using the colnames() function that you have just used to read the column names - however, this time we will use it to set them.
The first thing we need to do is create a list of the column names we want. We will concatenate a list of character strings together with the column names we want - again using c()
new_column_names <- c("LSOA_CODE", "CTAX01_Total_Count", "CTAX01_BandA_Count",
"CTAX01_BandA_Pct", "CTAX01_BandB_Count", "CTAX01_BandB_Pct", "CTAX01_BandC_Count",
"CTAX01_BandC_Pct", "CTAX01_BandD_Count", "CTAX01_BandD_Pct", "CTAX01_BandE_Count",
"CTAX01_BandE_Pct", "CTAX01_BandF_Count", "CTAX01_BandF_Pct", "CTAX01_BandG_Count",
"CTAX01_BandG_Pct", "CTAX01_BandH_Count", "CTAX01_BandH_Pct", "CTAX01_BandX_Count",
"CTAX01_BandX_Pct")
You can check the length of the new_column_names object using the length function…
length(new_column_names)
[1] 20
and that this matches the number of columns in your CT_2001_Liverpool data frame.
length(CT_2001_Liverpool)
[1] 20
Both should be 20. We can now apply these new column names to the data frame.
# Add the new column names
colnames(CT_2001_Liverpool) <- new_column_names
You now need to import the 2011 data into R using the same procedures that you have used for 2001. However, be careful, you can't just repeat the commands given previously as there are differences between the 2001 and 2011 data. Please copy the previous commands related to 2001 into a text file (e.g. notepad), change them, and then paste into the R terminal.
Some things that you will need to change from the 2001 code detailed above when importing the 2011 data:
If you have done the above correctly, the head of the new data frame should look like:
head(CT_2011_Liverpool)
LSOA_CODE CTAX11_Total_Count CTAX11_BandA_Count CTAX11_BandA_Pct
3654 E01006511 2671 469 17.56
3655 E01006512 792 665 83.96
3656 E01006513 1178 344 29.20
3657 E01006514 1014 624 61.54
3658 E01006515 673 578 85.88
3659 E01006516 1237 1020 82.46
CTAX11_BandB_Count CTAX11_BandB_Pct CTAX11_BandC_Count
3654 760 28.45 1020
3655 76 9.60 1
3656 429 36.42 320
3657 212 20.91 133
3658 83 12.33 12
3659 127 10.27 83
CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
3654 38.19 233 8.72
3655 0.13 49 6.19
3656 27.16 36 3.06
3657 13.12 28 2.76
3658 1.78 0 0.00
3659 6.71 4 0.32
CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count
3654 143 5.35 21
3655 1 0.13 0
3656 46 3.90 1
3657 14 1.38 0
3658 0 0.00 0
3659 3 0.24 0
CTAX11_BandF_Pct CTAX11_BandG_Count CTAX11_BandG_Pct
3654 0.79 25 0.94
3655 0.00 0 0.00
3656 0.08 1 0.08
3657 0.00 2 0.20
3658 0.00 0 0.00
3659 0.00 0 0.00
CTAX11_BandH_Count CTAX11_BandH_Pct CTAX11_BandI_Count
3654 0 0.00 0
3655 0 0.00 0
3656 1 0.08 0
3657 1 0.10 0
3658 0 0.00 0
3659 0 0.00 0
CTAX11_BandI_Pct CTAX11_BandX_Count CTAX11_BandX_Pct
3654 0 0 0
3655 0 0 0
3656 0 0 0
3657 0 0 0
3658 0 0 0
3659 0 0 0
We will now remove the two unwanted columns CTAX11_Band I_Count and CTAX11_Band I_Pct. You could use the subset procedure we implemented earlier, however this is another way (assuming that you have named the columns as described above)…
CT_2011_Liverpool$CTAX11_BandI_Count <- NULL
CT_2011_Liverpool$CTAX11_BandI_Pct <- NULL
We will also remove the BandX columns from both the 2001 and 2011 data - BandX relates to property with an unknown band.
CT_2001_Liverpool$CTAX01_BandX_Count <- NULL
CT_2001_Liverpool$CTAX01_BandX_Pct <- NULL
CT_2011_Liverpool$CTAX11_BandX_Count <- NULL
CT_2011_Liverpool$CTAX11_BandX_Pct <- NULL
We now need to create a new data frame that combines both the 2001 and 2011 data. One way of achieving this is to use the merge() function. However before we do this, we will first explore exactly what the merge function does.
We will use the data frame you created right at the start of the tutorial. If you have forgotten what this looked like, you can always print the object in the terminal.
test
Name Age Place
1 John 20 Liverpool
2 Helen 10 London
3 Mia 16 Liverpool
4 Carl 21 Manchester
You now need to create another data frame which we will use as an example. You could create another csv file and import this; however, we will illustrate another way of achieving this through the amalgamation of a series of vector lists.
# Create a person vector
Person <- c("Paul", "Mike", "John", "Helen", "Mia", "Leo", "Carl")
# Create a favourite functions
Function <- c("merge()", "read.csv()", "colnames()", "ncol()", "length()", "getwd()",
"save.image()")
# We can now join these two vectors into a new data frame of favourite
# foods
fav_fun <- data.frame(Person, Function)
# View the fav_fun
fav_fun
Person Function
1 Paul merge()
2 Mike read.csv()
3 John colnames()
4 Helen ncol()
5 Mia length()
6 Leo getwd()
7 Carl save.image()
We now have two tables, the first listing a series of people who attended a current R training workshop, and the second containing these current people, and those additional people who have been to previous workshops; along with their favourite R functions!
What we want to do is append the favourite functions onto our list of current course attendees. We will refer to the two data frames as x and y. The x data frame is test; and the y is fav_fun. In x, the column containing the list of people is called “Name”, and in y, it is called “Person”. The parameters of the merge function first accept the two table names, and then the lookup columns as by.x or by.y. You should also include all.x=TRUE as a final parameter. This tells the function to keep all the records in x, but only those in y which match. As an experiment you can change this to all.y=TRUE - what you will find is that all the y records are now in the data frame, however, because there were no matches in x, there are NA values present.
We will now create a new data frame called People_And_Functions and print this to the terminal.
People_And_Functions <- merge(test, fav_fun, by.x = "Name", by.y = "Person",
all.x = TRUE)
People_And_Functions
Name Age Place Function
1 Carl 21 Manchester save.image()
2 Helen 10 London ncol()
3 John 20 Liverpool colnames()
4 Mia 16 Liverpool length()
If the by columns were the same in both x and y, we can specify this simply with by=“column name” rather than by.x and by.y; and finally, a critical issue when making any join is assuring that the “by” columns are in the same format.
The CT_2001_Liverpool data frame should now be joined to CT_2011_Liverpool. You will need to specify a merge that:
If you have done the join correctly, this should look like:
head(CT_2001_2011_Liverpool)
LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count CTAX01_BandA_Pct
1 E01006511 540 232 42.96
2 E01006512 627 524 83.57
3 E01006513 579 264 45.60
4 E01006514 972 657 67.59
5 E01006515 826 761 92.13
6 E01006516 1212 1092 90.10
CTAX01_BandB_Count CTAX01_BandB_Pct CTAX01_BandC_Count CTAX01_BandC_Pct
1 58 10.74 22 4.07
2 42 6.70 11 1.75
3 227 39.21 31 5.35
4 197 20.27 86 8.85
5 64 7.75 1 0.12
6 75 6.19 42 3.47
CTAX01_BandD_Count CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
1 61 11.30 125 23.15
2 49 7.81 1 0.16
3 12 2.07 39 6.74
4 19 1.95 10 1.03
5 0 0.00 0 0.00
6 2 0.17 0 0.00
CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count CTAX01_BandG_Pct
1 16 2.96 24 4.44
2 0 0.00 0 0.00
3 4 0.69 1 0.17
4 1 0.10 2 0.21
5 0 0.00 0 0.00
6 1 0.08 0 0.00
CTAX01_BandH_Count CTAX01_BandH_Pct CTAX11_Total_Count
1 0 0.00 2671
2 0 0.00 792
3 1 0.17 1178
4 0 0.00 1014
5 0 0.00 673
6 0 0.00 1237
CTAX11_BandA_Count CTAX11_BandA_Pct CTAX11_BandB_Count CTAX11_BandB_Pct
1 469 17.56 760 28.45
2 665 83.96 76 9.60
3 344 29.20 429 36.42
4 624 61.54 212 20.91
5 578 85.88 83 12.33
6 1020 82.46 127 10.27
CTAX11_BandC_Count CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
1 1020 38.19 233 8.72
2 1 0.13 49 6.19
3 320 27.16 36 3.06
4 133 13.12 28 2.76
5 12 1.78 0 0.00
6 83 6.71 4 0.32
CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count CTAX11_BandF_Pct
1 143 5.35 21 0.79
2 1 0.13 0 0.00
3 46 3.90 1 0.08
4 14 1.38 0 0.00
5 0 0.00 0 0.00
6 3 0.24 0 0.00
CTAX11_BandG_Count CTAX11_BandG_Pct CTAX11_BandH_Count CTAX11_BandH_Pct
1 25 0.94 0 0.00
2 0 0.00 0 0.00
3 1 0.08 1 0.08
4 2 0.20 1 0.10
5 0 0.00 0 0.00
6 0 0.00 0 0.00
You have now created a single council tax file that contains both counts and percentages for each council tax band in 2001 and 2011. You will now calculate a series of new columns that show the absolute changes across the bands between the two time periods.
The following example will create two new columns, one for the total change, and one for Band A changes. We use the $ to both create a new column, and also to refer to values in existing columns.
# Create Total changes
CT_2001_2011_Liverpool$Ch_Total <- CT_2001_2011_Liverpool$CTAX11_Total_Count -
CT_2001_2011_Liverpool$CTAX01_Total_Count
# Create Band A Changes
CT_2001_2011_Liverpool$Ch_BandA <- CT_2001_2011_Liverpool$CTAX11_BandA_Count -
CT_2001_2011_Liverpool$CTAX01_BandA_Count
You now need to repeat this for the remaining council tax bands - call these new columns Ch_BandB through to Ch_BandH
Your council tax data frame should then look like:
head(CT_2001_2011_Liverpool)
LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count CTAX01_BandA_Pct
1 E01006511 540 232 42.96
2 E01006512 627 524 83.57
3 E01006513 579 264 45.60
4 E01006514 972 657 67.59
5 E01006515 826 761 92.13
6 E01006516 1212 1092 90.10
CTAX01_BandB_Count CTAX01_BandB_Pct CTAX01_BandC_Count CTAX01_BandC_Pct
1 58 10.74 22 4.07
2 42 6.70 11 1.75
3 227 39.21 31 5.35
4 197 20.27 86 8.85
5 64 7.75 1 0.12
6 75 6.19 42 3.47
CTAX01_BandD_Count CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
1 61 11.30 125 23.15
2 49 7.81 1 0.16
3 12 2.07 39 6.74
4 19 1.95 10 1.03
5 0 0.00 0 0.00
6 2 0.17 0 0.00
CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count CTAX01_BandG_Pct
1 16 2.96 24 4.44
2 0 0.00 0 0.00
3 4 0.69 1 0.17
4 1 0.10 2 0.21
5 0 0.00 0 0.00
6 1 0.08 0 0.00
CTAX01_BandH_Count CTAX01_BandH_Pct CTAX11_Total_Count
1 0 0.00 2671
2 0 0.00 792
3 1 0.17 1178
4 0 0.00 1014
5 0 0.00 673
6 0 0.00 1237
CTAX11_BandA_Count CTAX11_BandA_Pct CTAX11_BandB_Count CTAX11_BandB_Pct
1 469 17.56 760 28.45
2 665 83.96 76 9.60
3 344 29.20 429 36.42
4 624 61.54 212 20.91
5 578 85.88 83 12.33
6 1020 82.46 127 10.27
CTAX11_BandC_Count CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
1 1020 38.19 233 8.72
2 1 0.13 49 6.19
3 320 27.16 36 3.06
4 133 13.12 28 2.76
5 12 1.78 0 0.00
6 83 6.71 4 0.32
CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count CTAX11_BandF_Pct
1 143 5.35 21 0.79
2 1 0.13 0 0.00
3 46 3.90 1 0.08
4 14 1.38 0 0.00
5 0 0.00 0 0.00
6 3 0.24 0 0.00
CTAX11_BandG_Count CTAX11_BandG_Pct CTAX11_BandH_Count CTAX11_BandH_Pct
1 25 0.94 0 0.00
2 0 0.00 0 0.00
3 1 0.08 1 0.08
4 2 0.20 1 0.10
5 0 0.00 0 0.00
6 0 0.00 0 0.00
Ch_Total Ch_BandA Ch_BandB Ch_BandC Ch_BandD Ch_BandE Ch_BandF Ch_BandG
1 2131 237 702 998 172 18 5 1
2 165 141 34 -10 0 0 0 0
3 599 80 202 289 24 7 -3 0
4 42 -33 15 47 9 4 -1 0
5 -153 -183 19 11 0 0 0 0
6 25 -72 52 41 2 3 -1 0
Ch_BandH
1 0
2 0
3 0
4 1
5 0
6 0
In the previous section you have been introduced to the council tax data set which has been used to create measures that may indicate changes to the built structure of LSOA. In this section we will examine indicators of social change by comparing the 2001 Census data to population estimates derived by the Office for National Statistics.
The first stage is to download 2001 Census data. Although there are other sources of these data, in this example we will use the http://www.neighbourhood.statistics.gov.uk website again.
Navigate to the site and select the “Topics” link on the bottom left hand side of the page. On the next screen, click “2001 Census: Census Area Statistics”. Then click the radio button that corresponds to “Age (UV04)” and press the “Next” button at the bottom right hand side of the page.
Then click the related radio button for “Download” and press the “Next” button on the bottom right hand side of the page. Select the radio button for the North West Government Office Region. Press the “Next” button on the bottom right hand side of the page, and on the following page, download the CSV file by clicking on the “Comma separated values [*.csv]”. This will be zipped, and once you have extracted the directory you will find it contains CSV files for a variety of different geographies. You are interested in the Lower Super Output Area level file which will be named “UV040301_92_GeoPolicy_NW_LSOA.CSV”. Copy this to the working directory you defined earlier.
If you open this file up in Notepad you will again see that there are a series of lines at the top which do not contain the data or headings for the columns. As such, you need to:
This should then look like (note, only the first 6 rows and 10 columns are shown here):
# Using the square brackets we can view specific parts of a data frame-
# data[rows,columns]
POP_2001[1:6, 1:10]
LSOA_CODE DATA_VALUE DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3
1746 E01006511 1026 7 10 8
1747 E01006512 1454 8 11 10
1748 E01006513 2077 4 5 6
1749 E01006514 1432 8 7 10
1750 E01006515 1156 6 15 12
1751 E01006516 1412 12 10 12
DATA_VALUE.4 DATA_VALUE.5 DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8
1746 4 9 5 4 7
1747 9 10 8 13 7
1748 3 3 6 0 5
1749 7 8 6 6 5
1750 12 11 14 10 8
1751 8 7 7 4 4
You now need to rename the column headings of the data frame into something which are easier to interpret. There are obviously a lot of columns, and it would take a very long time to type out a list of vectors to use as the new names… However, there are some shortcuts. If you look at the source CSV file, you will see that the data columns represent: the total population, those under 1, then a series of columns for those aged 1 through to 74; and finally, those aged 75-79, 80-84, 85-89, 90-94, 95-99 and 100 or greater.
The main block of columns are the individual age bands; which we can create using the paste() function - this concatenates vectors (separated by a comma) together into characters. We also add sep=“” to indicate that we do not want any spaces putting between the new concatenated elements of the list.
years <- paste("YR_2001_", 1:74, sep = "")
We now need to add the remaining elements to the list which we will do in two stages. One will be added before the elements of the “years” list (pre), and the other afterwards (post)
pre <- c("LSOA", "YR_2001_TotPop", "YR_2001_Less1")
post <- c("YR_2001_75_79", "YR_2001_80_84", "YR_2001_85_89", "YR_2001_90_94",
"YR_2001_95_99", "YR_2001_100plus")
# Print the lists to see what they look like
pre
[1] "LSOA" "YR_2001_TotPop" "YR_2001_Less1"
years
[1] "YR_2001_1" "YR_2001_2" "YR_2001_3" "YR_2001_4" "YR_2001_5"
[6] "YR_2001_6" "YR_2001_7" "YR_2001_8" "YR_2001_9" "YR_2001_10"
[11] "YR_2001_11" "YR_2001_12" "YR_2001_13" "YR_2001_14" "YR_2001_15"
[16] "YR_2001_16" "YR_2001_17" "YR_2001_18" "YR_2001_19" "YR_2001_20"
[21] "YR_2001_21" "YR_2001_22" "YR_2001_23" "YR_2001_24" "YR_2001_25"
[26] "YR_2001_26" "YR_2001_27" "YR_2001_28" "YR_2001_29" "YR_2001_30"
[31] "YR_2001_31" "YR_2001_32" "YR_2001_33" "YR_2001_34" "YR_2001_35"
[36] "YR_2001_36" "YR_2001_37" "YR_2001_38" "YR_2001_39" "YR_2001_40"
[41] "YR_2001_41" "YR_2001_42" "YR_2001_43" "YR_2001_44" "YR_2001_45"
[46] "YR_2001_46" "YR_2001_47" "YR_2001_48" "YR_2001_49" "YR_2001_50"
[51] "YR_2001_51" "YR_2001_52" "YR_2001_53" "YR_2001_54" "YR_2001_55"
[56] "YR_2001_56" "YR_2001_57" "YR_2001_58" "YR_2001_59" "YR_2001_60"
[61] "YR_2001_61" "YR_2001_62" "YR_2001_63" "YR_2001_64" "YR_2001_65"
[66] "YR_2001_66" "YR_2001_67" "YR_2001_68" "YR_2001_69" "YR_2001_70"
[71] "YR_2001_71" "YR_2001_72" "YR_2001_73" "YR_2001_74"
post
[1] "YR_2001_75_79" "YR_2001_80_84" "YR_2001_85_89" "YR_2001_90_94"
[5] "YR_2001_95_99" "YR_2001_100plus"
The list objects that you have just created now need to be concatenated which can be achieved using the c() function.
# Concatenate the new column names
new_column_names <- c(pre, years, post)
# print the names
new_column_names
[1] "LSOA" "YR_2001_TotPop" "YR_2001_Less1"
[4] "YR_2001_1" "YR_2001_2" "YR_2001_3"
[7] "YR_2001_4" "YR_2001_5" "YR_2001_6"
[10] "YR_2001_7" "YR_2001_8" "YR_2001_9"
[13] "YR_2001_10" "YR_2001_11" "YR_2001_12"
[16] "YR_2001_13" "YR_2001_14" "YR_2001_15"
[19] "YR_2001_16" "YR_2001_17" "YR_2001_18"
[22] "YR_2001_19" "YR_2001_20" "YR_2001_21"
[25] "YR_2001_22" "YR_2001_23" "YR_2001_24"
[28] "YR_2001_25" "YR_2001_26" "YR_2001_27"
[31] "YR_2001_28" "YR_2001_29" "YR_2001_30"
[34] "YR_2001_31" "YR_2001_32" "YR_2001_33"
[37] "YR_2001_34" "YR_2001_35" "YR_2001_36"
[40] "YR_2001_37" "YR_2001_38" "YR_2001_39"
[43] "YR_2001_40" "YR_2001_41" "YR_2001_42"
[46] "YR_2001_43" "YR_2001_44" "YR_2001_45"
[49] "YR_2001_46" "YR_2001_47" "YR_2001_48"
[52] "YR_2001_49" "YR_2001_50" "YR_2001_51"
[55] "YR_2001_52" "YR_2001_53" "YR_2001_54"
[58] "YR_2001_55" "YR_2001_56" "YR_2001_57"
[61] "YR_2001_58" "YR_2001_59" "YR_2001_60"
[64] "YR_2001_61" "YR_2001_62" "YR_2001_63"
[67] "YR_2001_64" "YR_2001_65" "YR_2001_66"
[70] "YR_2001_67" "YR_2001_68" "YR_2001_69"
[73] "YR_2001_70" "YR_2001_71" "YR_2001_72"
[76] "YR_2001_73" "YR_2001_74" "YR_2001_75_79"
[79] "YR_2001_80_84" "YR_2001_85_89" "YR_2001_90_94"
[82] "YR_2001_95_99" "YR_2001_100plus"
You can now apply these to the columns of POP_2001
# Add the new column names
colnames(POP_2001) <- new_column_names
This will have created a data frame containing the 2001 population data that will look like…
# Show first 6 rows and 10 columns
POP_2001[1:6, 1:10]
LSOA YR_2001_TotPop YR_2001_Less1 YR_2001_1 YR_2001_2 YR_2001_3
1746 E01006511 1026 7 10 8 4
1747 E01006512 1454 8 11 10 9
1748 E01006513 2077 4 5 6 3
1749 E01006514 1432 8 7 10 7
1750 E01006515 1156 6 15 12 12
1751 E01006516 1412 12 10 12 8
YR_2001_4 YR_2001_5 YR_2001_6 YR_2001_7
1746 9 5 4 7
1747 10 8 13 7
1748 3 6 0 5
1749 8 6 6 5
1750 11 14 10 8
1751 7 7 4 4
The ONS produce annual population estimates from survey and administrate data. For more information on the methodology used in their calculation and to download the data visit the website: http://www.ons.gov.uk/ons/publications/re-reference-tables.html?edition=tcm%3A77-230902. Download and unzip the LSOA files for 2010 into you working directory.
Although it is also possible to read an Excel file into R directly, this method is a little tricky to implement under windows (however very easy in Apple OSX or Linux). As such, we will open the file in Excel, manipulate it, save as a CSV, and then imported using the previously used method.
Highlight all the numeric values on the spreadsheet, right click and from the menu select “Format Cells…”. On the dialog box that opens, click on the “Number” tab, then specify “Category” as “Number”. Use the “Decimal places:” control to reduce the number to 0. Also ensure that the box which states “Use 1000 separator (,)” is NOT ticked. Then click “OK”. This changes the formatting of the numbers and specifically removes the commas which will cause problems when you later import the exported file into R.
Save the spreadsheet as a CSV file by pressing the round “Office” button on the top left hand side of Excel and click “Save As” from the menu. In the dialog box navigate to your working directory. Click the “Save as type:” drop down box and choose “CSV (MS-DOS) (*.csv)”. Specify the file name as “pop10.csv” and click the “Save” button. You can now read this into R.
POP_2010 <- read.csv("pop10.csv", header = TRUE)
Later on we are going to merge the 2001 and 2011 data together into a new data frame. However, before doing this, we first need to create a new version of the 2001 data with the same age bands as the 2011 data. You need to create new columns in the POP_2001 data for: YR_2001_0_15 YR_2001_16_29 YR_2001_30_44 YR_2001_45_64 YR_2001_65plus - the numbers refer to the years and are the same ranges featured in the 2010 data. There are a number of ways to achieve this.
# Add manually
POP_2001$YR_2001_0_15 <- POP_2001$YR_2001_Less1 + POP_2001$YR_2001_1 + POP_2001$YR_2001_2 +
POP_2001$YR_2001_3 + POP_2001$YR_2001_4 + POP_2001$YR_2001_5 + POP_2001$YR_2001_6 +
POP_2001$YR_2001_7 + POP_2001$YR_2001_8 + POP_2001$YR_2001_9 + POP_2001$YR_2001_10 +
POP_2001$YR_2001_11 + POP_2001$YR_2001_12 + POP_2001$YR_2001_13 + POP_2001$YR_2001_14 +
POP_2001$YR_2001_15
# Add as a range
POP_2001$YR_2001_16_29 <- rowSums(POP_2001[, 19:32])
The above code has created the aggregations for YR_2001_0_15 and YR_2001_16_29. You now need to add the following new columns to POP_2001 data frame and with the appropriate age group aggregations:
Next, create a subset of the 2001 data called POP_2001_V2 that contains the LSOA code, the total population and the newly created variables (YR_2001_0_15 YR_2001_16_29 YR_2001_30_44 YR_2001_45_64 and YR_2001_65plus)
The top 6 rows of your new data frame should look like:
head(POP_2001_V2)
LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1746 E01006511 1026 120 269 245
1747 E01006512 1454 124 930 244
1748 E01006513 2077 87 1685 149
1749 E01006514 1432 105 678 358
1750 E01006515 1156 212 253 228
1751 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus
1746 246 146
1747 116 40
1748 95 61
1749 188 103
1750 252 211
1751 216 110
You will now have two population data frames; the first for 2001 which contains all LSOA in Liverpool (291 rows), and a second which contains all LSOA in England (34,378 rows). You might want to check if your data frame objects have the same number of rows at this stage.
nrow(POP_2010)
[1] 34378
nrow(POP_2001_V2)
[1] 291
You now need to merge (see your earlier code!!!) the 2010 data with the 2001 data so that the new data frame object that you create will just contain the records for Liverpool (ie. 291). Call this new data frame POP_2001_2010_Liverpool.
If you have done this correctly, your new data frame should have 291 rows and the top six rows should look like…
# Number of rows
nrow(POP_2001_2010_Liverpool)
[1] 291
# Show top 6 records
head(POP_2001_2010_Liverpool)
LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1 E01006511 1026 120 269 245
2 E01006512 1454 124 930 244
3 E01006513 2077 87 1685 149
4 E01006514 1432 105 678 358
5 E01006515 1156 212 253 228
6 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15 YR_2010_16_29
1 246 146 2332 187 1072
2 116 40 2358 191 1607
3 95 61 2630 71 2187
4 188 103 1269 123 467
5 252 211 1058 142 307
6 216 110 1385 85 580
YR_2010_30_44 YR_2010_45_64 YR_2010_65plus
1 579 332 162
2 333 193 34
3 202 97 73
4 280 278 121
5 232 224 153
6 342 282 96
You will now calculate both absolute and percentage change statistics between 2001 and 2010 for the different age bands. The following example illustrates this for the total population, however, you will need to complete this for each age band. Create new variables in the format shown;
# Total Population (TC - Total Change ; PC - Percent Change)
POP_2001_2010_Liverpool$TotPop_TC <- POP_2001_2010_Liverpool$YR_2010_TotPop -
POP_2001_2010_Liverpool$YR_2001_TotPop
POP_2001_2010_Liverpool$TotPop_PC <- (POP_2001_2010_Liverpool$YR_2010_TotPop -
POP_2001_2010_Liverpool$YR_2001_TotPop)/POP_2001_2010_Liverpool$YR_2001_TotPop *
100
# 0-15 Years (TC - Total Change ; PC - Percent Change)
POP_2001_2010_Liverpool$YR_0_15_TC <- POP_2001_2010_Liverpool$YR_2010_0_15 -
POP_2001_2010_Liverpool$YR_2001_0_15
POP_2001_2010_Liverpool$YR_0_15_PC <- (POP_2001_2010_Liverpool$YR_2010_0_15 -
POP_2001_2010_Liverpool$YR_2001_0_15)/POP_2001_2010_Liverpool$YR_2001_0_15 *
100
You now need to complete this for the other age bands, leaving you with a final data frame that looks like:
head(POP_2001_2010_Liverpool)
LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1 E01006511 1026 120 269 245
2 E01006512 1454 124 930 244
3 E01006513 2077 87 1685 149
4 E01006514 1432 105 678 358
5 E01006515 1156 212 253 228
6 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15 YR_2010_16_29
1 246 146 2332 187 1072
2 116 40 2358 191 1607
3 95 61 2630 71 2187
4 188 103 1269 123 467
5 252 211 1058 142 307
6 216 110 1385 85 580
YR_2010_30_44 YR_2010_45_64 YR_2010_65plus TotPop_TC TotPop_PC
1 579 332 162 1306 127.290
2 333 193 34 904 62.173
3 202 97 73 553 26.625
4 280 278 121 -163 -11.383
5 232 224 153 -98 -8.478
6 342 282 96 -27 -1.912
YR_0_15_TC YR_0_15_PC YR_16_29_TC YR_16_29_PC YR_30_44_TC YR_30_44_PC
1 67 55.83 803 298.513 334 136.327
2 67 54.03 677 72.796 89 36.475
3 -16 -18.39 502 29.792 53 35.570
4 18 17.14 -211 -31.121 -78 -21.788
5 -70 -33.02 54 21.344 4 1.754
6 -18 -17.48 -17 -2.848 -44 -11.399
YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1 86 34.959 16 10.96
2 77 66.379 -6 -15.00
3 2 2.105 12 19.67
4 90 47.872 18 17.48
5 -28 -11.111 -58 -27.49
6 66 30.556 -14 -12.73
The CT_2001_2011_Liverpool and POP_2001_2010_Liverpool data frames will now be visualized on a choropleth map so you can explore areas of change. In order to do this, you will first need to read some polygon files into R.
When we talk about Shapefiles, we are actually referring to a number of files - each of these files has a different purpose, e.g. holding the geometry, storing the data or specifying the projection.
Up until this point, all the functions you have been using are contained in the base R installation - i.e. they are present by default when you load R. However, one of the really great things about R is that these functions can be expanded by loading new packages that are created by the open source community. These packages add lots of exiting new functions to R.
We are going to now load “maptools” and “rgeos” which provides functions for reading and manipulating GIS data. If you remember from the last practical, the additional parameters to the standard install.packages() and library() relate to the packages being installed on a managed system where you do not have write access privileges to the main R library which is stored on the hard disk.
# Choose the CRAN mirror where the packages are going to be downloaded
# from
options(repos = c(CRAN = "http://www.stats.bris.ac.uk/R/"))
# Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())
# Download / install and then load maptools
install.packages("maptools", depend = TRUE, lib = getwd())
library(maptools, lib.loc = getwd())
# Download / install and then load rgeos
install.packages("rgeos", depend = TRUE, lib = getwd())
install.packages("stringr", depend = TRUE, lib = getwd())
library("stringr", lib.loc = getwd())
library("rgeos", lib.loc = getwd())
We will now read in the LSOA data into a new type of object called a SpatialPolygonsDataFrame by supplying it with the location of the downloaded shapefiles.
LSOA <- readShapePoly("LSOA/england_low_soa_2001.shp")
We can see what a SpatialPolygonsDataFrame looks like using a simple plot() function. You might notice that the polygons nearest the River Mersey are quite large. The reason for this is that the official boundaries extend into the water - however, aesthetically, these are not that appealing. We can however tidy these up before we start mapping.
# A basic plot of the OA SpatialPolygonsDataFrame
plot(LSOA)
As such: 1. Visit the UK Borders website again and download the shapefiles for the “English Outline 2011” which can be found in the Easy Download section. Choose the “Clipped and Generalized” shapefile version. 2. Read this into R as a new SpatialPolygonsDataFrame called “outline”
# Read in outline as a SpatialPolygonsDataFrame
outline <- readShapePoly("Outline/England_ol_2011_gen_clipped.shp")
# This looks like
plot(outline)
By default, when you call the plot() function, this overwrites the previous plot. However, there is a way that you can specify plots to be cumulative - i.e. add as a new layer. We do this in the following example to add the newly downloaded coastal outline on top of the LSOA by including add=TRUE. We have also made the border a red colour (border=“red”), and the fill colour (col=“#2C7FB820”) a shade of blue. These represent two ways of specifying colours. The second contains eight alphanumerics, the first six relate to a HEX colour code. To view various colours which can be used in R, have a look here. The final two characters are the level of transparency (in this case 20%).
# Plot the LSOA Map
plot(LSOA)
# Overplot the outline map
plot(outline, add = TRUE, border = "red", col = "#2C7FB820")
You will also note on the above plot that the full extent of the English coastal outline is not shown. This is because the plot window is initiated with the extent of the first plot (i.e. Liverpool). You could reverse the above commands (remembering to remove add=TRUE if outline is the first plot) to illustrate this effect.
SpatialPolygonsDataFrame objects are a little more complex than the standard data frame, and comprise of a number of different elements referred to as “slots”. We can view the slots of an object using the slotNames() function which which will print a list of all the available slots.
slotNames(LSOA)
[1] "data" "polygons" "plotOrder" "bbox" "proj4string"
The only slot we really need to worry about for this exercise is “data” which contains all the attribute data for each of the zones (i.e. the attributes contained in the DBF of the shapefile). We access this using an @ symbol, and this can then be treated in very similar way to a standard data frame. For example, we can look at the first 6 rows in the LSOA data slot.
head(LSOA@data)
gid geonorth popnorth geoeast name label popeast
0 458 381722 383625 342348 Liverpool 058B 04BYE01006739 342928
1 4350 384390 385002 338967 Liverpool 056D 04BYE01006687 339051
2 4351 383094 383388 343364 Liverpool 058C 04BYE01006741 343305
3 4352 384125 384509 341524 Liverpool 058D 04BYE01006743 340956
4 4370 385573 386586 337323 Liverpool 052C 04BYE01006528 337495
5 4371 385004 385692 338248 Liverpool 056B 04BYE01006684 338794
zonecode
0 E01006739
1 E01006687
2 E01006741
3 E01006743
4 E01006528
5 E01006684
We will now use the gIntersection() function from the rgeos package to remove the elements of the LSOA polygons that overlap the River Mersey - this will take quite a bit of time…
# First we create a new character vector of the area identifiers - these
# are assigned to a new object called my_area_id
my_area_id <- as.character(LSOA@data$zonecode)
# Create the clipped polygon with each area linked to an area identifier
# from the list you just created. This removes the data slot from the
# SpatialPolygonsDataFrame, hence why we need to specify the area IDs
# using id=. LSOA becomes a SpatialPolygons object - however, it contains
# an ID slot with the values of my_area_id
LSOA <- gIntersection(LSOA, outline, byid = TRUE, id = my_area_id)
# Get the ID values from the LSOA spatial polygon
data_IDs <- getSpPPolygonsIDSlots(LSOA)
# Convert into a dataframe
data_IDs <- as.data.frame(data_IDs)
# We also need to set the row names to be the same as the area ids - this
# is required to create a SpatialPolygonsDataFrame
rownames(data_IDs) <- data_IDs$data_IDs
# Join the ID values back onto the spatial polygon
LSOA <- SpatialPolygonsDataFrame(LSOA, data_IDs)
# Rename LSOA codes
colnames(LSOA@data) <- "zonecode"
Next we will join the council tax data onto the the LSOA spatial polygons data frame. This has to be handled in a slightly different way to how you used the merge statement earlier.
Because we know that both the council tax data frame, and the spatial data frame have the same number of zones (i.e. were both limited to just Liverpool) we can sort them alphabetically by their area ID, thus enabling us to just line the data up side by side.
# Order the LSOA spatial polygon data frame
LSOA <- LSOA[order(LSOA$zonecode), ]
# Order the CT_2001_2011_Liverpool data frame
CT_2001_2011_Liverpool <- CT_2001_2011_Liverpool[order(CT_2001_2011_Liverpool$LSOA_CODE),
]
If you then look at the first few rows of the above two objects, you will see by looking at their area identifiers that they are now in the same order.
head(LSOA@data)
zonecode
E01006511 E01006511
E01006512 E01006512
E01006513 E01006513
E01006514 E01006514
E01006515 E01006515
E01006516 E01006516
head(CT_2001_2011_Liverpool)
LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count CTAX01_BandA_Pct
1 E01006511 540 232 42.96
2 E01006512 627 524 83.57
3 E01006513 579 264 45.60
4 E01006514 972 657 67.59
5 E01006515 826 761 92.13
6 E01006516 1212 1092 90.10
CTAX01_BandB_Count CTAX01_BandB_Pct CTAX01_BandC_Count CTAX01_BandC_Pct
1 58 10.74 22 4.07
2 42 6.70 11 1.75
3 227 39.21 31 5.35
4 197 20.27 86 8.85
5 64 7.75 1 0.12
6 75 6.19 42 3.47
CTAX01_BandD_Count CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
1 61 11.30 125 23.15
2 49 7.81 1 0.16
3 12 2.07 39 6.74
4 19 1.95 10 1.03
5 0 0.00 0 0.00
6 2 0.17 0 0.00
CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count CTAX01_BandG_Pct
1 16 2.96 24 4.44
2 0 0.00 0 0.00
3 4 0.69 1 0.17
4 1 0.10 2 0.21
5 0 0.00 0 0.00
6 1 0.08 0 0.00
CTAX01_BandH_Count CTAX01_BandH_Pct CTAX11_Total_Count
1 0 0.00 2671
2 0 0.00 792
3 1 0.17 1178
4 0 0.00 1014
5 0 0.00 673
6 0 0.00 1237
CTAX11_BandA_Count CTAX11_BandA_Pct CTAX11_BandB_Count CTAX11_BandB_Pct
1 469 17.56 760 28.45
2 665 83.96 76 9.60
3 344 29.20 429 36.42
4 624 61.54 212 20.91
5 578 85.88 83 12.33
6 1020 82.46 127 10.27
CTAX11_BandC_Count CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
1 1020 38.19 233 8.72
2 1 0.13 49 6.19
3 320 27.16 36 3.06
4 133 13.12 28 2.76
5 12 1.78 0 0.00
6 83 6.71 4 0.32
CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count CTAX11_BandF_Pct
1 143 5.35 21 0.79
2 1 0.13 0 0.00
3 46 3.90 1 0.08
4 14 1.38 0 0.00
5 0 0.00 0 0.00
6 3 0.24 0 0.00
CTAX11_BandG_Count CTAX11_BandG_Pct CTAX11_BandH_Count CTAX11_BandH_Pct
1 25 0.94 0 0.00
2 0 0.00 0 0.00
3 1 0.08 1 0.08
4 2 0.20 1 0.10
5 0 0.00 0 0.00
6 0 0.00 0 0.00
Ch_Total Ch_BandA Ch_BandB Ch_BandC Ch_BandD Ch_BandE Ch_BandF Ch_BandG
1 2131 237 702 998 172 18 5 1
2 165 141 34 -10 0 0 0 0
3 599 80 202 289 24 7 -3 0
4 42 -33 15 47 9 4 -1 0
5 -153 -183 19 11 0 0 0 0
6 25 -72 52 41 2 3 -1 0
Ch_BandH
1 0
2 0
3 0
4 1
5 0
6 0
You now need to re-order the POP_2001_2010_Liverpool objects. If this has been done correctly, it should look like…
head(POP_2001_2010_Liverpool)
LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1 E01006511 1026 120 269 245
2 E01006512 1454 124 930 244
3 E01006513 2077 87 1685 149
4 E01006514 1432 105 678 358
5 E01006515 1156 212 253 228
6 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15 YR_2010_16_29
1 246 146 2332 187 1072
2 116 40 2358 191 1607
3 95 61 2630 71 2187
4 188 103 1269 123 467
5 252 211 1058 142 307
6 216 110 1385 85 580
YR_2010_30_44 YR_2010_45_64 YR_2010_65plus TotPop_TC TotPop_PC
1 579 332 162 1306 127.290
2 333 193 34 904 62.173
3 202 97 73 553 26.625
4 280 278 121 -163 -11.383
5 232 224 153 -98 -8.478
6 342 282 96 -27 -1.912
YR_0_15_TC YR_0_15_PC YR_16_29_TC YR_16_29_PC YR_30_44_TC YR_30_44_PC
1 67 55.83 803 298.513 334 136.327
2 67 54.03 677 72.796 89 36.475
3 -16 -18.39 502 29.792 53 35.570
4 18 17.14 -211 -31.121 -78 -21.788
5 -70 -33.02 54 21.344 4 1.754
6 -18 -17.48 -17 -2.848 -44 -11.399
YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1 86 34.959 16 10.96
2 77 66.379 -6 -15.00
3 2 2.105 12 19.67
4 90 47.872 18 17.48
5 -28 -11.111 -58 -27.49
6 66 30.556 -14 -12.73
We will now “column bind” the council tax and population data onto the data slot of the LSOA SpatialPolygonsDataFrame. You can do this using the cbind() function. The data objects are separated by commas. These are assigned (over written) onto the data slot of the SpatialPolygonsDataFrame. If you look at LSOA@data after running this code, you will see that the area codes are now lined up, and that the council tax and population data are present.
LSOA@data <- cbind(LSOA@data, CT_2001_2011_Liverpool, POP_2001_2010_Liverpool)
If this has been completed correctly, the LSOA@data should look like…
head(LSOA@data)
zonecode LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count
E01006511 E01006511 E01006511 540 232
E01006512 E01006512 E01006512 627 524
E01006513 E01006513 E01006513 579 264
E01006514 E01006514 E01006514 972 657
E01006515 E01006515 E01006515 826 761
E01006516 E01006516 E01006516 1212 1092
CTAX01_BandA_Pct CTAX01_BandB_Count CTAX01_BandB_Pct
E01006511 42.96 58 10.74
E01006512 83.57 42 6.70
E01006513 45.60 227 39.21
E01006514 67.59 197 20.27
E01006515 92.13 64 7.75
E01006516 90.10 75 6.19
CTAX01_BandC_Count CTAX01_BandC_Pct CTAX01_BandD_Count
E01006511 22 4.07 61
E01006512 11 1.75 49
E01006513 31 5.35 12
E01006514 86 8.85 19
E01006515 1 0.12 0
E01006516 42 3.47 2
CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
E01006511 11.30 125 23.15
E01006512 7.81 1 0.16
E01006513 2.07 39 6.74
E01006514 1.95 10 1.03
E01006515 0.00 0 0.00
E01006516 0.17 0 0.00
CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count
E01006511 16 2.96 24
E01006512 0 0.00 0
E01006513 4 0.69 1
E01006514 1 0.10 2
E01006515 0 0.00 0
E01006516 1 0.08 0
CTAX01_BandG_Pct CTAX01_BandH_Count CTAX01_BandH_Pct
E01006511 4.44 0 0.00
E01006512 0.00 0 0.00
E01006513 0.17 1 0.17
E01006514 0.21 0 0.00
E01006515 0.00 0 0.00
E01006516 0.00 0 0.00
CTAX11_Total_Count CTAX11_BandA_Count CTAX11_BandA_Pct
E01006511 2671 469 17.56
E01006512 792 665 83.96
E01006513 1178 344 29.20
E01006514 1014 624 61.54
E01006515 673 578 85.88
E01006516 1237 1020 82.46
CTAX11_BandB_Count CTAX11_BandB_Pct CTAX11_BandC_Count
E01006511 760 28.45 1020
E01006512 76 9.60 1
E01006513 429 36.42 320
E01006514 212 20.91 133
E01006515 83 12.33 12
E01006516 127 10.27 83
CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
E01006511 38.19 233 8.72
E01006512 0.13 49 6.19
E01006513 27.16 36 3.06
E01006514 13.12 28 2.76
E01006515 1.78 0 0.00
E01006516 6.71 4 0.32
CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count
E01006511 143 5.35 21
E01006512 1 0.13 0
E01006513 46 3.90 1
E01006514 14 1.38 0
E01006515 0 0.00 0
E01006516 3 0.24 0
CTAX11_BandF_Pct CTAX11_BandG_Count CTAX11_BandG_Pct
E01006511 0.79 25 0.94
E01006512 0.00 0 0.00
E01006513 0.08 1 0.08
E01006514 0.00 2 0.20
E01006515 0.00 0 0.00
E01006516 0.00 0 0.00
CTAX11_BandH_Count CTAX11_BandH_Pct Ch_Total Ch_BandA Ch_BandB
E01006511 0 0.00 2131 237 702
E01006512 0 0.00 165 141 34
E01006513 1 0.08 599 80 202
E01006514 1 0.10 42 -33 15
E01006515 0 0.00 -153 -183 19
E01006516 0 0.00 25 -72 52
Ch_BandC Ch_BandD Ch_BandE Ch_BandF Ch_BandG Ch_BandH LSOA
E01006511 998 172 18 5 1 0 E01006511
E01006512 -10 0 0 0 0 0 E01006512
E01006513 289 24 7 -3 0 0 E01006513
E01006514 47 9 4 -1 0 1 E01006514
E01006515 11 0 0 0 0 0 E01006515
E01006516 41 2 3 -1 0 0 E01006516
YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
E01006511 1026 120 269 245
E01006512 1454 124 930 244
E01006513 2077 87 1685 149
E01006514 1432 105 678 358
E01006515 1156 212 253 228
E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15
E01006511 246 146 2332 187
E01006512 116 40 2358 191
E01006513 95 61 2630 71
E01006514 188 103 1269 123
E01006515 252 211 1058 142
E01006516 216 110 1385 85
YR_2010_16_29 YR_2010_30_44 YR_2010_45_64 YR_2010_65plus
E01006511 1072 579 332 162
E01006512 1607 333 193 34
E01006513 2187 202 97 73
E01006514 467 280 278 121
E01006515 307 232 224 153
E01006516 580 342 282 96
TotPop_TC TotPop_PC YR_0_15_TC YR_0_15_PC YR_16_29_TC
E01006511 1306 127.290 67 55.83 803
E01006512 904 62.173 67 54.03 677
E01006513 553 26.625 -16 -18.39 502
E01006514 -163 -11.383 18 17.14 -211
E01006515 -98 -8.478 -70 -33.02 54
E01006516 -27 -1.912 -18 -17.48 -17
YR_16_29_PC YR_30_44_TC YR_30_44_PC YR_45_64_TC YR_45_64_PC
E01006511 298.513 334 136.327 86 34.959
E01006512 72.796 89 36.475 77 66.379
E01006513 29.792 53 35.570 2 2.105
E01006514 -31.121 -78 -21.788 90 47.872
E01006515 21.344 4 1.754 -28 -11.111
E01006516 -2.848 -44 -11.399 66 30.556
YR_65plus_TC YR_65plus_PC
E01006511 16 10.96
E01006512 -6 -15.00
E01006513 12 19.67
E01006514 18 17.48
E01006515 -58 -27.49
E01006516 -14 -12.73
OK, the data is now ready so we are now able to make some maps! There are many different ways of doing this, however, we will use one of the simplest, and outline each step as we go along. In this example, we will make a map of the change in total population.
We need to install or load some more packages containing functions that will help us make some maps.
# Download / install and then load classInt
install.packages("classInt", depend = TRUE, lib = getwd())
library("classInt", lib.loc = getwd())
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())
The first step is to find suitable breakpoints for the data contained in the TotPop_PC column. The contiuous data needs to be assigned into categories so different colours can be applied on a choropleth map. There are numerous ways of doing this such as jenks, standard deviations or equal intervals. In this example we use function classIntervals from the classInt package to find the ranges needed to divide the the TotPop_PC into six categories.
# set breaks for 6 categories and then add the values to a single list -
# style = fisher is jenks
breaks <- classIntervals(LSOA@data$TotPop_PC, n = 6, style = "fisher")
# classIntervals returns an object with a number of different slots -
# however, we only need the break values; thus, we overwrite the object
# breaks as follows
breaks <- breaks$brks
We will now use another package called RColorBrewer which provides a series of colour pallets that are suitable for mapping. You can have a look at the colour pallets online: http://colorbrewer2.org. Each of these pallets are named; and you can see all the available pallets as follows…
display.brewer.all()
We will now select a set of six colours from one of these pallets into a new list object called my_colours. If you print this object, you will see it contains six HEX codes for colours within the selected pallet.
# Select six colours from the pallet YlOrRd
my_colours <- brewer.pal(6, "YlOrRd")
# Print the colour codes to the terminal
my_colours
[1] "#FFFFB2" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#BD0026"
Another function called findInterval() can be used with the “breaks” object created earlier to divide the TotPop_PC column into aggregate groups. To view the intervals, just print the breaks object…
breaks
[1] -79.007 -44.874 -12.964 4.688 30.600 73.195 127.698
Before we use these breaks to create a chropleth map, let us first have a look to see what the findInterval() function actually does.
The following code returns a list of numbers. Each number represents a row in the LSOS@data TotPop_PC column - i.e. there will be the same number of values as there are rows in LSOA@data.
The function has examined the values of the TotPop_PC column against the break points that you have requested (stored in the “breaks” object); and returned these as a series of values (1-6) - i.e. break point 1 gets the number 1 etc. The all.inside=TRUE is an optional parameter to ensure that all values are assessed correctly.
# print the intervals
findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)
[1] 6 5 4 3 3 3 5 4 3 3 3 3 3 4 3 3 3 4 3 3 3 3 3 3 3 3 5 3 3 3 3 2 3 3 3
[36] 3 3 3 4 4 4 4 4 3 3 4 3 2 1 3 1 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 4 4 3 3
[71] 3 3 4 3 3 4 4 3 3 3 3 3 3 3 3 3 3 2 2 3 2 1 3 3 3 3 4 3 3 3 3 3 3 3 3
[106] 3 3 3 3 3 3 2 3 3 3 3 3 4 3 3 3 3 3 5 3 3 4 2 3 4 5 2 3 4 3 4 3 4 6 6
[141] 4 3 3 4 4 3 3 3 4 3 3 4 2 4 4 3 3 4 4 3 3 4 4 4 3 4 2 4 4 3 3 3 3 5 3
[176] 3 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 3 3 2 3 3 3 4 3 3 3 3 3 3 3 4 3 3 3 5
[211] 4 3 3 2 3 4 2 3 4 3 3 3 4 3 3 3 3 3 3 2 3 3 5 3 4 3 6 3 1 2 6 2 3 3 3
[246] 4 3 3 2 4 3 4 3 3 3 3 4 3 3 2 3 3 3 3 3 4 2 3 3 3 4 3 3 3 4 4 3 4 3 3
[281] 3 3 4 3 3 3 3 3 4 2 3
We can combine this list of 1-6 numbers to select specific colours from the the my_colours object you created earlier.
By using the square brackets we can use the output of the findInterval() function to select items from the my_colours list. E.g., a number 1 would select the first colour, a 2 the second colour etc.
# Returns a list of colours
my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)]
[1] "#BD0026" "#F03B20" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
[8] "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
[15] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
[22] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C"
[29] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[36] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FD8D3C"
[43] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FED976" "#FFFFB2"
[50] "#FEB24C" "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[57] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[64] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[71] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C"
[78] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[85] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FED976" "#FEB24C" "#FED976"
[92] "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[99] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[106] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976"
[113] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[120] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C" "#FEB24C"
[127] "#FD8D3C" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#FED976" "#FEB24C"
[134] "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#BD0026" "#BD0026"
[141] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[148] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[155] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[162] "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[169] "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C"
[176] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[183] "#FEB24C" "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
[190] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C"
[197] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[204] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
[211] "#FD8D3C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FD8D3C" "#FED976"
[218] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[225] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C"
[232] "#FEB24C" "#F03B20" "#FEB24C" "#FD8D3C" "#FEB24C" "#BD0026" "#FEB24C"
[239] "#FFFFB2" "#FED976" "#BD0026" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[246] "#FD8D3C" "#FEB24C" "#FEB24C" "#FED976" "#FD8D3C" "#FEB24C" "#FD8D3C"
[253] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[260] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
[267] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[274] "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[281] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[288] "#FEB24C" "#FD8D3C" "#FED976" "#FEB24C"
We can then pull all this together to create a basic chropleth map using the plot() function. The first attribute is the SpatialPolygonsDataFrame, followed by col= which is how we want to colour the polygons. If we specified a single colour, e.g. “#FEB24C”, all the polygons would be the same colour.
However, because we use the findInterval() function to generate a list of colours, each polygon is coloured by an appropriate colour associated with area's TotPop_PC value.
The other attributes in the plot() function are axes=FALSE to remove axis from the plot, and border = NA to remove borders from around the polygons.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
You can see what the axes and border function do if you run the following code, setting the axes to TRUE, and the border to a grey colour (#6B6B6B).
# Create a basic choropleth map with borders and axis
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = TRUE, border = "#6B6B6B")
You see that your previous plot has been replaced by a new plot with the LSOA borders coloured in grey, and the axis now visible - the scale on the axis are meters and relate to the projection of the input shapefile (OSGB - British National Grid).
As noted earlier, we can also plot extra data on top of the original chropleth. This is useful for adding additional context to maps.
Visit the UK Borders website again and use the boundary data selector to download the English Census Area Statistic Wards for Liverpool. If you put these into a new folder in your working directory called CASWARD, we will quickly run through the steps to prepare these data by clipping them to the coastline (i.e. so they match the LSOA).
# Read in the CASWARD Data
CASWARD <- readShapePoly("CASWARD/england_caswa_2001.shp")
# Get a set of ids for each of the polygons
my_area_id <- as.character(CASWARD@data$name)
# Clip the CASWARD to the coast outline
CASWARD <- gIntersection(CASWARD, outline, byid = TRUE, id = my_area_id)
# Get the ID values from the CASWARD spatial polygon
data_IDs <- getSpPPolygonsIDSlots(CASWARD)
# Convert into a dataframe
data_IDs <- as.data.frame(data_IDs)
# Set the row names to be the same as the area ids
rownames(data_IDs) <- data_IDs$data_IDs
# Join the ID values back onto the spatial polygon
CASWARD <- SpatialPolygonsDataFrame(CASWARD, data_IDs)
# Rename LSOA codes
colnames(CASWARD@data) <- "wards"
We can now plot the basic choropleth again, however, this time we will add a second layer showing the ward outlines. In the second plot, we simply specify a grey border colour, and add another parameter called add=TRUE. This tells the plot window to keep the previous result, and add the new plot on top.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
We now have a pretty convincing map. However, there are a few things missing. The first of these are labels for the ward zones. We can use the coordinates() function to extract the central location of each ward polygon, and use this with the names of the ward which are stored in the data slot.
# Print the top six results of using coordinates()
head(coordinates(CASWARD))
[,1] [,2]
St. Mary's 342161 383312
Dingle 336113 387301
Grassendale 339428 385775
Allerton 341695 385904
Broadgreen 340738 391460
Gillmoss 340764 395445
We can use the text() function to add these text onto the map. The first parameter are the x coordinates, the second the y, and the third the text we want to use as the labels. The final cex parameter scales the text smaller.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards,
cex = 0.7)
The final items we need to add onto the map are a scale bar, a north arrow and a legend. A legend is added using the legend() function - the parameters used include the xy location, legend= - which are the text labels to use on the legend (leglabs and round are used to make these display correctly), fill= which is the object containing the fill colours you used to colour the LSOA choropleth, bty=“n” which turns off the border around the legend.
You might find you will need to adjust the location or size of the legend to get this to fit onto your plot correctly. For scaling, the cex= parameter will reduce or increase the size of the item. To find a new set of location coordinates, type locator() into the terminal and press enter. After doing this, when you hover over the plot, the mouse will turn into a cross. If you click, and then press escape, the location of the click is printed to the terminal - you can use these to re-position items in the plot.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards,
cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "),
fill = my_colours, bty = "n")
The sp package which you loaded earlier contains a number of other functions that enable you to add useful items to the map, including a north arrow and scale bar. There are also base R functions for adding text and titles.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards,
cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "),
fill = my_colours, bty = "n")
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000,
plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000,
fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(335379, 381606, "0km", cex = 0.8)
text(335379 + 1500, 381606, "1.5km", cex = 0.8)
text(335379 + 3000, 381606, "3km", cex = 0.8)
# Add a title
title("Total Population Change (%)")
You have been creating all your plots within R, however, when preparing your reports you will need to export these maps. In order to do this, we wrap the code to create a map with png('rplot.jpg') and then dev.off(). This creates an image called rplot.png in your working directory. This code tells R is to write the results to a jpg rather than the plot window - thus, you won't see the output until you view the jpg. Another thing to remember is that the positioning of the items in the jpg may be a little different to the plot window - so you may have to adjust these and re-plot the image. For example…
# Write out to a png...
png("rplot.png", width = 800, height = 800, units = "px")
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards,
cex = 0.8)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "),
fill = my_colours, bty = "n", cex = 0.9)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000,
plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000,
fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(335379, 381606, "0km", cex = 0.7)
text(335379 + 1500, 381606, "1.5km", cex = 0.7)
text(335379 + 3000, 381606, "3km", cex = 0.7)
# Add a title
title("Total Population Change (%)")
# Change output back to the plot window
dev.off()
The submitted work for this practical is a short report alongside a series of maps that you will generate in R. The total word count for the report is 500 words, and it is expected that you will include at least 5 maps that you have generated in R. You should make reference to the wider literature in your report and cite appropriately - including a formatted reference list (the list is not included in the word count). If you include data tables in your report, these also do not contribute to the total word count. Please do not include any appendix.
You are free to layout your report as you wish, however all maps or other graphics must be appropriately referenced in the text – e.g. see Figure 1 etc.
If you have followed the tutorials in this practical you will have now created a SpatialPolygonsDataFrame for the Liverpool local authority that contains the following variables:
With reference to the evidence you create from the above data, further data sources that you integrate independently into R, and wider reading, write a report that will address the following questions:
With reference to your population and council tax band maps, identify the area of Liverpool with the highest population increase; discussing those potential processes that may have influenced these changes.
With reference to your population and council tax band maps, identify the area of Liverpool with the highest population decrease; discussing those potential processes that may have influenced these changes.
Through this report you should build a narrative using maps (or any other data visualization you deem appropriate) to hypothesis about the possible causes and impacts of the change that you observe to have occurred in your study area since the 2001 Census. Your marks will be linked to both the quality and clarity of your answers to the above questions and the visualizations you create.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.