Detecting Neighbourhood Change

1 Aims and Objectives

In this project you will compile a number of measures that can be used to assess the extent that small areas have changed in their composition over a 10 year period. You will explore data about changes in both built structures and demographics, and through the representations you create, formulate plausible hypotheses about why the patterns that you uncover may have occurred.

Through this work you will also learn a series of fundamental skills for the input, manipulation, analysis and display of spatial data using the statistical programming language R.

At this stage it might be worth looking at the end of this tutorial for the assignment questions as you can keep these in mind as you run through the practical.

2 Learning Outcomes

3 Some preliminaries on reading and manipulating data

Before we begin the practical, there are a couple of things to learn…

Firstly, what is a CSV file, and how can we get them into R? These files will usually have the file extension “.csv” and are typically opened in a spreadsheet software package.

Supposing we want to create a data set that contains a list of people who have come along to an R training workshop. This might look something like this:

Name Age Place
John 20 Liverpool
Helen 10 London
Mia 16 Liverpool
Carl 21 Manchester

Open Notepad and create a new text file, entering the following details as separate lines, with each column in the table separated by a comma, and each row on a new line (please note that the first row contains our desired headings):

Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester

However, before you do, create yourself a project directory in your storage area. Then use the set working directory function to tell R where you are saving your files.

Save the text file as “test.csv” in a new folder in your storage area. This will be your working directory for the rest of the practical. We need to tell R that this is where you will be storing all your files by running the setwd() function.

You need to paste the location of your working directory into the setwd() function between the “”. Please note, that on windows you need to make sure the slashes between the directories are formatted as either / or .

# For example...
setwd("M:/R work/Neighbourhood_Change/")

Next we will read in the CSV you created above.

# First create a new variable with the location of your csv file - you
# only need the file name as by default R looks in the working directory
file_location <- "test.csv"
# Next we will read this into a new data frame called test.
test <- read.csv(file_location, header = TRUE)
# Now print test
test
   Name Age      Place
1  John  20  Liverpool
2 Helen  10     London
3   Mia  16  Liverpool
4  Carl  21 Manchester

Next, duplicate test.csv in your working directory, renaming this to test2.csv and opening it in Notepad. Add the following to the top of the file: “Here is some example text which we will use to demonstrate how to skip lines” and on another line: “Here is another line of text” so it looks as follows:

Here is some example text which we will use to demonstrate how to skip lines
Here is another line of text
Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester

Save the file and close it.

Next we will import this new .csv into R

# Update the file_location object with the location of the new file
file_location <- "test2.csv"

We have to add an extra parameter to the read.csv function which specifies that the data we are interested in are located after line 2.

test2 <- read.csv(file_location, header = TRUE, skip = 2)

If you print the results of the object test2 (i.e. type it in the terminal!) you will see this is now the same as test, because you skipped the two junk lines.

Next we will illustrate how we can create a subset of a data frame, add a column and calculate a new value. To create a subset of the test2 data frame, we can use the subset function - this can select certain rows, columns or a combination of the two.

For example…

# Select the name and age columns
subset(test, select = c("Name", "Age"))
   Name Age
1  John  20
2 Helen  10
3   Mia  16
4  Carl  21
# Select rows where the place is Liverpool
subset(test, Place == "Liverpool")
  Name Age     Place
1 John  20 Liverpool
3  Mia  16 Liverpool
# Select the name and age columns, and where Place is Liverpool
subset(test, Place == "Liverpool", select = c("Name", "Age"))
  Name Age
1 John  20
3  Mia  16

We will now repeat the last step, however store the results as a new object. You will see that nothing prints to the terminal.

# Select the name and age columns, however, only where Place is equal to
# Liverpool (in R we need to use a double equals sign)
test3 <- subset(test, Place == "Liverpool", select = c("Name", "Age"))

We can add a new column called “diff100” to the data frame object test3. We do this using the $ symbol. At first, we will make the values of the column NA (i.e. no value)

test3$diff100 <- NA

If you print test3, you will now see the new column which will contain the values NA. However, what we actually want to do is find out how different each of the ages are to 100 (i.e. 100 - age). In order to create this calculation we need to refer to the age column in the data frame. We can use the $ again to achieve this. For example…

test3$diff100 <- 100 - test3$Age
# print test3
test3
  Name Age diff100
1 John  20      80
3  Mia  16      84

OK, using what you have learnt in this past section, you are now ready to play with some real data!

A couple of other important things! If you want to see the objects that you have created in the R workspace, you can use the ls() function

# List objects in the workspace
ls()
[1] "file_location" "test"          "test2"         "test3"        

If you want to save your workspace - i.e. so you can open it up later on (like saving your work!); you can use the save.image function.

# This will save all objects currently loaded in the workspace
save.image(file = "myfile.RData")

After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.

# Remember that you will need to set your working directory before trying
# to load - otherwise R will not know where to find you RData file
load("myfile.RData")

4 Detecting Changes

Structural Changes

This section of the practical will examine the use of multiple data that can be integrated to generate statistics that enable you to detect and hypothesise about potential changes in neighbourhood built structure occurring at small geographic scales. These changes may have occurred for a number of different reasons including: new building work, gentrification, clearances or changes of use (e.g. residential to retail).

Changing Council Tax Bands

Houses of different size are placed into a band that determines how much council tax occupants or owners of the property have to pay. The changing composition of property bands within an area over time can indicate structural changes. You will now download and prepare a series of data sets.

Downloading, input and preparation of the council tax data

Open a web browser and visit the website: http://www.neighbourhood.statistics.gov.uk and select the “Topics” link on the bottom left hand side of the page. On the next screen search for “council tax”. Click on “Data set name matches”, then where it says “Dwelling Stock by Council Tax Band”, click the radio button for this entry on the far right hand side of the page. Then press the “Next” button on the bottom right hand side of the page.

For the row “Dwelling Stock by Council Tax Band, 2001”, click the related radio button for “Download” and press the “Next” button on the bottom right hand side of the page. Select the radio button for the North West Government Office Region. Press the “Next” button on the bottom right hand side of the page, and on the following page, download the CSV file by clicking on the “Comma separated values [.csv]”. This will be zipped, and once you have extracted the directory you find it contains CSV files for a variety of different geographies. You are interested in the **Lower Super Output Area level* file which will be named “G050301_939_GeoPolicy_NW_LSOA.CSV”. Copy this to your working directory which you defined earlier. If you have forgotten where this is you can find out with the following R command.

getwd()

Repeat the above step for the 2011 Lower Super Output Area Level council tax band data. Once you have downloaded and extracted the zip file, this will be called “G050311_2417_GeoPolicy_NW_LSOA.CSV”. Note: you may see reference to 2001 Administrative Geography when you are downloading this file - don't worry about this; it relates to the fact the data are organised into the 2001 boundaries. Also, when you look inside the zip file, you might need to browse through a number of nested directories in order to find the CSV.

The next step is to read these data into R. If you open either of the CSV up in Notepad, you will see that the first 5 lines are unwanted; with the 6th containing the column headings. Lets begin with the 2001 data.

# Read in the 2001 data into a new object CT_2001, skipping the first 5
# lines and keeping the headers
CT_2001 <- read.csv("G050301_939_GeoPolicy_NW_LSOA.CSV", header = TRUE, skip = 5)

It is not a good idea to view all of the data frame… there is a lot of it! However we can see the first 6 rows using the head() function. Because there are too many columns to fit into a single span of the terminal, R wraps these over multiple lines; however, you will see each row identified by a number 1-6.

head(CT_2001)
  GOR_CODE   GOR_NAME CTY_CODE CTY_NAME LA_CODE LA_NAME MSOA_CODE
1        B North West       NA             00BL  Bolton E02000988
2        B North West       NA             00BL  Bolton E02000988
3        B North West       NA             00BL  Bolton E02000984
4        B North West       NA             00BL  Bolton E02000986
5        B North West       NA             00BL  Bolton E02000986
6        B North West       NA             00BL  Bolton E02000986
   MSOA_NAME LSOA_CODE   LSOA_NAME AREA_METADATA  X DATA_VALUE
1 Bolton 005 E01004766 Bolton 005A            NA NA        858
2 Bolton 005 E01004767 Bolton 005B            NA NA        685
3 Bolton 001 E01004768 Bolton 001A            NA NA        602
4 Bolton 003 E01004769 Bolton 003A            NA NA        645
5 Bolton 003 E01004770 Bolton 003B            NA NA        629
6 Bolton 003 E01004771 Bolton 003C            NA NA        631
  DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3 DATA_VALUE.4 DATA_VALUE.5
1          605        70.51          107        12.47          121
2          292        42.63          127        18.54          199
3            2         0.33            8         1.33          162
4           92        14.26          176        27.29          206
5           30         4.77           99        15.74          367
6            9         1.43           21         3.33          298
  DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8 DATA_VALUE.9 DATA_VALUE.10
1        14.10           11         1.28            4          0.47
2        29.05           58         8.47            4          0.58
3        26.91          163        27.08          185         30.73
4        31.94           96        14.88           38          5.89
5        58.35          107        17.01           13          2.07
6        47.23          207        32.81           66         10.46
  DATA_VALUE.11 DATA_VALUE.12 DATA_VALUE.13 DATA_VALUE.14 DATA_VALUE.15
1             8          0.93             2          0.23             0
2             2          0.29             3          0.44             0
3            47          7.81            35          5.81             0
4            32          4.96             5          0.78             0
5            12          1.91             1          0.16             0
6            19          3.01            11          1.74             0
  DATA_VALUE.16 DATA_VALUE.17 DATA_VALUE.18
1             0             0             0
2             0             0             0
3             0             0             0
4             0             0             0
5             0             0             0
6             0             0             0

Each row of the data pertains to a single Output Area (OA). In addition to the count and percentage of houses in each council tax band (DATA_VALUE columns), the data also contains lookups to the larger areas in which the OA nest. However, we do not need these for this project. Therefore, the first thing we will do is create a list of the columns we want to keep. You view a list of all the columns in a data frame using the colnames() function.

colnames(CT_2001)
 [1] "GOR_CODE"      "GOR_NAME"      "CTY_CODE"      "CTY_NAME"     
 [5] "LA_CODE"       "LA_NAME"       "MSOA_CODE"     "MSOA_NAME"    
 [9] "LSOA_CODE"     "LSOA_NAME"     "AREA_METADATA" "X"            
[13] "DATA_VALUE"    "DATA_VALUE.1"  "DATA_VALUE.2"  "DATA_VALUE.3" 
[17] "DATA_VALUE.4"  "DATA_VALUE.5"  "DATA_VALUE.6"  "DATA_VALUE.7" 
[21] "DATA_VALUE.8"  "DATA_VALUE.9"  "DATA_VALUE.10" "DATA_VALUE.11"
[25] "DATA_VALUE.12" "DATA_VALUE.13" "DATA_VALUE.14" "DATA_VALUE.15"
[29] "DATA_VALUE.16" "DATA_VALUE.17" "DATA_VALUE.18"

This printed a list of all the columns to the terminal. However, we can also append these to a new object which we will call keep_list.

keep_list <- colnames(CT_2001)

If you print the keep_list to the terminal you will see that the contents of the list are numbered; we can use these positions to select items from the list.

# We can select a single item from the list with a square bracket (e.g.
# item 1)
keep_list[1]
[1] "GOR_CODE"
# We can select multiple items from the list using a colon (e.g items 3-5
# in the list)
keep_list[3:5]
[1] "CTY_CODE" "CTY_NAME" "LA_CODE" 
# We can combine a single select and a multiple select by using c(,) -
# i.e. concatenate
keep_list[c(1, 3:5)]
[1] "GOR_CODE" "CTY_CODE" "CTY_NAME" "LA_CODE" 
# Finally, we will use this to create a new object called keep_list
keep_list <- keep_list[c(9, 13:31)]

If you print keep_list to the terminal, this should contain the names of the columns we want to keep from CT_2001. We will now use this with the subset function to create a new object where unwanted columns from the CT_2001 data frame have been removed. We also only want those LSOA which are in Liverpool. Fortunately, there is another column called “LA_NAME” which contains the local authority where the LSOA is located. So, if we specify this in a subset query, we can create a new object containing only the data we need. You will see that we have used the keep_list to select the columns.

CT_2001_Liverpool <- subset(CT_2001, LA_NAME == "Liverpool", keep_list)
# Have a quick look at the new data frame
head(CT_2001_Liverpool)
     LSOA_CODE DATA_VALUE DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3
1746 E01006511        540          232        42.96           58
1747 E01006512        627          524        83.57           42
1748 E01006513        579          264        45.60          227
1749 E01006514        972          657        67.59          197
1750 E01006515        826          761        92.13           64
1751 E01006516       1212         1092        90.10           75
     DATA_VALUE.4 DATA_VALUE.5 DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8
1746        10.74           22         4.07           61        11.30
1747         6.70           11         1.75           49         7.81
1748        39.21           31         5.35           12         2.07
1749        20.27           86         8.85           19         1.95
1750         7.75            1         0.12            0         0.00
1751         6.19           42         3.47            2         0.17
     DATA_VALUE.9 DATA_VALUE.10 DATA_VALUE.11 DATA_VALUE.12 DATA_VALUE.13
1746          125         23.15            16          2.96            24
1747            1          0.16             0          0.00             0
1748           39          6.74             4          0.69             1
1749           10          1.03             1          0.10             2
1750            0          0.00             0          0.00             0
1751            0          0.00             1          0.08             0
     DATA_VALUE.14 DATA_VALUE.15 DATA_VALUE.16 DATA_VALUE.17 DATA_VALUE.18
1746          4.44             0          0.00             2          0.37
1747          0.00             0          0.00             0          0.00
1748          0.17             1          0.17             0          0.00
1749          0.21             0          0.00             0          0.00
1750          0.00             0          0.00             0          0.00
1751          0.00             0          0.00             0          0.00

You should now have a data frame formatted as shown above. However, you will notice that the column names are not really that useful - and indeed, you are probably wondering what all those numbers are? We actually discarded this information when we skipped the rows in the original CSV input. However, if you open the original source CSV file up in a spreadsheet programme you will see that they are a mixture of counts and percentages. You will now rename the columns with labels that are easy to remember. We can do this in R using the colnames() function that you have just used to read the column names - however, this time we will use it to set them.

The first thing we need to do is create a list of the column names we want. We will concatenate a list of character strings together with the column names we want - again using c()

new_column_names <- c("LSOA_CODE", "CTAX01_Total_Count", "CTAX01_BandA_Count", 
    "CTAX01_BandA_Pct", "CTAX01_BandB_Count", "CTAX01_BandB_Pct", "CTAX01_BandC_Count", 
    "CTAX01_BandC_Pct", "CTAX01_BandD_Count", "CTAX01_BandD_Pct", "CTAX01_BandE_Count", 
    "CTAX01_BandE_Pct", "CTAX01_BandF_Count", "CTAX01_BandF_Pct", "CTAX01_BandG_Count", 
    "CTAX01_BandG_Pct", "CTAX01_BandH_Count", "CTAX01_BandH_Pct", "CTAX01_BandX_Count", 
    "CTAX01_BandX_Pct")

You can check the length of the new_column_names object using the length function…

length(new_column_names)
[1] 20

and that this matches the number of columns in your CT_2001_Liverpool data frame.

length(CT_2001_Liverpool)
[1] 20

Both should be 20. We can now apply these new column names to the data frame.

# Add the new column names
colnames(CT_2001_Liverpool) <- new_column_names

You now need to import the 2011 data into R using the same procedures that you have used for 2001. However, be careful, you can't just repeat the commands given previously as there are differences between the 2001 and 2011 data. Please copy the previous commands related to 2001 into a text file (e.g. notepad), change them, and then paste into the R terminal.

Some things that you will need to change from the 2001 code detailed above when importing the 2011 data:

  1. The Lower Super Output Area codes are in column number 11 rather than 9
  2. There is now a new council tax band “I” meaning that there are two extra columns of data (i.e. 1 count and 1 percentage). This new band was introduced in 2005 for properties located in Wales only. Thus it is not applicable for this exercise, as your study area is in England. We will remove this later! It is in the source data CSV after band H and before band X - please have a look for this!!
  3. As such, the data are found in columns 15 through to 35
  4. You will need to create new column names for the 2011 data, you should make these different from the 2001 data. Thus, CTAX01_Band A_Count from the 2001 data should have the name CTAX11_Band A_Count in the 2011 data - note 01 has been changed to 11.
  5. Call the new 2011 data frame object CT_2011_Liverpool.

If you have done the above correctly, the head of the new data frame should look like:

head(CT_2011_Liverpool)
     LSOA_CODE CTAX11_Total_Count CTAX11_BandA_Count CTAX11_BandA_Pct
3654 E01006511               2671                469            17.56
3655 E01006512                792                665            83.96
3656 E01006513               1178                344            29.20
3657 E01006514               1014                624            61.54
3658 E01006515                673                578            85.88
3659 E01006516               1237               1020            82.46
     CTAX11_BandB_Count CTAX11_BandB_Pct CTAX11_BandC_Count
3654                760            28.45               1020
3655                 76             9.60                  1
3656                429            36.42                320
3657                212            20.91                133
3658                 83            12.33                 12
3659                127            10.27                 83
     CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
3654            38.19                233             8.72
3655             0.13                 49             6.19
3656            27.16                 36             3.06
3657            13.12                 28             2.76
3658             1.78                  0             0.00
3659             6.71                  4             0.32
     CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count
3654                143             5.35                 21
3655                  1             0.13                  0
3656                 46             3.90                  1
3657                 14             1.38                  0
3658                  0             0.00                  0
3659                  3             0.24                  0
     CTAX11_BandF_Pct CTAX11_BandG_Count CTAX11_BandG_Pct
3654             0.79                 25             0.94
3655             0.00                  0             0.00
3656             0.08                  1             0.08
3657             0.00                  2             0.20
3658             0.00                  0             0.00
3659             0.00                  0             0.00
     CTAX11_BandH_Count CTAX11_BandH_Pct CTAX11_BandI_Count
3654                  0             0.00                  0
3655                  0             0.00                  0
3656                  1             0.08                  0
3657                  1             0.10                  0
3658                  0             0.00                  0
3659                  0             0.00                  0
     CTAX11_BandI_Pct CTAX11_BandX_Count CTAX11_BandX_Pct
3654                0                  0                0
3655                0                  0                0
3656                0                  0                0
3657                0                  0                0
3658                0                  0                0
3659                0                  0                0

We will now remove the two unwanted columns CTAX11_Band I_Count and CTAX11_Band I_Pct. You could use the subset procedure we implemented earlier, however this is another way (assuming that you have named the columns as described above)…

CT_2011_Liverpool$CTAX11_BandI_Count <- NULL
CT_2011_Liverpool$CTAX11_BandI_Pct <- NULL

We will also remove the BandX columns from both the 2001 and 2011 data - BandX relates to property with an unknown band.

CT_2001_Liverpool$CTAX01_BandX_Count <- NULL
CT_2001_Liverpool$CTAX01_BandX_Pct <- NULL
CT_2011_Liverpool$CTAX11_BandX_Count <- NULL
CT_2011_Liverpool$CTAX11_BandX_Pct <- NULL

Merging the Council tax data

We now need to create a new data frame that combines both the 2001 and 2011 data. One way of achieving this is to use the merge() function. However before we do this, we will first explore exactly what the merge function does.

We will use the data frame you created right at the start of the tutorial. If you have forgotten what this looked like, you can always print the object in the terminal.

test
   Name Age      Place
1  John  20  Liverpool
2 Helen  10     London
3   Mia  16  Liverpool
4  Carl  21 Manchester

You now need to create another data frame which we will use as an example. You could create another csv file and import this; however, we will illustrate another way of achieving this through the amalgamation of a series of vector lists.

# Create a person vector
Person <- c("Paul", "Mike", "John", "Helen", "Mia", "Leo", "Carl")
# Create a favourite functions
Function <- c("merge()", "read.csv()", "colnames()", "ncol()", "length()", "getwd()", 
    "save.image()")
# We can now join these two vectors into a new data frame of favourite
# foods
fav_fun <- data.frame(Person, Function)
# View the fav_fun
fav_fun
  Person     Function
1   Paul      merge()
2   Mike   read.csv()
3   John   colnames()
4  Helen       ncol()
5    Mia     length()
6    Leo      getwd()
7   Carl save.image()

We now have two tables, the first listing a series of people who attended a current R training workshop, and the second containing these current people, and those additional people who have been to previous workshops; along with their favourite R functions!

What we want to do is append the favourite functions onto our list of current course attendees. We will refer to the two data frames as x and y. The x data frame is test; and the y is fav_fun. In x, the column containing the list of people is called “Name”, and in y, it is called “Person”. The parameters of the merge function first accept the two table names, and then the lookup columns as by.x or by.y. You should also include all.x=TRUE as a final parameter. This tells the function to keep all the records in x, but only those in y which match. As an experiment you can change this to all.y=TRUE - what you will find is that all the y records are now in the data frame, however, because there were no matches in x, there are NA values present.

We will now create a new data frame called People_And_Functions and print this to the terminal.

People_And_Functions <- merge(test, fav_fun, by.x = "Name", by.y = "Person", 
    all.x = TRUE)
People_And_Functions
   Name Age      Place     Function
1  Carl  21 Manchester save.image()
2 Helen  10     London       ncol()
3  John  20  Liverpool   colnames()
4   Mia  16  Liverpool     length()

If the by columns were the same in both x and y, we can specify this simply with by=“column name” rather than by.x and by.y; and finally, a critical issue when making any join is assuring that the “by” columns are in the same format.

The CT_2001_Liverpool data frame should now be joined to CT_2011_Liverpool. You will need to specify a merge that:

  1. Creates a new object called CT_2001_2011_Liverpool
  2. Merges CT_2001_Liverpool to CT_2011_Liverpool
  3. Using LSOA_CODE from both CT_2001_Liverpool and CT_2011_Liverpool

If you have done the join correctly, this should look like:

head(CT_2001_2011_Liverpool)
  LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count CTAX01_BandA_Pct
1 E01006511                540                232            42.96
2 E01006512                627                524            83.57
3 E01006513                579                264            45.60
4 E01006514                972                657            67.59
5 E01006515                826                761            92.13
6 E01006516               1212               1092            90.10
  CTAX01_BandB_Count CTAX01_BandB_Pct CTAX01_BandC_Count CTAX01_BandC_Pct
1                 58            10.74                 22             4.07
2                 42             6.70                 11             1.75
3                227            39.21                 31             5.35
4                197            20.27                 86             8.85
5                 64             7.75                  1             0.12
6                 75             6.19                 42             3.47
  CTAX01_BandD_Count CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
1                 61            11.30                125            23.15
2                 49             7.81                  1             0.16
3                 12             2.07                 39             6.74
4                 19             1.95                 10             1.03
5                  0             0.00                  0             0.00
6                  2             0.17                  0             0.00
  CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count CTAX01_BandG_Pct
1                 16             2.96                 24             4.44
2                  0             0.00                  0             0.00
3                  4             0.69                  1             0.17
4                  1             0.10                  2             0.21
5                  0             0.00                  0             0.00
6                  1             0.08                  0             0.00
  CTAX01_BandH_Count CTAX01_BandH_Pct CTAX11_Total_Count
1                  0             0.00               2671
2                  0             0.00                792
3                  1             0.17               1178
4                  0             0.00               1014
5                  0             0.00                673
6                  0             0.00               1237
  CTAX11_BandA_Count CTAX11_BandA_Pct CTAX11_BandB_Count CTAX11_BandB_Pct
1                469            17.56                760            28.45
2                665            83.96                 76             9.60
3                344            29.20                429            36.42
4                624            61.54                212            20.91
5                578            85.88                 83            12.33
6               1020            82.46                127            10.27
  CTAX11_BandC_Count CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
1               1020            38.19                233             8.72
2                  1             0.13                 49             6.19
3                320            27.16                 36             3.06
4                133            13.12                 28             2.76
5                 12             1.78                  0             0.00
6                 83             6.71                  4             0.32
  CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count CTAX11_BandF_Pct
1                143             5.35                 21             0.79
2                  1             0.13                  0             0.00
3                 46             3.90                  1             0.08
4                 14             1.38                  0             0.00
5                  0             0.00                  0             0.00
6                  3             0.24                  0             0.00
  CTAX11_BandG_Count CTAX11_BandG_Pct CTAX11_BandH_Count CTAX11_BandH_Pct
1                 25             0.94                  0             0.00
2                  0             0.00                  0             0.00
3                  1             0.08                  1             0.08
4                  2             0.20                  1             0.10
5                  0             0.00                  0             0.00
6                  0             0.00                  0             0.00

Creating Metrics of Change for the Council Tax Data

You have now created a single council tax file that contains both counts and percentages for each council tax band in 2001 and 2011. You will now calculate a series of new columns that show the absolute changes across the bands between the two time periods.

The following example will create two new columns, one for the total change, and one for Band A changes. We use the $ to both create a new column, and also to refer to values in existing columns.

# Create Total changes
CT_2001_2011_Liverpool$Ch_Total <- CT_2001_2011_Liverpool$CTAX11_Total_Count - 
    CT_2001_2011_Liverpool$CTAX01_Total_Count
# Create Band A Changes
CT_2001_2011_Liverpool$Ch_BandA <- CT_2001_2011_Liverpool$CTAX11_BandA_Count - 
    CT_2001_2011_Liverpool$CTAX01_BandA_Count

You now need to repeat this for the remaining council tax bands - call these new columns Ch_BandB through to Ch_BandH

Your council tax data frame should then look like:

head(CT_2001_2011_Liverpool)
  LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count CTAX01_BandA_Pct
1 E01006511                540                232            42.96
2 E01006512                627                524            83.57
3 E01006513                579                264            45.60
4 E01006514                972                657            67.59
5 E01006515                826                761            92.13
6 E01006516               1212               1092            90.10
  CTAX01_BandB_Count CTAX01_BandB_Pct CTAX01_BandC_Count CTAX01_BandC_Pct
1                 58            10.74                 22             4.07
2                 42             6.70                 11             1.75
3                227            39.21                 31             5.35
4                197            20.27                 86             8.85
5                 64             7.75                  1             0.12
6                 75             6.19                 42             3.47
  CTAX01_BandD_Count CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
1                 61            11.30                125            23.15
2                 49             7.81                  1             0.16
3                 12             2.07                 39             6.74
4                 19             1.95                 10             1.03
5                  0             0.00                  0             0.00
6                  2             0.17                  0             0.00
  CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count CTAX01_BandG_Pct
1                 16             2.96                 24             4.44
2                  0             0.00                  0             0.00
3                  4             0.69                  1             0.17
4                  1             0.10                  2             0.21
5                  0             0.00                  0             0.00
6                  1             0.08                  0             0.00
  CTAX01_BandH_Count CTAX01_BandH_Pct CTAX11_Total_Count
1                  0             0.00               2671
2                  0             0.00                792
3                  1             0.17               1178
4                  0             0.00               1014
5                  0             0.00                673
6                  0             0.00               1237
  CTAX11_BandA_Count CTAX11_BandA_Pct CTAX11_BandB_Count CTAX11_BandB_Pct
1                469            17.56                760            28.45
2                665            83.96                 76             9.60
3                344            29.20                429            36.42
4                624            61.54                212            20.91
5                578            85.88                 83            12.33
6               1020            82.46                127            10.27
  CTAX11_BandC_Count CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
1               1020            38.19                233             8.72
2                  1             0.13                 49             6.19
3                320            27.16                 36             3.06
4                133            13.12                 28             2.76
5                 12             1.78                  0             0.00
6                 83             6.71                  4             0.32
  CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count CTAX11_BandF_Pct
1                143             5.35                 21             0.79
2                  1             0.13                  0             0.00
3                 46             3.90                  1             0.08
4                 14             1.38                  0             0.00
5                  0             0.00                  0             0.00
6                  3             0.24                  0             0.00
  CTAX11_BandG_Count CTAX11_BandG_Pct CTAX11_BandH_Count CTAX11_BandH_Pct
1                 25             0.94                  0             0.00
2                  0             0.00                  0             0.00
3                  1             0.08                  1             0.08
4                  2             0.20                  1             0.10
5                  0             0.00                  0             0.00
6                  0             0.00                  0             0.00
  Ch_Total Ch_BandA Ch_BandB Ch_BandC Ch_BandD Ch_BandE Ch_BandF Ch_BandG
1     2131      237      702      998      172       18        5        1
2      165      141       34      -10        0        0        0        0
3      599       80      202      289       24        7       -3        0
4       42      -33       15       47        9        4       -1        0
5     -153     -183       19       11        0        0        0        0
6       25      -72       52       41        2        3       -1        0
  Ch_BandH
1        0
2        0
3        0
4        1
5        0
6        0

Demographic Change

In the previous section you have been introduced to the council tax data set which has been used to create measures that may indicate changes to the built structure of LSOA. In this section we will examine indicators of social change by comparing the 2001 Census data to population estimates derived by the Office for National Statistics.

2001 Census Data

The first stage is to download 2001 Census data. Although there are other sources of these data, in this example we will use the http://www.neighbourhood.statistics.gov.uk website again.

Navigate to the site and select the “Topics” link on the bottom left hand side of the page. On the next screen, click “2001 Census: Census Area Statistics”. Then click the radio button that corresponds to “Age (UV04)” and press the “Next” button at the bottom right hand side of the page.

Then click the related radio button for “Download” and press the “Next” button on the bottom right hand side of the page. Select the radio button for the North West Government Office Region. Press the “Next” button on the bottom right hand side of the page, and on the following page, download the CSV file by clicking on the “Comma separated values [*.csv]”. This will be zipped, and once you have extracted the directory you will find it contains CSV files for a variety of different geographies. You are interested in the Lower Super Output Area level file which will be named “UV040301_92_GeoPolicy_NW_LSOA.CSV”. Copy this to the working directory you defined earlier.

If you open this file up in Notepad you will again see that there are a series of lines at the top which do not contain the data or headings for the columns. As such, you need to:

  1. Read this CSV into R, skipping the unwanted lines and setting the column headings as the row containing “LSOA_CODE”. Call the new data frame POP_2001
  2. Subset the data so only those LSOA in Liverpool are present.
  3. Remove all columns apart from LSOA_CODE and those which contain numbers (i.e. DATA_VALUE columns)

This should then look like (note, only the first 6 rows and 10 columns are shown here):

# Using the square brackets we can view specific parts of a data frame-
# data[rows,columns]
POP_2001[1:6, 1:10]
     LSOA_CODE DATA_VALUE DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3
1746 E01006511       1026            7           10            8
1747 E01006512       1454            8           11           10
1748 E01006513       2077            4            5            6
1749 E01006514       1432            8            7           10
1750 E01006515       1156            6           15           12
1751 E01006516       1412           12           10           12
     DATA_VALUE.4 DATA_VALUE.5 DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8
1746            4            9            5            4            7
1747            9           10            8           13            7
1748            3            3            6            0            5
1749            7            8            6            6            5
1750           12           11           14           10            8
1751            8            7            7            4            4

You now need to rename the column headings of the data frame into something which are easier to interpret. There are obviously a lot of columns, and it would take a very long time to type out a list of vectors to use as the new names… However, there are some shortcuts. If you look at the source CSV file, you will see that the data columns represent: the total population, those under 1, then a series of columns for those aged 1 through to 74; and finally, those aged 75-79, 80-84, 85-89, 90-94, 95-99 and 100 or greater.

The main block of columns are the individual age bands; which we can create using the paste() function - this concatenates vectors (separated by a comma) together into characters. We also add sep=“” to indicate that we do not want any spaces putting between the new concatenated elements of the list.

years <- paste("YR_2001_", 1:74, sep = "")

We now need to add the remaining elements to the list which we will do in two stages. One will be added before the elements of the “years” list (pre), and the other afterwards (post)

pre <- c("LSOA", "YR_2001_TotPop", "YR_2001_Less1")
post <- c("YR_2001_75_79", "YR_2001_80_84", "YR_2001_85_89", "YR_2001_90_94", 
    "YR_2001_95_99", "YR_2001_100plus")

# Print the lists to see what they look like
pre
[1] "LSOA"           "YR_2001_TotPop" "YR_2001_Less1" 
years
 [1] "YR_2001_1"  "YR_2001_2"  "YR_2001_3"  "YR_2001_4"  "YR_2001_5" 
 [6] "YR_2001_6"  "YR_2001_7"  "YR_2001_8"  "YR_2001_9"  "YR_2001_10"
[11] "YR_2001_11" "YR_2001_12" "YR_2001_13" "YR_2001_14" "YR_2001_15"
[16] "YR_2001_16" "YR_2001_17" "YR_2001_18" "YR_2001_19" "YR_2001_20"
[21] "YR_2001_21" "YR_2001_22" "YR_2001_23" "YR_2001_24" "YR_2001_25"
[26] "YR_2001_26" "YR_2001_27" "YR_2001_28" "YR_2001_29" "YR_2001_30"
[31] "YR_2001_31" "YR_2001_32" "YR_2001_33" "YR_2001_34" "YR_2001_35"
[36] "YR_2001_36" "YR_2001_37" "YR_2001_38" "YR_2001_39" "YR_2001_40"
[41] "YR_2001_41" "YR_2001_42" "YR_2001_43" "YR_2001_44" "YR_2001_45"
[46] "YR_2001_46" "YR_2001_47" "YR_2001_48" "YR_2001_49" "YR_2001_50"
[51] "YR_2001_51" "YR_2001_52" "YR_2001_53" "YR_2001_54" "YR_2001_55"
[56] "YR_2001_56" "YR_2001_57" "YR_2001_58" "YR_2001_59" "YR_2001_60"
[61] "YR_2001_61" "YR_2001_62" "YR_2001_63" "YR_2001_64" "YR_2001_65"
[66] "YR_2001_66" "YR_2001_67" "YR_2001_68" "YR_2001_69" "YR_2001_70"
[71] "YR_2001_71" "YR_2001_72" "YR_2001_73" "YR_2001_74"
post
[1] "YR_2001_75_79"   "YR_2001_80_84"   "YR_2001_85_89"   "YR_2001_90_94"  
[5] "YR_2001_95_99"   "YR_2001_100plus"

The list objects that you have just created now need to be concatenated which can be achieved using the c() function.

# Concatenate the new column names
new_column_names <- c(pre, years, post)
# print the names
new_column_names
 [1] "LSOA"            "YR_2001_TotPop"  "YR_2001_Less1"  
 [4] "YR_2001_1"       "YR_2001_2"       "YR_2001_3"      
 [7] "YR_2001_4"       "YR_2001_5"       "YR_2001_6"      
[10] "YR_2001_7"       "YR_2001_8"       "YR_2001_9"      
[13] "YR_2001_10"      "YR_2001_11"      "YR_2001_12"     
[16] "YR_2001_13"      "YR_2001_14"      "YR_2001_15"     
[19] "YR_2001_16"      "YR_2001_17"      "YR_2001_18"     
[22] "YR_2001_19"      "YR_2001_20"      "YR_2001_21"     
[25] "YR_2001_22"      "YR_2001_23"      "YR_2001_24"     
[28] "YR_2001_25"      "YR_2001_26"      "YR_2001_27"     
[31] "YR_2001_28"      "YR_2001_29"      "YR_2001_30"     
[34] "YR_2001_31"      "YR_2001_32"      "YR_2001_33"     
[37] "YR_2001_34"      "YR_2001_35"      "YR_2001_36"     
[40] "YR_2001_37"      "YR_2001_38"      "YR_2001_39"     
[43] "YR_2001_40"      "YR_2001_41"      "YR_2001_42"     
[46] "YR_2001_43"      "YR_2001_44"      "YR_2001_45"     
[49] "YR_2001_46"      "YR_2001_47"      "YR_2001_48"     
[52] "YR_2001_49"      "YR_2001_50"      "YR_2001_51"     
[55] "YR_2001_52"      "YR_2001_53"      "YR_2001_54"     
[58] "YR_2001_55"      "YR_2001_56"      "YR_2001_57"     
[61] "YR_2001_58"      "YR_2001_59"      "YR_2001_60"     
[64] "YR_2001_61"      "YR_2001_62"      "YR_2001_63"     
[67] "YR_2001_64"      "YR_2001_65"      "YR_2001_66"     
[70] "YR_2001_67"      "YR_2001_68"      "YR_2001_69"     
[73] "YR_2001_70"      "YR_2001_71"      "YR_2001_72"     
[76] "YR_2001_73"      "YR_2001_74"      "YR_2001_75_79"  
[79] "YR_2001_80_84"   "YR_2001_85_89"   "YR_2001_90_94"  
[82] "YR_2001_95_99"   "YR_2001_100plus"

You can now apply these to the columns of POP_2001

# Add the new column names
colnames(POP_2001) <- new_column_names

This will have created a data frame containing the 2001 population data that will look like…

# Show first 6 rows and 10 columns
POP_2001[1:6, 1:10]
          LSOA YR_2001_TotPop YR_2001_Less1 YR_2001_1 YR_2001_2 YR_2001_3
1746 E01006511           1026             7        10         8         4
1747 E01006512           1454             8        11        10         9
1748 E01006513           2077             4         5         6         3
1749 E01006514           1432             8         7        10         7
1750 E01006515           1156             6        15        12        12
1751 E01006516           1412            12        10        12         8
     YR_2001_4 YR_2001_5 YR_2001_6 YR_2001_7
1746         9         5         4         7
1747        10         8        13         7
1748         3         6         0         5
1749         8         6         6         5
1750        11        14        10         8
1751         7         7         4         4

2010 Population Estimates

The ONS produce annual population estimates from survey and administrate data. For more information on the methodology used in their calculation and to download the data visit the website: http://www.ons.gov.uk/ons/publications/re-reference-tables.html?edition=tcm%3A77-230902. Download and unzip the LSOA files for 2010 into you working directory.

Although it is also possible to read an Excel file into R directly, this method is a little tricky to implement under windows (however very easy in Apple OSX or Linux). As such, we will open the file in Excel, manipulate it, save as a CSV, and then imported using the previously used method.

  1. Go to the sheet “Mid-2010 Persons” and delete column B (i.e. LSOA Name) and rows 1,2 and 3
  2. Rename the columns:
  3. Highlight all the numeric values on the spreadsheet, right click and from the menu select “Format Cells…”. On the dialog box that opens, click on the “Number” tab, then specify “Category” as “Number”. Use the “Decimal places:” control to reduce the number to 0. Also ensure that the box which states “Use 1000 separator (,)” is NOT ticked. Then click “OK”. This changes the formatting of the numbers and specifically removes the commas which will cause problems when you later import the exported file into R.

  4. Save the spreadsheet as a CSV file by pressing the round “Office” button on the top left hand side of Excel and click “Save As” from the menu. In the dialog box navigate to your working directory. Click the “Save as type:” drop down box and choose “CSV (MS-DOS) (*.csv)”. Specify the file name as “pop10.csv” and click the “Save” button. You can now read this into R.

POP_2010 <- read.csv("pop10.csv", header = TRUE)

Some more manipulation…

Later on we are going to merge the 2001 and 2011 data together into a new data frame. However, before doing this, we first need to create a new version of the 2001 data with the same age bands as the 2011 data. You need to create new columns in the POP_2001 data for: YR_2001_0_15 YR_2001_16_29 YR_2001_30_44 YR_2001_45_64 YR_2001_65plus - the numbers refer to the years and are the same ranges featured in the 2010 data. There are a number of ways to achieve this.

# Add manually
POP_2001$YR_2001_0_15 <- POP_2001$YR_2001_Less1 + POP_2001$YR_2001_1 + POP_2001$YR_2001_2 + 
    POP_2001$YR_2001_3 + POP_2001$YR_2001_4 + POP_2001$YR_2001_5 + POP_2001$YR_2001_6 + 
    POP_2001$YR_2001_7 + POP_2001$YR_2001_8 + POP_2001$YR_2001_9 + POP_2001$YR_2001_10 + 
    POP_2001$YR_2001_11 + POP_2001$YR_2001_12 + POP_2001$YR_2001_13 + POP_2001$YR_2001_14 + 
    POP_2001$YR_2001_15
# Add as a range
POP_2001$YR_2001_16_29 <- rowSums(POP_2001[, 19:32])

The above code has created the aggregations for YR_2001_0_15 and YR_2001_16_29. You now need to add the following new columns to POP_2001 data frame and with the appropriate age group aggregations:

Next, create a subset of the 2001 data called POP_2001_V2 that contains the LSOA code, the total population and the newly created variables (YR_2001_0_15 YR_2001_16_29 YR_2001_30_44 YR_2001_45_64 and YR_2001_65plus)

The top 6 rows of your new data frame should look like:

head(POP_2001_V2)
          LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1746 E01006511           1026          120           269           245
1747 E01006512           1454          124           930           244
1748 E01006513           2077           87          1685           149
1749 E01006514           1432          105           678           358
1750 E01006515           1156          212           253           228
1751 E01006516           1412          103           597           386
     YR_2001_45_64 YR_2001_65plus
1746           246            146
1747           116             40
1748            95             61
1749           188            103
1750           252            211
1751           216            110

Joining the 2001 and 2010 data and calculating rates of change.

You will now have two population data frames; the first for 2001 which contains all LSOA in Liverpool (291 rows), and a second which contains all LSOA in England (34,378 rows). You might want to check if your data frame objects have the same number of rows at this stage.

nrow(POP_2010)
[1] 34378
nrow(POP_2001_V2)
[1] 291

You now need to merge (see your earlier code!!!) the 2010 data with the 2001 data so that the new data frame object that you create will just contain the records for Liverpool (ie. 291). Call this new data frame POP_2001_2010_Liverpool.

If you have done this correctly, your new data frame should have 291 rows and the top six rows should look like…

# Number of rows
nrow(POP_2001_2010_Liverpool)
[1] 291
# Show top 6 records
head(POP_2001_2010_Liverpool)
       LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1 E01006511           1026          120           269           245
2 E01006512           1454          124           930           244
3 E01006513           2077           87          1685           149
4 E01006514           1432          105           678           358
5 E01006515           1156          212           253           228
6 E01006516           1412          103           597           386
  YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15 YR_2010_16_29
1           246            146           2332          187          1072
2           116             40           2358          191          1607
3            95             61           2630           71          2187
4           188            103           1269          123           467
5           252            211           1058          142           307
6           216            110           1385           85           580
  YR_2010_30_44 YR_2010_45_64 YR_2010_65plus
1           579           332            162
2           333           193             34
3           202            97             73
4           280           278            121
5           232           224            153
6           342           282             96

You will now calculate both absolute and percentage change statistics between 2001 and 2010 for the different age bands. The following example illustrates this for the total population, however, you will need to complete this for each age band. Create new variables in the format shown;

# Total Population (TC - Total Change ; PC - Percent Change)
POP_2001_2010_Liverpool$TotPop_TC <- POP_2001_2010_Liverpool$YR_2010_TotPop - 
    POP_2001_2010_Liverpool$YR_2001_TotPop
POP_2001_2010_Liverpool$TotPop_PC <- (POP_2001_2010_Liverpool$YR_2010_TotPop - 
    POP_2001_2010_Liverpool$YR_2001_TotPop)/POP_2001_2010_Liverpool$YR_2001_TotPop * 
    100

# 0-15 Years (TC - Total Change ; PC - Percent Change)
POP_2001_2010_Liverpool$YR_0_15_TC <- POP_2001_2010_Liverpool$YR_2010_0_15 - 
    POP_2001_2010_Liverpool$YR_2001_0_15
POP_2001_2010_Liverpool$YR_0_15_PC <- (POP_2001_2010_Liverpool$YR_2010_0_15 - 
    POP_2001_2010_Liverpool$YR_2001_0_15)/POP_2001_2010_Liverpool$YR_2001_0_15 * 
    100

You now need to complete this for the other age bands, leaving you with a final data frame that looks like:

head(POP_2001_2010_Liverpool)
       LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1 E01006511           1026          120           269           245
2 E01006512           1454          124           930           244
3 E01006513           2077           87          1685           149
4 E01006514           1432          105           678           358
5 E01006515           1156          212           253           228
6 E01006516           1412          103           597           386
  YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15 YR_2010_16_29
1           246            146           2332          187          1072
2           116             40           2358          191          1607
3            95             61           2630           71          2187
4           188            103           1269          123           467
5           252            211           1058          142           307
6           216            110           1385           85           580
  YR_2010_30_44 YR_2010_45_64 YR_2010_65plus TotPop_TC TotPop_PC
1           579           332            162      1306   127.290
2           333           193             34       904    62.173
3           202            97             73       553    26.625
4           280           278            121      -163   -11.383
5           232           224            153       -98    -8.478
6           342           282             96       -27    -1.912
  YR_0_15_TC YR_0_15_PC YR_16_29_TC YR_16_29_PC YR_30_44_TC YR_30_44_PC
1         67      55.83         803     298.513         334     136.327
2         67      54.03         677      72.796          89      36.475
3        -16     -18.39         502      29.792          53      35.570
4         18      17.14        -211     -31.121         -78     -21.788
5        -70     -33.02          54      21.344           4       1.754
6        -18     -17.48         -17      -2.848         -44     -11.399
  YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1          86      34.959           16        10.96
2          77      66.379           -6       -15.00
3           2       2.105           12        19.67
4          90      47.872           18        17.48
5         -28     -11.111          -58       -27.49
6          66      30.556          -14       -12.73

5 Preparation for Making Maps in R

The CT_2001_2011_Liverpool and POP_2001_2010_Liverpool data frames will now be visualized on a choropleth map so you can explore areas of change. In order to do this, you will first need to read some polygon files into R.

Downloading appropriate shapefiles

  1. Visit the website http://edina.ac.uk/ukborders/ where you will need to enter you University of Liverpool username and password. Click the “Boundary Data Selector” link. Select “England”, “Census boundaries” and “post 1999” from the menus and then click the find button.
  2. From the generated list click “English Super Output Areas (Lower Layer),2001 [within Counties]”; then on the sub list select “Merseyside” and click the expand button. On the new list select “Liverpool[Merseyside]” and click the “Extract Boundary Data” button.
  3. On the following page, download the zip file, unzip this, and place the contents in a new directory inside your working directory called “LSOA”

When we talk about Shapefiles, we are actually referring to a number of files - each of these files has a different purpose, e.g. holding the geometry, storing the data or specifying the projection.

Getting the Shapefiles into R

Up until this point, all the functions you have been using are contained in the base R installation - i.e. they are present by default when you load R. However, one of the really great things about R is that these functions can be expanded by loading new packages that are created by the open source community. These packages add lots of exiting new functions to R.

We are going to now load “maptools” and “rgeos” which provides functions for reading and manipulating GIS data. If you remember from the last practical, the additional parameters to the standard install.packages() and library() relate to the packages being installed on a managed system where you do not have write access privileges to the main R library which is stored on the hard disk.

# Choose the CRAN mirror where the packages are going to be downloaded
# from
options(repos = c(CRAN = "http://www.stats.bris.ac.uk/R/"))
# Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())
# Download / install and then load maptools
install.packages("maptools", depend = TRUE, lib = getwd())
library(maptools, lib.loc = getwd())
# Download / install and then load rgeos
install.packages("rgeos", depend = TRUE, lib = getwd())
install.packages("stringr", depend = TRUE, lib = getwd())
library("stringr", lib.loc = getwd())
library("rgeos", lib.loc = getwd())

We will now read in the LSOA data into a new type of object called a SpatialPolygonsDataFrame by supplying it with the location of the downloaded shapefiles.

LSOA <- readShapePoly("LSOA/england_low_soa_2001.shp")

We can see what a SpatialPolygonsDataFrame looks like using a simple plot() function. You might notice that the polygons nearest the River Mersey are quite large. The reason for this is that the official boundaries extend into the water - however, aesthetically, these are not that appealing. We can however tidy these up before we start mapping.

# A basic plot of the OA SpatialPolygonsDataFrame
plot(LSOA)

plot of chunk unnamed-chunk-56

As such: 1. Visit the UK Borders website again and download the shapefiles for the “English Outline 2011” which can be found in the Easy Download section. Choose the “Clipped and Generalized” shapefile version. 2. Read this into R as a new SpatialPolygonsDataFrame called “outline”

# Read in outline as a SpatialPolygonsDataFrame
outline <- readShapePoly("Outline/England_ol_2011_gen_clipped.shp")
# This looks like
plot(outline)

plot of chunk unnamed-chunk-57

By default, when you call the plot() function, this overwrites the previous plot. However, there is a way that you can specify plots to be cumulative - i.e. add as a new layer. We do this in the following example to add the newly downloaded coastal outline on top of the LSOA by including add=TRUE. We have also made the border a red colour (border=“red”), and the fill colour (col=“#2C7FB820”) a shade of blue. These represent two ways of specifying colours. The second contains eight alphanumerics, the first six relate to a HEX colour code. To view various colours which can be used in R, have a look here. The final two characters are the level of transparency (in this case 20%).

# Plot the LSOA Map
plot(LSOA)
# Overplot the outline map
plot(outline, add = TRUE, border = "red", col = "#2C7FB820")

plot of chunk unnamed-chunk-58

You will also note on the above plot that the full extent of the English coastal outline is not shown. This is because the plot window is initiated with the extent of the first plot (i.e. Liverpool). You could reverse the above commands (remembering to remove add=TRUE if outline is the first plot) to illustrate this effect.

SpatialPolygonsDataFrame objects are a little more complex than the standard data frame, and comprise of a number of different elements referred to as “slots”. We can view the slots of an object using the slotNames() function which which will print a list of all the available slots.

slotNames(LSOA)
[1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"

The only slot we really need to worry about for this exercise is “data” which contains all the attribute data for each of the zones (i.e. the attributes contained in the DBF of the shapefile). We access this using an @ symbol, and this can then be treated in very similar way to a standard data frame. For example, we can look at the first 6 rows in the LSOA data slot.

head(LSOA@data)
   gid geonorth popnorth geoeast           name         label popeast
0  458   381722   383625  342348 Liverpool 058B 04BYE01006739  342928
1 4350   384390   385002  338967 Liverpool 056D 04BYE01006687  339051
2 4351   383094   383388  343364 Liverpool 058C 04BYE01006741  343305
3 4352   384125   384509  341524 Liverpool 058D 04BYE01006743  340956
4 4370   385573   386586  337323 Liverpool 052C 04BYE01006528  337495
5 4371   385004   385692  338248 Liverpool 056B 04BYE01006684  338794
   zonecode
0 E01006739
1 E01006687
2 E01006741
3 E01006743
4 E01006528
5 E01006684

We will now use the gIntersection() function from the rgeos package to remove the elements of the LSOA polygons that overlap the River Mersey - this will take quite a bit of time…

# First we create a new character vector of the area identifiers - these
# are assigned to a new object called my_area_id
my_area_id <- as.character(LSOA@data$zonecode)
# Create the clipped polygon with each area linked to an area identifier
# from the list you just created. This removes the data slot from the
# SpatialPolygonsDataFrame, hence why we need to specify the area IDs
# using id=. LSOA becomes a SpatialPolygons object - however, it contains
# an ID slot with the values of my_area_id
LSOA <- gIntersection(LSOA, outline, byid = TRUE, id = my_area_id)
# Get the ID values from the LSOA spatial polygon
data_IDs <- getSpPPolygonsIDSlots(LSOA)
# Convert into a dataframe
data_IDs <- as.data.frame(data_IDs)
# We also need to set the row names to be the same as the area ids - this
# is required to create a SpatialPolygonsDataFrame
rownames(data_IDs) <- data_IDs$data_IDs
# Join the ID values back onto the spatial polygon
LSOA <- SpatialPolygonsDataFrame(LSOA, data_IDs)
# Rename LSOA codes
colnames(LSOA@data) <- "zonecode"

Next we will join the council tax data onto the the LSOA spatial polygons data frame. This has to be handled in a slightly different way to how you used the merge statement earlier.

Because we know that both the council tax data frame, and the spatial data frame have the same number of zones (i.e. were both limited to just Liverpool) we can sort them alphabetically by their area ID, thus enabling us to just line the data up side by side.

# Order the LSOA spatial polygon data frame
LSOA <- LSOA[order(LSOA$zonecode), ]
# Order the CT_2001_2011_Liverpool data frame
CT_2001_2011_Liverpool <- CT_2001_2011_Liverpool[order(CT_2001_2011_Liverpool$LSOA_CODE), 
    ]

If you then look at the first few rows of the above two objects, you will see by looking at their area identifiers that they are now in the same order.

head(LSOA@data)
           zonecode
E01006511 E01006511
E01006512 E01006512
E01006513 E01006513
E01006514 E01006514
E01006515 E01006515
E01006516 E01006516
head(CT_2001_2011_Liverpool)
  LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count CTAX01_BandA_Pct
1 E01006511                540                232            42.96
2 E01006512                627                524            83.57
3 E01006513                579                264            45.60
4 E01006514                972                657            67.59
5 E01006515                826                761            92.13
6 E01006516               1212               1092            90.10
  CTAX01_BandB_Count CTAX01_BandB_Pct CTAX01_BandC_Count CTAX01_BandC_Pct
1                 58            10.74                 22             4.07
2                 42             6.70                 11             1.75
3                227            39.21                 31             5.35
4                197            20.27                 86             8.85
5                 64             7.75                  1             0.12
6                 75             6.19                 42             3.47
  CTAX01_BandD_Count CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
1                 61            11.30                125            23.15
2                 49             7.81                  1             0.16
3                 12             2.07                 39             6.74
4                 19             1.95                 10             1.03
5                  0             0.00                  0             0.00
6                  2             0.17                  0             0.00
  CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count CTAX01_BandG_Pct
1                 16             2.96                 24             4.44
2                  0             0.00                  0             0.00
3                  4             0.69                  1             0.17
4                  1             0.10                  2             0.21
5                  0             0.00                  0             0.00
6                  1             0.08                  0             0.00
  CTAX01_BandH_Count CTAX01_BandH_Pct CTAX11_Total_Count
1                  0             0.00               2671
2                  0             0.00                792
3                  1             0.17               1178
4                  0             0.00               1014
5                  0             0.00                673
6                  0             0.00               1237
  CTAX11_BandA_Count CTAX11_BandA_Pct CTAX11_BandB_Count CTAX11_BandB_Pct
1                469            17.56                760            28.45
2                665            83.96                 76             9.60
3                344            29.20                429            36.42
4                624            61.54                212            20.91
5                578            85.88                 83            12.33
6               1020            82.46                127            10.27
  CTAX11_BandC_Count CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
1               1020            38.19                233             8.72
2                  1             0.13                 49             6.19
3                320            27.16                 36             3.06
4                133            13.12                 28             2.76
5                 12             1.78                  0             0.00
6                 83             6.71                  4             0.32
  CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count CTAX11_BandF_Pct
1                143             5.35                 21             0.79
2                  1             0.13                  0             0.00
3                 46             3.90                  1             0.08
4                 14             1.38                  0             0.00
5                  0             0.00                  0             0.00
6                  3             0.24                  0             0.00
  CTAX11_BandG_Count CTAX11_BandG_Pct CTAX11_BandH_Count CTAX11_BandH_Pct
1                 25             0.94                  0             0.00
2                  0             0.00                  0             0.00
3                  1             0.08                  1             0.08
4                  2             0.20                  1             0.10
5                  0             0.00                  0             0.00
6                  0             0.00                  0             0.00
  Ch_Total Ch_BandA Ch_BandB Ch_BandC Ch_BandD Ch_BandE Ch_BandF Ch_BandG
1     2131      237      702      998      172       18        5        1
2      165      141       34      -10        0        0        0        0
3      599       80      202      289       24        7       -3        0
4       42      -33       15       47        9        4       -1        0
5     -153     -183       19       11        0        0        0        0
6       25      -72       52       41        2        3       -1        0
  Ch_BandH
1        0
2        0
3        0
4        1
5        0
6        0

You now need to re-order the POP_2001_2010_Liverpool objects. If this has been done correctly, it should look like…

head(POP_2001_2010_Liverpool)
       LSOA YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
1 E01006511           1026          120           269           245
2 E01006512           1454          124           930           244
3 E01006513           2077           87          1685           149
4 E01006514           1432          105           678           358
5 E01006515           1156          212           253           228
6 E01006516           1412          103           597           386
  YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15 YR_2010_16_29
1           246            146           2332          187          1072
2           116             40           2358          191          1607
3            95             61           2630           71          2187
4           188            103           1269          123           467
5           252            211           1058          142           307
6           216            110           1385           85           580
  YR_2010_30_44 YR_2010_45_64 YR_2010_65plus TotPop_TC TotPop_PC
1           579           332            162      1306   127.290
2           333           193             34       904    62.173
3           202            97             73       553    26.625
4           280           278            121      -163   -11.383
5           232           224            153       -98    -8.478
6           342           282             96       -27    -1.912
  YR_0_15_TC YR_0_15_PC YR_16_29_TC YR_16_29_PC YR_30_44_TC YR_30_44_PC
1         67      55.83         803     298.513         334     136.327
2         67      54.03         677      72.796          89      36.475
3        -16     -18.39         502      29.792          53      35.570
4         18      17.14        -211     -31.121         -78     -21.788
5        -70     -33.02          54      21.344           4       1.754
6        -18     -17.48         -17      -2.848         -44     -11.399
  YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1          86      34.959           16        10.96
2          77      66.379           -6       -15.00
3           2       2.105           12        19.67
4          90      47.872           18        17.48
5         -28     -11.111          -58       -27.49
6          66      30.556          -14       -12.73

We will now “column bind” the council tax and population data onto the data slot of the LSOA SpatialPolygonsDataFrame. You can do this using the cbind() function. The data objects are separated by commas. These are assigned (over written) onto the data slot of the SpatialPolygonsDataFrame. If you look at LSOA@data after running this code, you will see that the area codes are now lined up, and that the council tax and population data are present.

LSOA@data <- cbind(LSOA@data, CT_2001_2011_Liverpool, POP_2001_2010_Liverpool)

If this has been completed correctly, the LSOA@data should look like…

head(LSOA@data)
           zonecode LSOA_CODE CTAX01_Total_Count CTAX01_BandA_Count
E01006511 E01006511 E01006511                540                232
E01006512 E01006512 E01006512                627                524
E01006513 E01006513 E01006513                579                264
E01006514 E01006514 E01006514                972                657
E01006515 E01006515 E01006515                826                761
E01006516 E01006516 E01006516               1212               1092
          CTAX01_BandA_Pct CTAX01_BandB_Count CTAX01_BandB_Pct
E01006511            42.96                 58            10.74
E01006512            83.57                 42             6.70
E01006513            45.60                227            39.21
E01006514            67.59                197            20.27
E01006515            92.13                 64             7.75
E01006516            90.10                 75             6.19
          CTAX01_BandC_Count CTAX01_BandC_Pct CTAX01_BandD_Count
E01006511                 22             4.07                 61
E01006512                 11             1.75                 49
E01006513                 31             5.35                 12
E01006514                 86             8.85                 19
E01006515                  1             0.12                  0
E01006516                 42             3.47                  2
          CTAX01_BandD_Pct CTAX01_BandE_Count CTAX01_BandE_Pct
E01006511            11.30                125            23.15
E01006512             7.81                  1             0.16
E01006513             2.07                 39             6.74
E01006514             1.95                 10             1.03
E01006515             0.00                  0             0.00
E01006516             0.17                  0             0.00
          CTAX01_BandF_Count CTAX01_BandF_Pct CTAX01_BandG_Count
E01006511                 16             2.96                 24
E01006512                  0             0.00                  0
E01006513                  4             0.69                  1
E01006514                  1             0.10                  2
E01006515                  0             0.00                  0
E01006516                  1             0.08                  0
          CTAX01_BandG_Pct CTAX01_BandH_Count CTAX01_BandH_Pct
E01006511             4.44                  0             0.00
E01006512             0.00                  0             0.00
E01006513             0.17                  1             0.17
E01006514             0.21                  0             0.00
E01006515             0.00                  0             0.00
E01006516             0.00                  0             0.00
          CTAX11_Total_Count CTAX11_BandA_Count CTAX11_BandA_Pct
E01006511               2671                469            17.56
E01006512                792                665            83.96
E01006513               1178                344            29.20
E01006514               1014                624            61.54
E01006515                673                578            85.88
E01006516               1237               1020            82.46
          CTAX11_BandB_Count CTAX11_BandB_Pct CTAX11_BandC_Count
E01006511                760            28.45               1020
E01006512                 76             9.60                  1
E01006513                429            36.42                320
E01006514                212            20.91                133
E01006515                 83            12.33                 12
E01006516                127            10.27                 83
          CTAX11_BandC_Pct CTAX11_BandD_Count CTAX11_BandD_Pct
E01006511            38.19                233             8.72
E01006512             0.13                 49             6.19
E01006513            27.16                 36             3.06
E01006514            13.12                 28             2.76
E01006515             1.78                  0             0.00
E01006516             6.71                  4             0.32
          CTAX11_BandE_Count CTAX11_BandE_Pct CTAX11_BandF_Count
E01006511                143             5.35                 21
E01006512                  1             0.13                  0
E01006513                 46             3.90                  1
E01006514                 14             1.38                  0
E01006515                  0             0.00                  0
E01006516                  3             0.24                  0
          CTAX11_BandF_Pct CTAX11_BandG_Count CTAX11_BandG_Pct
E01006511             0.79                 25             0.94
E01006512             0.00                  0             0.00
E01006513             0.08                  1             0.08
E01006514             0.00                  2             0.20
E01006515             0.00                  0             0.00
E01006516             0.00                  0             0.00
          CTAX11_BandH_Count CTAX11_BandH_Pct Ch_Total Ch_BandA Ch_BandB
E01006511                  0             0.00     2131      237      702
E01006512                  0             0.00      165      141       34
E01006513                  1             0.08      599       80      202
E01006514                  1             0.10       42      -33       15
E01006515                  0             0.00     -153     -183       19
E01006516                  0             0.00       25      -72       52
          Ch_BandC Ch_BandD Ch_BandE Ch_BandF Ch_BandG Ch_BandH      LSOA
E01006511      998      172       18        5        1        0 E01006511
E01006512      -10        0        0        0        0        0 E01006512
E01006513      289       24        7       -3        0        0 E01006513
E01006514       47        9        4       -1        0        1 E01006514
E01006515       11        0        0        0        0        0 E01006515
E01006516       41        2        3       -1        0        0 E01006516
          YR_2001_TotPop YR_2001_0_15 YR_2001_16_29 YR_2001_30_44
E01006511           1026          120           269           245
E01006512           1454          124           930           244
E01006513           2077           87          1685           149
E01006514           1432          105           678           358
E01006515           1156          212           253           228
E01006516           1412          103           597           386
          YR_2001_45_64 YR_2001_65plus YR_2010_TotPop YR_2010_0_15
E01006511           246            146           2332          187
E01006512           116             40           2358          191
E01006513            95             61           2630           71
E01006514           188            103           1269          123
E01006515           252            211           1058          142
E01006516           216            110           1385           85
          YR_2010_16_29 YR_2010_30_44 YR_2010_45_64 YR_2010_65plus
E01006511          1072           579           332            162
E01006512          1607           333           193             34
E01006513          2187           202            97             73
E01006514           467           280           278            121
E01006515           307           232           224            153
E01006516           580           342           282             96
          TotPop_TC TotPop_PC YR_0_15_TC YR_0_15_PC YR_16_29_TC
E01006511      1306   127.290         67      55.83         803
E01006512       904    62.173         67      54.03         677
E01006513       553    26.625        -16     -18.39         502
E01006514      -163   -11.383         18      17.14        -211
E01006515       -98    -8.478        -70     -33.02          54
E01006516       -27    -1.912        -18     -17.48         -17
          YR_16_29_PC YR_30_44_TC YR_30_44_PC YR_45_64_TC YR_45_64_PC
E01006511     298.513         334     136.327          86      34.959
E01006512      72.796          89      36.475          77      66.379
E01006513      29.792          53      35.570           2       2.105
E01006514     -31.121         -78     -21.788          90      47.872
E01006515      21.344           4       1.754         -28     -11.111
E01006516      -2.848         -44     -11.399          66      30.556
          YR_65plus_TC YR_65plus_PC
E01006511           16        10.96
E01006512           -6       -15.00
E01006513           12        19.67
E01006514           18        17.48
E01006515          -58       -27.49
E01006516          -14       -12.73

6 Making Maps in R

OK, the data is now ready so we are now able to make some maps! There are many different ways of doing this, however, we will use one of the simplest, and outline each step as we go along. In this example, we will make a map of the change in total population.

We need to install or load some more packages containing functions that will help us make some maps.

# Download / install and then load classInt
install.packages("classInt", depend = TRUE, lib = getwd())
library("classInt", lib.loc = getwd())
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())

The first step is to find suitable breakpoints for the data contained in the TotPop_PC column. The contiuous data needs to be assigned into categories so different colours can be applied on a choropleth map. There are numerous ways of doing this such as jenks, standard deviations or equal intervals. In this example we use function classIntervals from the classInt package to find the ranges needed to divide the the TotPop_PC into six categories.

# set breaks for 6 categories and then add the values to a single list -
# style = fisher is jenks
breaks <- classIntervals(LSOA@data$TotPop_PC, n = 6, style = "fisher")

# classIntervals returns an object with a number of different slots -
# however, we only need the break values; thus, we overwrite the object
# breaks as follows
breaks <- breaks$brks

We will now use another package called RColorBrewer which provides a series of colour pallets that are suitable for mapping. You can have a look at the colour pallets online: http://colorbrewer2.org. Each of these pallets are named; and you can see all the available pallets as follows…

display.brewer.all()

plot of chunk unnamed-chunk-70

We will now select a set of six colours from one of these pallets into a new list object called my_colours. If you print this object, you will see it contains six HEX codes for colours within the selected pallet.

# Select six colours from the pallet YlOrRd
my_colours <- brewer.pal(6, "YlOrRd")
# Print the colour codes to the terminal
my_colours
[1] "#FFFFB2" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#BD0026"

Another function called findInterval() can be used with the “breaks” object created earlier to divide the TotPop_PC column into aggregate groups. To view the intervals, just print the breaks object…

breaks
[1] -79.007 -44.874 -12.964   4.688  30.600  73.195 127.698

Before we use these breaks to create a chropleth map, let us first have a look to see what the findInterval() function actually does.

The following code returns a list of numbers. Each number represents a row in the LSOS@data TotPop_PC column - i.e. there will be the same number of values as there are rows in LSOA@data.

The function has examined the values of the TotPop_PC column against the break points that you have requested (stored in the “breaks” object); and returned these as a series of values (1-6) - i.e. break point 1 gets the number 1 etc. The all.inside=TRUE is an optional parameter to ensure that all values are assessed correctly.

# print the intervals
findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)
  [1] 6 5 4 3 3 3 5 4 3 3 3 3 3 4 3 3 3 4 3 3 3 3 3 3 3 3 5 3 3 3 3 2 3 3 3
 [36] 3 3 3 4 4 4 4 4 3 3 4 3 2 1 3 1 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 4 4 3 3
 [71] 3 3 4 3 3 4 4 3 3 3 3 3 3 3 3 3 3 2 2 3 2 1 3 3 3 3 4 3 3 3 3 3 3 3 3
[106] 3 3 3 3 3 3 2 3 3 3 3 3 4 3 3 3 3 3 5 3 3 4 2 3 4 5 2 3 4 3 4 3 4 6 6
[141] 4 3 3 4 4 3 3 3 4 3 3 4 2 4 4 3 3 4 4 3 3 4 4 4 3 4 2 4 4 3 3 3 3 5 3
[176] 3 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 3 3 2 3 3 3 4 3 3 3 3 3 3 3 4 3 3 3 5
[211] 4 3 3 2 3 4 2 3 4 3 3 3 4 3 3 3 3 3 3 2 3 3 5 3 4 3 6 3 1 2 6 2 3 3 3
[246] 4 3 3 2 4 3 4 3 3 3 3 4 3 3 2 3 3 3 3 3 4 2 3 3 3 4 3 3 3 4 4 3 4 3 3
[281] 3 3 4 3 3 3 3 3 4 2 3

We can combine this list of 1-6 numbers to select specific colours from the the my_colours object you created earlier.

By using the square brackets we can use the output of the findInterval() function to select items from the my_colours list. E.g., a number 1 would select the first colour, a 2 the second colour etc.

# Returns a list of colours
my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)]
  [1] "#BD0026" "#F03B20" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
  [8] "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
 [15] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
 [22] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C"
 [29] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
 [36] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FD8D3C"
 [43] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FED976" "#FFFFB2"
 [50] "#FEB24C" "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [57] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [64] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
 [71] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C"
 [78] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [85] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FED976" "#FEB24C" "#FED976"
 [92] "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
 [99] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[106] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976"
[113] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[120] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C" "#FEB24C"
[127] "#FD8D3C" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#FED976" "#FEB24C"
[134] "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#BD0026" "#BD0026"
[141] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[148] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[155] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[162] "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[169] "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C"
[176] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[183] "#FEB24C" "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
[190] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C"
[197] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[204] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
[211] "#FD8D3C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FD8D3C" "#FED976"
[218] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[225] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C"
[232] "#FEB24C" "#F03B20" "#FEB24C" "#FD8D3C" "#FEB24C" "#BD0026" "#FEB24C"
[239] "#FFFFB2" "#FED976" "#BD0026" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[246] "#FD8D3C" "#FEB24C" "#FEB24C" "#FED976" "#FD8D3C" "#FEB24C" "#FD8D3C"
[253] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[260] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
[267] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[274] "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[281] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[288] "#FEB24C" "#FD8D3C" "#FED976" "#FEB24C"

We can then pull all this together to create a basic chropleth map using the plot() function. The first attribute is the SpatialPolygonsDataFrame, followed by col= which is how we want to colour the polygons. If we specified a single colour, e.g. “#FEB24C”, all the polygons would be the same colour.

However, because we use the findInterval() function to generate a list of colours, each polygon is coloured by an appropriate colour associated with area's TotPop_PC value.

The other attributes in the plot() function are axes=FALSE to remove axis from the plot, and border = NA to remove borders from around the polygons.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)

plot of chunk unnamed-chunk-75

You can see what the axes and border function do if you run the following code, setting the axes to TRUE, and the border to a grey colour (#6B6B6B).

# Create a basic choropleth map with borders and axis
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = TRUE, border = "#6B6B6B")

plot of chunk unnamed-chunk-76

You see that your previous plot has been replaced by a new plot with the LSOA borders coloured in grey, and the axis now visible - the scale on the axis are meters and relate to the projection of the input shapefile (OSGB - British National Grid).

As noted earlier, we can also plot extra data on top of the original chropleth. This is useful for adding additional context to maps.

Visit the UK Borders website again and use the boundary data selector to download the English Census Area Statistic Wards for Liverpool. If you put these into a new folder in your working directory called CASWARD, we will quickly run through the steps to prepare these data by clipping them to the coastline (i.e. so they match the LSOA).

# Read in the CASWARD Data
CASWARD <- readShapePoly("CASWARD/england_caswa_2001.shp")
# Get a set of ids for each of the polygons
my_area_id <- as.character(CASWARD@data$name)
# Clip the CASWARD to the coast outline
CASWARD <- gIntersection(CASWARD, outline, byid = TRUE, id = my_area_id)
# Get the ID values from the CASWARD spatial polygon
data_IDs <- getSpPPolygonsIDSlots(CASWARD)
# Convert into a dataframe
data_IDs <- as.data.frame(data_IDs)
# Set the row names to be the same as the area ids
rownames(data_IDs) <- data_IDs$data_IDs
# Join the ID values back onto the spatial polygon
CASWARD <- SpatialPolygonsDataFrame(CASWARD, data_IDs)
# Rename LSOA codes
colnames(CASWARD@data) <- "wards"

We can now plot the basic choropleth again, however, this time we will add a second layer showing the ward outlines. In the second plot, we simply specify a grey border colour, and add another parameter called add=TRUE. This tells the plot window to keep the previous result, and add the new plot on top.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)

plot of chunk unnamed-chunk-78

We now have a pretty convincing map. However, there are a few things missing. The first of these are labels for the ward zones. We can use the coordinates() function to extract the central location of each ward polygon, and use this with the names of the ward which are stored in the data slot.

# Print the top six results of using coordinates()
head(coordinates(CASWARD))
              [,1]   [,2]
St. Mary's  342161 383312
Dingle      336113 387301
Grassendale 339428 385775
Allerton    341695 385904
Broadgreen  340738 391460
Gillmoss    340764 395445

We can use the text() function to add these text onto the map. The first parameter are the x coordinates, the second the y, and the third the text we want to use as the labels. The final cex parameter scales the text smaller.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards, 
    cex = 0.7)

plot of chunk unnamed-chunk-80

The final items we need to add onto the map are a scale bar, a north arrow and a legend. A legend is added using the legend() function - the parameters used include the xy location, legend= - which are the text labels to use on the legend (leglabs and round are used to make these display correctly), fill= which is the object containing the fill colours you used to colour the LSOA choropleth, bty=“n” which turns off the border around the legend.

You might find you will need to adjust the location or size of the legend to get this to fit onto your plot correctly. For scaling, the cex= parameter will reduce or increase the size of the item. To find a new set of location coordinates, type locator() into the terminal and press enter. After doing this, when you hover over the plot, the mouse will turn into a cross. If you click, and then press escape, the location of the click is printed to the terminal - you can use these to re-position items in the plot.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards, 
    cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "), 
    fill = my_colours, bty = "n")

plot of chunk unnamed-chunk-81

The sp package which you loaded earlier contains a number of other functions that enable you to add useful items to the map, including a north arrow and scale bar. There are also base R functions for adding text and titles.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards, 
    cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "), 
    fill = my_colours, bty = "n")
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000, 
    plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000, 
    fill = c("white", "black"), plot.grid = F)

# Add text to scale bar
text(335379, 381606, "0km", cex = 0.8)
text(335379 + 1500, 381606, "1.5km", cex = 0.8)
text(335379 + 3000, 381606, "3km", cex = 0.8)
# Add a title
title("Total Population Change (%)")

plot of chunk unnamed-chunk-82

7 Exporting Maps

You have been creating all your plots within R, however, when preparing your reports you will need to export these maps. In order to do this, we wrap the code to create a map with png('rplot.jpg') and then dev.off(). This creates an image called rplot.png in your working directory. This code tells R is to write the results to a jpg rather than the plot window - thus, you won't see the output until you view the jpg. Another thing to remember is that the positioning of the items in the jpg may be a little different to the plot window - so you may have to adjust these and re-plot the image. For example…

# Write out to a png...
png("rplot.png", width = 800, height = 800, units = "px")

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CASWARD, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CASWARD)[, 1], coordinates(CASWARD)[, 2], labels = CASWARD@data$wards, 
    cex = 0.8)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "), 
    fill = my_colours, bty = "n", cex = 0.9)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000, 
    plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000, 
    fill = c("white", "black"), plot.grid = F)

# Add text to scale bar
text(335379, 381606, "0km", cex = 0.7)
text(335379 + 1500, 381606, "1.5km", cex = 0.7)
text(335379 + 3000, 381606, "3km", cex = 0.7)
# Add a title
title("Total Population Change (%)")

# Change output back to the plot window
dev.off()

8 Your Assignment

Introduction

The submitted work for this practical is a short report alongside a series of maps that you will generate in R. The total word count for the report is 500 words, and it is expected that you will include at least 5 maps that you have generated in R. You should make reference to the wider literature in your report and cite appropriately - including a formatted reference list (the list is not included in the word count). If you include data tables in your report, these also do not contribute to the total word count. Please do not include any appendix.

You are free to layout your report as you wish, however all maps or other graphics must be appropriately referenced in the text – e.g. see Figure 1 etc.

Data

If you have followed the tutorials in this practical you will have now created a SpatialPolygonsDataFrame for the Liverpool local authority that contains the following variables:

  1. Estimated percentage and total population change between 2001 and 2010:
  2. Total Population
  3. 0-15 Years Old
  4. 16-29 Years Old
  5. 30-44 Years Old
  6. 45-64 Years Old
  7. 65+ Years Old
  8. Council Tax Band Changes between 2001 and 2011

Report Details and Structure

With reference to the evidence you create from the above data, further data sources that you integrate independently into R, and wider reading, write a report that will address the following questions:

  1. With reference to your population and council tax band maps, identify the area of Liverpool with the highest population increase; discussing those potential processes that may have influenced these changes.

  2. With reference to your population and council tax band maps, identify the area of Liverpool with the highest population decrease; discussing those potential processes that may have influenced these changes.

Through this report you should build a narrative using maps (or any other data visualization you deem appropriate) to hypothesis about the possible causes and impacts of the change that you observe to have occurred in your study area since the 2001 Census. Your marks will be linked to both the quality and clarity of your answers to the above questions and the visualizations you create.

Tips…(how to get a better mark!)

  1. Look at the GIS mark scheme to get the basics right – e.g. north arrow, legend, scale bar etc
  2. Do not print out the R code in your report
  3. Ensure that your written text is clear and that the English is grammatically correct
  4. Include other visualization - R is very flexible and the web is full of reproducible examples
  5. Think about what changes in council tax bands actually indicate

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.