Detecting Neighbourhood Change

1 Aims and Objectives

In this project you will compile a number of measures that can be used to assess the extent that small areas have changed in their composition over a 10 year period. You will explore data about changes in both built structures and demographics, and through the representations you create, formulate plausible hypotheses about why the patterns that you uncover may have occurred.

Through this work you will also learn a series of fundamental skills for the input, manipulation, analysis and display of spatial data using the statistical programming language R.

At this stage it might be worth looking at the end of this practical for the assignment questions as you can keep these in mind as you run through the practical.

2 Learning Outcomes

Experience turning spatial data into information
Extracting raw spatial data from: UKBorders and Neighbourhood Statistics
Develop R skills to: import; join; perform calculations and aggregate data.
Derive metrics that are useful for detecting neighbourhood changes to population
Develop GIS skills to: import; create attribute joins; interrogate data; build effective cartography.
Develop critical appraisal skills for the use of indicators and representations derived from estimate data

3 Some preliminaries on reading and manipulating data

Before we begin the practical, there are a couple of things to learn…

Firstly, what is a CSV file, and how can we get them into R? These files will usually have the file extension “.csv” and are typically opened in a spreadsheet software package.

Supposing we want to create a data set that contains a list of people who have come along to an R training workshop. This might look something like this:

Name	Age	Place
John	20	Liverpool
Helen	10	London
Mia	16	Liverpool
Carl	21	Manchester

Create yourself a project directory in your storage area, and then Open R. We need to tell R that this is where you will be storing all your files by running the setwd() function.

You need to paste the location of your working directory into the setwd() function between the “”. Please note, that on Windows you need to make sure the slashes between the directories are formatted as either / or .

# For example...
setwd("M:/R work/Neighbourhood_Change/")

Open Notepad and create a new text file, entering the following details as separate lines, with each column in the table separated by a comma, and each row on a new line (please note that the first row contains our desired headings):

Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester

Remember to add an extra blank line at the end of the file. Save the text file as “test.csv” in the folder you created earlier in your storage area.

Next we will read in the CSV you created above.

# First create a new variable with the location of your csv file - you only
# need the file name as by default R looks in the working directory
file_location <- "test.csv"
# Next we will read this into a new data frame called test.
test <- read.csv(file_location, header = TRUE)
# Now print test
test

   Name Age      Place
1  John  20  Liverpool
2 Helen  10     London
3   Mia  16  Liverpool
4  Carl  21 Manchester

Next, duplicate test.csv in your working directory, renaming this to test2.csv and opening it in Notepad. Add the following to the top of the file: “Here is some example text which we will use to demonstrate how to skip lines” and on another line: “Here is another line of text” so it looks as follows:

Here is some example text which we will use to demonstrate how to skip lines
Here is another line of text
Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester

Save the file and close it.

Next we will import this new .csv into R

# Update the file_location object with the location of the new file
file_location <- "test2.csv"

We have to add an extra parameter to the read.csv function which specifies that the data we are interested in are located after line 2.

test2 <- read.csv(file_location, header = TRUE, skip = 2)

If you print the results of the object test2 (i.e. type test2 in to the terminal!) you will see this is now the same as test, because you skipped the two junk lines.

Next we will illustrate how we can create a subset of a data frame, add a column and calculate a new value. To create a subset of the test2 data frame, we can use the subset function - this can select certain rows, columns or a combination of the two.

For example…

# Select the name and age columns
subset(test, select = c("Name", "Age"))

   Name Age
1  John  20
2 Helen  10
3   Mia  16
4  Carl  21

# Select rows where the place is Liverpool
subset(test, Place == "Liverpool")

  Name Age     Place
1 John  20 Liverpool
3  Mia  16 Liverpool

# Select the name and age columns, and where Place is Liverpool
subset(test, Place == "Liverpool", select = c("Name", "Age"))

  Name Age
1 John  20
3  Mia  16

We will now repeat the last step, however store the results as a new object. You will see that nothing prints to the terminal.

# Select the name and age columns, however, only where Place is equal to
# Liverpool (in R we need to use a double equals sign)
test3 <- subset(test, Place == "Liverpool", select = c("Name", "Age"))

We can add a new column called “diff100” to the data frame object test3. We do this using the $ symbol. At first, we will make the values of the column NA (i.e. no value)

test3$diff100 <- NA

If you print test3, you will now see the new column which will contain the values NA. However, what we actually want to do is find out how different each of the ages are to 100 (i.e. 100 - age). In order to create this calculation we need to refer to the age column in the data frame. We can use the $ again to achieve this. For example…

test3$diff100 <- 100 - test3$Age
# print test3
test3

  Name Age diff100
1 John  20      80
3  Mia  16      84

When printing variables, you can use square brackets to specify either rows, columns or both. For example:

test3[1, 1]

[1] John
Levels: Carl Helen John Mia

test3[, 1]

[1] John Mia 
Levels: Carl Helen John Mia

Additionally, you can use colnames to give you a list of columns within a data frame:

colnames(test3)

[1] "Name"    "Age"     "diff100"

and you can use:

test3$diff100 <- NULL

to delete specific columns.

OK, using what you have learnt in this past section, you are now ready to play with some real data!

A couple of other important things! If you want to see the objects that you have created in the R workspace, you can use the ls() function

# List objects in the workspace
ls()

[1] "file_location" "test"          "test2"         "test3"

If you want to save your workspace - i.e. so you can open it up later on (like saving your work!); you can use the save.image function. This performs the same as clicking on File > Save Workspace.

# This will save all objects currently loaded in the workspace
save.image(file = "myfile.RData")

After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.

# Remember that you will need to set your working directory before trying to
# load - otherwise R will not know where to find you RData file
load("myfile.RData")

You can also load and save the workspace using the File menu at the top of the window.

One other command that may be useful to know is how to access the help files. If you want help on a specific function, type ?, then the function name:

?load

This will provide more information on the function. If you don't know what a function is called, type ??, then whatever it is that the function does:

??open

This will search the help files for the term you specified (open in this case).

4 Detecting Changes

Demographic Change

In this section we will examine the use of census data to generate statistics that enable you to detect and hypothesise about potential changes in neighbourhood structure occurring at small geographic scales. We will examine indicators of social change by comparing the 2001 Census data to population estimates derived by the Office for National Statistics.

2001 Census Data

The first stage is to download 2001 Census data. Although there are other sources of these data, in this example we will use the http://www.neighbourhood.statistics.gov.uk website.

Navigate to the site and select the “Topics” link on the bottom left hand side of the page. On the next screen, click on the “Census” link, and then click “2001 Census: Census Area Statistics”. Then click the radio button that corresponds to “Age, 2001 (UV04)” and press the “Next” button at the bottom right hand side of the page.

Then click the related radio button for “Download” and press the “Next” button on the bottom right hand side of the page. Select the radio button for the 2001 Statistical Geography Hierarchy (North West). Press the “Next” button on the bottom right hand side of the page, and on the following page, download the CSV file by clicking on the “Comma separated values [*.csv]”. This will be zipped, and once you have extracted the directory you will find it contains CSV files for a variety of different geographies. You are interested in the Lower Super Output Area level file which will be named “UV040301_92_GeoPolicy_NW_LSOA.CSV”. Copy this to the working directory you defined earlier.

If you open this file up in Excel you will again see that there are a series of lines at the top which do not contain the data or headings for the columns. As such, you need to:

Read this CSV into R, skipping the unwanted lines and setting the column headings as the row containing “LSOA_CODE”. Call the new data frame pop_2001.
Subset the data so only those LSOA in Liverpool LA are present.
Remove all columns apart from LSOA_CODE and those which contain numbers (i.e. DATA_VALUE columns)

This should then look like (note, only the first 6 rows and 10 columns are shown here):

# Using the square brackets we can view specific parts of a data frame-
# data[rows,columns]
pop_2001[1:6, 1:10]

     LSOA_CODE DATA_VALUE DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3
1746 E01006511       1026            7           10            8
1747 E01006512       1454            8           11           10
1748 E01006513       2077            4            5            6
1749 E01006514       1432            8            7           10
1750 E01006515       1156            6           15           12
1751 E01006516       1412           12           10           12
     DATA_VALUE.4 DATA_VALUE.5 DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8
1746            4            9            5            4            7
1747            9           10            8           13            7
1748            3            3            6            0            5
1749            7            8            6            6            5
1750           12           11           14           10            8
1751            8            7            7            4            4

You now need to rename the column headings of the data frame into something which are easier to interpret. There are obviously a lot of columns, and it would take a very long time to type out a list of vectors to use as the new names… However, there are some shortcuts. If you look at the source CSV file, you will see that the data columns represent: the total population, those under 1, then a series of columns for those aged 1 through to 74; and finally, those aged 75-79, 80-84, 85-89, 90-94, 95-99 and 100 or greater.

The main block of columns are the individual age bands; which we can create using the paste() function - this concatenates (or joins together) vectors (separated by a comma) together into characters. We also add sep="" to indicate that we do not want any spaces putting between the new concatenated elements of the list.

years <- paste("YR_2001_", 1:74, sep = "")

We now need to add the remaining elements to the list which we will do in two stages. One will be added before the elements of the “years” list (pre), and the other afterwards (post)

pre <- c("LSOA", "YR_2001_TotPop", "YR_2001_Less1")
post <- c("YR_2001_75_79", "YR_2001_80_84", "YR_2001_85_89", "YR_2001_90_94", 
    "YR_2001_95_99", "YR_2001_100plus")

# Print the lists to see what they look like
pre

[1] "LSOA"           "YR_2001_TotPop" "YR_2001_Less1"

years

 [1] "YR_2001_1"  "YR_2001_2"  "YR_2001_3"  "YR_2001_4"  "YR_2001_5" 
 [6] "YR_2001_6"  "YR_2001_7"  "YR_2001_8"  "YR_2001_9"  "YR_2001_10"
[11] "YR_2001_11" "YR_2001_12" "YR_2001_13" "YR_2001_14" "YR_2001_15"
[16] "YR_2001_16" "YR_2001_17" "YR_2001_18" "YR_2001_19" "YR_2001_20"
[21] "YR_2001_21" "YR_2001_22" "YR_2001_23" "YR_2001_24" "YR_2001_25"
[26] "YR_2001_26" "YR_2001_27" "YR_2001_28" "YR_2001_29" "YR_2001_30"
[31] "YR_2001_31" "YR_2001_32" "YR_2001_33" "YR_2001_34" "YR_2001_35"
[36] "YR_2001_36" "YR_2001_37" "YR_2001_38" "YR_2001_39" "YR_2001_40"
[41] "YR_2001_41" "YR_2001_42" "YR_2001_43" "YR_2001_44" "YR_2001_45"
[46] "YR_2001_46" "YR_2001_47" "YR_2001_48" "YR_2001_49" "YR_2001_50"
[51] "YR_2001_51" "YR_2001_52" "YR_2001_53" "YR_2001_54" "YR_2001_55"
[56] "YR_2001_56" "YR_2001_57" "YR_2001_58" "YR_2001_59" "YR_2001_60"
[61] "YR_2001_61" "YR_2001_62" "YR_2001_63" "YR_2001_64" "YR_2001_65"
[66] "YR_2001_66" "YR_2001_67" "YR_2001_68" "YR_2001_69" "YR_2001_70"
[71] "YR_2001_71" "YR_2001_72" "YR_2001_73" "YR_2001_74"

post

[1] "YR_2001_75_79"   "YR_2001_80_84"   "YR_2001_85_89"   "YR_2001_90_94"  
[5] "YR_2001_95_99"   "YR_2001_100plus"

The list objects that you have just created now need to be concatenated which can be achieved using the c() function.

# Concatenate the new column names
new_column_names <- c(pre, years, post)
# print the names
new_column_names

 [1] "LSOA"            "YR_2001_TotPop"  "YR_2001_Less1"  
 [4] "YR_2001_1"       "YR_2001_2"       "YR_2001_3"      
 [7] "YR_2001_4"       "YR_2001_5"       "YR_2001_6"      
[10] "YR_2001_7"       "YR_2001_8"       "YR_2001_9"      
[13] "YR_2001_10"      "YR_2001_11"      "YR_2001_12"     
[16] "YR_2001_13"      "YR_2001_14"      "YR_2001_15"     
[19] "YR_2001_16"      "YR_2001_17"      "YR_2001_18"     
[22] "YR_2001_19"      "YR_2001_20"      "YR_2001_21"     
[25] "YR_2001_22"      "YR_2001_23"      "YR_2001_24"     
[28] "YR_2001_25"      "YR_2001_26"      "YR_2001_27"     
[31] "YR_2001_28"      "YR_2001_29"      "YR_2001_30"     
[34] "YR_2001_31"      "YR_2001_32"      "YR_2001_33"     
[37] "YR_2001_34"      "YR_2001_35"      "YR_2001_36"     
[40] "YR_2001_37"      "YR_2001_38"      "YR_2001_39"     
[43] "YR_2001_40"      "YR_2001_41"      "YR_2001_42"     
[46] "YR_2001_43"      "YR_2001_44"      "YR_2001_45"     
[49] "YR_2001_46"      "YR_2001_47"      "YR_2001_48"     
[52] "YR_2001_49"      "YR_2001_50"      "YR_2001_51"     
[55] "YR_2001_52"      "YR_2001_53"      "YR_2001_54"     
[58] "YR_2001_55"      "YR_2001_56"      "YR_2001_57"     
[61] "YR_2001_58"      "YR_2001_59"      "YR_2001_60"     
[64] "YR_2001_61"      "YR_2001_62"      "YR_2001_63"     
[67] "YR_2001_64"      "YR_2001_65"      "YR_2001_66"     
[70] "YR_2001_67"      "YR_2001_68"      "YR_2001_69"     
[73] "YR_2001_70"      "YR_2001_71"      "YR_2001_72"     
[76] "YR_2001_73"      "YR_2001_74"      "YR_2001_75_79"  
[79] "YR_2001_80_84"   "YR_2001_85_89"   "YR_2001_90_94"  
[82] "YR_2001_95_99"   "YR_2001_100plus"

You can now apply these to the columns of pop_2001

# Add the new column names
colnames(pop_2001) <- new_column_names

This will have created a data frame containing the 2001 population data that will look like…

# Show first 6 rows and 10 columns
pop_2001[1:6, 1:10]

          LSOA YR_2001_TotPop YR_2001_Less1 YR_2001_1 YR_2001_2 YR_2001_3
1746 E01006511           1026             7        10         8         4
1747 E01006512           1454             8        11        10         9
1748 E01006513           2077             4         5         6         3
1749 E01006514           1432             8         7        10         7
1750 E01006515           1156             6        15        12        12
1751 E01006516           1412            12        10        12         8
     YR_2001_4 YR_2001_5 YR_2001_6 YR_2001_7
1746         9         5         4         7
1747        10         8        13         7
1748         3         6         0         5
1749         8         6         6         5
1750        11        14        10         8
1751         7         7         4         4

Some more manipulation…

Later on we are going to merge the 2001 and 2011 data together into a new data frame. However, before doing this, we first need to create a new version of the 2001 data with the same age bands as the 2011 data. You need to create new columns in the pop_2001 data for:

YR_2001_0_15
YR_2001_16_29
YR_2001_30_44
YR_2001_45_64
YR_2001_65plus

The numbers refer to the years. There are a number of ways to achieve this.

# Add manually
pop_2001$YR_2001_0_14 <- pop_2001$YR_2001_Less1 + pop_2001$YR_2001_1 + pop_2001$YR_2001_2 + 
    pop_2001$YR_2001_3 + pop_2001$YR_2001_4 + pop_2001$YR_2001_5 + pop_2001$YR_2001_6 + 
    pop_2001$YR_2001_7 + pop_2001$YR_2001_8 + pop_2001$YR_2001_9 + pop_2001$YR_2001_10 + 
    pop_2001$YR_2001_11 + pop_2001$YR_2001_12 + pop_2001$YR_2001_13 + pop_2001$YR_2001_14
# Add as a range
pop_2001$YR_2001_15_29 <- rowSums(pop_2001[, 18:32])

The above code has created the aggregations for YR_2001_0_14 and YR_2001_15_29. You now need to add the following new columns to pop_2001 data frame and with the appropriate age group aggregations:

YR_2001_30_44
YR_2001_45_64
YR_2001_65plus

Hint: The easiest way to do it is to use the 'Add as a range' option from above, and to alter the field names and column numbers as required.

Next, create a subset of the 2001 data called pop_2001_V2 that contains the LSOA code, the total population and the newly created variables (YR_2001_0_14 YR_2001_15_29 YR_2001_30_44 YR_2001_45_64 and YR_2001_65plus)

The top 6 rows of your new data frame should look like:

head(pop_2001_V2)

          LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1746 E01006511           1026          115           274           245
1747 E01006512           1454          119           935           244
1748 E01006513           2077           81          1691           149
1749 E01006514           1432          102           681           358
1750 E01006515           1156          187           278           228
1751 E01006516           1412          103           597           386
     YR_2001_45_64 YR_2001_65plus
1746           246            146
1747           116             40
1748            95             61
1749           188            103
1750           252            211
1751           216            110

2011 Census Population Estimates

We are going to compare the population from the 2001 census with population data from the 2011 census. Unfortunately, 21 of the 291 LSOAs in Liverpool changed between the 2001 and 2011 census. Fortunately, ONS provide a set of estimated population data for 2011 split into the 2001 LSOAs. It won't be as accurate as the 2011 census, but the geographies will match so we can see how the population has changed between 2001 and 2011.

Download the “Supporting Information for Lower Super Output Area Population Estimates - Mid-2011 (Rolled Forward Estimates)” entry from http://www.ons.gov.uk/ons/datasets-and-tables/index.html?pageSize=25&sortBy=none&sortDirection=none&newquery=Supporting+Information+for+Lower+Super+Output+Area+Population+Estimates+-+Mid-2011+%28Rolled+Forward+Estimates%29 - it is the second in the list.

As before this is a zipped file. Follow the same procedure as before to extract it. Unfortunately, this isn't a csv file, it is stored in Microsoft Excel. You will need to use Excel to extract the data in CSV format before you read it into R. So, open it in Excel and go to the tab marked mid11_rolled_forward. Save this file under the new name pop2011.csv - select csv as the format to save the file in your working directory. You can now read this into R.

If you do head(pop_2011) you can see that there are many different age catagories, in addition to the LSOA Code and the MSOA Code fields.

head(pop_2011)

##   LSOA.Code MSOA.Code M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14
## 1 E01000001 E02000001  7  4  4  4  3  3  1  1  6  6   4   2   2   4   5
## 2 E01000002 E02000001  4  8  2  5  5  4 11  3  2  2   3   3  10   8   2
## 3 E01000003 E02000001  9  6  3  6 12  1  2  5  4  6   2  11   5   5   4
## 4 E01000004 E02000001 10  7  5  6  5 10  4  6 16 27   7  13  14   9   3
## 5 E01000005 E02000001  9  6  3  7  2  9  8 11 11  7   7  12   9   9   5
## 6 E01000027 E02000002 12 16 13  8 11  9 13 16 12 14   9   8  12  11  17
##   M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32
## 1   4   4   4   2   2   6   2  12  11  12  18  22  35  30  29  23  19  40
## 2   2   5   3   6   1   7   4  14   6   2  11   4  20  18  11  18  26  19
## 3   4   3  21  26  24  18  25  25  41  28  29  17  32  34  34  30  43  27
## 4   0   0   0   0   0   7  15  32  47  86  99  87 106 102 105  93  81  87
## 5  13  11   8   7  17  11  11  10  12  24  23  40   4  38  41  46  16  27
## 6  15  11   7  14   9  17   9  11  12   7  12   7  12   6  12   9   9   7
##   M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50
## 1  34  38  50  40  34  24  17  19  14  23  13  17  17  19  18  17  19  22
## 2  33  17  28  22  21  20  30  17  14  19  36  17  24  11  15  10  17  23
## 3  38  53  40  48  39  25  34  28  16  21  20  18  21  19  11  11  16  10
## 4  66  65  77  58  41  43  33  43  35  24  37  26  40  44  36  25  20  15
## 5  36  20  25  20  20   8  12  14  11  15  20  23  12   6  11   9  17   3
## 6  10  11   5  15   5  15   9   6   8   6  11   8   9   8   8   6   6   9
##   M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 M67 M68
## 1  19  19  24  16  14  14  10  19  18  42  19  12  32  31  15  18  14   7
## 2  23  21  34  29  19  20  19  14  18  18  13  12  23  16  18  22  26  19
## 3   4  12  17   9  11  16   5  11   8  21   7  13   8  14  14   4   8  12
## 4  18  16  27  23  22  12  15  18  14  28   6   8   5   5   8  12   4   6
## 5   4   2   5   3   8   6   8  10   5   8   6  10   7  14   3   6   4   4
## 6   7   9   9   6   5   8   4   6   8   3   3   2   2   4   6   6   8   1
##   M69 M70 M71 M72 M73 M74 M75 M76 M77 M78 M79 M80 M81 M82 M83 M84 M85 M86
## 1   4   8   6   7   7   2   9  10   6   3  11  12   5   4  10   7   3   2
## 2  13   9  10   7  16   5   7   6   4   1   3   5   3   1   4   2   1   7
## 3  13   1   4   7   6   1   4   2   5   1   8   3   1   1   7   1   5   1
## 4   8   3   5   1   5   3   1   1   1   2   3   1   0   1   1   1   2   1
## 5   4   7   3  11   1   1   6   1   4   3   2   1   0   0   0   0   0   0
## 6   4   3   5   4   6   2   5   1   4   3   7   3   5   4   3   2   8   4
##   M87 M88 M89 M90plus F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
## 1   1   0   1       8  5  5  2  4  5  4  8  4  5  2   4   9   3   2   3
## 2   1   0   1       4 10 15  7  4  7  4  9 16  6  8   6   5   2   4   4
## 3   1   0   1       9  6  5  9  8  6  3  3  3  3  4   6   4   2   3   1
## 4   1   0   1       6  5  4  3  0  0  3  2  2  6  3   3   1   5   1   1
## 5   0   0   0       4  6  6  2  5  6  1  4  6  2  6   6  10   7   6  13
## 6   1   1   2       2  9 11 13 16 12 18 14 11 11  7  11  14  22  19   9
##   F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32
## 1   7   7   7   7  11   4   5  22  11  32  24  41  39  40  39  28  32  32
## 2   5   3   2   3   4   4   6   5  14   9  14  28  26  19  17  22  21  13
## 3   1   7  16   7  27  35  30  35  30  37  25  28  33  35  37  54  20  43
## 4   1   2   2   4   7   4   4  10  39  29  61  53  57  54  65  63  60  66
## 5   6   6   5   9  16   8  19  12  38  34  33  26  30  16  42  36  20  19
## 6  14  10  11  13  12   6  12   8   8   9  14  21  14  10   9  11   8  11
##   F33 F34 F35 F36 F37 F38 F39 F40 F41 F42 F43 F44 F45 F46 F47 F48 F49 F50
## 1  19  37  21  18  23  23  16  33  22  29  12  13  17  19  19  10  15  15
## 2  13  22  32  30  21  26  40  22  23  31  24  24  17  20  16  13   9  13
## 3  31  35  20  19  16  17  19  14   9  18  15  18   6  14  13  12  10  15
## 4  39  34  22  17  19  18  28  16  19  17  12  15  17  14  14   8   7   3
## 5  12  21  10   7   5  12   6  16  20  19  13   8   7   8  11   7   9  10
## 6  12  13  12   8  12   9  16  15   9  12  12  14  19  18   9   9  20   9
##   F51 F52 F53 F54 F55 F56 F57 F58 F59 F60 F61 F62 F63 F64 F65 F66 F67 F68
## 1  13   9  17  12  22  13  11  12  18  12  18  10  15  11  19  13  15  19
## 2   7   5  18  21  14  12  14   9  13  10  15  16  12  12   9  18  11  12
## 3   6   9  14   5  13  14   9   6  10   5  16  10   9   6   8   5   4   5
## 4   7  12  13   7   6  10   7   9   7   6  13  16  12  10   5   5   4   2
## 5   7   5  12   4  12   7   8   5   4   7   3   6  16   4   0   7   2   1
## 6  13  15   6  10   3   6   6   3  11   4   7   3   4  12   4   8   4   3
##   F69 F70 F71 F72 F73 F74 F75 F76 F77 F78 F79 F80 F81 F82 F83 F84 F85 F86
## 1   9  10   4   8   5   5  16   8  11  15  12   7   4   7   9   2   0   1
## 2   8   9  12   8   8   7   7   6   6   1   7   3   4   5   1   1   1   3
## 3   5   9   9   6   3   2   7   7   7   8   6   5  10   8   8   6   8  11
## 4   4   6   4   2   6   1   1   5   1   2   1   0   0   0   0   0   1   2
## 5   2   4   5   3   1   1   5   3   0   2   1   0   3   0   1   1   2   1
## 6   4   3   3   3   8   9   4   5   7   7   6  12   8   7   6   7   1   7
##   F87 F88 F89 F90plus
## 1   0   0   0       5
## 2   6   1   2       8
## 3   4   4   1      23
## 4   1   1   1       4
## 5   1   0   1       5
## 6   5   3   1       8

The MSOA Code field is for a different type of geography, which we don't need so we can remove it:

pop_2011$MSOA.Code <- NULL

We need to merge the data into the five age categories we want, as well as creating a total category, like we did with the 2011 data. However this time, the data are split up into male and female, so the combinations are a little more complex. To get you started:

# Add as a range
pop_2011$YR_2011_TotPop <- rowSums(pop_2011[, 2:183])
pop_2011$YR_2011_0_14 <- rowSums(pop_2011[, c(2:16, 93:107)])
pop_2011$YR_2011_15_29 <- rowSums(pop_2011[, c(17:31, 108:122)])

Note the slightly different format of the column numbers - we need to use c(17:31,108:122) because we are specifying multiple ranges - i.e. columns 17 to 31 (for males) and 108 to 122 (for females) in this case. If you need to check which column numbers you need, you can use the colnames function to work out which columns you need:

colnames(pop_2011)

##   [1] "LSOA.Code"      "M0"             "M1"             "M2"            
##   [5] "M3"             "M4"             "M5"             "M6"            
##   [9] "M7"             "M8"             "M9"             "M10"           
##  [13] "M11"            "M12"            "M13"            "M14"           
##  [17] "M15"            "M16"            "M17"            "M18"           
##  [21] "M19"            "M20"            "M21"            "M22"           
##  [25] "M23"            "M24"            "M25"            "M26"           
##  [29] "M27"            "M28"            "M29"            "M30"           
##  [33] "M31"            "M32"            "M33"            "M34"           
##  [37] "M35"            "M36"            "M37"            "M38"           
##  [41] "M39"            "M40"            "M41"            "M42"           
##  [45] "M43"            "M44"            "M45"            "M46"           
##  [49] "M47"            "M48"            "M49"            "M50"           
##  [53] "M51"            "M52"            "M53"            "M54"           
##  [57] "M55"            "M56"            "M57"            "M58"           
##  [61] "M59"            "M60"            "M61"            "M62"           
##  [65] "M63"            "M64"            "M65"            "M66"           
##  [69] "M67"            "M68"            "M69"            "M70"           
##  [73] "M71"            "M72"            "M73"            "M74"           
##  [77] "M75"            "M76"            "M77"            "M78"           
##  [81] "M79"            "M80"            "M81"            "M82"           
##  [85] "M83"            "M84"            "M85"            "M86"           
##  [89] "M87"            "M88"            "M89"            "M90plus"       
##  [93] "F0"             "F1"             "F2"             "F3"            
##  [97] "F4"             "F5"             "F6"             "F7"            
## [101] "F8"             "F9"             "F10"            "F11"           
## [105] "F12"            "F13"            "F14"            "F15"           
## [109] "F16"            "F17"            "F18"            "F19"           
## [113] "F20"            "F21"            "F22"            "F23"           
## [117] "F24"            "F25"            "F26"            "F27"           
## [121] "F28"            "F29"            "F30"            "F31"           
## [125] "F32"            "F33"            "F34"            "F35"           
## [129] "F36"            "F37"            "F38"            "F39"           
## [133] "F40"            "F41"            "F42"            "F43"           
## [137] "F44"            "F45"            "F46"            "F47"           
## [141] "F48"            "F49"            "F50"            "F51"           
## [145] "F52"            "F53"            "F54"            "F55"           
## [149] "F56"            "F57"            "F58"            "F59"           
## [153] "F60"            "F61"            "F62"            "F63"           
## [157] "F64"            "F65"            "F66"            "F67"           
## [161] "F68"            "F69"            "F70"            "F71"           
## [165] "F72"            "F73"            "F74"            "F75"           
## [169] "F76"            "F77"            "F78"            "F79"           
## [173] "F80"            "F81"            "F82"            "F83"           
## [177] "F84"            "F85"            "F86"            "F87"           
## [181] "F88"            "F89"            "F90plus"        "YR_2011_TotPop"
## [185] "YR_2011_0_14"   "YR_2011_15_29"

Therefore you can see above, that column 32 ("M30") contains the number of males ages 30 (4 along from the row starting with [29]).

The above code has created the aggregations for YR_2011_0_14 and YR_2011_15_29. You now need to add the following new columns to pop_2011 data frame and with the appropriate age group aggregations:

YR_2011_30_44
YR_2011_45_64
YR_2011_65plus

The first row of your data should look like this:

pop_2011[1, ]

##   LSOA.Code M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17
## 1 E01000001  7  4  4  4  3  3  1  1  6  6   4   2   2   4   5   4   4   4
##   M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35
## 1   2   2   6   2  12  11  12  18  22  35  30  29  23  19  40  34  38  50
##   M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53
## 1  40  34  24  17  19  14  23  13  17  17  19  18  17  19  22  19  19  24
##   M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 M67 M68 M69 M70 M71
## 1  16  14  14  10  19  18  42  19  12  32  31  15  18  14   7   4   8   6
##   M72 M73 M74 M75 M76 M77 M78 M79 M80 M81 M82 M83 M84 M85 M86 M87 M88 M89
## 1   7   7   2   9  10   6   3  11  12   5   4  10   7   3   2   1   0   1
##   M90plus F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17
## 1       8  5  5  2  4  5  4  8  4  5  2   4   9   3   2   3   7   7   7
##   F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32 F33 F34 F35
## 1   7  11   4   5  22  11  32  24  41  39  40  39  28  32  32  19  37  21
##   F36 F37 F38 F39 F40 F41 F42 F43 F44 F45 F46 F47 F48 F49 F50 F51 F52 F53
## 1  18  23  23  16  33  22  29  12  13  17  19  19  10  15  15  13   9  17
##   F54 F55 F56 F57 F58 F59 F60 F61 F62 F63 F64 F65 F66 F67 F68 F69 F70 F71
## 1  12  22  13  11  12  18  12  18  10  15  11  19  13  15  19   9  10   4
##   F72 F73 F74 F75 F76 F77 F78 F79 F80 F81 F82 F83 F84 F85 F86 F87 F88 F89
## 1   8   5   5  16   8  11  15  12   7   4   7   9   2   0   1   0   0   0
##   F90plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29 YR_2011_30_44
## 1       5           2446          121           489           763
##   YR_2011_45_64 YR_2011_65plus
## 1           689            384

We also need to rename the LSOA column to correctly reflect its contents:

# Rename column 'LSOA.Code' to 'LSOA'
names(pop_2011)[names(pop_2011) == "LSOA.Code"] <- "LSOA"

As before, create a subset of the 2011 data called pop_2011_V2 that contains the LSOA code, the total population and the newly created variables (YR_2011_0_14 YR_2011_15_29 YR_2011_30_44 YR_2011_45_64 and YR_2011_65plus)

The head of your pop_2011_V2 data frame should look like this:

head(pop_2011_V2)

##        LSOA YR_2011_TotPop YR_2011_0_14 YR_2011_15_29 YR_2011_30_44
## 1 E01000001           2446          121           489           763
## 2 E01000002           2157          179           273           701
## 3 E01000003           2464          147           744           828
## 4 E01000004           3245          181          1078          1254
## 5 E01000005           1731          201           570           537
## 6 E01000027           1570          378           332           308
##   YR_2011_45_64 YR_2011_65plus
## 1           689            384
## 2           645            359
## 3           446            299
## 4           595            137
## 5           306            117
## 6           309            243

Merging the 2001 and 2011 data and calculating rates of change

You will now have two population data frames; the first for 2001 which contains all LSOA in Liverpool (291 rows), and a second which contains all LSOA in England (34,378 rows). You might want to check if your data frame objects have the same number of rows at this stage.

nrow(pop_2001_V2)

[1] 291

nrow(pop_2011_V2)

[1] 34378

We now need to create a new data frame that combines both the 2001 and 2011 data. One way of achieving this is to use the merge() function. However before we do this, we will first explore exactly what the merge function does.

We will use the data frame you created right at the start of the tutorial. If you have forgotten what this looked like, you can always print the object in the terminal.

test

   Name Age      Place
1  John  20  Liverpool
2 Helen  10     London
3   Mia  16  Liverpool
4  Carl  21 Manchester

You now need to create another data frame which we will use as an example. You could create another CSV file and import this; however, we will illustrate another way of achieving this through the amalgamation of a series of vector lists.

# Create a person vector
Person <- c("Paul", "Mike", "John", "Helen", "Mia", "Leo", "Carl")
# Create a favourite colours vector
Colour <- c("blue", "green", "red", "yellow", "purple", "orange", "black")
# We can now join these two vectors into a new data frame of favourite
# colours
fav_colour <- data.frame(Person, Colour)
# View the fav_colour
fav_colour

  Person Colour
1   Paul   blue
2   Mike  green
3   John    red
4  Helen yellow
5    Mia purple
6    Leo orange
7   Carl  black

We now have two tables, the first listing a series of people who attended a current R training workshop, and the second containing these current people, and those additional people who have been to previous workshops; along with their favourite colours!

What we want to do is append the favourite colours onto our list of current course attendees. We will refer to the two data frames as x and y. The x data frame is test; and the y is fav_colour. In x, the column containing the list of people is called “Name”, and in y, it is called “Person”. The parameters of the merge function first accept the two table names, and then the lookup columns as by.x or by.y. You should also include all.x=TRUE as a final parameter. This tells the function to keep all the records in x, but only those in y which match. As an experiment you can change this to all.y=TRUE - what you will find is that all the y records are now in the data frame, however, because there were no matches in x, there are NA values present.

We will now create a new data frame called People_And_Colours and print this to the terminal.

People_And_Colours <- merge(test, fav_colour, by.x = "Name", by.y = "Person", 
    all.x = TRUE)
People_And_Colours

   Name Age      Place Colour
1  Carl  21 Manchester  black
2 Helen  10     London yellow
3  John  20  Liverpool    red
4   Mia  16  Liverpool purple

If the by columns were the same in both x and y, we can specify this simply with by="column name" rather than by.x and by.y; and finally, a critical issue when making any join is assuring that the “by” columns are in the same format.

The pop_2001_V2 data frame should now be joined to pop_2011_V2. You will need to specify a merge that:

Creates a new object called pop_2001_2011
Merges pop_2001_V2 to pop_2011_V2
Using LSOA from both pop_2001_V2 and pop_2011_V2
Only keep the entries that are in the pop_2001 data frame (i.e. the ones for Liverpool, all.x=TRUE)

If you have done the join correctly, your new data frame should have 291 rows and the top six rows should look like…

# Number of rows
nrow(pop_2001_2011)

[1] 291

# Show top 6 records
head(pop_2001_2011)

       LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1 E01006511           1026          115           274           245
2 E01006512           1454          119           935           244
3 E01006513           2077           81          1691           149
4 E01006514           1432          102           681           358
5 E01006515           1156          187           278           228
6 E01006516           1412          103           597           386
  YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29
1           246            146           2697          198          1256
2           116             40           2463          180          1713
3            95             61           2891           82          2445
4           188            103           1275          105           489
5           252            211           1066          133           331
6           216            110           1400           81           603
  YR_2011_30_44 YR_2011_45_64 YR_2011_65plus
1           713           369            161
2           329           200             41
3           190           100             74
4           275           286            120
5           229           223            150
6           332           283            101

You will now calculate both absolute and percentage change statistics between 2001 and 2011 for the different age bands. The following example illustrates this for the total population, however, you will need to complete this for each age band. Create new variables in the format shown:

# Total Population (TC - Total Change ; PC - Percent Change)
pop_2001_2011$TotPop_TC <- pop_2001_2011$YR_2011_TotPop - pop_2001_2011$YR_2001_TotPop
pop_2001_2011$TotPop_PC <- (pop_2001_2011$YR_2011_TotPop - pop_2001_2011$YR_2001_TotPop)/pop_2001_2011$YR_2001_TotPop * 
    100

# 0-14 Years (TC - Total Change ; PC - Percent Change)
pop_2001_2011$YR_0_14_TC <- pop_2001_2011$YR_2011_0_14 - pop_2001_2011$YR_2001_0_14
pop_2001_2011$YR_0_14_PC <- (pop_2001_2011$YR_2011_0_14 - pop_2001_2011$YR_2001_0_14)/pop_2001_2011$YR_2001_0_14 * 
    100

You now need to complete this for the other age bands, leaving you with a final data frame that looks like:

head(pop_2001_2011)

       LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1 E01006511           1026          115           274           245
2 E01006512           1454          119           935           244
3 E01006513           2077           81          1691           149
4 E01006514           1432          102           681           358
5 E01006515           1156          187           278           228
6 E01006516           1412          103           597           386
  YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29
1           246            146           2697          198          1256
2           116             40           2463          180          1713
3            95             61           2891           82          2445
4           188            103           1275          105           489
5           252            211           1066          133           331
6           216            110           1400           81           603
  YR_2011_30_44 YR_2011_45_64 YR_2011_65plus TotPop_TC TotPop_PC
1           713           369            161      1671  162.8655
2           329           200             41      1009   69.3948
3           190           100             74       814   39.1911
4           275           286            120      -157  -10.9637
5           229           223            150       -90   -7.7855
6           332           283            101       -12   -0.8499
  YR_0_14_TC YR_0_14_PC YR_15_29_TC YR_15_29_PC YR_30_44_TC YR_30_44_PC
1         83     72.174         982     358.394         468    191.0204
2         61     51.261         778      83.209          85     34.8361
3          1      1.235         754      44.589          41     27.5168
4          3      2.941        -192     -28.194         -83    -23.1844
5        -54    -28.877          53      19.065           1      0.4386
6        -22    -21.359           6       1.005         -54    -13.9896
  YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1         123      50.000           15       10.274
2          84      72.414            1        2.500
3           5       5.263           13       21.311
4          98      52.128           17       16.505
5         -29     -11.508          -61      -28.910
6          67      31.019           -9       -8.182

5 Preparation for Making Maps in R

The pop_2001_2010 data frame will now be visualized on a choropleth map so you can explore areas of change. In order to do this, you will first need to read some polygon files into R.

Downloading appropriate shapefiles

Visit the website http://edina.ac.uk/ukborders/ where you will need to enter you University of Liverpool username and password. Click the “Boundary Data Selector” link. Select “England”, “Census boundaries” and “2001 to 2010” from the menus and then click the find button.
From the generated list click “English Super Output Areas (Lower Layer),2001 [within Counties]”; then on the sub list select “Merseyside” and click the expand button. On the new list select “Liverpool[Merseyside]” and click the “Extract Boundary Data” button.
This is a different data set to the one used in the last practical. If you get a message about special conditions, follow the instructions in the previous practical.
On the following page, download the zip file, unzip this, and place the contents in a new directory inside your working directory called “LSOA”.

When we talk about Shapefiles, we are actually referring to a number of files - each of these files has a different purpose, e.g. holding the geometry, storing the data or specifying the projection.

Data Access Problems

If you can't download the data from Edina, it is avaliable on Vital called england_low_soa_2001.shp.

Getting the Shapefiles into R

Up until this point, all the functions you have been using are contained in the base R installation - i.e. they are present by default when you load R. However, one of the really great things about R is that these functions can be expanded by loading new packages that are created by the open source community. These packages add lots of exiting new functions to R.

We are going to now load “maptools” and “rgeos” which provides functions for reading and manipulating GIS data, along with some other libraries that they require.

# load sp
library(sp)
# load maptools
library(maptools)
# load rgeos
library(stringr)
library(rgeos)

We will now read in the LSOA data into a new type of object called a SpatialPolygonsDataFrame by supplying it with the location of the downloaded shapefiles. If the command generates an error message, check that the shapefile is in the correct location (in a folder called LSOA in your working directory).

LSOA <- readShapePoly("LSOA/england_low_soa_2001.shp")

We can see what a SpatialPolygonsDataFrame looks like using a simple plot() function. You might notice that the polygons nearest the River Mersey are quite large. The reason for this is that the official boundaries extend into the water - however, aesthetically, these are not that appealing. We can however tidy these up before we start mapping.

# A basic plot of the OA SpatialPolygonsDataFrame
plot(LSOA)

plot of chunk unnamed-chunk-48

As such:

Visit the UK Borders website again and download the shapefiles for the “English Outline 2011” which can be found in the Easy Download section. Choose the “Clipped and Generalized” shapefile version.
Unzip these files and save them in a folder called “Outline” within your working directory.
Read this into R as a new SpatialPolygonsDataFrame called “outline”

# Read in outline as a SpatialPolygonsDataFrame
outline <- readShapePoly("Outline/England_ol_2011_gen_clipped.shp")
# This looks like
plot(outline)

plot of chunk unnamed-chunk-49

By default, when you call the plot() function, this overwrites the previous plot. However, there is a way that you can specify plots to be cumulative - i.e. add as a new layer. We do this in the following example to add the newly downloaded coastal outline on top of the LSOA by including add=TRUE. We have also made the border a red colour (border="red"), and the fill colour (col="#2C7FB820") a shade of blue. These represent two ways of specifying colours. The second contains eight alphanumerics, the first six relate to a HEX colour code. To view various colours which can be used in R, have a look here. The final two characters are the level of transparency (in this case 20%). Sometimes when running R in Windows, the transparency option will not work - it will just ﬁll it with a solid colour. In this case, just remove the col = "#2C7FB820" section from the plot command to just generate a red outline.

# Plot the LSOA Map
plot(LSOA)
# Overplot the outline map
plot(outline, add = TRUE, border = "red", col = "#2C7FB820")

plot of chunk unnamed-chunk-50

You will also note on the above plot that the full extent of the English coastal outline is not shown. This is because the plot window is initiated with the extent of the first plot (i.e. Liverpool). You could reverse the above commands (remembering to remove add=TRUE if outline is the first plot) to illustrate this effect.

SpatialPolygonsDataFrame objects are a little more complex than the standard data frame, and comprise of a number of different elements referred to as “slots”. We can view the slots of an object using the slotNames() function which which will print a list of all the available slots.

slotNames(LSOA)

[1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"

The only slot we really need to worry about for this exercise is “data” which contains all the attribute data for each of the zones (i.e. the attributes contained in the DBF of the shapefile). We access this using an @ symbol, and this can then be treated in very similar way to a standard data frame. For example, we can look at the first 6 rows in the LSOA data slot.

head(LSOA@data)

   gid geonorth popnorth geoeast           name         label popeast
0  458   381722   383625  342348 Liverpool 058B 04BYE01006739  342928
1 4350   384390   385002  338967 Liverpool 056D 04BYE01006687  339051
2 4351   383094   383388  343364 Liverpool 058C 04BYE01006741  343305
3 4352   384125   384509  341524 Liverpool 058D 04BYE01006743  340956
4 4370   385573   386586  337323 Liverpool 052C 04BYE01006528  337495
5 4371   385004   385692  338248 Liverpool 056B 04BYE01006684  338794
   zonecode
0 E01006739
1 E01006687
2 E01006741
3 E01006743
4 E01006528
5 E01006684

We want to use the gIntersection() function to remove the elements of the LSOA polygons that overlap the River Mersey. However this takes quite a long time to run, so we have pre-processed this for you (if you want more details on the gIntersection() function, have a look at http://www.alex-singleton.com/R-Tutorial-Materials/6-basic-spatial-analysis.pdf. Run the code below to load in the already clippled version:

# Download file and unzip
download.file("http://www.nickbearman.me.uk/academic/201314_liverpool_envs257/LSOAclip.zip", 
    "LSOAclip.zip")
unzip("LSOAclip.zip")
# read in to R
LSOA <- readShapePoly("LSOAclip/england_low_soa_2001_clip.shp")

Then re-run the plot command:

# Replot the map with outline
plot(LSOA)
# Overplot the outline map
plot(outline, add = TRUE, border = "red", col = "#2C7FB820")

plot of chunk unnamed-chunk-54

Next we will join the population data onto the the LSOA spatial polygons data frame. This has to be handled in a slightly different way to how you used the merge statement earlier. Because we know that both the population data frame, and the spatial data frame have the same number of zones (i.e. were both limited to just Liverpool) we can sort them alphabetically by their area ID, thus enabling us to just line the data up side by side.

# Order the LSOA spatial polygon data frame
LSOA <- LSOA[order(LSOA$zonecode), ]
# Order the pop_2001_2011 data frame
pop_2001_2011 <- pop_2001_2011[order(pop_2001_2011$LSOA), ]

If you then look at the first few rows of the above two objects, you will see by looking at their area identifiers that they are now in the same order. If the area identifiers (zonecode and LSOA) are not the same then something has gone wrong - go back and check what you have done.

head(LSOA@data)

    gid geonorth popnorth geoeast           name         label popeast
8  4374   389683   389710  334377 Liverpool 033A 04BYE01006511  334623
67 8973   389734   389715  336154 Liverpool 031A 04BYE01006512  336203
68 8974   390061   389758  335536 Liverpool 033B 04BYE01006513  335346
63 8969   389484   389482  335525 Liverpool 037A 04BYE01006514  335582
61 8967   389196   389220  335117 Liverpool 037B 04BYE01006515  335214
64 8970   389504   389479  335913 Liverpool 037C 04BYE01006516  335897
    zonecode
8  E01006511
67 E01006512
68 E01006513
63 E01006514
61 E01006515
64 E01006516

head(pop_2001_2011)

       LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1 E01006511           1026          115           274           245
2 E01006512           1454          119           935           244
3 E01006513           2077           81          1691           149
4 E01006514           1432          102           681           358
5 E01006515           1156          187           278           228
6 E01006516           1412          103           597           386
  YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29
1           246            146           2697          198          1256
2           116             40           2463          180          1713
3            95             61           2891           82          2445
4           188            103           1275          105           489
5           252            211           1066          133           331
6           216            110           1400           81           603
  YR_2011_30_44 YR_2011_45_64 YR_2011_65plus TotPop_TC TotPop_PC
1           713           369            161      1671  162.8655
2           329           200             41      1009   69.3948
3           190           100             74       814   39.1911
4           275           286            120      -157  -10.9637
5           229           223            150       -90   -7.7855
6           332           283            101       -12   -0.8499
  YR_0_14_TC YR_0_14_PC YR_15_29_TC YR_15_29_PC YR_30_44_TC YR_30_44_PC
1         83     72.174         982     358.394         468    191.0204
2         61     51.261         778      83.209          85     34.8361
3          1      1.235         754      44.589          41     27.5168
4          3      2.941        -192     -28.194         -83    -23.1844
5        -54    -28.877          53      19.065           1      0.4386
6        -22    -21.359           6       1.005         -54    -13.9896
  YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1         123      50.000           15       10.274
2          84      72.414            1        2.500
3           5       5.263           13       21.311
4          98      52.128           17       16.505
5         -29     -11.508          -61      -28.910
6          67      31.019           -9       -8.182

We will now “column bind” the population data onto the data slot of the LSOA SpatialPolygonsDataFrame. You can do this using the cbind() function. The data objects are separated by commas. These are assigned (over written) onto the data slot of the SpatialPolygonsDataFrame. If you look at LSOA@data after running this code, you will see that the area codes are now lined up, and that the population data are present.

LSOA@data <- cbind(LSOA@data, pop_2001_2011)

If this has been completed correctly, the LSOA@data should look like…

head(LSOA@data)

    gid geonorth popnorth geoeast           name         label popeast
8  4374   389683   389710  334377 Liverpool 033A 04BYE01006511  334623
67 8973   389734   389715  336154 Liverpool 031A 04BYE01006512  336203
68 8974   390061   389758  335536 Liverpool 033B 04BYE01006513  335346
63 8969   389484   389482  335525 Liverpool 037A 04BYE01006514  335582
61 8967   389196   389220  335117 Liverpool 037B 04BYE01006515  335214
64 8970   389504   389479  335913 Liverpool 037C 04BYE01006516  335897
    zonecode      LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29
8  E01006511 E01006511           1026          115           274
67 E01006512 E01006512           1454          119           935
68 E01006513 E01006513           2077           81          1691
63 E01006514 E01006514           1432          102           681
61 E01006515 E01006515           1156          187           278
64 E01006516 E01006516           1412          103           597
   YR_2001_30_44 YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14
8            245           246            146           2697          198
67           244           116             40           2463          180
68           149            95             61           2891           82
63           358           188            103           1275          105
61           228           252            211           1066          133
64           386           216            110           1400           81
   YR_2011_15_29 YR_2011_30_44 YR_2011_45_64 YR_2011_65plus TotPop_TC
8           1256           713           369            161      1671
67          1713           329           200             41      1009
68          2445           190           100             74       814
63           489           275           286            120      -157
61           331           229           223            150       -90
64           603           332           283            101       -12
   TotPop_PC YR_0_14_TC YR_0_14_PC YR_15_29_TC YR_15_29_PC YR_30_44_TC
8   162.8655         83     72.174         982     358.394         468
67   69.3948         61     51.261         778      83.209          85
68   39.1911          1      1.235         754      44.589          41
63  -10.9637          3      2.941        -192     -28.194         -83
61   -7.7855        -54    -28.877          53      19.065           1
64   -0.8499        -22    -21.359           6       1.005         -54
   YR_30_44_PC YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
8     191.0204         123      50.000           15       10.274
67     34.8361          84      72.414            1        2.500
68     27.5168           5       5.263           13       21.311
63    -23.1844          98      52.128           17       16.505
61      0.4386         -29     -11.508          -61      -28.910
64    -13.9896          67      31.019           -9       -8.182

6 Making Maps in R

OK, the data is now ready so we are now able to make some maps! There are many different ways of doing this, however, we will use one of the simplest, and outline each step as we go along. In this example, we will make a map of the change in total population.

We need to load some more packages containing functions that will help us make some maps.

# load classInt
library("classInt")
library("RColorBrewer")

The first step is to find suitable breakpoints for the data contained in the TotPop_PC column. The contiuous data needs to be assigned into categories so different colours can be applied on a choropleth map. There are numerous ways of doing this such as jenks, standard deviations or equal intervals. In this example we use function classIntervals from the classInt package to find the ranges needed to divide the the TotPop_PC into six categories.

# set breaks for 6 categories and then add the values to a single list -
# style = fisher is jenks
breaks <- classIntervals(LSOA@data$TotPop_PC, n = 6, style = "fisher")

# classIntervals returns an object with a number of different slots -
# however, we only need the break values; thus, we overwrite the object
# breaks as follows
breaks <- breaks$brks

We will now use another package called RColorBrewer which provides a series of colour pallets that are suitable for mapping. You can have a look at the colour pallets online: http://colorbrewer2.org. Each of these pallets are named; and you can see all the available pallets as follows…

display.brewer.all()

plot of chunk unnamed-chunk-61

We will now select a set of six colours from one of these pallets into a new list object called my_colours. If you print this object, you will see it contains six HEX codes for colours within the selected pallet.

# Select six colours from the pallet YlOrRd
my_colours <- brewer.pal(6, "YlOrRd")
# Print the colour codes to the terminal
my_colours

[1] "#FFFFB2" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#BD0026"

Another function called findInterval() can be used with the “breaks” object created earlier to divide the TotPop_PC column into aggregate groups. To view the intervals, just print the breaks object…

breaks

[1] -80.426 -40.414 -12.341   5.591  29.781  82.104 162.865

Before we use these breaks to create a chropleth map, let us first have a look to see what the findInterval() function actually does.

The following code returns a list of numbers. Each number represents a row in the LSOA@data TotPop_PC column - i.e. there will be the same number of values as there are rows in LSOA@data.

The function has examined the values of the TotPop_PC column against the break points that you have requested (stored in the “breaks” object); and returned these as a series of values (1-6) - i.e. break point 1 gets the number 1 etc. The all.inside=TRUE is an optional parameter to ensure that all values are assessed correctly.

# print the intervals
findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)

  [1] 6 5 5 3 3 3 5 4 3 4 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 4 5 3 3 3 3 2 3 3 3
 [36] 3 3 3 3 4 4 4 3 3 3 4 3 2 1 3 1 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 4 3 3 3
 [71] 3 3 4 3 3 4 4 3 3 3 3 3 3 3 3 3 3 2 2 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3
[106] 3 3 3 3 3 3 2 3 3 3 3 3 4 3 3 3 3 3 5 3 3 4 2 3 4 5 2 3 4 3 4 3 4 6 6
[141] 4 3 3 4 4 3 3 3 3 3 3 4 2 4 4 4 3 4 4 3 3 4 4 4 3 4 2 4 4 3 3 3 3 5 3
[176] 3 4 3 3 3 3 3 3 4 5 4 3 3 3 3 3 3 3 1 3 3 3 4 3 2 3 3 3 3 3 4 3 3 3 5
[211] 4 4 3 2 3 4 2 3 4 3 3 3 4 3 3 3 3 3 3 2 3 3 5 3 4 3 6 3 1 2 6 2 3 3 3
[246] 4 3 3 2 4 3 4 3 3 3 3 4 3 3 2 3 3 3 3 3 4 2 3 3 3 3 3 3 3 4 4 3 4 3 3
[281] 3 3 4 3 3 3 3 3 4 2 3

We can combine this list of 1-6 numbers to select specific colours from the my_colours object you created earlier.

By using the square brackets we can use the output of the findInterval() function to select items from the my_colours list. E.g., a number 1 would select the first colour, a 2 the second colour, etc.

# Returns a list of colours
my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)]

  [1] "#BD0026" "#F03B20" "#F03B20" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
  [8] "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
 [15] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [22] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#F03B20" "#FEB24C"
 [29] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
 [36] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FD8D3C"
 [43] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FED976" "#FFFFB2"
 [50] "#FEB24C" "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [57] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [64] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
 [71] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C"
 [78] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [85] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FED976" "#FEB24C" "#FED976"
 [92] "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
 [99] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[106] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976"
[113] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[120] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C" "#FEB24C"
[127] "#FD8D3C" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#FED976" "#FEB24C"
[134] "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#BD0026" "#BD0026"
[141] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[148] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[155] "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[162] "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[169] "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C"
[176] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[183] "#FEB24C" "#FD8D3C" "#F03B20" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
[190] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FFFFB2" "#FEB24C" "#FEB24C"
[197] "#FEB24C" "#FD8D3C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[204] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
[211] "#FD8D3C" "#FD8D3C" "#FEB24C" "#FED976" "#FEB24C" "#FD8D3C" "#FED976"
[218] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[225] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C"
[232] "#FEB24C" "#F03B20" "#FEB24C" "#FD8D3C" "#FEB24C" "#BD0026" "#FEB24C"
[239] "#FFFFB2" "#FED976" "#BD0026" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[246] "#FD8D3C" "#FEB24C" "#FEB24C" "#FED976" "#FD8D3C" "#FEB24C" "#FD8D3C"
[253] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[260] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
[267] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[274] "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[281] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[288] "#FEB24C" "#FD8D3C" "#FED976" "#FEB24C"

We can then pull all this together to create a basic chropleth map using the plot() function. The first attribute is the SpatialPolygonsDataFrame, followed by col= which is how we want to colour the polygons. If we specified a single colour, e.g. "#FEB24C", all the polygons would be the same colour.

However, because we use the findInterval() function to generate a list of colours, each polygon is coloured by an appropriate colour associated with area's TotPop_PC value.

The other attributes in the plot() function are axes=FALSE to remove axis from the plot, and border = NA to remove borders from around the polygons.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)

plot of chunk unnamed-chunk-66

You can see what the axes and border function do if you run the following code, setting the axes to TRUE, and the border to a grey colour (#6B6B6B).

# Create a basic choropleth map with borders and axis
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = TRUE, border = "#6B6B6B")

plot of chunk unnamed-chunk-67

You see that your previous plot has been replaced by a new plot with the LSOA borders coloured in grey, and the axis now visible - the scale on the axis are meters and relate to the projection of the input shapefile (OSGB - British National Grid).

As noted earlier, we can also plot extra data on top of the original chropleth. This is useful for adding additional context to maps. Run the code below to download and import the English Census Area Statistic Wards for Liverpool, which have already been clipped to match the River Mersey.

# Download file and unzip
download.file("http://www.nickbearman.me.uk/academic/201314_liverpool_envs257/CASclip.zip", 
    "CASclip.zip")
unzip("CASclip.zip")
# read in to R
CAS <- readShapePoly("CASclip/england_caswa_2001_clip.shp")

We can now plot the basic choropleth again, however, this time we will add a second layer showing the ward outlines. In the second plot, we simply specify a grey border colour, and add another parameter called add=TRUE. This tells the plot window to keep the previous result, and add the new plot on top.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)

plot of chunk unnamed-chunk-69

We now have a pretty convincing map. However, there are a few things missing. The first of these are labels for the ward zones. We can use the coordinates() function to extract the central location of each ward polygon, and use this with the names of the ward which are stored in the data slot.

# Print the top six results of using coordinates()
head(coordinates(CAS))

    [,1]   [,2]
0 342161 383312
1 336112 387301
2 339430 385776
3 341695 385904
4 340738 391460
5 340764 395445

We can use the text() function to add these text onto the map. The first parameter are the x coordinates, the second the y, and the third the text we want to use as the labels. The final cex parameter scales the text smaller.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)

plot of chunk unnamed-chunk-71

The final items we need to add onto the map are a scale bar, a north arrow and a legend. A legend is added using the legend() function - the parameters used include the xy location, legend= - which are the text labels to use on the legend (leglabs and round are used to make these display correctly), fill= which is the object containing the fill colours you used to colour the LSOA choropleth, bty="n" which turns off the border around the legend.

You might find you will need to adjust the location or size of the legend to get this to fit onto your plot correctly. For scaling, the cex= parameter will reduce or increase the size of the item. To find a new set of location coordinates, type locator() into the terminal and press enter. Move your mouse to the map window and left-click where you want to position the legend. Then right-click and choose Stop. R will then print the coordinates of the location you clicked. You can use these to re-position items in the plot.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "), 
    fill = my_colours, bty = "n")

plot of chunk unnamed-chunk-72

The sp package which you loaded earlier contains a number of other functions that enable you to add useful items to the map, including a north arrow and scale bar. There are also base R functions for adding text and titles.

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "), 
    fill = my_colours, bty = "n")
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000, 
    plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000, 
    fill = c("white", "black"), plot.grid = F)

# Add text to scale bar
text(335379, 381606, "0km", cex = 0.8)
text(335379 + 1500, 381606, "1.5km", cex = 0.8)
text(335379 + 3000, 381606, "3km", cex = 0.8)
# Add a title
title("Total Population Change (%)")

plot of chunk unnamed-chunk-73

The legend, north arrow, scale bar, etc. are all important parts of the map and will be assessed in your assignment. Make sure you understand what is going on here, in order to add the various elements to the map. Remember, you can always use the help command (?layout.scale.bar) for help with specific functions.

7 Exporting Maps

You have been creating all your plots within R, however, when preparing your reports you will need to export these maps. In order to do this, we wrap the code to create a map with png('rplot.png') and then dev.off(). This creates an image called rplot.png in your working directory. This code tells R is to write the results to a png rather than the plot window - thus, you won't see the output until you view the png. Another thing to remember is that the positioning of the items in the png may be a little different to the plot window - so you may have to adjust these and re-plot the image. For example…

# Write out to a png...
png("rplot.png", width = 800, height = 800, units = "px")

# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)], 
    axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.8)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "), 
    fill = my_colours, bty = "n", cex = 0.9)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000, 
    plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000, 
    fill = c("white", "black"), plot.grid = F)

# Add text to scale bar
text(335379, 381606, "0km", cex = 0.7)
text(335379 + 1500, 381606, "1.5km", cex = 0.7)
text(335379 + 3000, 381606, "3km", cex = 0.7)
# Add a title
title("Total Population Change (%)")

# Change output back to the plot window
dev.off()

When running code that you are likley to need to tweak and run again, it is a good idea to copy it into Notepad, edit it, and then copy and paste it into R.

8 Your Assignment

Introduction

The submitted work for this practical is a short report alongside a series of maps that you will generate in R. The total word count for the report is 500 words, and it is expected that you will include at least 5 maps that you have generated in R. You should make reference to the wider literature in your report and cite appropriately - including a formatted reference list (the list is not included in the word count). If you include data tables in your report, these also do not contribute to the total word count. Please do not include any appendix.

You are free to layout your report as you wish, however all maps or other graphics must be appropriately referenced in the text – e.g. see Figure 1 etc.

Data

If you have followed the tutorials in this practical you will have now created a SpatialPolygonsDataFrame for the Liverpool local authority that contains the following variables:

Estimated percentage and total population change between 2001 and 2011:

Total Population
0-14 Years Old
15-29 Years Old
30-44 Years Old
45-64 Years Old
65+ Years Old

Report Details and Structure

With reference to the evidence you create from the above data, further data sources that you integrate independently into R, and wider reading, write a report that will address the following questions:

With reference to your population maps, identify the area of Liverpool with the highest population increase; discussing those potential processes that may have influenced these changes.
With reference to your population maps, identify the area of Liverpool with the highest population decrease; discussing those potential processes that may have influenced these changes.

Through this report you should build a narrative using maps (or any other data visualization you deem appropriate) to hypothesis about the possible causes and impacts of the change that you observe to have occurred in your study area since the 2001 Census. Your marks will be linked to both the quality and clarity of your answers to the above questions and the visualizations you create.

Tips…(how to get a better mark!)

Look at the GIS mark scheme to get the basics right – e.g. north arrow, legend, scale bar etc
Do not print out the R code in your report
Ensure that your written text is clear and that the English is grammatically correct
Include other visualization - R is very flexible and the web is full of reproducible examples

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en. Based upon Detecting Neighbourhood Change by Alex Singleton.