In this project you will compile a number of measures that can be used to assess the extent that small areas have changed in their composition over a 10 year period. You will explore data about changes in both built structures and demographics, and through the representations you create, formulate plausible hypotheses about why the patterns that you uncover may have occurred.
Through this work you will also learn a series of fundamental skills for the input, manipulation, analysis and display of spatial data using the statistical programming language R.
At this stage it might be worth looking at the end of this practical for the assignment questions as you can keep these in mind as you run through the practical.
Before we begin the practical, there are a couple of things to learn…
Firstly, what is a CSV file, and how can we get them into R? These files will usually have the file extension “.csv” and are typically opened in a spreadsheet software package.
Supposing we want to create a data set that contains a list of people who have come along to an R training workshop. This might look something like this:
| Name | Age | Place |
|---|---|---|
| John | 20 | Liverpool |
| Helen | 10 | London |
| Mia | 16 | Liverpool |
| Carl | 21 | Manchester |
Create yourself a project directory in your storage area, and then Open R. We need to tell R that this is where you will be storing all your files by running the setwd() function.
You need to paste the location of your working directory into the setwd() function between the “”. Please note, that on Windows you need to make sure the slashes between the directories are formatted as either / or .
# For example...
setwd("M:/R work/Neighbourhood_Change/")
Open Notepad and create a new text file, entering the following details as separate lines, with each column in the table separated by a comma, and each row on a new line (please note that the first row contains our desired headings):
Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester
Remember to add an extra blank line at the end of the file. Save the text file as “test.csv” in the folder you created earlier in your storage area.
Next we will read in the CSV you created above.
# First create a new variable with the location of your csv file - you only
# need the file name as by default R looks in the working directory
file_location <- "test.csv"
# Next we will read this into a new data frame called test.
test <- read.csv(file_location, header = TRUE)
# Now print test
test
Name Age Place
1 John 20 Liverpool
2 Helen 10 London
3 Mia 16 Liverpool
4 Carl 21 Manchester
Next, duplicate test.csv in your working directory, renaming this to test2.csv and opening it in Notepad. Add the following to the top of the file: “Here is some example text which we will use to demonstrate how to skip lines” and on another line: “Here is another line of text” so it looks as follows:
Here is some example text which we will use to demonstrate how to skip lines
Here is another line of text
Name,Age,Place
John,20,Liverpool
Helen,10,London
Mia,16,Liverpool
Carl,21,Manchester
Save the file and close it.
Next we will import this new .csv into R
# Update the file_location object with the location of the new file
file_location <- "test2.csv"
We have to add an extra parameter to the read.csv function which specifies that the data we are interested in are located after line 2.
test2 <- read.csv(file_location, header = TRUE, skip = 2)
If you print the results of the object test2 (i.e. type test2 in to the terminal!) you will see this is now the same as test, because you skipped the two junk lines.
Next we will illustrate how we can create a subset of a data frame, add a column and calculate a new value. To create a subset of the test2 data frame, we can use the subset function - this can select certain rows, columns or a combination of the two.
For example…
# Select the name and age columns
subset(test, select = c("Name", "Age"))
Name Age
1 John 20
2 Helen 10
3 Mia 16
4 Carl 21
# Select rows where the place is Liverpool
subset(test, Place == "Liverpool")
Name Age Place
1 John 20 Liverpool
3 Mia 16 Liverpool
# Select the name and age columns, and where Place is Liverpool
subset(test, Place == "Liverpool", select = c("Name", "Age"))
Name Age
1 John 20
3 Mia 16
We will now repeat the last step, however store the results as a new object. You will see that nothing prints to the terminal.
# Select the name and age columns, however, only where Place is equal to
# Liverpool (in R we need to use a double equals sign)
test3 <- subset(test, Place == "Liverpool", select = c("Name", "Age"))
We can add a new column called “diff100” to the data frame object test3. We do this using the $ symbol. At first, we will make the values of the column NA (i.e. no value)
test3$diff100 <- NA
If you print test3, you will now see the new column which will contain the values NA. However, what we actually want to do is find out how different each of the ages are to 100 (i.e. 100 - age). In order to create this calculation we need to refer to the age column in the data frame. We can use the $ again to achieve this. For example…
test3$diff100 <- 100 - test3$Age
# print test3
test3
Name Age diff100
1 John 20 80
3 Mia 16 84
When printing variables, you can use square brackets to specify either rows, columns or both. For example:
test3[1, 1]
[1] John
Levels: Carl Helen John Mia
test3[, 1]
[1] John Mia
Levels: Carl Helen John Mia
Additionally, you can use colnames to give you a list of columns within a data frame:
colnames(test3)
[1] "Name" "Age" "diff100"
and you can use:
test3$diff100 <- NULL
to delete specific columns.
OK, using what you have learnt in this past section, you are now ready to play with some real data!
A couple of other important things! If you want to see the objects that you have created in the R workspace, you can use the ls() function
# List objects in the workspace
ls()
[1] "file_location" "test" "test2" "test3"
If you want to save your workspace - i.e. so you can open it up later on (like saving your work!); you can use the save.image function. This performs the same as clicking on File > Save Workspace.
# This will save all objects currently loaded in the workspace
save.image(file = "myfile.RData")
After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.
# Remember that you will need to set your working directory before trying to
# load - otherwise R will not know where to find you RData file
load("myfile.RData")
You can also load and save the workspace using the File menu at the top of the window.
One other command that may be useful to know is how to access the help files. If you want help on a specific function, type ?, then the function name:
?load
This will provide more information on the function. If you don't know what a function is called, type ??, then whatever it is that the function does:
??open
This will search the help files for the term you specified (open in this case).
In this section we will examine the use of census data to generate statistics that enable you to detect and hypothesise about potential changes in neighbourhood structure occurring at small geographic scales. We will examine indicators of social change by comparing the 2001 Census data to population estimates derived by the Office for National Statistics.
The first stage is to download 2001 Census data. Although there are other sources of these data, in this example we will use the http://www.neighbourhood.statistics.gov.uk website.
Navigate to the site and select the “Topics” link on the bottom left hand side of the page. On the next screen, click on the “Census” link, and then click “2001 Census: Census Area Statistics”. Then click the radio button that corresponds to “Age, 2001 (UV04)” and press the “Next” button at the bottom right hand side of the page.
Then click the related radio button for “Download” and press the “Next” button on the bottom right hand side of the page. Select the radio button for the 2001 Statistical Geography Hierarchy (North West). Press the “Next” button on the bottom right hand side of the page, and on the following page, download the CSV file by clicking on the “Comma separated values [*.csv]”. This will be zipped, and once you have extracted the directory you will find it contains CSV files for a variety of different geographies. You are interested in the Lower Super Output Area level file which will be named “UV040301_92_GeoPolicy_NW_LSOA.CSV”. Copy this to the working directory you defined earlier.
If you open this file up in Excel you will again see that there are a series of lines at the top which do not contain the data or headings for the columns. As such, you need to:
pop_2001.This should then look like (note, only the first 6 rows and 10 columns are shown here):
# Using the square brackets we can view specific parts of a data frame-
# data[rows,columns]
pop_2001[1:6, 1:10]
LSOA_CODE DATA_VALUE DATA_VALUE.1 DATA_VALUE.2 DATA_VALUE.3
1746 E01006511 1026 7 10 8
1747 E01006512 1454 8 11 10
1748 E01006513 2077 4 5 6
1749 E01006514 1432 8 7 10
1750 E01006515 1156 6 15 12
1751 E01006516 1412 12 10 12
DATA_VALUE.4 DATA_VALUE.5 DATA_VALUE.6 DATA_VALUE.7 DATA_VALUE.8
1746 4 9 5 4 7
1747 9 10 8 13 7
1748 3 3 6 0 5
1749 7 8 6 6 5
1750 12 11 14 10 8
1751 8 7 7 4 4
You now need to rename the column headings of the data frame into something which are easier to interpret. There are obviously a lot of columns, and it would take a very long time to type out a list of vectors to use as the new names… However, there are some shortcuts. If you look at the source CSV file, you will see that the data columns represent: the total population, those under 1, then a series of columns for those aged 1 through to 74; and finally, those aged 75-79, 80-84, 85-89, 90-94, 95-99 and 100 or greater.
The main block of columns are the individual age bands; which we can create using the paste() function - this concatenates (or joins together) vectors (separated by a comma) together into characters. We also add sep="" to indicate that we do not want any spaces putting between the new concatenated elements of the list.
years <- paste("YR_2001_", 1:74, sep = "")
We now need to add the remaining elements to the list which we will do in two stages. One will be added before the elements of the “years” list (pre), and the other afterwards (post)
pre <- c("LSOA", "YR_2001_TotPop", "YR_2001_Less1")
post <- c("YR_2001_75_79", "YR_2001_80_84", "YR_2001_85_89", "YR_2001_90_94",
"YR_2001_95_99", "YR_2001_100plus")
# Print the lists to see what they look like
pre
[1] "LSOA" "YR_2001_TotPop" "YR_2001_Less1"
years
[1] "YR_2001_1" "YR_2001_2" "YR_2001_3" "YR_2001_4" "YR_2001_5"
[6] "YR_2001_6" "YR_2001_7" "YR_2001_8" "YR_2001_9" "YR_2001_10"
[11] "YR_2001_11" "YR_2001_12" "YR_2001_13" "YR_2001_14" "YR_2001_15"
[16] "YR_2001_16" "YR_2001_17" "YR_2001_18" "YR_2001_19" "YR_2001_20"
[21] "YR_2001_21" "YR_2001_22" "YR_2001_23" "YR_2001_24" "YR_2001_25"
[26] "YR_2001_26" "YR_2001_27" "YR_2001_28" "YR_2001_29" "YR_2001_30"
[31] "YR_2001_31" "YR_2001_32" "YR_2001_33" "YR_2001_34" "YR_2001_35"
[36] "YR_2001_36" "YR_2001_37" "YR_2001_38" "YR_2001_39" "YR_2001_40"
[41] "YR_2001_41" "YR_2001_42" "YR_2001_43" "YR_2001_44" "YR_2001_45"
[46] "YR_2001_46" "YR_2001_47" "YR_2001_48" "YR_2001_49" "YR_2001_50"
[51] "YR_2001_51" "YR_2001_52" "YR_2001_53" "YR_2001_54" "YR_2001_55"
[56] "YR_2001_56" "YR_2001_57" "YR_2001_58" "YR_2001_59" "YR_2001_60"
[61] "YR_2001_61" "YR_2001_62" "YR_2001_63" "YR_2001_64" "YR_2001_65"
[66] "YR_2001_66" "YR_2001_67" "YR_2001_68" "YR_2001_69" "YR_2001_70"
[71] "YR_2001_71" "YR_2001_72" "YR_2001_73" "YR_2001_74"
post
[1] "YR_2001_75_79" "YR_2001_80_84" "YR_2001_85_89" "YR_2001_90_94"
[5] "YR_2001_95_99" "YR_2001_100plus"
The list objects that you have just created now need to be concatenated which can be achieved using the c() function.
# Concatenate the new column names
new_column_names <- c(pre, years, post)
# print the names
new_column_names
[1] "LSOA" "YR_2001_TotPop" "YR_2001_Less1"
[4] "YR_2001_1" "YR_2001_2" "YR_2001_3"
[7] "YR_2001_4" "YR_2001_5" "YR_2001_6"
[10] "YR_2001_7" "YR_2001_8" "YR_2001_9"
[13] "YR_2001_10" "YR_2001_11" "YR_2001_12"
[16] "YR_2001_13" "YR_2001_14" "YR_2001_15"
[19] "YR_2001_16" "YR_2001_17" "YR_2001_18"
[22] "YR_2001_19" "YR_2001_20" "YR_2001_21"
[25] "YR_2001_22" "YR_2001_23" "YR_2001_24"
[28] "YR_2001_25" "YR_2001_26" "YR_2001_27"
[31] "YR_2001_28" "YR_2001_29" "YR_2001_30"
[34] "YR_2001_31" "YR_2001_32" "YR_2001_33"
[37] "YR_2001_34" "YR_2001_35" "YR_2001_36"
[40] "YR_2001_37" "YR_2001_38" "YR_2001_39"
[43] "YR_2001_40" "YR_2001_41" "YR_2001_42"
[46] "YR_2001_43" "YR_2001_44" "YR_2001_45"
[49] "YR_2001_46" "YR_2001_47" "YR_2001_48"
[52] "YR_2001_49" "YR_2001_50" "YR_2001_51"
[55] "YR_2001_52" "YR_2001_53" "YR_2001_54"
[58] "YR_2001_55" "YR_2001_56" "YR_2001_57"
[61] "YR_2001_58" "YR_2001_59" "YR_2001_60"
[64] "YR_2001_61" "YR_2001_62" "YR_2001_63"
[67] "YR_2001_64" "YR_2001_65" "YR_2001_66"
[70] "YR_2001_67" "YR_2001_68" "YR_2001_69"
[73] "YR_2001_70" "YR_2001_71" "YR_2001_72"
[76] "YR_2001_73" "YR_2001_74" "YR_2001_75_79"
[79] "YR_2001_80_84" "YR_2001_85_89" "YR_2001_90_94"
[82] "YR_2001_95_99" "YR_2001_100plus"
You can now apply these to the columns of pop_2001
# Add the new column names
colnames(pop_2001) <- new_column_names
This will have created a data frame containing the 2001 population data that will look like…
# Show first 6 rows and 10 columns
pop_2001[1:6, 1:10]
LSOA YR_2001_TotPop YR_2001_Less1 YR_2001_1 YR_2001_2 YR_2001_3
1746 E01006511 1026 7 10 8 4
1747 E01006512 1454 8 11 10 9
1748 E01006513 2077 4 5 6 3
1749 E01006514 1432 8 7 10 7
1750 E01006515 1156 6 15 12 12
1751 E01006516 1412 12 10 12 8
YR_2001_4 YR_2001_5 YR_2001_6 YR_2001_7
1746 9 5 4 7
1747 10 8 13 7
1748 3 6 0 5
1749 8 6 6 5
1750 11 14 10 8
1751 7 7 4 4
Later on we are going to merge the 2001 and 2011 data together into a new data frame. However, before doing this, we first need to create a new version of the 2001 data with the same age bands as the 2011 data. You need to create new columns in the pop_2001 data for:
The numbers refer to the years. There are a number of ways to achieve this.
# Add manually
pop_2001$YR_2001_0_14 <- pop_2001$YR_2001_Less1 + pop_2001$YR_2001_1 + pop_2001$YR_2001_2 +
pop_2001$YR_2001_3 + pop_2001$YR_2001_4 + pop_2001$YR_2001_5 + pop_2001$YR_2001_6 +
pop_2001$YR_2001_7 + pop_2001$YR_2001_8 + pop_2001$YR_2001_9 + pop_2001$YR_2001_10 +
pop_2001$YR_2001_11 + pop_2001$YR_2001_12 + pop_2001$YR_2001_13 + pop_2001$YR_2001_14
# Add as a range
pop_2001$YR_2001_15_29 <- rowSums(pop_2001[, 18:32])
The above code has created the aggregations for YR_2001_0_14 and YR_2001_15_29. You now need to add the following new columns to pop_2001 data frame and with the appropriate age group aggregations:
Hint: The easiest way to do it is to use the 'Add as a range' option from above, and to alter the field names and column numbers as required.
Next, create a subset of the 2001 data called pop_2001_V2 that contains the LSOA code, the total population and the newly created variables (YR_2001_0_14 YR_2001_15_29 YR_2001_30_44 YR_2001_45_64 and YR_2001_65plus)
The top 6 rows of your new data frame should look like:
head(pop_2001_V2)
LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1746 E01006511 1026 115 274 245
1747 E01006512 1454 119 935 244
1748 E01006513 2077 81 1691 149
1749 E01006514 1432 102 681 358
1750 E01006515 1156 187 278 228
1751 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus
1746 246 146
1747 116 40
1748 95 61
1749 188 103
1750 252 211
1751 216 110
We are going to compare the population from the 2001 census with population data from the 2011 census. Unfortunately, 21 of the 291 LSOAs in Liverpool changed between the 2001 and 2011 census. Fortunately, ONS provide a set of estimated population data for 2011 split into the 2001 LSOAs. It won't be as accurate as the 2011 census, but the geographies will match so we can see how the population has changed between 2001 and 2011.
Download the “Supporting Information for Lower Super Output Area Population Estimates - Mid-2011 (Rolled Forward Estimates)” entry from http://www.ons.gov.uk/ons/datasets-and-tables/index.html?pageSize=25&sortBy=none&sortDirection=none&newquery=Supporting+Information+for+Lower+Super+Output+Area+Population+Estimates+-+Mid-2011+%28Rolled+Forward+Estimates%29 - it is the second in the list.
As before this is a zipped file. Follow the same procedure as before to extract it. Unfortunately, this isn't a csv file, it is stored in Microsoft Excel. You will need to use Excel to extract the data in CSV format before you read it into R. So, open it in Excel and go to the tab marked mid11_rolled_forward. Save this file under the new name pop2011.csv - select csv as the format to save the file in your working directory. You can now read this into R.
If you do head(pop_2011) you can see that there are many different age catagories, in addition to the LSOA Code and the MSOA Code fields.
head(pop_2011)
## LSOA.Code MSOA.Code M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14
## 1 E01000001 E02000001 7 4 4 4 3 3 1 1 6 6 4 2 2 4 5
## 2 E01000002 E02000001 4 8 2 5 5 4 11 3 2 2 3 3 10 8 2
## 3 E01000003 E02000001 9 6 3 6 12 1 2 5 4 6 2 11 5 5 4
## 4 E01000004 E02000001 10 7 5 6 5 10 4 6 16 27 7 13 14 9 3
## 5 E01000005 E02000001 9 6 3 7 2 9 8 11 11 7 7 12 9 9 5
## 6 E01000027 E02000002 12 16 13 8 11 9 13 16 12 14 9 8 12 11 17
## M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32
## 1 4 4 4 2 2 6 2 12 11 12 18 22 35 30 29 23 19 40
## 2 2 5 3 6 1 7 4 14 6 2 11 4 20 18 11 18 26 19
## 3 4 3 21 26 24 18 25 25 41 28 29 17 32 34 34 30 43 27
## 4 0 0 0 0 0 7 15 32 47 86 99 87 106 102 105 93 81 87
## 5 13 11 8 7 17 11 11 10 12 24 23 40 4 38 41 46 16 27
## 6 15 11 7 14 9 17 9 11 12 7 12 7 12 6 12 9 9 7
## M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50
## 1 34 38 50 40 34 24 17 19 14 23 13 17 17 19 18 17 19 22
## 2 33 17 28 22 21 20 30 17 14 19 36 17 24 11 15 10 17 23
## 3 38 53 40 48 39 25 34 28 16 21 20 18 21 19 11 11 16 10
## 4 66 65 77 58 41 43 33 43 35 24 37 26 40 44 36 25 20 15
## 5 36 20 25 20 20 8 12 14 11 15 20 23 12 6 11 9 17 3
## 6 10 11 5 15 5 15 9 6 8 6 11 8 9 8 8 6 6 9
## M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 M67 M68
## 1 19 19 24 16 14 14 10 19 18 42 19 12 32 31 15 18 14 7
## 2 23 21 34 29 19 20 19 14 18 18 13 12 23 16 18 22 26 19
## 3 4 12 17 9 11 16 5 11 8 21 7 13 8 14 14 4 8 12
## 4 18 16 27 23 22 12 15 18 14 28 6 8 5 5 8 12 4 6
## 5 4 2 5 3 8 6 8 10 5 8 6 10 7 14 3 6 4 4
## 6 7 9 9 6 5 8 4 6 8 3 3 2 2 4 6 6 8 1
## M69 M70 M71 M72 M73 M74 M75 M76 M77 M78 M79 M80 M81 M82 M83 M84 M85 M86
## 1 4 8 6 7 7 2 9 10 6 3 11 12 5 4 10 7 3 2
## 2 13 9 10 7 16 5 7 6 4 1 3 5 3 1 4 2 1 7
## 3 13 1 4 7 6 1 4 2 5 1 8 3 1 1 7 1 5 1
## 4 8 3 5 1 5 3 1 1 1 2 3 1 0 1 1 1 2 1
## 5 4 7 3 11 1 1 6 1 4 3 2 1 0 0 0 0 0 0
## 6 4 3 5 4 6 2 5 1 4 3 7 3 5 4 3 2 8 4
## M87 M88 M89 M90plus F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
## 1 1 0 1 8 5 5 2 4 5 4 8 4 5 2 4 9 3 2 3
## 2 1 0 1 4 10 15 7 4 7 4 9 16 6 8 6 5 2 4 4
## 3 1 0 1 9 6 5 9 8 6 3 3 3 3 4 6 4 2 3 1
## 4 1 0 1 6 5 4 3 0 0 3 2 2 6 3 3 1 5 1 1
## 5 0 0 0 4 6 6 2 5 6 1 4 6 2 6 6 10 7 6 13
## 6 1 1 2 2 9 11 13 16 12 18 14 11 11 7 11 14 22 19 9
## F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32
## 1 7 7 7 7 11 4 5 22 11 32 24 41 39 40 39 28 32 32
## 2 5 3 2 3 4 4 6 5 14 9 14 28 26 19 17 22 21 13
## 3 1 7 16 7 27 35 30 35 30 37 25 28 33 35 37 54 20 43
## 4 1 2 2 4 7 4 4 10 39 29 61 53 57 54 65 63 60 66
## 5 6 6 5 9 16 8 19 12 38 34 33 26 30 16 42 36 20 19
## 6 14 10 11 13 12 6 12 8 8 9 14 21 14 10 9 11 8 11
## F33 F34 F35 F36 F37 F38 F39 F40 F41 F42 F43 F44 F45 F46 F47 F48 F49 F50
## 1 19 37 21 18 23 23 16 33 22 29 12 13 17 19 19 10 15 15
## 2 13 22 32 30 21 26 40 22 23 31 24 24 17 20 16 13 9 13
## 3 31 35 20 19 16 17 19 14 9 18 15 18 6 14 13 12 10 15
## 4 39 34 22 17 19 18 28 16 19 17 12 15 17 14 14 8 7 3
## 5 12 21 10 7 5 12 6 16 20 19 13 8 7 8 11 7 9 10
## 6 12 13 12 8 12 9 16 15 9 12 12 14 19 18 9 9 20 9
## F51 F52 F53 F54 F55 F56 F57 F58 F59 F60 F61 F62 F63 F64 F65 F66 F67 F68
## 1 13 9 17 12 22 13 11 12 18 12 18 10 15 11 19 13 15 19
## 2 7 5 18 21 14 12 14 9 13 10 15 16 12 12 9 18 11 12
## 3 6 9 14 5 13 14 9 6 10 5 16 10 9 6 8 5 4 5
## 4 7 12 13 7 6 10 7 9 7 6 13 16 12 10 5 5 4 2
## 5 7 5 12 4 12 7 8 5 4 7 3 6 16 4 0 7 2 1
## 6 13 15 6 10 3 6 6 3 11 4 7 3 4 12 4 8 4 3
## F69 F70 F71 F72 F73 F74 F75 F76 F77 F78 F79 F80 F81 F82 F83 F84 F85 F86
## 1 9 10 4 8 5 5 16 8 11 15 12 7 4 7 9 2 0 1
## 2 8 9 12 8 8 7 7 6 6 1 7 3 4 5 1 1 1 3
## 3 5 9 9 6 3 2 7 7 7 8 6 5 10 8 8 6 8 11
## 4 4 6 4 2 6 1 1 5 1 2 1 0 0 0 0 0 1 2
## 5 2 4 5 3 1 1 5 3 0 2 1 0 3 0 1 1 2 1
## 6 4 3 3 3 8 9 4 5 7 7 6 12 8 7 6 7 1 7
## F87 F88 F89 F90plus
## 1 0 0 0 5
## 2 6 1 2 8
## 3 4 4 1 23
## 4 1 1 1 4
## 5 1 0 1 5
## 6 5 3 1 8
The MSOA Code field is for a different type of geography, which we don't need so we can remove it:
pop_2011$MSOA.Code <- NULL
We need to merge the data into the five age categories we want, as well as creating a total category, like we did with the 2011 data. However this time, the data are split up into male and female, so the combinations are a little more complex. To get you started:
# Add as a range
pop_2011$YR_2011_TotPop <- rowSums(pop_2011[, 2:183])
pop_2011$YR_2011_0_14 <- rowSums(pop_2011[, c(2:16, 93:107)])
pop_2011$YR_2011_15_29 <- rowSums(pop_2011[, c(17:31, 108:122)])
Note the slightly different format of the column numbers - we need to use c(17:31,108:122) because we are specifying multiple ranges - i.e. columns 17 to 31 (for males) and 108 to 122 (for females) in this case. If you need to check which column numbers you need, you can use the colnames function to work out which columns you need:
colnames(pop_2011)
## [1] "LSOA.Code" "M0" "M1" "M2"
## [5] "M3" "M4" "M5" "M6"
## [9] "M7" "M8" "M9" "M10"
## [13] "M11" "M12" "M13" "M14"
## [17] "M15" "M16" "M17" "M18"
## [21] "M19" "M20" "M21" "M22"
## [25] "M23" "M24" "M25" "M26"
## [29] "M27" "M28" "M29" "M30"
## [33] "M31" "M32" "M33" "M34"
## [37] "M35" "M36" "M37" "M38"
## [41] "M39" "M40" "M41" "M42"
## [45] "M43" "M44" "M45" "M46"
## [49] "M47" "M48" "M49" "M50"
## [53] "M51" "M52" "M53" "M54"
## [57] "M55" "M56" "M57" "M58"
## [61] "M59" "M60" "M61" "M62"
## [65] "M63" "M64" "M65" "M66"
## [69] "M67" "M68" "M69" "M70"
## [73] "M71" "M72" "M73" "M74"
## [77] "M75" "M76" "M77" "M78"
## [81] "M79" "M80" "M81" "M82"
## [85] "M83" "M84" "M85" "M86"
## [89] "M87" "M88" "M89" "M90plus"
## [93] "F0" "F1" "F2" "F3"
## [97] "F4" "F5" "F6" "F7"
## [101] "F8" "F9" "F10" "F11"
## [105] "F12" "F13" "F14" "F15"
## [109] "F16" "F17" "F18" "F19"
## [113] "F20" "F21" "F22" "F23"
## [117] "F24" "F25" "F26" "F27"
## [121] "F28" "F29" "F30" "F31"
## [125] "F32" "F33" "F34" "F35"
## [129] "F36" "F37" "F38" "F39"
## [133] "F40" "F41" "F42" "F43"
## [137] "F44" "F45" "F46" "F47"
## [141] "F48" "F49" "F50" "F51"
## [145] "F52" "F53" "F54" "F55"
## [149] "F56" "F57" "F58" "F59"
## [153] "F60" "F61" "F62" "F63"
## [157] "F64" "F65" "F66" "F67"
## [161] "F68" "F69" "F70" "F71"
## [165] "F72" "F73" "F74" "F75"
## [169] "F76" "F77" "F78" "F79"
## [173] "F80" "F81" "F82" "F83"
## [177] "F84" "F85" "F86" "F87"
## [181] "F88" "F89" "F90plus" "YR_2011_TotPop"
## [185] "YR_2011_0_14" "YR_2011_15_29"
Therefore you can see above, that column 32 ("M30") contains the number of males ages 30 (4 along from the row starting with [29]).
The above code has created the aggregations for YR_2011_0_14 and YR_2011_15_29. You now need to add the following new columns to pop_2011 data frame and with the appropriate age group aggregations:
The first row of your data should look like this:
pop_2011[1, ]
## LSOA.Code M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17
## 1 E01000001 7 4 4 4 3 3 1 1 6 6 4 2 2 4 5 4 4 4
## M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35
## 1 2 2 6 2 12 11 12 18 22 35 30 29 23 19 40 34 38 50
## M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53
## 1 40 34 24 17 19 14 23 13 17 17 19 18 17 19 22 19 19 24
## M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 M67 M68 M69 M70 M71
## 1 16 14 14 10 19 18 42 19 12 32 31 15 18 14 7 4 8 6
## M72 M73 M74 M75 M76 M77 M78 M79 M80 M81 M82 M83 M84 M85 M86 M87 M88 M89
## 1 7 7 2 9 10 6 3 11 12 5 4 10 7 3 2 1 0 1
## M90plus F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17
## 1 8 5 5 2 4 5 4 8 4 5 2 4 9 3 2 3 7 7 7
## F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32 F33 F34 F35
## 1 7 11 4 5 22 11 32 24 41 39 40 39 28 32 32 19 37 21
## F36 F37 F38 F39 F40 F41 F42 F43 F44 F45 F46 F47 F48 F49 F50 F51 F52 F53
## 1 18 23 23 16 33 22 29 12 13 17 19 19 10 15 15 13 9 17
## F54 F55 F56 F57 F58 F59 F60 F61 F62 F63 F64 F65 F66 F67 F68 F69 F70 F71
## 1 12 22 13 11 12 18 12 18 10 15 11 19 13 15 19 9 10 4
## F72 F73 F74 F75 F76 F77 F78 F79 F80 F81 F82 F83 F84 F85 F86 F87 F88 F89
## 1 8 5 5 16 8 11 15 12 7 4 7 9 2 0 1 0 0 0
## F90plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29 YR_2011_30_44
## 1 5 2446 121 489 763
## YR_2011_45_64 YR_2011_65plus
## 1 689 384
We also need to rename the LSOA column to correctly reflect its contents:
# Rename column 'LSOA.Code' to 'LSOA'
names(pop_2011)[names(pop_2011) == "LSOA.Code"] <- "LSOA"
As before, create a subset of the 2011 data called pop_2011_V2 that contains the LSOA code, the total population and the newly created variables (YR_2011_0_14 YR_2011_15_29 YR_2011_30_44 YR_2011_45_64 and YR_2011_65plus)
The head of your pop_2011_V2 data frame should look like this:
head(pop_2011_V2)
## LSOA YR_2011_TotPop YR_2011_0_14 YR_2011_15_29 YR_2011_30_44
## 1 E01000001 2446 121 489 763
## 2 E01000002 2157 179 273 701
## 3 E01000003 2464 147 744 828
## 4 E01000004 3245 181 1078 1254
## 5 E01000005 1731 201 570 537
## 6 E01000027 1570 378 332 308
## YR_2011_45_64 YR_2011_65plus
## 1 689 384
## 2 645 359
## 3 446 299
## 4 595 137
## 5 306 117
## 6 309 243
You will now have two population data frames; the first for 2001 which contains all LSOA in Liverpool (291 rows), and a second which contains all LSOA in England (34,378 rows). You might want to check if your data frame objects have the same number of rows at this stage.
nrow(pop_2001_V2)
[1] 291
nrow(pop_2011_V2)
[1] 34378
We now need to create a new data frame that combines both the 2001 and 2011 data. One way of achieving this is to use the merge() function. However before we do this, we will first explore exactly what the merge function does.
We will use the data frame you created right at the start of the tutorial. If you have forgotten what this looked like, you can always print the object in the terminal.
test
Name Age Place
1 John 20 Liverpool
2 Helen 10 London
3 Mia 16 Liverpool
4 Carl 21 Manchester
You now need to create another data frame which we will use as an example. You could create another CSV file and import this; however, we will illustrate another way of achieving this through the amalgamation of a series of vector lists.
# Create a person vector
Person <- c("Paul", "Mike", "John", "Helen", "Mia", "Leo", "Carl")
# Create a favourite colours vector
Colour <- c("blue", "green", "red", "yellow", "purple", "orange", "black")
# We can now join these two vectors into a new data frame of favourite
# colours
fav_colour <- data.frame(Person, Colour)
# View the fav_colour
fav_colour
Person Colour
1 Paul blue
2 Mike green
3 John red
4 Helen yellow
5 Mia purple
6 Leo orange
7 Carl black
We now have two tables, the first listing a series of people who attended a current R training workshop, and the second containing these current people, and those additional people who have been to previous workshops; along with their favourite colours!
What we want to do is append the favourite colours onto our list of current course attendees. We will refer to the two data frames as x and y. The x data frame is test; and the y is fav_colour. In x, the column containing the list of people is called “Name”, and in y, it is called “Person”. The parameters of the merge function first accept the two table names, and then the lookup columns as by.x or by.y. You should also include all.x=TRUE as a final parameter. This tells the function to keep all the records in x, but only those in y which match. As an experiment you can change this to all.y=TRUE - what you will find is that all the y records are now in the data frame, however, because there were no matches in x, there are NA values present.
We will now create a new data frame called People_And_Colours and print this to the terminal.
People_And_Colours <- merge(test, fav_colour, by.x = "Name", by.y = "Person",
all.x = TRUE)
People_And_Colours
Name Age Place Colour
1 Carl 21 Manchester black
2 Helen 10 London yellow
3 John 20 Liverpool red
4 Mia 16 Liverpool purple
If the by columns were the same in both x and y, we can specify this simply with by="column name" rather than by.x and by.y; and finally, a critical issue when making any join is assuring that the “by” columns are in the same format.
The pop_2001_V2 data frame should now be joined to pop_2011_V2. You will need to specify a merge that:
pop_2001_2011pop_2001_V2 to pop_2011_V2LSOA from both pop_2001_V2 and pop_2011_V2pop_2001 data frame (i.e. the ones for Liverpool, all.x=TRUE)If you have done the join correctly, your new data frame should have 291 rows and the top six rows should look like…
# Number of rows
nrow(pop_2001_2011)
[1] 291
# Show top 6 records
head(pop_2001_2011)
LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1 E01006511 1026 115 274 245
2 E01006512 1454 119 935 244
3 E01006513 2077 81 1691 149
4 E01006514 1432 102 681 358
5 E01006515 1156 187 278 228
6 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29
1 246 146 2697 198 1256
2 116 40 2463 180 1713
3 95 61 2891 82 2445
4 188 103 1275 105 489
5 252 211 1066 133 331
6 216 110 1400 81 603
YR_2011_30_44 YR_2011_45_64 YR_2011_65plus
1 713 369 161
2 329 200 41
3 190 100 74
4 275 286 120
5 229 223 150
6 332 283 101
You will now calculate both absolute and percentage change statistics between 2001 and 2011 for the different age bands. The following example illustrates this for the total population, however, you will need to complete this for each age band. Create new variables in the format shown:
# Total Population (TC - Total Change ; PC - Percent Change)
pop_2001_2011$TotPop_TC <- pop_2001_2011$YR_2011_TotPop - pop_2001_2011$YR_2001_TotPop
pop_2001_2011$TotPop_PC <- (pop_2001_2011$YR_2011_TotPop - pop_2001_2011$YR_2001_TotPop)/pop_2001_2011$YR_2001_TotPop *
100
# 0-14 Years (TC - Total Change ; PC - Percent Change)
pop_2001_2011$YR_0_14_TC <- pop_2001_2011$YR_2011_0_14 - pop_2001_2011$YR_2001_0_14
pop_2001_2011$YR_0_14_PC <- (pop_2001_2011$YR_2011_0_14 - pop_2001_2011$YR_2001_0_14)/pop_2001_2011$YR_2001_0_14 *
100
You now need to complete this for the other age bands, leaving you with a final data frame that looks like:
head(pop_2001_2011)
LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1 E01006511 1026 115 274 245
2 E01006512 1454 119 935 244
3 E01006513 2077 81 1691 149
4 E01006514 1432 102 681 358
5 E01006515 1156 187 278 228
6 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29
1 246 146 2697 198 1256
2 116 40 2463 180 1713
3 95 61 2891 82 2445
4 188 103 1275 105 489
5 252 211 1066 133 331
6 216 110 1400 81 603
YR_2011_30_44 YR_2011_45_64 YR_2011_65plus TotPop_TC TotPop_PC
1 713 369 161 1671 162.8655
2 329 200 41 1009 69.3948
3 190 100 74 814 39.1911
4 275 286 120 -157 -10.9637
5 229 223 150 -90 -7.7855
6 332 283 101 -12 -0.8499
YR_0_14_TC YR_0_14_PC YR_15_29_TC YR_15_29_PC YR_30_44_TC YR_30_44_PC
1 83 72.174 982 358.394 468 191.0204
2 61 51.261 778 83.209 85 34.8361
3 1 1.235 754 44.589 41 27.5168
4 3 2.941 -192 -28.194 -83 -23.1844
5 -54 -28.877 53 19.065 1 0.4386
6 -22 -21.359 6 1.005 -54 -13.9896
YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1 123 50.000 15 10.274
2 84 72.414 1 2.500
3 5 5.263 13 21.311
4 98 52.128 17 16.505
5 -29 -11.508 -61 -28.910
6 67 31.019 -9 -8.182
The pop_2001_2010 data frame will now be visualized on a choropleth map so you can explore areas of change. In order to do this, you will first need to read some polygon files into R.
When we talk about Shapefiles, we are actually referring to a number of files - each of these files has a different purpose, e.g. holding the geometry, storing the data or specifying the projection.
If you can't download the data from Edina, it is avaliable on Vital called england_low_soa_2001.shp.
Up until this point, all the functions you have been using are contained in the base R installation - i.e. they are present by default when you load R. However, one of the really great things about R is that these functions can be expanded by loading new packages that are created by the open source community. These packages add lots of exiting new functions to R.
We are going to now load “maptools” and “rgeos” which provides functions for reading and manipulating GIS data, along with some other libraries that they require.
# load sp
library(sp)
# load maptools
library(maptools)
# load rgeos
library(stringr)
library(rgeos)
We will now read in the LSOA data into a new type of object called a SpatialPolygonsDataFrame by supplying it with the location of the downloaded shapefiles. If the command generates an error message, check that the shapefile is in the correct location (in a folder called LSOA in your working directory).
LSOA <- readShapePoly("LSOA/england_low_soa_2001.shp")
We can see what a SpatialPolygonsDataFrame looks like using a simple plot() function. You might notice that the polygons nearest the River Mersey are quite large. The reason for this is that the official boundaries extend into the water - however, aesthetically, these are not that appealing. We can however tidy these up before we start mapping.
# A basic plot of the OA SpatialPolygonsDataFrame
plot(LSOA)
As such:
# Read in outline as a SpatialPolygonsDataFrame
outline <- readShapePoly("Outline/England_ol_2011_gen_clipped.shp")
# This looks like
plot(outline)
By default, when you call the plot() function, this overwrites the previous plot. However, there is a way that you can specify plots to be cumulative - i.e. add as a new layer. We do this in the following example to add the newly downloaded coastal outline on top of the LSOA by including add=TRUE. We have also made the border a red colour (border="red"), and the fill colour (col="#2C7FB820") a shade of blue. These represent two ways of specifying colours. The second contains eight alphanumerics, the first six relate to a HEX colour code. To view various colours which can be used in R, have a look here. The final two characters are the level of transparency (in this case 20%). Sometimes when running R in Windows, the transparency option will not work - it will just fill it with a solid colour. In this case, just remove the col = "#2C7FB820" section from the plot command to just generate a red outline.
# Plot the LSOA Map
plot(LSOA)
# Overplot the outline map
plot(outline, add = TRUE, border = "red", col = "#2C7FB820")
You will also note on the above plot that the full extent of the English coastal outline is not shown. This is because the plot window is initiated with the extent of the first plot (i.e. Liverpool). You could reverse the above commands (remembering to remove add=TRUE if outline is the first plot) to illustrate this effect.
SpatialPolygonsDataFrame objects are a little more complex than the standard data frame, and comprise of a number of different elements referred to as “slots”. We can view the slots of an object using the slotNames() function which which will print a list of all the available slots.
slotNames(LSOA)
[1] "data" "polygons" "plotOrder" "bbox" "proj4string"
The only slot we really need to worry about for this exercise is “data” which contains all the attribute data for each of the zones (i.e. the attributes contained in the DBF of the shapefile). We access this using an @ symbol, and this can then be treated in very similar way to a standard data frame. For example, we can look at the first 6 rows in the LSOA data slot.
head(LSOA@data)
gid geonorth popnorth geoeast name label popeast
0 458 381722 383625 342348 Liverpool 058B 04BYE01006739 342928
1 4350 384390 385002 338967 Liverpool 056D 04BYE01006687 339051
2 4351 383094 383388 343364 Liverpool 058C 04BYE01006741 343305
3 4352 384125 384509 341524 Liverpool 058D 04BYE01006743 340956
4 4370 385573 386586 337323 Liverpool 052C 04BYE01006528 337495
5 4371 385004 385692 338248 Liverpool 056B 04BYE01006684 338794
zonecode
0 E01006739
1 E01006687
2 E01006741
3 E01006743
4 E01006528
5 E01006684
We want to use the gIntersection() function to remove the elements of the LSOA polygons that overlap the River Mersey. However this takes quite a long time to run, so we have pre-processed this for you (if you want more details on the gIntersection() function, have a look at http://www.alex-singleton.com/R-Tutorial-Materials/6-basic-spatial-analysis.pdf. Run the code below to load in the already clippled version:
# Download file and unzip
download.file("http://www.nickbearman.me.uk/academic/201314_liverpool_envs257/LSOAclip.zip",
"LSOAclip.zip")
unzip("LSOAclip.zip")
# read in to R
LSOA <- readShapePoly("LSOAclip/england_low_soa_2001_clip.shp")
Then re-run the plot command:
# Replot the map with outline
plot(LSOA)
# Overplot the outline map
plot(outline, add = TRUE, border = "red", col = "#2C7FB820")
Next we will join the population data onto the the LSOA spatial polygons data frame. This has to be handled in a slightly different way to how you used the merge statement earlier. Because we know that both the population data frame, and the spatial data frame have the same number of zones (i.e. were both limited to just Liverpool) we can sort them alphabetically by their area ID, thus enabling us to just line the data up side by side.
# Order the LSOA spatial polygon data frame
LSOA <- LSOA[order(LSOA$zonecode), ]
# Order the pop_2001_2011 data frame
pop_2001_2011 <- pop_2001_2011[order(pop_2001_2011$LSOA), ]
If you then look at the first few rows of the above two objects, you will see by looking at their area identifiers that they are now in the same order. If the area identifiers (zonecode and LSOA) are not the same then something has gone wrong - go back and check what you have done.
head(LSOA@data)
gid geonorth popnorth geoeast name label popeast
8 4374 389683 389710 334377 Liverpool 033A 04BYE01006511 334623
67 8973 389734 389715 336154 Liverpool 031A 04BYE01006512 336203
68 8974 390061 389758 335536 Liverpool 033B 04BYE01006513 335346
63 8969 389484 389482 335525 Liverpool 037A 04BYE01006514 335582
61 8967 389196 389220 335117 Liverpool 037B 04BYE01006515 335214
64 8970 389504 389479 335913 Liverpool 037C 04BYE01006516 335897
zonecode
8 E01006511
67 E01006512
68 E01006513
63 E01006514
61 E01006515
64 E01006516
head(pop_2001_2011)
LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29 YR_2001_30_44
1 E01006511 1026 115 274 245
2 E01006512 1454 119 935 244
3 E01006513 2077 81 1691 149
4 E01006514 1432 102 681 358
5 E01006515 1156 187 278 228
6 E01006516 1412 103 597 386
YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14 YR_2011_15_29
1 246 146 2697 198 1256
2 116 40 2463 180 1713
3 95 61 2891 82 2445
4 188 103 1275 105 489
5 252 211 1066 133 331
6 216 110 1400 81 603
YR_2011_30_44 YR_2011_45_64 YR_2011_65plus TotPop_TC TotPop_PC
1 713 369 161 1671 162.8655
2 329 200 41 1009 69.3948
3 190 100 74 814 39.1911
4 275 286 120 -157 -10.9637
5 229 223 150 -90 -7.7855
6 332 283 101 -12 -0.8499
YR_0_14_TC YR_0_14_PC YR_15_29_TC YR_15_29_PC YR_30_44_TC YR_30_44_PC
1 83 72.174 982 358.394 468 191.0204
2 61 51.261 778 83.209 85 34.8361
3 1 1.235 754 44.589 41 27.5168
4 3 2.941 -192 -28.194 -83 -23.1844
5 -54 -28.877 53 19.065 1 0.4386
6 -22 -21.359 6 1.005 -54 -13.9896
YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
1 123 50.000 15 10.274
2 84 72.414 1 2.500
3 5 5.263 13 21.311
4 98 52.128 17 16.505
5 -29 -11.508 -61 -28.910
6 67 31.019 -9 -8.182
We will now “column bind” the population data onto the data slot of the LSOA SpatialPolygonsDataFrame. You can do this using the cbind() function. The data objects are separated by commas. These are assigned (over written) onto the data slot of the SpatialPolygonsDataFrame. If you look at LSOA@data after running this code, you will see that the area codes are now lined up, and that the population data are present.
LSOA@data <- cbind(LSOA@data, pop_2001_2011)
If this has been completed correctly, the LSOA@data should look like…
head(LSOA@data)
gid geonorth popnorth geoeast name label popeast
8 4374 389683 389710 334377 Liverpool 033A 04BYE01006511 334623
67 8973 389734 389715 336154 Liverpool 031A 04BYE01006512 336203
68 8974 390061 389758 335536 Liverpool 033B 04BYE01006513 335346
63 8969 389484 389482 335525 Liverpool 037A 04BYE01006514 335582
61 8967 389196 389220 335117 Liverpool 037B 04BYE01006515 335214
64 8970 389504 389479 335913 Liverpool 037C 04BYE01006516 335897
zonecode LSOA YR_2001_TotPop YR_2001_0_14 YR_2001_15_29
8 E01006511 E01006511 1026 115 274
67 E01006512 E01006512 1454 119 935
68 E01006513 E01006513 2077 81 1691
63 E01006514 E01006514 1432 102 681
61 E01006515 E01006515 1156 187 278
64 E01006516 E01006516 1412 103 597
YR_2001_30_44 YR_2001_45_64 YR_2001_65plus YR_2011_TotPop YR_2011_0_14
8 245 246 146 2697 198
67 244 116 40 2463 180
68 149 95 61 2891 82
63 358 188 103 1275 105
61 228 252 211 1066 133
64 386 216 110 1400 81
YR_2011_15_29 YR_2011_30_44 YR_2011_45_64 YR_2011_65plus TotPop_TC
8 1256 713 369 161 1671
67 1713 329 200 41 1009
68 2445 190 100 74 814
63 489 275 286 120 -157
61 331 229 223 150 -90
64 603 332 283 101 -12
TotPop_PC YR_0_14_TC YR_0_14_PC YR_15_29_TC YR_15_29_PC YR_30_44_TC
8 162.8655 83 72.174 982 358.394 468
67 69.3948 61 51.261 778 83.209 85
68 39.1911 1 1.235 754 44.589 41
63 -10.9637 3 2.941 -192 -28.194 -83
61 -7.7855 -54 -28.877 53 19.065 1
64 -0.8499 -22 -21.359 6 1.005 -54
YR_30_44_PC YR_45_64_TC YR_45_64_PC YR_65plus_TC YR_65plus_PC
8 191.0204 123 50.000 15 10.274
67 34.8361 84 72.414 1 2.500
68 27.5168 5 5.263 13 21.311
63 -23.1844 98 52.128 17 16.505
61 0.4386 -29 -11.508 -61 -28.910
64 -13.9896 67 31.019 -9 -8.182
OK, the data is now ready so we are now able to make some maps! There are many different ways of doing this, however, we will use one of the simplest, and outline each step as we go along. In this example, we will make a map of the change in total population.
We need to load some more packages containing functions that will help us make some maps.
# load classInt
library("classInt")
library("RColorBrewer")
The first step is to find suitable breakpoints for the data contained in the TotPop_PC column. The contiuous data needs to be assigned into categories so different colours can be applied on a choropleth map. There are numerous ways of doing this such as jenks, standard deviations or equal intervals. In this example we use function classIntervals from the classInt package to find the ranges needed to divide the the TotPop_PC into six categories.
# set breaks for 6 categories and then add the values to a single list -
# style = fisher is jenks
breaks <- classIntervals(LSOA@data$TotPop_PC, n = 6, style = "fisher")
# classIntervals returns an object with a number of different slots -
# however, we only need the break values; thus, we overwrite the object
# breaks as follows
breaks <- breaks$brks
We will now use another package called RColorBrewer which provides a series of colour pallets that are suitable for mapping. You can have a look at the colour pallets online: http://colorbrewer2.org. Each of these pallets are named; and you can see all the available pallets as follows…
display.brewer.all()
We will now select a set of six colours from one of these pallets into a new list object called my_colours. If you print this object, you will see it contains six HEX codes for colours within the selected pallet.
# Select six colours from the pallet YlOrRd
my_colours <- brewer.pal(6, "YlOrRd")
# Print the colour codes to the terminal
my_colours
[1] "#FFFFB2" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#BD0026"
Another function called findInterval() can be used with the “breaks” object created earlier to divide the TotPop_PC column into aggregate groups. To view the intervals, just print the breaks object…
breaks
[1] -80.426 -40.414 -12.341 5.591 29.781 82.104 162.865
Before we use these breaks to create a chropleth map, let us first have a look to see what the findInterval() function actually does.
The following code returns a list of numbers. Each number represents a row in the LSOA@data TotPop_PC column - i.e. there will be the same number of values as there are rows in LSOA@data.
The function has examined the values of the TotPop_PC column against the break points that you have requested (stored in the “breaks” object); and returned these as a series of values (1-6) - i.e. break point 1 gets the number 1 etc. The all.inside=TRUE is an optional parameter to ensure that all values are assessed correctly.
# print the intervals
findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)
[1] 6 5 5 3 3 3 5 4 3 4 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 4 5 3 3 3 3 2 3 3 3
[36] 3 3 3 3 4 4 4 3 3 3 4 3 2 1 3 1 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 4 3 3 3
[71] 3 3 4 3 3 4 4 3 3 3 3 3 3 3 3 3 3 2 2 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3
[106] 3 3 3 3 3 3 2 3 3 3 3 3 4 3 3 3 3 3 5 3 3 4 2 3 4 5 2 3 4 3 4 3 4 6 6
[141] 4 3 3 4 4 3 3 3 3 3 3 4 2 4 4 4 3 4 4 3 3 4 4 4 3 4 2 4 4 3 3 3 3 5 3
[176] 3 4 3 3 3 3 3 3 4 5 4 3 3 3 3 3 3 3 1 3 3 3 4 3 2 3 3 3 3 3 4 3 3 3 5
[211] 4 4 3 2 3 4 2 3 4 3 3 3 4 3 3 3 3 3 3 2 3 3 5 3 4 3 6 3 1 2 6 2 3 3 3
[246] 4 3 3 2 4 3 4 3 3 3 3 4 3 3 2 3 3 3 3 3 4 2 3 3 3 3 3 3 3 4 4 3 4 3 3
[281] 3 3 4 3 3 3 3 3 4 2 3
We can combine this list of 1-6 numbers to select specific colours from the my_colours object you created earlier.
By using the square brackets we can use the output of the findInterval() function to select items from the my_colours list. E.g., a number 1 would select the first colour, a 2 the second colour, etc.
# Returns a list of colours
my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)]
[1] "#BD0026" "#F03B20" "#F03B20" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
[8] "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
[15] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[22] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#F03B20" "#FEB24C"
[29] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[36] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FD8D3C"
[43] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FED976" "#FFFFB2"
[50] "#FEB24C" "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[57] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[64] "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
[71] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C"
[78] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[85] "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FED976" "#FEB24C" "#FED976"
[92] "#FFFFB2" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[99] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[106] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976"
[113] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[120] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C" "#FEB24C"
[127] "#FD8D3C" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#FED976" "#FEB24C"
[134] "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#BD0026" "#BD0026"
[141] "#FD8D3C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[148] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[155] "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FEB24C"
[162] "#FD8D3C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FED976" "#FD8D3C"
[169] "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20" "#FEB24C"
[176] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[183] "#FEB24C" "#FD8D3C" "#F03B20" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C"
[190] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FFFFB2" "#FEB24C" "#FEB24C"
[197] "#FEB24C" "#FD8D3C" "#FEB24C" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[204] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#F03B20"
[211] "#FD8D3C" "#FD8D3C" "#FEB24C" "#FED976" "#FEB24C" "#FD8D3C" "#FED976"
[218] "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C"
[225] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FED976" "#FEB24C"
[232] "#FEB24C" "#F03B20" "#FEB24C" "#FD8D3C" "#FEB24C" "#BD0026" "#FEB24C"
[239] "#FFFFB2" "#FED976" "#BD0026" "#FED976" "#FEB24C" "#FEB24C" "#FEB24C"
[246] "#FD8D3C" "#FEB24C" "#FEB24C" "#FED976" "#FD8D3C" "#FEB24C" "#FD8D3C"
[253] "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[260] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FD8D3C"
[267] "#FED976" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[274] "#FEB24C" "#FD8D3C" "#FD8D3C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C"
[281] "#FEB24C" "#FEB24C" "#FD8D3C" "#FEB24C" "#FEB24C" "#FEB24C" "#FEB24C"
[288] "#FEB24C" "#FD8D3C" "#FED976" "#FEB24C"
We can then pull all this together to create a basic chropleth map using the plot() function. The first attribute is the SpatialPolygonsDataFrame, followed by col= which is how we want to colour the polygons. If we specified a single colour, e.g. "#FEB24C", all the polygons would be the same colour.
However, because we use the findInterval() function to generate a list of colours, each polygon is coloured by an appropriate colour associated with area's TotPop_PC value.
The other attributes in the plot() function are axes=FALSE to remove axis from the plot, and border = NA to remove borders from around the polygons.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
You can see what the axes and border function do if you run the following code, setting the axes to TRUE, and the border to a grey colour (#6B6B6B).
# Create a basic choropleth map with borders and axis
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = TRUE, border = "#6B6B6B")
You see that your previous plot has been replaced by a new plot with the LSOA borders coloured in grey, and the axis now visible - the scale on the axis are meters and relate to the projection of the input shapefile (OSGB - British National Grid).
As noted earlier, we can also plot extra data on top of the original chropleth. This is useful for adding additional context to maps. Run the code below to download and import the English Census Area Statistic Wards for Liverpool, which have already been clipped to match the River Mersey.
# Download file and unzip
download.file("http://www.nickbearman.me.uk/academic/201314_liverpool_envs257/CASclip.zip",
"CASclip.zip")
unzip("CASclip.zip")
# read in to R
CAS <- readShapePoly("CASclip/england_caswa_2001_clip.shp")
We can now plot the basic choropleth again, however, this time we will add a second layer showing the ward outlines. In the second plot, we simply specify a grey border colour, and add another parameter called add=TRUE. This tells the plot window to keep the previous result, and add the new plot on top.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
We now have a pretty convincing map. However, there are a few things missing. The first of these are labels for the ward zones. We can use the coordinates() function to extract the central location of each ward polygon, and use this with the names of the ward which are stored in the data slot.
# Print the top six results of using coordinates()
head(coordinates(CAS))
[,1] [,2]
0 342161 383312
1 336112 387301
2 339430 385776
3 341695 385904
4 340738 391460
5 340764 395445
We can use the text() function to add these text onto the map. The first parameter are the x coordinates, the second the y, and the third the text we want to use as the labels. The final cex parameter scales the text smaller.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
The final items we need to add onto the map are a scale bar, a north arrow and a legend. A legend is added using the legend() function - the parameters used include the xy location, legend= - which are the text labels to use on the legend (leglabs and round are used to make these display correctly), fill= which is the object containing the fill colours you used to colour the LSOA choropleth, bty="n" which turns off the border around the legend.
You might find you will need to adjust the location or size of the legend to get this to fit onto your plot correctly. For scaling, the cex= parameter will reduce or increase the size of the item. To find a new set of location coordinates, type locator() into the terminal and press enter. Move your mouse to the map window and left-click where you want to position the legend. Then right-click and choose Stop. R will then print the coordinates of the location you clicked. You can use these to re-position items in the plot.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "),
fill = my_colours, bty = "n")
The sp package which you loaded earlier contains a number of other functions that enable you to add useful items to the map, including a north arrow and scale bar. There are also base R functions for adding text and titles.
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.7)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "),
fill = my_colours, bty = "n")
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000,
plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000,
fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(335379, 381606, "0km", cex = 0.8)
text(335379 + 1500, 381606, "1.5km", cex = 0.8)
text(335379 + 3000, 381606, "3km", cex = 0.8)
# Add a title
title("Total Population Change (%)")
The legend, north arrow, scale bar, etc. are all important parts of the map and will be assessed in your assignment. Make sure you understand what is going on here, in order to add the various elements to the map. Remember, you can always use the help command (?layout.scale.bar) for help with specific functions.
You have been creating all your plots within R, however, when preparing your reports you will need to export these maps. In order to do this, we wrap the code to create a map with png('rplot.png') and then dev.off(). This creates an image called rplot.png in your working directory. This code tells R is to write the results to a png rather than the plot window - thus, you won't see the output until you view the png. Another thing to remember is that the positioning of the items in the png may be a little different to the plot window - so you may have to adjust these and re-plot the image. For example…
# Write out to a png...
png("rplot.png", width = 800, height = 800, units = "px")
# Create a basic choropleth map
plot(LSOA, col = my_colours[findInterval(LSOA@data$TotPop_PC, breaks, all.inside = TRUE)],
axes = FALSE, border = NA)
# Plot the wards on top of the LSOA choropleth
plot(CAS, axes = FALSE, border = "#6B6B6B", add = TRUE)
# Add on text labels for the wards
text(coordinates(CAS)[, 1], coordinates(CAS)[, 2], labels = CAS@data$name, cex = 0.8)
# Add the legend
legend(x = 333603, y = 386096, legend = leglabs(round(breaks), between = " to "),
fill = my_colours, bty = "n", cex = 0.9)
# Add North Arrow
SpatialPolygonsRescale(layout.north.arrow(2), offset = c(333770, 381184), scale = 1000,
plot.grid = F)
# Add Scale Bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(335379, 381184), scale = 3000,
fill = c("white", "black"), plot.grid = F)
# Add text to scale bar
text(335379, 381606, "0km", cex = 0.7)
text(335379 + 1500, 381606, "1.5km", cex = 0.7)
text(335379 + 3000, 381606, "3km", cex = 0.7)
# Add a title
title("Total Population Change (%)")
# Change output back to the plot window
dev.off()
When running code that you are likley to need to tweak and run again, it is a good idea to copy it into Notepad, edit it, and then copy and paste it into R.
The submitted work for this practical is a short report alongside a series of maps that you will generate in R. The total word count for the report is 500 words, and it is expected that you will include at least 5 maps that you have generated in R. You should make reference to the wider literature in your report and cite appropriately - including a formatted reference list (the list is not included in the word count). If you include data tables in your report, these also do not contribute to the total word count. Please do not include any appendix.
You are free to layout your report as you wish, however all maps or other graphics must be appropriately referenced in the text – e.g. see Figure 1 etc.
If you have followed the tutorials in this practical you will have now created a SpatialPolygonsDataFrame for the Liverpool local authority that contains the following variables:
Estimated percentage and total population change between 2001 and 2011:
With reference to the evidence you create from the above data, further data sources that you integrate independently into R, and wider reading, write a report that will address the following questions:
With reference to your population maps, identify the area of Liverpool with the highest population increase; discussing those potential processes that may have influenced these changes.
With reference to your population maps, identify the area of Liverpool with the highest population decrease; discussing those potential processes that may have influenced these changes.
Through this report you should build a narrative using maps (or any other data visualization you deem appropriate) to hypothesis about the possible causes and impacts of the change that you observe to have occurred in your study area since the 2001 Census. Your marks will be linked to both the quality and clarity of your answers to the above questions and the visualizations you create.

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en. Based upon Detecting Neighbourhood Change by Alex Singleton.