Mapping Data from Eurostat using R

Tutorial Aims:

This tutorial is indebted to the various R spatial materials published on the web by Chris Brunsdon and Alex Singleton of the University of Liverpool

R Basics

This tutorial is designed as a walk-through, so as such you will not need any real prior knowledge of R. If you follow the guide step-by-step, you will generate some interesting output.

However, if you are new to R and wish to learn some more about this powerful language and statistical computing environment, there are a number of useful sites on the web to get you started:

http://www.statmethods.net/index.html - Quick R: a very good basic guide
http://www.cookbook-r.com/ - very comprehensive tutorial
http://www.r-tutor.com/ - various tutorials

There are almost limitless resources for R on the web, so if you encouter any problems a solution is usually only a short search away.

Getting started

First create yourself a working directory somewhere on your hard drive.
Once you have your new directory you need to tell R that this is your working directory. For example, if I called my working directory EurostatAtlas and put that in the root of my C: drive, I could set this using the setwd() function:

setwd("C:/EurostatAtlas")
## Error: cannot change working directory

The next thing we will do is install a series of packages which contain the various functions we will be using:

install.packages(c("rgdal", "RColorBrewer", "sp", "GISTools", "classInt", "maptools", 
    "SmarterPoland"))
## Error: trying to use CRAN without setting a mirror

Now to use the packages, we need to load them with the library function

library("rgdal")
library("RColorBrewer")
library("sp")
library("GISTools")
library("classInt")
library("maptools")
library("SmarterPoland")

We now have everything which is required to get mapping

Downloading and manipulating boundary data from Eurostat

Downloading data

The following block of code will download a set of Nomenclature of Units for Territorial Statistics (NUTS) boundaries from the Eurostat website and store them in the default temporary location on your computer:

# create a new empty object called 'temp' in which to store a zip file
# containing boundary data
temp <- tempfile(fileext = ".zip")
# now download the zip file from its location on the Eurostat website and
# put it into the temp object
download.file("http://epp.eurostat.ec.europa.eu/cache/GISCO/geodatafiles/NUTS_2010_60M_SH.zip", 
    temp)
# now unzip the boundary data
unzip(temp)

These are not the only boundary data available - the full range of shapefiles and geodatabases can be explored here: http://epp.eurostat.ec.europa.eu/portal/page/portal/gisco_Geographical_information_maps/popups/references/administrative_units_statistical_units_1

Changing the spatial reference system

Your boundary data can now be converted into a spatial polygons data frame. This is a data format that R is able to work with. As these are boundaries for the NUTS hierarchy, we'll give the dataframe a suitable name:

EU_NUTS <- readOGR(dsn = "./NUTS_2010_60M_SH/data", layer = "NUTS_RG_60M_2010")
## OGR data source with driver: ESRI Shapefile 
## Source: "./NUTS_2010_60M_SH/data", layer: "NUTS_RG_60M_2010"
## with 1920 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions

We can now plot our boundary data to see what it looks like:

plot(EU_NUTS)

plot of chunk unnamed-chunk-5

You'll notice that the EU looks a little squished - this is because the projection is set to a default which distorts the map compared to what we commonly see. To get the projection information for the map, we can extract the PROJ.4 string:

proj4string(EU_NUTS)
## [1] "+proj=longlat +ellps=GRS80 +no_defs"

To project our map using a more familiar projection (such as the google maps projection many of us are used to seeing) we can set the projection using a new PROJ.4 string. This can be carried out using the spTransform function:

EU_NUTS <- spTransform(EU_NUTS, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"))

We can now re-plot our boundary data to see the changes that the new PROJ.4 string makes:

plot(EU_NUTS)

plot of chunk unnamed-chunk-8

To obtain the PROJ.4 string for a whole variety of different projections and datums, the spatialreference.org website is a very useful resource: http://www.spatialreference.org

Reading in Data from Eurostat

If you wish to browse the Eurostat database before starting this section, go to: http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/search_database

Here you can look at the various tables contained in the different sub-folders and play with the alternative ways of chopping up the 'hypercube' data structures to create two-dimensional tabulations.

Data can be downloaded from Eurostat in conventional csv or excel tabulations, which are fine for day-to-day useage and can themselves be read into R. It is also possible, however, to download data tables directly into R. To do this we will make use of the SmarterPoland package which is able to read data directly from the Eurostat database and, perhaps even more usefully, clean and format the data ready for analysis.

First we will create a table of contents containing the table definitions and table codes:

EurostatTOC <- getEurostatTOC()

If you are using RStudio, you will be able to examine the contents of the new EurostatTOC dataframe by double clicking on it in your workspace. If you are not using RStudio, then you should be! But if for some weird reason you're still resistant, then you can look at the top few rows of the data frame using the head() function:

head(EurostatTOC)
##                                                              title
## 1                                               Database by themes
## 2                                  General and regional statistics
## 3         European and national indicators for short-term analysis
## 4                 Business and consumer surveys (source: DG ECFIN)
## 5                              Consumer surveys (source: DG ECFIN)
## 6                                         Consumers - monthly data
##        code    type last.update.of.data last.table.structure.change
## 1      data  folder                                                
## 2   general  folder                                                
## 3   euroind  folder                                                
## 4    ei_bcs  folder                                                
## 5 ei_bcs_cs  folder                                                
## 6 ei_bsco_m dataset          30.08.2013                  30.08.2013
##   data.start data.end values
## 1                         NA
## 2                         NA
## 3                         NA
## 4                         NA
## 5                         NA
## 6    1985M01  2013M08     NA

You should be able to see a number of columns including the title of each table in the database and the code for that table.

At this point we are now going to download some unemployment data from the European Labour Force Survey. Yes, I know this is a practical about Census data, so by all means find a Census table that interests you and apply everything which follows below to that table. Unemployment data are quite interesting though, so this is where we will focus the rest of this practical.

The table we are interested in is “Unemployment rates by sex, age and NUTS 2 regions (%)” - table code lfst_r_lfu3rt

We will download the data from this table in its 'molten' form. The molten data format gives us the most flexibility for reformatting our data into a useable table. For more infomation on the molten data format, see: Wickham, H. (2007) Reshaping Data with the reshape Package, Journal of Statistical Software, 21(12) - http://www.jstatsoft.org/v21/i12/paper

To download the lfst_r_lfu3rt in molten form, we will use the getEurostatRCV()' function in the SmarterPoland package and store it in a data frame called 'data':

data <- getEurostatRCV(kod = "lfst_r_lfu3rt")

If we now take a look at this data frame either by double clicking data in your RStudio workspace (or by using head(data) if you are still an RStudio refusenik), you will see we have five columns of data relating to age, sex, geography, time and the unemployment value in %.

Re-formatting your data

In order to reformat our data, we need to be aware of the different variables contained within the dataset. To check this we can use the unique() function to look at the different variables associated with age, sex and time:

unique(data$age)
## [1] Y15-24 Y20-64 Y_GE15 Y_GE25
## Levels: Y_GE15 Y_GE25 Y15-24 Y20-64
unique(data$sex)
## [1] F M T
## Levels: F M T
unique(data$time)
##  [1] 2012  2011  2010  2009  2008  2007  2006  2005  2004  2003  2002 
## [12] 2001  2000  1999 
## 14 Levels: 2012  2011  2010  2009  2008  2007  2006  2005  2004  ... 1999

Now we are aware of the choice of variables available we should select a combination of time to map.

For example we might want to look at the distribution of unemployment across Europe for all people aged 20-64 in 2012. One way to select out this data from the main data table would be to create a subset:

sub_data <- subset(data, (age == "Y20-64") & (sex == "T") & (time == "2012 "))

This is fine, but every time we want to look at another combination of variables (unemployed Females aged 15-24 in 2011, for example) we will need to create a new subset. An alternative option is to rearrange our data using the reshape package. Fortunately this package has already been installed as part of the SmarterPoland package.

For details on the conceptual framework underpinning the reshape package, see Wickham (2007) mentioned above.

We will now rearrange the data using the cast() function so that the data frame contains a column for each of the variables combinations in our data set - for example: 15-24, Male, 2012; 15-24, Female, 2012; 15-24, Total, 2012, etc…

mapdata <- cast(data, geo ~ time + age + sex)

As Wickham (2007) outlines, the casting formula has this basic form: col_var_1 + col_var_2 ~ row_var_1 + row_var_2

Because we want to combine all of the values contained in time (14 values), age (4 values), and sex (3 values) so that there is a unique combination of each (14*4*3 = 168 new variables), these three values go on the right-hand side of the function. We want to keep the individual values of geo, so this goes on the left-hand side of the function. Try experimenting with placing different variables on the left and right hand sides of the cast function and see how the molten data are merged differently.

As before, view a summary of your new data frame in either the RStudio viewer or using the head() function:

head(mapdata)
##    geo 2012 _Y_GE15_F 2012 _Y_GE15_M 2012 _Y_GE15_T 2012 _Y_GE25_F
## 1   AT            4.3            4.3            4.3            3.6
## 2  AT1            5.5            6.3            5.9            4.8
## 3 AT11             NA            4.2            4.4             NA
## 4 AT12            4.1            4.1            4.1            3.8
## 5 AT13            7.0            8.7            7.9            5.8
## 6  AT2            4.0            3.4            3.7            3.4
##   2012 _Y_GE25_M 2012 _Y_GE25_T 2012 _Y15-24_F 2012 _Y15-24_M
## 1            3.6            3.6            8.7            8.8
## 2            5.3            5.0           11.4           13.6
## 3             NA            3.6             NA             NA
## 4            3.3            3.5            6.6            9.3
## 5            7.4            6.6           16.6           18.7
## 6            3.0            3.2            7.8            6.2
##   2012 _Y15-24_T 2012 _Y20-64_F 2012 _Y20-64_M 2012 _Y20-64_T
## 1            8.7            4.1            4.2            4.1
## 2           12.5            5.2            6.0            5.6
## 3             NA             NA             NA            4.3
## 4            8.1            3.9            3.8            3.9
## 5           17.7            6.5            8.4            7.5
## 6            6.9            3.8            3.4            3.6
##   2011 _Y_GE15_F 2011 _Y_GE15_M 2011 _Y_GE15_T 2011 _Y_GE25_F
## 1            4.3            4.0            4.1            3.6
## 2            5.5            5.6            5.5            4.6
## 3             NA             NA            3.6             NA
## 4            4.3            4.2            4.2            3.7
## 5            6.8            7.4            7.1            5.5
## 6            3.4            3.3            3.3            3.0
##   2011 _Y_GE25_M 2011 _Y_GE25_T 2011 _Y15-24_F 2011 _Y15-24_M
## 1            3.4            3.5            8.8            7.9
## 2            4.8            4.7           12.0           11.8
## 3             NA            3.2             NA             NA
## 4            3.4            3.5            8.2            9.6
## 5            6.4            6.0           16.4           15.1
## 6            3.0            3.0            6.1            5.2
##   2011 _Y15-24_T 2011 _Y20-64_F 2011 _Y20-64_M 2011 _Y20-64_T
## 1            8.3            4.0            3.8            3.9
## 2           11.9            5.0            5.4            5.2
## 3             NA             NA             NA            3.4
## 4            8.9            4.0            4.0            4.0
## 5           15.7            6.1            7.0            6.6
## 6            5.6            3.2            3.2            3.2
##   2010 _Y_GE15_F 2010 _Y_GE15_M 2010 _Y_GE15_T 2010 _Y_GE25_F
## 1            4.2            4.6            4.4            3.5
## 2            4.9            5.9            5.4            4.1
## 3             NA             NA            3.9             NA
## 4            3.4            3.8            3.6            2.8
## 5            6.4            8.2            7.3            5.3
## 6            4.2            4.0            4.1            3.4
##   2010 _Y_GE25_M 2010 _Y_GE25_T 2010 _Y15-24_F 2010 _Y15-24_M
## 1            3.9            3.7            8.8            8.9
## 2            5.0            4.6           11.2           12.0
## 3             NA            3.1             NA             NA
## 4            3.4            3.1            7.7            6.7
## 5            6.9            6.2           14.6           18.0
## 6            3.4            3.4            8.7            7.8
##   2010 _Y15-24_T 2010 _Y20-64_F 2010 _Y20-64_M 2010 _Y20-64_T
## 1            8.8            3.9            4.4            4.2
## 2           11.6            4.6            5.8            5.2
## 3             NA             NA             NA            3.4
## 4            7.2            3.1            3.8            3.5
## 5           16.4            6.1            8.0            7.1
## 6            8.2            3.9            3.8            3.9
##   2009 _Y_GE15_F 2009 _Y_GE15_M 2009 _Y_GE15_T 2009 _Y_GE25_F
## 1            4.6            5.0            4.8            3.7
## 2            5.2            6.4            5.9            4.3
## 3            4.8            4.4            4.6             NA
## 4            4.0            4.5            4.3            3.0
## 5            6.4            8.6            7.5            5.4
## 6            4.2            4.7            4.5            3.3
##   2009 _Y_GE25_M 2009 _Y_GE25_T 2009 _Y15-24_F 2009 _Y15-24_M
## 1            4.1            3.9            9.4           10.5
## 2            5.4            4.9           11.4           13.7
## 3             NA            4.0             NA             NA
## 4            3.4            3.2           10.2           12.1
## 5            7.6            6.5           12.9           16.3
## 6            3.8            3.6            9.4           10.2
##   2009 _Y15-24_T 2009 _Y20-64_F 2009 _Y20-64_M 2009 _Y20-64_T
## 1           10.0             NA             NA             NA
## 2           12.6             NA             NA             NA
## 3             NA             NA             NA             NA
## 4           11.2             NA             NA             NA
## 5           14.6             NA             NA             NA
## 6            9.8             NA             NA             NA
##   2008 _Y_GE15_F 2008 _Y_GE15_M 2008 _Y_GE15_T 2008 _Y_GE25_F
## 1            4.1            3.6            3.8            3.4
## 2            5.1            4.8            5.0            4.3
## 3             NA             NA            3.6             NA
## 4            4.0            2.9            3.4            3.1
## 5            6.5            6.9            6.7            5.7
## 6            3.6            3.3            3.4            3.0
##   2008 _Y_GE25_M 2008 _Y_GE25_T 2008 _Y15-24_F 2008 _Y15-24_M
## 1            2.9            3.1            8.2            7.9
## 2            3.9            4.1           10.3           11.4
## 3             NA            2.8             NA             NA
## 4            2.2            2.6            9.4            7.5
## 5            5.7            5.7           11.9           16.0
## 6            2.6            2.8            6.7            7.1
##   2008 _Y15-24_T 2008 _Y20-64_F 2008 _Y20-64_M 2008 _Y20-64_T
## 1            8.0             NA             NA             NA
## 2           10.9             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            8.4             NA             NA             NA
## 5           14.0             NA             NA             NA
## 6            6.9             NA             NA             NA
##   2007 _Y_GE15_F 2007 _Y_GE15_M 2007 _Y_GE15_T 2007 _Y_GE25_F
## 1            5.0            3.9            4.4            4.3
## 2            6.0            5.6            5.8            5.4
## 3             NA             NA            3.7             NA
## 4            4.1            3.1            3.6            3.5
## 5            8.1            8.5            8.3            7.5
## 6            4.4            3.3            3.8            3.7
##   2007 _Y_GE25_M 2007 _Y_GE25_T 2007 _Y15-24_F 2007 _Y15-24_M
## 1            3.2            3.7            9.1            8.3
## 2            4.7            5.0           10.2           12.0
## 3             NA            3.0             NA             NA
## 4            2.4            2.9            8.2            7.1
## 5            7.2            7.3           12.2           18.2
## 6            2.5            3.0            8.3            8.0
##   2007 _Y15-24_T 2007 _Y20-64_F 2007 _Y20-64_M 2007 _Y20-64_T
## 1            8.7             NA             NA             NA
## 2           11.1             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            7.6             NA             NA             NA
## 5           15.3             NA             NA             NA
## 6            8.2             NA             NA             NA
##   2006 _Y_GE15_F 2006 _Y_GE15_M 2006 _Y_GE15_T 2006 _Y_GE25_F
## 1            5.2            4.3            4.7            4.5
## 2            6.3            6.4            6.3            5.5
## 3            6.1            4.2            5.0             NA
## 4            4.5            3.5            4.0            3.9
## 5            8.0            9.5            8.8            7.0
## 6            5.1            3.2            4.1            4.4
##   2006 _Y_GE25_M 2006 _Y_GE25_T 2006 _Y15-24_F 2006 _Y15-24_M
## 1            3.6            4.0            9.3            8.9
## 2            5.3            5.4           11.7           13.5
## 3             NA            4.1             NA             NA
## 4            2.8            3.3            8.2            7.8
## 5            8.0            7.5           15.0           20.2
## 6            2.7            3.5            9.2            6.3
##   2006 _Y15-24_T 2006 _Y20-64_F 2006 _Y20-64_M 2006 _Y20-64_T
## 1            9.1             NA             NA             NA
## 2           12.7             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            8.0             NA             NA             NA
## 5           17.7             NA             NA             NA
## 6            7.7             NA             NA             NA
##   2005 _Y_GE15_F 2005 _Y_GE15_M 2005 _Y_GE15_T 2005 _Y_GE25_F
## 1            5.5            4.9            5.2            4.7
## 2            6.5            6.8            6.7            5.6
## 3            7.4            4.9            6.0            6.4
## 4            4.8            3.8            4.3            4.2
## 5            7.9           10.2            9.1            6.8
## 6            5.1            3.7            4.3            4.2
##   2005 _Y_GE25_M 2005 _Y_GE25_T 2005 _Y15-24_F 2005 _Y15-24_M
## 1            3.9            4.3            9.9           10.7
## 2            5.5            5.6           12.0           15.8
## 3             NA            4.9             NA             NA
## 4            2.9            3.5            8.2            9.5
## 5            8.4            7.7           15.6           23.3
## 6            3.0            3.5            9.7            8.2
##   2005 _Y15-24_T 2005 _Y20-64_F 2005 _Y20-64_M 2005 _Y20-64_T
## 1           10.3             NA             NA             NA
## 2           14.0             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            8.9             NA             NA             NA
## 5           19.7             NA             NA             NA
## 6            8.9             NA             NA             NA
##   2004 _Y_GE15_F 2004 _Y_GE15_M 2004 _Y_GE15_T 2004 _Y_GE25_F
## 1            5.3            5.3            5.3            4.4
## 2            6.6            7.1            6.9            5.3
## 3            5.6            5.2            5.4             NA
## 4            4.6            3.7            4.1            3.3
## 5            8.6           10.7            9.7            7.3
## 6            4.7            4.6            4.7            4.3
##   2004 _Y_GE25_M 2004 _Y_GE25_T 2004 _Y15-24_F 2004 _Y15-24_M
## 1            4.3            4.4           10.7           11.3
## 2            6.1            5.7           15.4           14.7
## 3             NA            4.2             NA             NA
## 4            3.1            3.2           12.9            7.9
## 5            9.3            8.4           19.3           21.5
## 6            3.9            4.1            7.0            8.8
##   2004 _Y15-24_T 2004 _Y20-64_F 2004 _Y20-64_M 2004 _Y20-64_T
## 1           11.0             NA             NA             NA
## 2           15.0             NA             NA             NA
## 3             NA             NA             NA             NA
## 4           10.3             NA             NA             NA
## 5           20.6             NA             NA             NA
## 6            8.0             NA             NA             NA
##   2003 _Y_GE15_F 2003 _Y_GE15_M 2003 _Y_GE15_T 2003 _Y_GE25_F
## 1            4.3            5.1            4.8            4.0
## 2            5.6            6.6            6.2            5.2
## 3            6.2            5.4            5.8            5.7
## 4            3.8            4.2            4.0            3.3
## 5            7.3            9.3            8.3            6.8
## 6            3.6            4.6            4.1            3.4
##   2003 _Y_GE25_M 2003 _Y_GE25_T 2003 _Y15-24_F 2003 _Y15-24_M
## 1            4.7            4.4            6.8            8.0
## 2            6.1            5.7            9.1           10.8
## 3            5.0            5.3             NA             NA
## 4            4.0            3.7            7.2            5.1
## 5            8.2            7.6           11.2           18.0
## 6            4.2            3.8            5.2            7.4
##   2003 _Y15-24_T 2003 _Y20-64_F 2003 _Y20-64_M 2003 _Y20-64_T
## 1            7.5             NA             NA             NA
## 2           10.1             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            6.0             NA             NA             NA
## 5           15.0             NA             NA             NA
## 6            6.4             NA             NA             NA
##   2002 _Y_GE15_F 2002 _Y_GE15_M 2002 _Y_GE15_T 2002 _Y_GE25_F
## 1            4.5            5.1            4.8            4.2
## 2            5.3            6.8            6.1            5.1
## 3            4.7            5.0            4.9            4.4
## 4            4.9            4.5            4.7            4.9
## 5            5.8            9.3            7.7            5.5
## 6            5.2            5.0            5.1            4.8
##   2002 _Y_GE25_M 2002 _Y_GE25_T 2002 _Y15-24_F 2002 _Y15-24_M
## 1            4.7            4.5            6.6            7.7
## 2            6.2            5.7            6.8           10.9
## 3            4.6            4.5             NA             NA
## 4            4.0            4.4            5.4            8.0
## 5            8.7            7.2            8.3           14.5
## 6            4.6            4.7            7.8            8.0
##   2002 _Y15-24_T 2002 _Y20-64_F 2002 _Y20-64_M 2002 _Y20-64_T
## 1            7.2             NA             NA             NA
## 2            9.0             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            6.8             NA             NA             NA
## 5           11.7             NA             NA             NA
## 6            7.9             NA             NA             NA
##   2001 _Y_GE15_F 2001 _Y_GE15_M 2001 _Y_GE15_T 2001 _Y_GE25_F
## 1            4.1            3.9            4.0            3.9
## 2            4.3            5.1            4.7            4.0
## 3            4.3            5.5            5.0             NA
## 4            3.3            3.2            3.2            3.1
## 5            5.1            6.7            6.0            4.6
## 6            5.4            3.9            4.5            4.8
##   2001 _Y_GE25_M 2001 _Y_GE25_T 2001 _Y15-24_F 2001 _Y15-24_M
## 1            3.6            3.7            5.8            6.2
## 2            4.9            4.5            6.7            6.6
## 3            5.5            4.9             NA             NA
## 4            2.8            2.9             NA            5.8
## 5            6.6            5.7            9.6            7.6
## 6            3.1            3.9            9.0            8.5
##   2001 _Y15-24_T 2001 _Y20-64_F 2001 _Y20-64_M 2001 _Y20-64_T
## 1            6.0             NA             NA             NA
## 2            6.6             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            5.1             NA             NA             NA
## 5            8.5             NA             NA             NA
## 6            8.7             NA             NA             NA
##   2000 _Y_GE15_F 2000 _Y_GE15_M 2000 _Y_GE15_T 2000 _Y_GE25_F
## 1            4.6            4.8            4.7            4.4
## 2            5.4            6.0            5.8            5.4
## 3            4.8            4.9            4.8            5.0
## 4            4.5            3.7            4.1            4.3
## 5            6.4            8.4            7.5            6.4
## 6            4.5            4.6            4.6            3.9
##   2000 _Y_GE25_M 2000 _Y_GE25_T 2000 _Y15-24_F 2000 _Y15-24_M
## 1            4.4            4.4            5.6            6.9
## 2            5.9            5.7            5.5            7.5
## 3            4.7            4.8             NA             NA
## 4            3.7            4.0            5.3             NA
## 5            8.0            7.3            6.1           12.2
## 6            4.1            4.0            8.2            7.8
##   2000 _Y15-24_T 2000 _Y20-64_F 2000 _Y20-64_M 2000 _Y20-64_T
## 1            6.3             NA             NA             NA
## 2            6.6             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            4.4             NA             NA             NA
## 5            9.5             NA             NA             NA
## 6            8.0             NA             NA             NA
##   1999 _Y_GE15_F 1999 _Y_GE15_M 1999 _Y_GE15_T 1999 _Y_GE25_F
## 1            4.8            4.7            4.7            4.5
## 2            5.3            5.7            5.5            5.1
## 3            5.3            4.7            4.9            4.8
## 4            4.7            4.0            4.3            4.4
## 5            5.7            7.5            6.7            5.7
## 6            4.0            4.3            4.2            3.8
##   1999 _Y_GE25_M 1999 _Y_GE25_T 1999 _Y15-24_F 1999 _Y15-24_M
## 1            4.5            4.5            6.4            5.5
## 2            5.7            5.4            6.3            5.9
## 3            4.5            4.6             NA             NA
## 4            4.2            4.3            6.4             NA
## 5            7.2            6.5            6.0            9.5
## 6            4.1            4.0            5.2            5.2
##   1999 _Y15-24_T 1999 _Y20-64_F 1999 _Y20-64_M 1999 _Y20-64_T
## 1            5.9             NA             NA             NA
## 2            6.1             NA             NA             NA
## 3             NA             NA             NA             NA
## 4            4.4             NA             NA             NA
## 5            7.9             NA             NA             NA
## 6            5.2             NA             NA             NA

You will notice that the variable names are now just concatenations of the original variable values.

Combining your attribute data with boundary data

The EU_NUTS spatial polygons data frame you created earlier has a data object associated with it. View the first few rows of data already attached to the data object as follows:

head(EU_NUTS@data)
##   NUTS_ID STAT_LEVL_ SHAPE_AREA SHAPE_LEN
## 0   AT111          3    0.08091     1.089
## 1   AT112          3    0.20926     2.257
## 2   AT113          3    0.17728     2.002
## 3   AT121          3    0.40147     3.158
## 4   AT122          3    0.42676     2.957
## 5   AT123          3    0.14146     2.010

You will see four columns containing data: The NUTS_ID which contains the NUTS code for the particular boundary polygon; STAT_LEVL_ which indicates whether the boundary is a NUTS0, NUTS1, NUTS2 or NUTS3 boundary (for details of the differences, visit http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/introduction); SHAPE_AREA which gives an indication of the size of the polygon; and SHAPE_LEN its length.

We can use the common codes in the NUTS_ID field of our spatial polygons data frame and the geo field in mapdata to combine the two dataframes. This can be done using the match() function:

EU_NUTS@data = data.frame(EU_NUTS@data, mapdata[match(EU_NUTS@data[, "NUTS_ID"], 
    mapdata[, "geo"]), ])

The square brackets in this function allow us to match row data in the two data frames by column vectors with the specific names NUTS_ID and geo. If you now look at the data object of EU_NUTS using the head(EU_NUTS@data) function, you'll see all of the columns from mapdata appended. You may notice that these first few rows contain NULL values, but this is to be expected as the data are at NUTS2 level and the first few polgons in the spatial polygons data frame are at NUTS3 level.

OK, now we're ready to get mapping!!

Mapping your data

Choosing your colour scheme

At the start of this practical we imported the colorbrewer package. This package allows you to choose from a variety of different, pleasing to the eye, colour palettes. To examine the range, visit http://colorbrewer2.org/. In this example we will be using the RdPu palette, but by all means choose your own and change the code below accordingly.

We will opt for a 5 colour scheme and store this in a data frame called my_colours

my_colours <- brewer.pal(5, "RdPu")

Next we need to select the variable we are going to map and calculate the breaks in the data which will define the colour ranges on the map. In this example we will map the unemployment rates of all people aged 20-64 in 2012 (the X2012._Y20.64_T variable in our data frame).

Defining the breaks in this variable can be achieved with the classIntervals() function and then extracting the vector of breaks (brks) from the function:

breaks <- classIntervals(EU_NUTS@data$X2012._Y20.64_T, n = 5, style = "fisher", 
    unique = TRUE)$brks
## Warning: var has missing values, omitted in finding classes

To save us attempting to define the best partitions in our data, here we make use of the Fisher-Jenks natural breaks algorithm using the “fisher” style option.

Generating your map

Everything is now in place to plot your map. This can be done with a simple call to the plot() function:

plot <- plot(EU_NUTS, col = my_colours[findInterval(EU_NUTS@data$X2012._Y20.64_T, 
    breaks, all.inside = TRUE)], axes = FALSE, border = NA)

plot of chunk plot map

If you wish, you can also add in the borders of the countries by first creating a new spatial polygons dataframe containing just the country (NUTS0 or STAT_LEVL 0) boundaries:

CountryBorder <- EU_NUTS[EU_NUTS@data$STAT_LEVL_ == 0, ]

and then add these to your plot:

plot <- plot(CountryBorder, border = "#707070", add = TRUE)
## Error: plot.new has not been called yet

Adding a legend and other map furniture

In order to add a legend, you will need to know the the coordinates for the upper-left corner of the box that contains your legend. Do find these, you can use the locator() function:

locator()
## Error: plot.new has not been called yet

This will allow you to click anywhere on your open map plot before pressing escape to get generate the coordinates of where you clicked. In this example, clicking somewhere in the north atlantic will generate the following x and y coordinates which can then be included in the legend() function:

plot <- legend(x = -6080915, y = 8730220, legend = leglabs(round(breaks, digits = 2), 
    between = " to <"), fill = my_colours, bty = "n", cex = 0.7, title = "Unemployment Rate")
## Error: plot.new has not been called yet

To generate a PDF of your map all of the above code can be included between a pdf() and a dev.off() function:

pdf("map.pdf", width = 10, height = 10, title = "Unemployment rates by sex, age and NUTS 2 regions (%)", 
    paper = "a4")
plot <- plot(EU_NUTS, col = my_colours[findInterval(EU_NUTS@data$X2012._Y20.64_T, 
    breaks, all.inside = TRUE)], axes = FALSE, border = NA)
plot <- plot(CountryBorder, border = "#707070", add = TRUE)
plot <- legend(x = -6080915, y = 8730220, legend = leglabs(round(breaks, digits = 2), 
    between = " to <"), fill = my_colours, bty = "n", cex = 0.7, title = "Unemployment Rate")
title("Total Unemployment rates 20-64 year olds, \nEU NUTS 2 regions (% of total workforce), 2012")
dev.off()
## pdf 
##   2