Eurostat R tools

This R package provides tools to access Eurostat open data as part of the rOpenGov project.

For contact information and source code, see the github page

Installation

Release version for general use:

install.packages("eurostat")
library(eurostat)

Development version (potentially unstable):

install.packages("devtools")
library(devtools)
install_github("eurostat", "ropengov")
library(eurostat)

Finding the data

Function getEurostatTOC downloads a table of contents of eurostat datasets. Note that the values in column ‘code’ should be used to download a selected dataset.

library(eurostat)

# Get Eurostat data listing
toc <- getEurostatTOC()
toc[200:210,]
##                                                                                            title
## 200                                           Gross value added at basic pricesby NUTS 3 regions
## 201                                               Employment (in 1000 persons) by NUTS 3 regions
## 202                                Gross fixed capital formation by NUTS 2 regions (NACE Rev. 2)
## 203                                    Compensation of employees by NUTS 2 regions (NACE Rev. 2)
## 204                            Employment (in 1000 hours worked) by NUTS 2 regions (NACE Rev. 2)
## 205                                Employment (in 1000 persons) by NUTS 3 regions (NACE Rev. 2) 
## 206                                                                   Household accounts - ESA95
## 207                         Allocation of primary income account of households by NUTS 2 regions
## 208                     Secondary distribution of income account of households by NUTS 2 regions
## 209                                                       Income of households by NUTS 2 regions
## 210                                                                Regional education statistics
##                 code    type last.update.of.data
## 200  nama_r_e3vabp95 dataset          18.06.2013
## 201  nama_r_e3empl95 dataset          11.06.2012
## 202  nama_r_e2gfcfr2 dataset          22.08.2014
## 203   nama_r_e2remr2 dataset          21.08.2014
## 204 nama_r_e2em95hr2 dataset          21.08.2014
## 205  nama_r_e3em95r2 dataset          21.08.2014
## 206        reg_ecohh  folder                    
## 207     nama_r_ehh2p dataset          27.03.2014
## 208     nama_r_ehh2s dataset          27.03.2014
## 209   nama_r_ehh2inc dataset          28.03.2014
## 210         reg_educ  folder                    
##     last.table.structure.change data.start data.end values
## 200                  26.06.2013       1995     2009     NA
## 201                  04.02.2014       1995     2009     NA
## 202                  08.08.2014       2000     2011     NA
## 203                  30.07.2014       2000     2011     NA
## 204                  14.08.2014       1995     2011     NA
## 205                  14.08.2014       2000     2012     NA
## 206                                                     NA
## 207                  25.03.2014       2000     2011     NA
## 208                  25.03.2014       2000     2011     NA
## 209                  24.03.2014       2000     2011     NA
## 210                                                     NA

With grepEurostatTOC you can search through the table of content for particular patterns, e.g. all datasets related to passenger transport.

# info about passengers
head(grepEurostatTOC("passenger transport", type = "dataset"))
##                                                                                                                                         title
## 4945                                                                                            Volume of passenger transport relative to GDP
## 4946                                                                                                       Modal split of passenger transport
## 4985                                                          Railway transport - Total annual passenger transport (1 000 pass., million pkm)
## 4989                 International railway passenger transport from the reporting country to the country of disembarkation (1 000 passengers)
## 4990                    International railway passenger transport from the country of embarkation to the reporting country (1 000 passengers)
## 5341                                                                                             Air passenger transport by reporting country
##                 code    type last.update.of.data
## 4945   tran_hv_pstra dataset          25.06.2014
## 4946   tran_hv_psmod dataset          25.06.2014
## 4985   rail_pa_total dataset          14.08.2014
## 4989 rail_pa_intgong dataset          14.08.2014
## 4990 rail_pa_intcmng dataset          14.08.2014
## 5341       avia_paoc dataset          18.08.2014
##      last.table.structure.change data.start data.end values
## 4945                  25.06.2014       1995     2012     NA
## 4946                  25.06.2014       1990     2012     NA
## 4985                  10.07.2014       2004     2013     NA
## 4989                  10.07.2014       2002     2013     NA
## 4990                  10.07.2014       2002     2013     NA
## 5341                  13.08.2014       1993   2014Q2     NA
head(grepEurostatTOC("passenger transport", type = "table"))
##                                                              title
## 7105                 Volume of passenger transport relative to GDP
## 7106                            Modal split of passenger transport
## 7613                            Modal split of passenger transport
## 7737                            Modal split of passenger transport
## 7740                 Volume of passenger transport relative to GDP
##          code  type last.update.of.data last.table.structure.change
## 7105 tsdtr240 table          25.06.2014                  25.06.2014
## 7106 tsdtr210 table          25.06.2014                  25.06.2014
## 7613 tsdtr210 table          25.06.2014                  25.06.2014
## 7737 tsdtr210 table          25.06.2014                  25.06.2014
## 7740 tsdtr240 table          25.06.2014                  25.06.2014
##      data.start data.end values
## 7105       1995     2012     NA
## 7106       1990     2012     NA
## 7613       1990     2012     NA
## 7737       1990     2012     NA
## 7740       1995     2012     NA

Downloading the data

Package has two functions for downloading the data. When using get_eurostat_raw the data is transformed into the tabular format, whereas get_eurostat returns dataset transformed into the molten / row-column-value format (RCV). Let’s focus on indicator (Modal split of passenger transport) in this document.

This indicator is defined as the percentage share of each mode of transport in total inland transport, expressed in passenger-kilometres (pkm). It is based on transport by passenger cars, buses and coaches, and trains. All data should be based on movements on national territory, regardless of the nationality of the vehicle. However, the data collection methodology is not harmonised at the EU level.

# Pick ID for the table
id <- unique(grepEurostatTOC("Modal split of passenger transport", 
                         type = "table")$code)
# Get table with the given ID
dat_raw <- get_eurostat_raw(id)
# lets use kable function from knitr for nicer table outputs
library(knitr)
kable(head(dat_raw))
vehicle.geo.time X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012
BUS_TOT,AT NA 10.6 10.5 10.7 10.6 10.9 10.7 10.9 10.9 10.7 11 10.9 10.9 10.9 11 10.5 10.4 10.8 10.2 9.6 10.3 10.1 10
BUS_TOT,BE NA 10.1 e 10.3 e 10.3 e 10.4 e 11.2 11.2 e 11.1 11.1 10.7 e 10.5 10.7 11.4 12.5 12.7 13 13.2 13.4 12.5 12.5 12.2 12.3 12.4
BUS_TOT,BG NA NA NA NA NA 28.0 e 26.3 e 28.5 e 30.3 e 33.5 e 31.4 b 32 33.4 28.1 25 24.3 22.7 21.8 20.8 16.8 16.4 15.9 16.9
BUS_TOT,CH NA NA NA NA NA NA NA NA NA NA 5.2 5.2 5.1 5.2 5.2 5.3 5.6 5.5 5.1 5.1 5.1 5.1 5.1
BUS_TOT,CY NA NA NA NA NA NA NA NA NA NA 22.3 e 22.5 e 22.6 e 23.6 e 21.2 e 20.8 e 20.4 e 19.7 e 18.8 e 17.6 e 18.1 e 18.3 e 18.7 e
BUS_TOT,CZ NA NA NA 19.1 e 17.0 e 15.8 e 20.1 e 19.0 e 18.5 e 18.2 e 18.6 19.9 18.7 17.2 16 17.2 17.3 17 16.9 16 18.9 17 16.8
dat <- get_eurostat(id)
kable(head(dat))
vehicle geo time value
BUS_TOT AT 1990 NA
BUS_TOT BE 1990 NA
BUS_TOT BG 1990 NA
BUS_TOT CH 1990 NA
BUS_TOT CY 1990 NA
BUS_TOT CZ 1990 NA

Labelling the data

Function label_eurostat replaces the eurostat codes with definitions from Eurostat dictionaries for data frames created using get_eurostat-function.

datl <- label_eurostat(dat)

kable(head(datl))
vehicle geo time value
Motor coaches, buses and trolley buses Austria 1990 NA
Motor coaches, buses and trolley buses Belgium 1990 NA
Motor coaches, buses and trolley buses Bulgaria 1990 NA
Motor coaches, buses and trolley buses Switzerland 1990 NA
Motor coaches, buses and trolley buses Cyprus 1990 NA
Motor coaches, buses and trolley buses Czech Republic 1990 NA

Triangle plot for split of passenger transport

library(reshape)
tmp <- get_eurostat("tsdtr210")
bus  <- cast(tmp, geo ~ time , mean, subset= vehicle=="BUS_TOT")
car <- cast(tmp, geo ~ time , mean, subset= vehicle=="CAR")
train   <- cast(tmp, geo ~ time , mean, subset= vehicle=="TRN")

# select 2010 data
allTransports <- data.frame(bus = bus[,"2010"], 
                            car = car[,"2010"],
                            train = train[,"2010"])
# add countrynames
rownames(allTransports) <- levels(bus[,1])
allTransports <- na.omit(allTransports)

# triangle plot
library("plotrix")
triax.plot(allTransports, show.grid=TRUE, 
           label.points=TRUE, point.labels=rownames(allTransports), 
           pch=19)

plot of chunk plotGallery

Working with country codes

Eurostat is using ISO2 format for country names, OECD is using ISO3 for their studies, and Statistics Finland uses full country names. There are (at least) two ways to solve the issue. First one is to apply label_eurostat-function to your dataset.

tmp <- get_eurostat("tsdtr210")
tmpl <- label_eurostat(tmp)

kable(head(tmpl))
vehicle geo time value
Motor coaches, buses and trolley buses Austria 1990 NA
Motor coaches, buses and trolley buses Belgium 1990 NA
Motor coaches, buses and trolley buses Bulgaria 1990 NA
Motor coaches, buses and trolley buses Switzerland 1990 NA
Motor coaches, buses and trolley buses Cyprus 1990 NA
Motor coaches, buses and trolley buses Czech Republic 1990 NA

A second option is to use countrycode package can be used to convert between these formats.

library("countrycode")

# Use the country codes from previous examples
countries <- rownames(allTransports)
head(countries)
## [1] "AT" "BE" "BG" "CH" "CZ" "DE"
# From ISO2 (used by Eurostat) into ISO3 (used by OECD)
head(countrycode(countries, "iso2c", "iso3c"))
## [1] "AUT" "BEL" "BGR" "CHE" "CZE" "DEU"
# From ISO2 (used by Eurostat) into ISO (short country names)
head(countrycode(rownames(allTransports), "iso2c", "country.name"))
## [1] "Austria"        "Belgium"        "Bulgaria"       "Switzerland"   
## [5] "Czech Republic" "Germany"

Citing the package

This R package is based on earlier CRAN packages statfi and smarterpoland. The datamart package contains related tools for Eurostat but at the time of writing this tutorial this package seems to be in an experimental stage.

Citing the Data Kindly cite Eurostat.

Citing the R tools This work can be freely used, modified and distributed under the [BSD-2-clause (modified FreeBSD) license]. Kindly cite the R package as ‘Leo Lahti, Przemyslaw Biecek, Janne Huovari and Markus Kainu (C) 2014. eurostat R package. URL: http://ropengov.github.io/eurostat’.

Session info

This tutorial was created with

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=fi_FI.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fi_FI.UTF-8        LC_COLLATE=fi_FI.UTF-8    
##  [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=fi_FI.UTF-8   
##  [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] countrycode_0.17 plotrix_3.5-7    reshape_0.8.5    knitr_1.6       
## [5] eurostat_0.9.35  tidyr_0.1        plyr_1.8.1      
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.4     evaluate_0.5.5   formatR_1.0      htmltools_0.2.4 
##  [5] Rcpp_0.11.2      reshape2_1.4     rmarkdown_0.2.64 stringr_0.6.2   
##  [9] tools_3.1.1      yaml_2.1.13