This document explains how to calculate unemployment rate by regions, education level and age group using public microdata from INE.

You can find microdata files here

In this example we use data in 1st Quarter 2014. In my shinyapp I have data since 2008. It’s easy unzip epa files and concatenate ascii file. In unix-like system you can type this if your decompress files are in the same directory.

$ cd /path/to/files
$ files=*
$ cat $files > bigfile 

Read EPA microdata using MicroDatosEs package

MicroDatosEs package is very useful. I have microdata in my public dropbox

suppressPackageStartupMessages(library(MicroDatosEs))
library(plyr)

tmp <- tempfile()

download.file("https://dl.dropboxusercontent.com/u/2712908/EPA/EPA_1t14", 
              destfile = tmp, method="wget")
epa <- epa2005(tmp)

Subset data

dat <- subset(epa, select=c(ciclo,ccaa,edad,nforma, aoi,factorel))
rm(epa)

dat class is data.set from memisc package

class(dat)
## [1] "data.set"
## attr(,"package")
## [1] "memisc"

Use codebook function to tabulate data

codebook(dat$edad)
## ===========================================================================
## 
##    dat$edad 'Edad, grupos quinquenales de años cumplidos'
## 
## ---------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: nominal
## 
##         Values and labels     N     Percent  
##                                              
##     0   'de 0 A 4 años'    7498     4.4   4.4
##     5   'de 5 A 9 años'    8947     5.2   5.2
##    10   'de 10 A 15 años' 10458     6.1   6.1
##    16   'de 16 A 19 años'  6894     4.0   4.0
##    20   'de 20 A 24 años'  8946     5.2   5.2
##    25   'de 25 A 29 años'  8260     4.8   4.8
##    30   'de 30 A 34 años'  9641     5.6   5.6
##    35   'de 35 A 39 años' 12580     7.4   7.4
##    40   'de 40 A 44 años' 13254     7.8   7.8
##    45   'de 45 A 49 años' 13661     8.0   8.0
##    50   'de 50 A 54 años' 13172     7.7   7.7
##    55   'de 55 A 59 años' 11576     6.8   6.8
##    60   'de 60 A 64 años' 10243     6.0   6.0
##    65   '65 o más años'   35883    21.0  21.0

Recode

We’ll use memisc recode function

Create function specifying recodes

recodificacion <- function (dat) {
  dat$aoi <- memisc::recode(dat$aoi, "o" = 1 <- 3:4, "p" = 2 <- 5:6, "i" = 3 <- 7:9)
  
  dat$nforma3 <- memisc::recode(dat$nforma,                        
                                "Primary school or les"  = 1 <- c("AN","P1","P2"),
                                "Secondary" = 2 <- c("S1","SG","SP"),
                                "University"  = 3 <- c("SU")
  )
  dat$gedad <- memisc::recode(dat$edad,
                              "15 years old or less" = 1 <- c(0,5,10),
                              "16-34 years old" = 2 <- c(16,20,25,30),
                              "35-54 years old" = 3 <- c(35,40,45,50),
                              "55 years or older" = 4 <- c(55,60,65)
  )
 
  dat
}
dat <- recodificacion(dat)

Convert to classic data.frame

dat <-as.data.frame(dat)

Unemployment rate calculation

Unemployment rate is definied only for case with 16 years old or older, active people ( workers and unemployed people )

# use only cases with 16 or older
dat <- dat[ as.numeric(dat$edad) > 3, ]

# delete inactive
dat <- dat[ dat$aoi != "i", ]

# delete unusued levels
dat$gedad <- droplevels( dat$gedad)
levels(dat$gedad)
## [1] "16-34 years old"   "35-54 years old"   "55 years or older"

Unemployment rate by region, age group and education level

EPA is a weighted survey, then we should use factorel variable to get correct numbers.

ddplyfunction from plyr package help us

tasa.paro <- ddply(dat,.(ciclo,ccaa,gedad,nforma3),summarise,
                   rate=weighted.mean(aoi=="p",factorel))

head(tasa.paro)
##   ciclo      ccaa           gedad               nforma3   rate
## 1   166 Andalucía 16-34 years old Primary school or les 0.5304
## 2   166 Andalucía 16-34 years old             Secondary 0.4676
## 3   166 Andalucía 16-34 years old            University 0.3090
## 4   166 Andalucía 35-54 years old Primary school or les 0.4868
## 5   166 Andalucía 35-54 years old             Secondary 0.3487
## 6   166 Andalucía 35-54 years old            University 0.1806

We also can compute unemployment rate by age group and education level in Spain

tasa.paro.global <- ddply(dat,.(ciclo,gedad,nforma3),summarise,
                                  rate=weighted.mean(aoi=="p",factorel))

head(tasa.paro.global)
##   ciclo           gedad               nforma3   rate
## 1   166 16-34 years old Primary school or les 0.5198
## 2   166 16-34 years old             Secondary 0.4067
## 3   166 16-34 years old            University 0.2324
## 4   166 35-54 years old Primary school or les 0.4238
## 5   166 35-54 years old             Secondary 0.2637
## 6   166 35-54 years old            University 0.1304

Figure

library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:memisc':
## 
##     percent
p <- ggplot(tasa.paro,aes(x=gedad ,y=rate, group=ccaa) )
p + geom_line() + facet_wrap(  ccaa ~ nforma3,nrow=14) +
    scale_y_continuous(labels=percent)+
    theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5,size=rel(1),face="bold"),
          strip.text = element_text(size=rel(0.6),face="bold")) + ggtitle("Unemployment rate\nby region , age group and education level")

plot of chunk unnamed-chunk-12

More age and higher education level are associated with less unemployment rate in each region.

Note: In this shinyapp I put together data since 2008