This document explains how to calculate unemployment rate by regions, education level and age group using public microdata from INE.
You can find microdata files here
In this example we use data in 1st Quarter 2014. In my shinyapp I have data since 2008. It’s easy unzip epa files and concatenate ascii file. In unix-like system you can type this if your decompress files are in the same directory.
$ cd /path/to/files
$ files=*
$ cat $files > bigfile
MicroDatosEs
package is very useful. I have microdata in my public dropbox
suppressPackageStartupMessages(library(MicroDatosEs))
library(plyr)
tmp <- tempfile()
download.file("https://dl.dropboxusercontent.com/u/2712908/EPA/EPA_1t14",
destfile = tmp, method="wget")
epa <- epa2005(tmp)
dat <- subset(epa, select=c(ciclo,ccaa,edad,nforma, aoi,factorel))
rm(epa)
dat class is data.set from memisc
package
class(dat)
## [1] "data.set"
## attr(,"package")
## [1] "memisc"
Use codebook function to tabulate data
codebook(dat$edad)
## ===========================================================================
##
## dat$edad 'Edad, grupos quinquenales de años cumplidos'
##
## ---------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: nominal
##
## Values and labels N Percent
##
## 0 'de 0 A 4 años' 7498 4.4 4.4
## 5 'de 5 A 9 años' 8947 5.2 5.2
## 10 'de 10 A 15 años' 10458 6.1 6.1
## 16 'de 16 A 19 años' 6894 4.0 4.0
## 20 'de 20 A 24 años' 8946 5.2 5.2
## 25 'de 25 A 29 años' 8260 4.8 4.8
## 30 'de 30 A 34 años' 9641 5.6 5.6
## 35 'de 35 A 39 años' 12580 7.4 7.4
## 40 'de 40 A 44 años' 13254 7.8 7.8
## 45 'de 45 A 49 años' 13661 8.0 8.0
## 50 'de 50 A 54 años' 13172 7.7 7.7
## 55 'de 55 A 59 años' 11576 6.8 6.8
## 60 'de 60 A 64 años' 10243 6.0 6.0
## 65 '65 o más años' 35883 21.0 21.0
We’ll use memisc recode function
Create function specifying recodes
recodificacion <- function (dat) {
dat$aoi <- memisc::recode(dat$aoi, "o" = 1 <- 3:4, "p" = 2 <- 5:6, "i" = 3 <- 7:9)
dat$nforma3 <- memisc::recode(dat$nforma,
"Primary school or les" = 1 <- c("AN","P1","P2"),
"Secondary" = 2 <- c("S1","SG","SP"),
"University" = 3 <- c("SU")
)
dat$gedad <- memisc::recode(dat$edad,
"15 years old or less" = 1 <- c(0,5,10),
"16-34 years old" = 2 <- c(16,20,25,30),
"35-54 years old" = 3 <- c(35,40,45,50),
"55 years or older" = 4 <- c(55,60,65)
)
dat
}
dat <- recodificacion(dat)
Convert to classic data.frame
dat <-as.data.frame(dat)
Unemployment rate is definied only for case with 16 years old or older, active people ( workers and unemployed people )
# use only cases with 16 or older
dat <- dat[ as.numeric(dat$edad) > 3, ]
# delete inactive
dat <- dat[ dat$aoi != "i", ]
# delete unusued levels
dat$gedad <- droplevels( dat$gedad)
levels(dat$gedad)
## [1] "16-34 years old" "35-54 years old" "55 years or older"
EPA is a weighted survey, then we should use factorel variable to get correct numbers.
ddply
function from plyr
package help us
tasa.paro <- ddply(dat,.(ciclo,ccaa,gedad,nforma3),summarise,
rate=weighted.mean(aoi=="p",factorel))
head(tasa.paro)
## ciclo ccaa gedad nforma3 rate
## 1 166 Andalucía 16-34 years old Primary school or les 0.5304
## 2 166 Andalucía 16-34 years old Secondary 0.4676
## 3 166 Andalucía 16-34 years old University 0.3090
## 4 166 Andalucía 35-54 years old Primary school or les 0.4868
## 5 166 Andalucía 35-54 years old Secondary 0.3487
## 6 166 Andalucía 35-54 years old University 0.1806
We also can compute unemployment rate by age group and education level in Spain
tasa.paro.global <- ddply(dat,.(ciclo,gedad,nforma3),summarise,
rate=weighted.mean(aoi=="p",factorel))
head(tasa.paro.global)
## ciclo gedad nforma3 rate
## 1 166 16-34 years old Primary school or les 0.5198
## 2 166 16-34 years old Secondary 0.4067
## 3 166 16-34 years old University 0.2324
## 4 166 35-54 years old Primary school or les 0.4238
## 5 166 35-54 years old Secondary 0.2637
## 6 166 35-54 years old University 0.1304
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:memisc':
##
## percent
p <- ggplot(tasa.paro,aes(x=gedad ,y=rate, group=ccaa) )
p + geom_line() + facet_wrap( ccaa ~ nforma3,nrow=14) +
scale_y_continuous(labels=percent)+
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5,size=rel(1),face="bold"),
strip.text = element_text(size=rel(0.6),face="bold")) + ggtitle("Unemployment rate\nby region , age group and education level")
More age and higher education level are associated with less unemployment rate in each region.
Note: In this shinyapp I put together data since 2008