Impact evaluation of Slovak active labour market policy programmes

Prepared by Soňa Dulíková, Lukáš Lafférs and Miroslav Štefánik (miroslav.stefanik@savba.sk)
Institute of Economic Research, Slovak Academy of Sciences
This work was supported by the Slovak Research and Development Agency under the contract no. APVV-17-0329.

The [Component] Projects and programmes - Projects - Guidance

This is an automated report providing evidence on [Component] Projects and programmes - Projects - Guidance, one of the active labour market policy (ALMP) programmes, implemented in Slovakia between during 2017.
The [Component] Projects and programmes - Projects - Guidance is provided based on the § 54 of the Act on Employment Services Col. 2004/5.

if(qualitative_data_condition) {
  knitr::knit_exit()
}

1. The description of the programme

Based on the Labour Market Policy Database (LMP) administrated by DG Employment of the European Commission, the [Component] Projects and programmes - Projects - Guidance is classified as an “Labour market services” type of ALMP programme, with the programme specific code 11_SK39_3.

[Component] Projects and programmes - Projects - Guidance aims: Improvement the situation of job-seekers on labour market, Approval of new active labour market policies, Pilot projects or pilot programs oriented on support regional or local employment progress

Beneficiaries are citizens, registered job-seekers, job-seekers, employers.

The Eligible are NA

Implementation: Projects and programs financed from ESF founds, financed or co-financed from national budget or from other resources. Active labour market policy measures shall also include a) national projects approved by the Ministry and implemented by the Central Office or an office, b) projects to improve the status of jobseekers in the labour market approved by the Ministry and implemented by the Central Office, c) projects to improve the status of jobseekers in the labour market approved by the Ministry and implemented by an office, d) projects to improve the status of jobseekers in the labour market approved by the Ministry or the Central Office and implemented by an office, a legal entity or a natural person, e) pilot projects to test new active labour market policy measures approved by the Ministry and implemented by the Central Office, f) pilot projects or pilot programmes to support regional or local employment approved by the Central Office and implemented by an office. Specifications of national projects (NP) that are part of intervention measures: Through Apprenticeship to Employment Employers may receive: - a financial contribution for mentoring, to partially cover mentors labour costs, - a financial contribution to partially cover total labour costs of employees, - a financial contribution to partially cover the unavoidable costs. The aim of this project is to give NEETs the opportunity to gain or improve their knowledge, skills and practical experience through mentoring. Graduate employment practice starts employment - Contribution to a graduate to start a graduate employment practice. - Contribution to employers to partially cover the advance payments for permanent health insurance contributions, social insurance contributions and mandatory contributions to old-age pension savings paid by the employer. The aim of this measure is to increase the employment rate and employability of NEETs, as well as the inclusion of young people who are not in education, employment or training, with subsequent opportunity to create a new position for a young person who finished a graduate employment practice. Success in the Labour Market - Contribution to employers to partially cover total labour costs of employees. - Contribution to jobseekers to support job creation through self-employment. The contribution partially covers costs related to self-employment activities. The aim is to empower the NEETs and improve their situation in the labour market, as well as to increase their employability. Breaking the Vicious Circle of Unemployment - Contribution to employers to partially cover total labour costs of employees; to support training and tutoring activities, to partially cover total labour costs of tutors. The aim is to promote the employment and reduce long-term unemployment through education and training. Chance of employment - Contribution to employers to partially cover total labour costs of employees. - A single financial contribution to partially cover the unavoidable costs related to community services. The aim is to improve the situation of disadvantaged unemployed in the labour market, increase the employment rate and their employability, to reduce long-term unemployment and support local and regional employment. Active on the Labour Market - Contribution to employers to partially cover the advance payments for permanent health insurance contributions, social insurance contributions and mandatory contributions to old-age pension savings paid by the employer. The aim is to improve the situation of disadvantaged unemployed, especially people over 50 years of age; to increase the employment rate and their employability through the provision of contributions to support the job creation in less developed regions. Expenditure on the RE-PAS Project (retraining allowance) represents tuition fees, which are refunded by the Labour Office after the completion of a training course to the provider. The aim of this measure is to give jobseekers the opportunity to attend a training course to gain new skills and knowledge.

1.1 Participants and expenditures

#Preparing of participants and expenditures tables 
partSK <- subset(part, geo == 'SK' & year == format(as.Date(params$ep_start),"%Y") & age == 'TOTAL' & sex == 'T' & stk_flow == 'ENT' )
names(programesSK)[1] <- 'lmp_type'
partSK <- merge(partSK, programesSK, by = 'lmp_type', all = T)
partSK <- subset(partSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
partSK_datapie <- subset(partSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
partSK_datapie$value <- ifelse(is.na(partSK_datapie$value), sum(partSK[substr(partSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = T), partSK_datapie$value)

expSK <- subset(exp, geo == 'SK' & year == format(as.Date(params$ep_start),"%Y") & exptype == 'XTOT' & unit == 'MIO_EUR')
expSK <- subset(expSK, Classification == '2' | Classification == '4' | Classification == '5' | Classification == '6' | Classification == '7')
expSK <- subset(expSK, !is.na(expSK$value))
expSK <- expSK %>% select(Classification, value) %>% group_by(Classification) %>% dplyr::summarise(value = sum(value))

#Share of expenditures and participants at programm
partSK_per <- partSK_datapie %>% select(lmp_type, value) 
partSK_per$value <- ifelse(partSK_per$lmp_type == substr(lmp_code,1,1),  
                           partSK_per[partSK_per$lmp_type == substr(lmp_code,1,1),'value'] - partSK_per[partSK_per$lmp_type == lmp_code,'value'],
                           partSK_per$value)
partSK_per <- partSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))


expSK_per <- expSK %>% select(Classification, value) %>% 
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

program <-  data.frame(lmp_type  = c('total', subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]),
                       participants = c(sum(partSK_per$value), subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1]),
                       per = c('100 %', paste(round(subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1] / sum(partSK_per$value) * 100,2),'%'))
                      )

type_share_exp <- expSK_per$per[expSK_per$Classification == subset(qualitative, qualitative$almp == params$measure)$Classification[1]]
type_share_par <- partSK_per$per[partSK_per$lmp_type == subset(qualitative, qualitative$almp == params$measure)$Classification[1]] 
prog_share_par <- program$per[program$lmp_type == subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]]
prog_numb_par <- program$participants[program$lmp_type == subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]]

The left pie chart Participants displays the share of participants in ALMP programmes grouped by ALMP types of the LMP classification. Shares are based on LMP Database stock figures reported for the calendar year 2017.

The right pie chart Expenditure display the share of expenditure using the same LMP typology as the Participants pie chart. LMP Database only reports expenditures at the level of program types, so we are not able to trace expenditures to the level of a particular program.

Graph 1: Resources flowing to [Component] Projects and programmes - Projects - Guidance during 2017

#Preparing data for Pie Chart
#PARTICIPANTS
piechart_P <- select(partSK_datapie, lmp_type, value)

piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == '2' | piechart_P$lmp_type == '4' | piechart_P$lmp_type == '5' | piechart_P$lmp_type == '6' | piechart_P$lmp_type == '7',       
                              qualitative$class[match(piechart_P$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_P <- piechart_P[order(piechart_P$lmp_type),]
piechart_P$focus <- ifelse(
  piechart_P$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)

#EXPENDITURES 
piechart_E <- select(expSK, Classification, value)
piechart_E <- piechart_E[order(piechart_E$Classification),]
piechart_E$Classification <- ifelse(piechart_E$Classification == '2' | piechart_E$Classification == '4' | piechart_E$Classification == '5' | piechart_E$Classification == '6' | piechart_E$Classification == '7', qualitative$class[match(piechart_E$Classification, qualitative$Classification)], qualitative$Labour.market.services[match(piechart_E$Classification, piechart_E$Classification)]) 
piechart_E <- piechart_E[order(piechart_E$Classification),]
piechart_E$focus <- ifelse(
  piechart_E$Classification == subset(qualitative, qualitative$almp == params$measure)$class[1],
  0.05, 0)

## PIE CHART
#PARTICIPANTS
piechart_P$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_P$lmp_type)
ev_measure <- gsub("(.{25,}?)\\s", "\\1\n",
                   paste(subset(qualitative, qualitative$almp == params$measure)$class[1],
                         subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
                         sep =': '))

ev_type <- gsub("(.{25,}?)\\s", "\\1\n",
                   subset(qualitative, qualitative$almp == params$measure)$class[1])

piechart_P$value <- ifelse(piechart_P$lmp_type == ev_type, piechart_P$value - piechart_P[piechart_P$lmp_type == ev_measure,"value"], piechart_P$value)

piechart_P <- piechart_P %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_P <- ggplot(piechart_P) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_P, stat = 'pie',color='white')+
  ggtitle('Participants') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_P$value != 0 &  piechart_P$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_P$lmp_type == ev_type, 
                                           "steelblue1", 
                                            gray.colors(6, start = 0.8)))))

#EXPENDITURES
piechart_E$Classification <- gsub("(.{25,}?)\\s", "\\1\n", piechart_E$Classification)

piechart_E <- piechart_E %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))


rpie <- 1
rlabel <- (0.6 * rpie) 

plot_piechart_E <- ggplot(piechart_E) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = Classification, explode  = focus),
               data = piechart_E, stat = 'pie',color='white')+
  ggtitle('Expenditure') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(Classification == ev_type ,per,"")),size = 4)+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm"))+
  guides(fill=guide_legend(title="Classification of LMP")) +
  scale_fill_manual(values=c(ifelse(piechart_E$value != 0 &
                                      piechart_E$Classification == ev_type, 
                                    "steelblue1", 
                                    gray.colors(6, start = 0.8))))

Based on the LMP database, NA individuals participated in the [Component] Projects and programmes - Projects - Guidance during 2017. This accounted for the NA % percent of the total number of participants in all Slovak ALMP programmes (LMP types 2-7). The total roofing LMP type Labour market services presents percent of the total Slovak ALMP (type 2-7) expenditure and percent of the total ALMP participants.

1.2 The [Component] Projects and programmes - Projects - Guidance in the context of ALMPs in Slovakia

Using administrative data, we first provide a picture of the importance of the [Component] Projects and programmes - Projects - Guidance in the context of Slovak ALMP. The following flowchart displays flows of job seekers registered into database of unemployment during . Flows are based on the movement of these individuals during the two years following their registration, this period is divided into 6 months sub-periods (0/6/12/18/24). For each of these sub-periods we observe the flows of registered jobseekers to employment, or their de-registration for an another reason.
Job seekers can also move to one of the ALMP programmes. The lines highlighted in blue represent the flow of jobseekers participating in measure P54P, [Component] Projects and programmes - Projects - Guidance.

Graph 2: The [Component] Projects and programmes - Projects - Guidance in the structure of the flows of jobseekers registered during 2017

The next table shows full names of programmes shown in graph above.

Table 1: Explanatory table for Graph 2

Sankey_description <- select(qualitative, almp, Labour.market.services, Labour.market.services_SK)
Sankey_description[32,1] <- 'P054'
Sankey_description<- rbind(Sankey_description, Sankey_description[32,])
Sankey_description[nrow(Sankey_description),1] <- 'P54O'
Sankey_description[28,1] <- 'P54D'

Sankey_tabel1 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
Sankey_tabel1$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel1$Measures, Sankey_description$almp)]

Sankey_tabel1  %>% kbl(format = 'html', booktabs = T , align = 'c', row.names = F)%>%
  kable_paper('hover', full_width = F)%>%
  column_spec(1,  border_right = T)

Measures	Name of programme
P54K	[Component] Projects and programmes - Projects - Education and training
P054	Projects and programmes
P060	Contribution for reimbursement of operating costs of a sheltered workshop or a sheltered workplace and for transport expenses for employees
P053	Contribution for commuting to work
P54O	Projects and programmes
P052	Contribution for activation activity in the form of minor communal services performed for a municipality or minor services for a self-governing region
P051	Contribution for the graduate practice
P52A	Contribution for activation activity in the form of voluntary works
P050	Contribution to support employing a disadvantaged job seeker
P50J	Contribution supporting the development of local and regional employment
P54P	[Component] Projects and programmes - Projects - Guidance

## Preparing of df
# we observe only  young people under XX (age) which inflow in XXXX (entry year) 
df_r <- subset(df, age <= age_group_max)
df_r <- subset(df_r, format(as.Date(df_r$entry),"%Y")== format(as.Date(params$ep_start),"%Y"))
#df_r <- df_r %>% mutate_all(na_if,"") #if cells are empty -> change it to NAs
df_r <- subset(df_r, healthy < 3)

# DOVOD VYRADENIA 
dovod_vyradenia = c('V01','V02','V03','V1','V12','V15') #zamestnali sa 

# Opatrenie P032 vyhodiť (ak sa budú meniť aj iné opatrenia ako napr. 54R a 54Rp tak tu sa to opraví (%in% c()))
Salmps <- subset(almps, !nastroj  %in% c('P032','P54P','P54D'))

## Spojenie DF a ALMPS 
df_almps <- merge(Salmps, df_r, by = 'klient_id')

## Vyfiltruj iba tie klient_id. pri ktorých nástroj je params$measure
partic_measure <- df_almps[df_almps$nastroj == params$measure, 'klient_id']
df_almps <- filter(df_almps, klient_id %in% partic_measure) #tu su tí, ktorí boli na aj na params$measure ale aj na iných opatreniach 

# Podmienka prieniku času pri databáze nezamestnaných a v zúčastnili sa evaluated measure
df_almps_measure <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj == params$measure) # tu su tí, ktorí išli v skúmanom čase do opatrenia params$measure 
##### toto je moja základňa môj prvý stĺpec 

##### Podmienka prieniku času pri databáze nezamestnaných a iných programoch ako evaluated measue ale zároveň sú to tí, ktorí už niekedy na evaluated measure už boli 
df_almps_other <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj != params$measure)

# upravíme si dáta ktoré budeme používať pri grafe
df_almps_measure <- select(df_almps_measure, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)
df_almps_other <- select(df_almps_other, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)


#smojím iba tých ktorí boli aj v evaluated measure aj v iných opatreniach, all.x = T lebo chceme aj tých ktorý boli iba na evaluated measure (nemuseli sa zúčastniť aj iných programov)
flow <- merge(df_almps_measure, df_almps_other, by = 'klient_id', all.x = T) 

#dni od začatia opatrenia entrya.x po začatie iného opatrenia entrya.y alebo po zamestnanie/odchod z iného dôvodu exit.x
flow$days <- ifelse(is.na(flow$nastroj.y),
                    as.numeric(difftime(flow$exit.x, flow$entrya.x, units = 'days')),
                    as.numeric(difftime(flow$entrya.y, flow$entrya.x, units = 'days')))

#ak je dovod vyradenia NA ale zúčastnili sa na opatrení 
flow$dovod_vyradenia_kod.y <- ifelse(is.na(flow$dovod_vyradenia_kod.y) & !is.na(flow$nastroj.y), 'V01', flow$dovod_vyradenia_kod.y)

#vyfiltruj tých ktorý boli aj na evaluated measure aj na inom opatrení alebo sa zamestnali a days nie je záporné 
#podmienka -> entrya do ďalšieho projektu musí byť väčšie ako entrya do evaluated measure
#flow$days nemôže byť záporné 
flow <- filter(flow, days >= 0)

#nastroj -> ak dovod vyradenia sa rovna nejakému prvku z vektoru dovodov vyradenia tak -> employed inak another reason
flow  <- flow %>% 
  mutate(nastroj.y = case_when(is.na(flow$nastroj.y) & flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'employed',
                               is.na(flow$nastroj.y) & !flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'another reason',
                               !is.na(flow$nastroj.y) ~ flow$nastroj.y)
  )


#dataframe, ktorý budem používať pri tvorbe grafu
Sankey_measure <- flow %>% select(nastroj.x, nastroj.y, days) %>% 
           mutate(month = ceiling(days/30.417))

Sankey_measure <- Sankey_measure %>%mutate(
  time = case_when(
    Sankey_measure$month %in% seq(0,6,1)  ~ 6,
    Sankey_measure$month %in% seq(7,12,1)  ~ 12, 
    Sankey_measure$month %in% seq(13,18,1)  ~ 18,
    Sankey_measure$month %in% seq(19,100,1)  ~ 24,
  ) 
)

Sankey_measure$sources <- ifelse(Sankey_measure$time == 6 | Sankey_measure$time == 12 |
                                   Sankey_measure$time == 18 | Sankey_measure$time == 24, 
                                 Sankey_measure$time - 6, 
                                 Sankey_measure$time)


#zosumarizuj, koľký mladí išli do ktorého opatrenia, zamestnali sa alebo odišli z registra z iných dôvodov
San_measure <- Sankey_measure %>% select(nastroj.y, time, sources) %>%
  group_by(nastroj.y, time, sources) %>% summarise(num = n(), .groups = 'drop') %>%
  rename(nastroj = nastroj.y)


# rozdeľ opatrenia, na tie ostatné almps - OTHER ALMPS 
nastroj_kod <- c('another reason','employed')
NastrojKod <- San_measure[!San_measure$nastroj %in% nastroj_kod,]  %>%  group_by(nastroj) %>% summarise(num = sum(num), .groups = 'drop') %>%
  mutate(perc = num*100 / sum(num),
         cut = case_when(perc >= 5 ~ 1,
                         perc < 5 ~ 0))
nastroj_kod <- c(nastroj_kod, NastrojKod$nastroj[NastrojKod$cut == '1'])

San_aplmps <- subset(San_measure, nastroj %in% nastroj_kod | nastroj == params$measure)
San_other_aplmps <- subset(San_measure, !nastroj %in% nastroj_kod & !nastroj == params$measure)

San_other_aplmps <- San_other_aplmps %>%  group_by(sources , time) %>% summarise(num = sum(num), .groups = 'drop') 
San_other_aplmps$nastroj <- 'OTHER ALMPS'
San_other_aplmps <- relocate(San_other_aplmps, c(nastroj, time), .before = sources,)

San_measure <- rbind(San_aplmps, San_other_aplmps)
remove(San_aplmps, San_other_aplmps)


# uzly grafu (jedinečné), musia tu byť všetky opatrenia
node_m <- data.frame(
  name=c(as.character(San_measure$nastroj), as.character(San_measure$sources))%>% unique()
)

# definovanie koľko registrovaných bude medzi tými rokmi  
velky_df <- data.frame()
for (i in seq(6,24,6)){
  pocet <- San_measure %>%  group_by('sources' = sources >= i) %>% summarise(num = sum(num), .groups = 'drop') 
  pocet <- subset(pocet, sources == TRUE)
  pocet$sources <- i
  velky_df <- rbind(velky_df, pocet)
}

# musím si velky_df prisposobiť tak, aby roky boli ako nodes aby som to mohla spojiť s dataframe San s ktorým potom budem ďalej robiť graf
# preto sources budu ako nastroj -> aby som spravila nodes, years su sources ale sources su years -1 v skutočnosti (v san grafe)
colnames(velky_df) <- c('nastroj', 'num')
velky_df$time <- velky_df$nastroj
velky_df$sources <- San_measure$sources[match(velky_df$time, San_measure$time)] 
velky_df <- relocate(velky_df, num, .after = sources)

San_measure <- rbind(San_measure, velky_df)

#urobím IDsources a ID target podľa uzlov aby garf vedel ten flow medzi jednotlivími uzlami 
San_measure$IDsource <- match(San_measure$sources, node_m$name)-1 
San_measure$IDtarget <- match(San_measure$nastroj, node_m$name)-1

#Color 
time <- seq(0,24,6)
NOALMP <- c('another reason', 'employed')
node_m <- node_m %>% mutate(group = case_when(node_m$name %in% NOALMP ~ 'A',
                                              !node_m$name %in% NOALMP & !node_m$name %in% time ~ 'B',
                                              node_m$name %in% time ~ 'C'
)
)

San_measure$group <- 'type_a'

my_color <- 'd3.scaleOrdinal() .domain(["type_a", "A","B", "C"]) .range(["lightgray", "darkseagreen", "thistle", "rosybrown", "red"])'

San_measure <- as.data.frame(San_measure)

Sankey_ev.measure <- sankeyNetwork(Links = San_measure, Nodes = node_m,
                                   Source = "IDsource", Target = "IDtarget",
                                   Value = "num", NodeID = "name", 
                                   sinksRight=F, fontSize = 14,
                                   fontFamily = "sans-serif",
                                   width = 900,
                                   colourScale=my_color, LinkGroup="group", NodeGroup="group",
                                   nodePadding=10)

condition_for_Sankeyplot <- length(unique(San_measure$IDsource))>=1 && length(unique(San_measure$IDtarget))>=1 && !any(is.na(San_measure$IDsource)) && !any(is.na(San_measure$IDtarget))

if(condition_for_Sankeyplot){
  Sankey_ev.measure
}

cond_for_table <- F
if(condition_for_Sankeyplot){
  Sankey_tabel2 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
  Sankey_tabel2$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel2$Measures, Sankey_description$almp)]
  
  cond_for_table <- nrow(Sankey_tabel2) >=1
}

if (cond_for_table){
  Sankey_tabel2  %>% kbl(format = 'html', booktabs = T , align = 'c', row.names = F)%>%
    kable_paper('hover', full_width = F)%>%
    column_spec(1,  border_right = T)
}

2. Data

This evaluation report is based on administrative data on registered unemployed jobseekers in Slovakia, inter-linkable with database of participants in ALMP measures. The export was provided by the Central Office of Labour, Social Affairs and Family of the Slovak Republic (COLSAF) in the beginning of 2021 and covers the period between January 2014 until December 2020. The raw data were processed using a data-preparation script, published here (ADD URL).
The data frame “df” covers all unemployment spells of unemployed jobseekers, with attributes collected at the moment of their registration as unemployed jobseekers (application form here - ADD URL).

### DEFINE THE EVALUATION PERIOD #
ep_start <- as.Date(params$ep_start)
ep_end <- as.Date(params$ep_end)
un_spell <- spell
measure <- params$measure

    ########################################x
    ## SELECTING THE EVALUATION SAMPLE #
    ########################################x,
    
treated<-filter(almps, nastroj==toString(params$measure))

#Sub-groups to be dropped: 
# - those with ALMP participation 2 years before the EP
IDalmps_before<-unique(almps$klient_id[almps$entrya<ep_start & almps$entrya>=ep_start-730]) 
# - those with ALMP participation in other ALMP during the EP
IDalmps_during_ep<-unique(almps$klient_id[(almps$entrya<=ep_end & almps$entrya>=ep_start) & almps$nastroj!=toString(params$measure)])

###DEFINE THE ELIGIBILITY CRITERIA 
#the EC are measure specific, in the case of looping over multiple measures EC need to be elaborated t a form of table or a list and added to the parameters
#SUBSETTING THE BASE EVALUATION DATASET OF ELIGIBLE 
cond0<-as.logical(df$entry<=ep_end & df$exit>=ep_start) # Being on the register of unemployed during the evaluation period
cond1<-as.logical(df$age < age_group_max) 
cond2<-as.logical((df$exit-df$entry)>=un_spell) #LENGTH OF PREVIOUS UNEMPLOYMENT SPELL
cond3<-as.logical(df$entry>=ep_start-730) # Dropping old unemployment spells (cases inflowing more than 730 days before the start of the evaluation period)

dfe<-df[cond0 & cond1 & cond2 & cond3,]
n1 <- dim(dfe)[1]
sampleIDs<-unique(df$klient_id[cond0 & cond1 & cond2 & cond3])
n2 <- length(sampleIDs)

###
#### ONLY KEEP THE SPELLS OF PARTICIPANTS DURING WHICH THEY PARTICIPATED 
#### Creating dataframe of participants in the evaluated programme during the evaluation period.
dfa<-filter(treated, entrya<=ep_end & entrya>=ep_start)
npart0<-length(unique(dfa$klient_id))
npart1<-dim(dfa)[1]

#Drop other ALMP participations from the group of participants as well as the eligible non-participants
n3 <- nrow(filter(dfa, klient_id %in% IDalmps_before))
n4 <- nrow(filter(dfe, klient_id %in% IDalmps_before))
dfa<-filter(dfa, !klient_id %in% IDalmps_before)
dfe<-filter(dfe, !klient_id %in% IDalmps_before)

n5 <- nrow(filter(dfa, klient_id %in% IDalmps_during_ep))
n6 <- nrow(filter(dfe, klient_id %in% IDalmps_during_ep))
dfa<-filter(dfa, !klient_id %in% IDalmps_during_ep)
dfe<-filter(dfe, !klient_id %in% IDalmps_during_ep)



#### Only participants with one participation during the evaluation period are sampled. 
#### JS with multiple participations are droped from the sample
dfa<-dfa %>%
  group_by(klient_id) %>% 
  mutate(rep=n()) # rep is the number of participations of one JS repeating during 2014

n7 <- nrow(filter(dfa, rep!=1))
dfa<-filter(dfa, rep==1)
npart2<-dim(dfa)[1] # Number of participants after cleaning with multiple ALMP participations

###Participants who also participated in other ALMP measures (§54) are dropped
progOUT<- setdiff(c("P050", "P50A", "P50C","P50J", "P50K" ,"P51A", "P054", "P54D", "P54E", "P54O", "P54P", "P54U"),params$measure)
outIDs<-unique(almps$klient_id[(almps$entrya>=as.Date(params$ep_start) & almps$entrya<=as.Date(params$ep_end)+730) & as.logical(almps$nastroj %in% progOUT)])

n8 <- nrow(filter(dfa, klient_id %in% outIDs))
n9 <- nrow(filter(dfe, klient_id %in% outIDs))

dfa<-(filter(dfa, !klient_id %in% outIDs))
dfe<-(filter(dfe, !klient_id %in% outIDs))

npart3<-length(unique(dfa$klient_id)) # The number of participants after we drop participations in supported employment during the outcome observation period 

#MERGING PARTICIPATIONS AND UNEMPLOYMENT SPELLS
#First we add the date of the entry and exit from the registration into the table of participations in measure (evaluated measure params$measure). We only import entry dates for the individuals in the evaluation sample. 
dfa<-merge(dfa, select(dfe, klient_id, entry, exit), by="klient_id", all.x = TRUE)
nrowdfa <- nrow(dfa)

#Second we filter only the registrations of members of the evaluation sample during which the programme participation took place. 
dfa<-filter(dfa, dfa$exit+30>=dfa$entrya & dfa$entrya<=ep_end & entrya >= entry) # Keeping only the participations happening during an unemployment spell
n10 <- nrowdfa-dim(dfa)[1]
npart4<-dim(dfa)[1] # Number of participants after cleaning participations outside an unemployment spell (data quality issue)

## Participants #
particIDs<-unique(dfa$klient_id)
## Eligible #
# nonpartIDs<-sampleIDs[!(sampleIDs %in% particIDs)]
nonpart<-filter(dfe, !(klient_id %in% particIDs))
###Out of the participants only one-time participations happening during an unemployment spell are used 
partic<-merge(dfe, dfa, by = c("klient_id", "entry"), all.x = FALSE)


### Cleaning and renaming #
partic$exit.y<-NULL
partic<-partic %>% rename(exit=exit.x)

nonpart$entrya<-NA
nonpart$exita<-NA
nonpart$nastroj<-NA
nonpart$naklady<-NA
nonpart$projekt<-NA
partic$rep<-NULL

#Filter extreme values (1%) of the waiting time until participation in the evaluated measure 
wte<-quantile(as.numeric(partic$entrya)-as.numeric(partic$entry),na.rm=TRUE, probs=0.99)
n11 <- nrow(filter(filter(partic, as.numeric(entrya)-as.numeric(entry)>wte)))
partic<-filter(partic, as.numeric(entrya)-as.numeric(entry)<=wte)

esample<-rbind(nonpart, partic)
esample$treated<-!is.na(esample$entrya)
#Filter extreme values of the waiting time until participation in the evaluated measure 

###Unemployment spells ending with LM placements
#esample<-filter(esample, dovod_vyradenia_kod == 'V01' | dovod_vyradenia_kod == 'V02' | dovod_vyradenia_kod == 'V03' | dovod_vyradenia_kod == 'V1' | dovod_vyradenia_kod == 'V12' | dovod_vyradenia_kod == 'V15')

    ########################################x
    ## GENERATING EXPLANATORY VARIABLES #
    ########################################x,
    esample$ent <- as.numeric(as.Date(ep_start))-as.numeric(as.Date(esample$entry))

    ########AGEG
    esample$ageg <- cut_interval(esample$age, 5, labels=FALSE)
    esample$ageg <- as.factor(esample$ageg)
    
    ####Extra columns for dummy variables go into the esample_est for further testing
    esample <- dummy_cols(esample, select_columns = c("ageg"), remove_first_dummy = TRUE)
    ageg_dummies<-colnames(esample)[grepl("ageg_", colnames(esample))]

  #######Regional Unemployment rate during the implementation period
    esample[, "UR_region"] <- esample[, paste0("UR_region_",year(ep_start), "")]
    esample[,grepl("UR_region_", colnames(esample))]<-NULL
    esample$UR_region <- as.numeric(gsub(",", ".", gsub("\\.", "", esample$UR_region)))

  

#Cleaning
#nonpart<-NULL
#partic<-NULL




# Share of repeated unemployment after participation in ALMP
# Merge esample, history table
h_esample <- merge(select(esample, klient_id, entry, exit, dovod_vyradenia_kod, entrya, exita, nastroj, treated), dfh, by = 'klient_id', all.x = T, all.y = F)

# as.Date
entry <- paste('entry', seq(1:15), sep = '')
exit <- paste('exit', seq(1:15), sep = '')

for (e in entry){
  h_esample[ ,e] <- as.Date(h_esample[ ,e], origin = '1970-01-01')
}
for (x in exit){
  h_esample[ ,x] <- as.Date(h_esample[ ,x], origin = '1970-01-01')
}
h_esample$exita <- as.Date(h_esample$exita, origin = '1970-01-01')

#Dropping cases with over than 15 unemployment spells 
h_esample<-filter(h_esample, is.na(entry16)) #subset (klient_id) jobseekers who became unemployment only 15 times 
esample <- subset(esample, klient_id %in% h_esample$klient_id) 
#remove entry16+ and exit16+ columns
h_esample<-select(h_esample, -entry16, -entry17, -entry18, -entry19, -entry20, -exit16, -exit17, -exit18, -exit19, -exit20)

esample_condition <- (mean(esample$treated) < 0.00001)

if(esample_condition) {
  knitr::knit_exit()
}

2.1 Description of participants and eligible

Our evaluation sample consists of 414 546 eligible individuals, registered as jobseekers during the evaluation period starting from 2017-01-01 until 2017-12-31, These jobseekers have generated in total 483 955 unemployment registrations during the period 2014-2020. Out of them, in 7 813 registrations jobseekers participated in the evaluated measure [Component] Projects and programmes - Projects - Guidance during the evaluation period. 4 965 participants and 204 157 eligible JS were dropped from the sample, because of multiple participations in the evaluated program (or other relevant ALMPs) during and after the evaluation period. After cleaning we have sampled 2 848 JS with one-time participation during the evaluation period. At the same time, there was 279 798 eligible non-participants present in the register of unemployed jobseekers during 2017.

The group of participants and eligible show differences on a number of observed characteristics. Table 1 displays the presents an overview of these differences on selected characteristics.

Table 2: Descriptive statistics of participants and eligible (selected characteristics)

####
## number of participants and eligible 
####

#separate table dfe with participants in ALMP  and eligible 
elig <- distinct_at(nonpart,vars(klient_id),.keep_all = TRUE)
part <- distinct_at(partic,vars(klient_id),.keep_all = TRUE)

#BASIC DESCRIPTIVE TABLE

# The number of participants and eligible in the sample'
#tab1<-cbind(sum(!is.na(dfe$entrya)),sum(is.na(dfe$entrya)))
tab1 <- cbind(format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
colnames(tab1) <- c('Participants', 'Eligible')
tab1 <- data.frame(cbind(Description = 'Number of observations', tab1))

####
## Age distribution
####

age_par <- part %>% select(age) %>% group_by(age) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

age_elig <- elig %>% select(age) %>% group_by(age) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

age <- merge(age_par, age_elig, by='age', all = T)
age$Participants_percent <- ifelse(is.na(age$Participants_percent), paste(0,'%'), age$Participants_percent)
age$Participants_total <- ifelse(is.na(age$Participants_total), 0, age$Participants_total)

mean_age_par <- round(mean(part$age),1)
mean_age_elig <- round(mean(elig$age),1)

##### MEAN 
mean_age<-data.frame(mean_age_par,mean_age_elig)
mean_age <- cbind(Description = 'Age (years)', mean_age)
mean_age <- rename(mean_age, Participants = mean_age_par, Eligible=mean_age_elig)
####

age_elig$desc <- 'Eligible'
age_par$desc <- 'Participants'
age_elig <- age_elig %>% rename(total = Eligible_total, percent = Eligible_percent)
age_par <- age_par %>% rename(total = Participants_total, percent = Participants_percent)
age_r <- rbind(age_par,age_elig)

age_plot <- ggplot(age_r, aes(x = age, y = total, group= desc)) +
  geom_point(aes(color = desc), size = 1.5)+
  geom_line(aes(color = desc), size = 1) + 
  ylim(0,23000) +
  theme_light() +
  geom_text(aes(label = paste(percent, '\n\n' )), col ='black', size = 3, fontface ='italic')+
  labs(
    title = "Compare of age distribution (%)",
    x = "Age",
    y = "Total Count"
  )

####
## Gender distribution
####

gender_par <- part %>% select(male) %>% group_by(male) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

gender_elig <- elig %>% select(male) %>% group_by(male) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

gender <- merge(gender_par, gender_elig, by='male', all = T)

####
male <- data.frame(gender_par[2,3], gender_elig[2,3])
male <- cbind(Description = 'Male', male)
male <- rename(male, Participants = Participants_percent, Eligible=Eligible_percent)


####
##  Education distribution
####

education_par <- part %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Participants_total = n(), .groups = 'drop')  %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2))
education_par <- education_par[!(is.na(education_par$noedu)),]
education_par <- melt(education_par, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_par <- education_par[education_par$value == 1,] 
education_par <- select(education_par, -value)

education_elig <- elig %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Eligible_total = n(), .groups = 'drop')  %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2))
education_elig <- education_elig[!(is.na(education_elig$noedu)),]
education_elig <- melt(education_elig, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_elig <- education_elig[education_elig$value == 1,] 
education_elig <- select(education_elig, -value)

education <- merge(education_par, education_elig, by='variable', all = T)
education <- select(education, variable, Participants, Eligible)
education <- rename(education, Description = variable)

education <- education%>%mutate(
  Description = case_when(
    education$Description ==  'noedu' ~ 'No education',    
    education$Description ==  'primary' ~ 'Primary', 
    education$Description ==  'lsec' ~ 'Lower secondary',
    education$Description ==  'usec' ~ 'Upper secondary',
    education$Description ==   'tertiary' ~ 'Tertiary', 
    TRUE~as.character(education$Description)
  ) 
)

education <- education %>% group_by(Description) %>%
  dplyr::summarise(Participants = paste0(sum(Participants),  "%"), Eligible = paste0(sum(Eligible),  "%")) 

x <- c('No education','Primary','Lower secondary', 'Upper secondary', 'Tertiary')
education <- education %>% slice(match(x, Description))

####
##  skills
####

l_skills_par <- part %>%  select(flang) %>%  
  mutate(flang = case_when(part$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

l_skills_elig <- elig %>%  select(flang) %>%  
  mutate(flang = case_when(elig$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

PC_skills_par <- part %>%  select(pc) %>%  
  mutate(pc = case_when(part$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

PC_skills_elig <- elig %>%  select(pc) %>%  
  mutate(pc = case_when(elig$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

d_skills_par <- part %>%  select(drive) %>%  
  mutate(drive = case_when(part$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)

d_skills_elig <- elig %>%  select(drive) %>%  
  mutate(drive = case_when(elig$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)


l_skills <- merge(l_skills_par, l_skills_elig, by='Description', all = T)
PC_skills <- merge(PC_skills_par, PC_skills_elig, by='Description', all = T)
d_skills <- merge(d_skills_par, d_skills_elig, by='Description', all = T)

skills <- rbind(l_skills, PC_skills, d_skills)
skills <- select(skills, Description, Participants_percent, Eligible_percent)
skills <- rename(skills, Participants = Participants_percent, Eligible=Eligible_percent)

####
##  region
####

part <- part %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

elig <- elig %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

okres_par <- part %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

okres_elig <- elig %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

okres <- merge(okres_par, okres_elig, by='okres', all = T)
okres <- select(okres, okres, Participants, Eligible)
okres <- rename(okres, Description = okres)
okres <- okres[okres$Description != 'N/A',]

####
##  Previous employment
####

prev_emp_part <- part  %>% select(empl) %>% group_by(empl) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = empl, Participants = Participants_percent)  

prev_emp_elig <- elig  %>% select(empl) %>% group_by(empl) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = empl, Eligible = Eligible_percent)

prev_emp <- merge(prev_emp_part, prev_emp_elig, by='Description', all = T)
prev_emp <- select(prev_emp, Description, Participants, Eligible)
prev_emp <- prev_emp[prev_emp$Description == 1,] 
prev_emp$Description[prev_emp$Description == 1 ] <- 'Previous employment'

####
##  Nationality
####

nat_part <- part  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Participants_total = n(), .groups = 'drop') %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%')) 
nat_part$othern <- ifelse(nat_part$othern == 'TRUE', 1,0)
nat_part <- melt(nat_part, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_part <- nat_part[nat_part$value == 1,] 
nat_part <- select(nat_part, -value)

nat_elig <- elig  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Eligible_total = n(), .groups = 'drop') %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2),'%')) 
nat_elig$othern <- ifelse(nat_elig$othern == 'TRUE', 1,0)
nat_elig <- melt(nat_elig, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_elig <- nat_elig[nat_elig$value == 1,] 
nat_elig <- select(nat_elig, -value)

nat <-  merge(nat_part, nat_elig, by='variable', all = T)
nat <- select(nat, variable, Participants, Eligible)
nat <- rename(nat, Description = variable)

nat <- nat %>%mutate(
  Description = case_when(
    nat$Description == 'slovak' ~ 'Slovak', 
    nat$Description == 'hungarian' ~ 'Hungarian', 
    nat$Description == 'czech'~ 'Czech', 
    nat$Description == 'roma' ~ 'Roma', 
    nat$Description == 'othern'~ 'Other', 
  ) 
)


x <- c('Slovak','Hungarian', 'Czech','Roma','Other')
nat <- nat %>% slice(match(x, Description))

####
##  Length of the unemployment spell
####

part$un_spell <- as.integer(part$exit - part$entry)
elig$un_spell <- as.integer(elig$exit - elig$entry)

un_spell <- cbind(round(mean(part$un_spell),2), round(mean(elig$un_spell),2))
un_spell <- data.frame(cbind(Description = 'Length of the unemployment spell', un_spell))
un_spell <- rename(un_spell, Participants = V2, Eligible=V3)


####
##  Length of spell between unemployment and participation
####

#Rozdiel medzi evidenciou nezamestnanosti a nastúpenia do AOTP 
part$spell_b <- as.integer(part$entrya - part$entry)

spell_bup <- part %>% select(spell_b) %>% 
  mutate(month = ceiling(spell_b/30.417)) #roundup -> ceiling

spell_bup_p <- ggplot(data=spell_bup, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none")+
  labs(
    title = "Inflow into the programme in months\nsince the start of the unemployment",
    x = "Months",
    y = "Total Count"
  )  + 
  scale_x_continuous()

####
##  Length of AOTP
####

part$spell_aotp <- as.integer(part$exita - part$entrya)

spell_aotp <- part %>% select(spell_aotp) %>% 
  mutate(month = ceiling(spell_aotp/30.417)) #roundup -> ceiling

spell_aotp_p <- ggplot(data=spell_aotp, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none") +
  labs(
    title = "Length of participation\n (in months)",
    x = "Months",
    y = "Total Count"
  ) + 
  scale_x_continuous(breaks = scales::breaks_extended(length(unique(spell_aotp$month))))


####
##  Compare of length spell
####

spell_p <- ggarrange(spell_bup_p, spell_aotp_p)


####
##  In flow
####

a <- format(seq(as.Date(ep_start),length=3,by="1 month"),"%Y-%m")
b <- format(seq((ymd(as.Date(ep_start)) %m+% months(3)),length=3,by="1 month"),"%Y-%m")
c <- format(seq((ymd(as.Date(ep_start)) %m+% months(6)),length=3,by="1 month"),"%Y-%m")
d <- format(seq((ymd(as.Date(ep_start)) %m+% months(9)),length=3,by="1 month"),"%Y-%m")

#vstúpili do programu
in_part <- part %>% select(entrya) %>% group_by(format(as.Date(entrya),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(in_part)[1] <- 'Description'
in_part <- filter(in_part, str_detect(in_part$Description, (format(as.Date(ep_start),"%Y"))))

in_part <- in_part%>%mutate(
  Description = case_when(
    in_part$Description %in% a ~  paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% b ~  paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% c ~  paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% d ~  paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_part <- in_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#sa stali nezamestnaný 
in_elig <- elig %>% select(entry) %>% group_by(format(as.Date(entry),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(in_elig)[1] <- 'Description'
in_elig <- filter(in_elig, str_detect(in_elig$Description, (format(as.Date(ep_start),"%Y"))))

in_elig <- in_elig%>%mutate(
  Description = case_when(
    in_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_elig <- in_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

inflow <- merge(in_part, in_elig, by='Description', all = F)
inflow <- select(inflow, Description, Participants, Eligible)


####
##  Outflow
####

#vystúpili z programu

out_part <- part %>% select(exita) %>% group_by(format(as.Date(exita),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(out_part)[1] <- 'Description'
out_part <- filter(out_part, str_detect(out_part$Description, (format(as.Date(ep_start),"%Y"))))

out_part <- out_part%>%mutate(
  Description = case_when(
    out_part$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_part <- out_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#vystúpili z evidencie -> zamestnali sa 
out_elig <- elig %>% select(exit) %>% group_by(format(as.Date(exit),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(out_elig)[1] <- 'Description'
out_elig <- filter(out_elig, str_detect(out_elig$Description, (format(as.Date(ep_start),"%Y"))))

out_elig <- out_elig%>%mutate(
  Description = case_when(
    out_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_elig <- out_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

outflow <- merge(out_part, out_elig, by='Description', all = F)
outflow <- select(outflow, Description, Participants, Eligible)


####
##  Children in the household
####

child_part <- part  %>% select(kids) %>% group_by(kids) %>% summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = kids)  

child_elig <- elig  %>% select(kids) %>% group_by(kids) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = kids)

child <- merge(child_part, child_elig, by='Description', all = T)
child <- select(child, Description, Participants, Eligible)
child <- child[child$Description == 1,] 
child$Description[child$Description == 1] <- 'Children in the household'


####
##  Fields of study
####

study_part <- part  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2)) 

study_elig <- elig  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2)) 

study <- merge(study_part, study_elig, by='odbor', all = T)
study <- select(study, odbor, Participants, Eligible)
study$Participants <- ifelse(is.na(study$Participants), 0, study$Participants)

a <- as.character(c(seq(11,19,1)))
b <- as.character(c(seq(21,39,1)))
c <- as.character(c(seq(41,49,1)))
d <- as.character(c(seq(51,59,1)))
e <- as.character(c(seq(61,79,1)))
f <- as.character(c(seq(81,89,1)))
g <- as.character(c(seq(91,98,1)))

study <- study%>%mutate(
  odbor = case_when(
    study$odbor %in% a ~ 'Natural Science', 
    study$odbor %in% b ~ 'Technical sciences', 
    study$odbor %in% c ~ 'Agricultural, forestry and veterinary sciences', 
    study$odbor %in% d ~ 'Medical and pharmaceutical sciences', 
    study$odbor %in% e ~ 'Social sciences and services', 
    study$odbor %in% f ~ 'Sciences of culture and art', 
    study$odbor %in% g ~ 'Military and security sciences',
    study$odbor == 99 || study$odbor == 0 || study$odbor == 10 ~ 'General sciences and services ',
    TRUE~as.character(study$odbor)
  ) 
)

study <- study %>% select(odbor, Participants, Eligible) %>%
  group_by(odbor)  %>% 
  summarise(Participants = sum(Participants), Eligible = sum(Eligible))  %>%
  mutate(Participants = paste0(Participants, "%")) %>%
  mutate(Eligible = paste0(Eligible, "%")) 
study <- rename(study, Description = odbor)


####
##  SUMMARIZE ####
####

colnames(tab1)<-c("Description", "Participants", "Eligible")
colnames(mean_age)<-c("Description", "Participants", "Eligible")
colnames(male)<-c("Description", "Participants", "Eligible")
colnames(prev_emp)<-c("Description", "Participants", "Eligible")
colnames(un_spell)<-c("Description", "Participants", "Eligible")
colnames(child)<-c("Description", "Participants", "Eligible")

tables <- c('tab1', 'mean_age', 'male', 'prev_emp', 'un_spell', 'child')
basics <- data.frame()

for (name in tables){
  table <- get(name)
  table <- mutate(table, across(everything(), as.factor))
  basics <- bind_rows(basics, table)
}

tables <- c('basics', 'education', 'study', 'skills', 'okres', 'nat', 'inflow', 'outflow')

for (name in tables){
  table <- get(name)
  table <- add_column(table, Variable = name, .after = "Eligible")
  colnames(table) <- c("Description", "Participants", "Eligible", "Variable")
  assign(name, table)
}


sum_table <- rbind(basics, education, study, skills, okres, nat, inflow, outflow)
sum_table <- sum_table %>% relocate(Variable, .before = Description) %>%mutate(
  Variable = case_when(
    sum_table$Variable == 'basics' ~ 'Basics',
    sum_table$Variable == 'education' ~ 'Education level',
    sum_table$Variable == 'study' ~ 'Field of study',
    sum_table$Variable == 'skills' ~ 'Skills',
    sum_table$Variable == 'okres' ~ 'Region',
    sum_table$Variable == 'nat' ~ 'Nationality',
    sum_table$Variable == 'inflow' ~ 'Inflow',
    sum_table$Variable == 'outflow' ~ 'Outflow',
    TRUE ~ as.character(sum_table$Variable)
  )
) 

sum_table$Participants <- ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                 ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                        ifelse(str_detect(as.character(sum_table$Eligible), regex("%")), '0%',0), 
                                        as.character(sum_table$Participants)),
                                 as.character(sum_table$Participants))

sum_table[,2:4]  %>% kbl(format = 'html', booktabs = T , align = 'c', row.names = F) %>%
  column_spec(1,  border_right = T) %>%
  kable_paper('hover', full_width = F) %>%
  pack_rows(index = table(fct_inorder(sum_table$Variable)))

Description	Participants	Eligible
Basics
Number of observations	2 848	279 798
Age (years)	25	36.4
Male	47.96%	51.55%
Previous employment	4.67%	8.65%
Length of the unemployment spell	400.76	266.91
Children in the household	10.22%	11.66%
Education level
No education	0.77%	0.75%
Primary	17.38%	14.33%
Lower secondary	16.64%	28.13%
Upper secondary	45.93%	35.45%
Tertiary	19.28%	21.35%
Field of study
Agricultural, forestry and veterinary sciences	2.14%	4.23%
General sciences and services	55.51%	37.2%
Medical and pharmaceutical sciences	0.99%	1.47%
Military and security sciences	0.15%	0.35%
Natural Science	0.48%	0.68%
Sciences of culture and art	0.84%	1.05%
Social sciences and services	24.29%	24.93%
Technical sciences	15.69%	30.08%
Skills
Foreign language	74.86%	61.98%
PC skill	70.79%	55.53%
Driving license	50.46%	53.76%
Region
Banskobystrický	21.42%	12.27%
Bratislavský	0.04%	11.22%
Košický	19.77%	14.63%
Nitriansky	10.92%	12.67%
Prešovský	15.55%	17.02%
Trenčiansky	9.09%	10.22%
Trnavský	8.08%	9.56%
Žilinský	15.13%	12.41%
Nationality
Slovak	87.64%	90.43%
Hungarian	11.97%	8.33%
Czech	0.11%	0.44%
Roma	0.11%	0.17%
Other	0.18%	0.64%
Inflow
1Q.2017	0.11%	26.98%
3Q.2017	23.91%	25.95%
4Q.2017	75.98%	21.72%
Outflow
3Q.2017	4.85%	23.62%
4Q.2017	95.15%	20.04%

Graph 3: Timing and length of participation in the evaluated program

require(gridExtra)
if (nrow(spell_bup) > 5){
  mx <- max(spell_bup$month)
  mn <- min(spell_bup$month)
  plot1 <- ggplot(data=spell_bup, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Inflow into the program in months\nsince the start of the unemployment") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}

if (nrow(spell_aotp) > 5){
  
  mx <- max(spell_aotp$month)
  mn <- min(spell_aotp$month)
  plot2 <- ggplot(data=spell_aotp, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Length of participation\n(in months)") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}
grid.arrange(plot1, plot2, ncol=2)

3. The evaluation sample and estimation

The ultimate objective of ALMP programs is improving employment chances of its beneficiaries. Constrained by the data provided, we construct four outcome indicators. All four of them are based on the presence/absence from the register of JSs administrated by COLSAF. They are, therefore, proxies of the employment status.

#Estimation parameters
Ssamples <- seq(1,4)
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) # Month of participation since the start of the evaluation period
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = T)
Mdata<-c()

OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")

    
list_vars <- c('ent', 'male', 'married','kids',
             'slovak', 'noedu','primary', 'lsec', 'usec',
             'flang', 'drive', 'pc',
             'unpast', 'min_urad', 'min_BA',
             'UR_region', 'roma_share', 'population', 'age')

Balance_vars <- list_vars
# Result Matrixes
  N<-nrow(esample[esample$treated==T,])
  N_sp <- matrix(NA, nrow=length(Ssamples)) # S x P x Q2
  
  resultsArray_ATT  <- array(NA, dim=c(length(O_vars),length(Ssamples))) # S x P x Q2
  dimnames(resultsArray_ATT)[[1]] <- c(O_vars)
  resultsArray_se  <- array(NA, dim=c(length(O_vars),length(Ssamples))) # S x P x Q2
  dimnames(resultsArray_se)[[1]] <- c(O_vars)
  results <- array(NA, dim=c(length(O_vars),2))

### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = T)

partic<-Sesample[Sesample$treated==T,]
nonpart<-Sesample[Sesample$treated==F,]

#NonPart majú iba spodné kritérium minimálnej dĺžky nezamestnanosti 
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    
    N_sp[s,] <- nrow(esampleS[esampleS$treated==T,])
    
    if (mean(esampleS$treated) > 0.0001){
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      
      #Adding unemployment history
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:15), sep=""), 
                                   paste0("exit", seq(1:15), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      
      #Imputing the start of the unemployment spell 
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==F,]), ncol=max_pcpQ)
  
          for (p in 1:max_pcpQ){  
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ
                          [esampleS$treated==T & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            Pmatrix[,p]<-as.numeric(esampleS$infQ[esampleS$treated==F]  %in% PinfQ)
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)==0) & esampleS$treated == F] <- 0
        Pmatrix<-Pmatrix[as.logical(apply(Pmatrix, 1, FUN=sum)>0),]
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)>0) & esampleS$treated == F] <- apply(Pmatrix,1,hh)
        
  
        
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                    for (n in 1:15) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90))    
        
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
        for (col in colnames(result$D)){
          D <-  D[!is.na(D[,col]),]
        }    
            
        # ESTIMATION:
        m.1 <- matchit(as.formula(paste(result$spec[2], '~', result$spec[3] , sep = ' ')), 
                       data = D,
                       method = "nearest", 
                       exact = c("infQ"),
                       distance = "glm")
        #plot(summary(m.1))
        m.data1 <- match.data(m.1)
        assign(paste("Mdata",s, sep=""), m.data1)
        Mdata <- bind_rows(Mdata, m.data1)
        
        assign(paste0('balancegraph',s, by=''),summary(m.1, subclass = TRUE))
        
        #distance in m.data1 is Propensity score
        #trim = 0.005
        #m.data1 <- m.data1[!(m.data1$distance <= trim | m.data1$distance >= (1-trim)),]
  
        
            for (iQ in O_vars){
              fit <- lm(as.formula(paste(iQ , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                        data = m.data1, 
                        weights = weights)
              res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
              att <- res[2,1]
              se <- res[2,2]
              
              resultsArray_ATT[iQ,s] <- att
              resultsArray_se[iQ,s] <- se
              
        }
    }
  } 


for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum((resultsArray_ATT[iQ,]*(N_sp/N)))
      results[iQ,2] <- sum((resultsArray_se[iQ,]*(N_sp/N)))
}

results<-cbind(O_vars, results)   

#Mdata<-rbind(Mdata1, Mdata2, Mdata3, Mdata4)

resultsPSM <- matrix(NA, nrow=length(O_vars), ncol = 2)

for (iQ in 1:length(O_vars)){
            fit <- lm(as.formula(paste(O_vars[iQ] , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                      data = Mdata, 
                      weights = weights)
            res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
            att <- res[2,1]
            se <- res[2,2]
            
            resultsPSM[iQ,1] <- att
            resultsPSM[iQ,2] <- se
            
        }

resultsPSM<-cbind(O_vars, resultsPSM)

Graph 4: Balance of the groups of participants and eligible before and after weighting

match.vars <- c('distance',list_vars,paste0("empl.", seq(1,4,1), sep=""),'infQ')


balance_g  <- array(NA, dim=c(length(match.vars),2,length(Ssamples))) 
dimnames(balance_g)[[1]] <-match.vars
dimnames(balance_g)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data')

for(i in 1:length(Ssamples)){
  
  if(exists(paste0('balancegraph', i, by = ''))){
    x <- get(paste0('balancegraph', i, by = ''))
    balance_g[,1,i] <- abs(x$sum.all[,3][match(rownames(balance_g), names(x$sum.all[,3]))])
    balance_g[,2,i] <- abs(x$sum.matched[,3][match(rownames(balance_g), names(x$sum.matched[,3]))])
  }                  
}

balance_f <- array(NA, dim=c(length(match.vars),2)) 
dimnames(balance_f)[[1]] <-match.vars
dimnames(balance_f)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data') 

for (iQ in 1:length(match.vars)){
      balance_f[iQ,1] <- sum(sum((as.numeric(balance_g[iQ,1,])*(N_sp/N)), na.rm = T), na.rm = T)
      balance_f[iQ,2] <-sum(sum((as.numeric(balance_g[iQ,2,])*(N_sp/N)), na.rm = T), na.rm = T)
}

balance_fg <- data.frame(balance = c(balance_f[,1], balance_f[,2]),
                 Balance = c(rep("before matching",length(match.vars)),rep("after matching",length(match.vars))),
                 names = c(match.vars,match.vars) ) 

p <- balance_fg %>% 
    mutate(names = fct_reorder(names, balance)) %>%
    ggplot(aes(x=balance, y=names,col=Balance)) + 
    geom_vline(xintercept = 0.05, linetype="dotted", color = 'darkgrey') + 
    geom_vline(xintercept = 0.0, color = 'darkgrey')+
    geom_vline(xintercept = 0.1, color = 'darkgrey')+
    geom_point()+
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Zdroj: ÚPSVaR")
  
p

After inspecting the achieved balance of the “comparison group” and participants, we can proceed to reporting the estimated effects of participating in the [Component] Projects and programmes - Projects - Guidance during 2017.

3.1 Estimation of the average treatment effects on the treated (ATTs)

First, we explore the average treatment effects on the treated (ATTs).

Graph 5: The average treatment effects on the treated (ATTs) estimated for the participation in the [Component] Projects and programmes - Projects - Guidance during 2017

graphATT

3.2 Labour market outcomes of participants and eligible

Second we construct a proxy of the employment rate. It is a share of persons out of the register of unemployed JSs observed at the first day of quarters before and after the start of participation (Quarters 0).

Graph 6: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(Mdata[Mdata$treated == T,o,drop = F], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(Mdata[Mdata$treated == F,o,drop = F], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  labs( 
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Third we look at the number of quarters until the first exit from the register of unemployed JSs. This is an indication of the how the participation in the ALMP program contributed to shortening of the unemployment spell. The graph displays the whole distribution of participants and eligible based on the values of this outcome indicator.

Graph 7: Quarters until the first exit

#### THE NUMBER OF MONTHS UNTIL THE FIRST EXIT
########## Months until the first exit
# phow many months after the entrya, he got a job

firstempl_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

firstempl_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

firstempl <- rbind(firstempl_P, firstempl_E)

ggplot(firstempl, aes(fill=Group, y=percent, x=quarter)) + 
          geom_bar(position="dodge", stat="identity") +
          facet_wrap(~Group) +
          theme_minimal() + 
          scale_fill_manual(values=c('grey', 'steelblue3')) +
          scale_x_continuous(breaks=seq(0,16,1)) +
          theme(legend.position="none") +
          xlab("Quarters after the start of the participation (0)") + 
          ylab("") +
          labs(caption="Source: COLSAF")+
          scale_y_continuous(labels = percent)

In the last one graph in this section, we calculate the cumulative number of months out of the register during the observation period following the end of the participation in the evaluated program.

Graph 8: Number of quarters out of the unemployment register after the end of the participation

## Plotting the number of months in cumulative employment
###### Participants

empl36m_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

empl36m_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

empl36m <- rbind(empl36m_P, empl36m_E)

ggplot(empl36m, aes(fill=Group, y=percent, x=quarter)) + 
  geom_bar(position="dodge", stat="identity") +
  facet_wrap(~Group) +
  theme_minimal() + 
  scale_fill_manual(values=c('grey', 'steelblue3')) +
  scale_x_continuous(breaks=seq(0,16,1)) +
  theme(legend.position="none") +
  xlab("Quarters after the start of the participation (0)") + 
  ylab("") +
  labs(caption="Source: COLSAF")+
  scale_y_continuous(labels = percent)

3.3 Statistical significance and heterogeneity of ATT effects

Further, we report the ATTs in numbers for outcomes variables and also for the indicators of:

Number of months until first exit from the unemployment register JSs (firtempl)
Cummulative number of months outside the unemployment register of unemployed JSs (cumempl)

Table 3: The average treatment effect on the treated

resultsDF %>% kbl(format = 'html', booktabs = T , align = 'c') %>%
  column_spec(1,  border_right = T) %>%
  kable_paper('hover', full_width = F) %>%
  kableExtra::footnote(number = paste('Signif. codes:',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))

O_vars	effect	results_sd	pval	Significance
empl.4	-0.0001528	0.0066455	0.9816548
empl.3	-0.0071771	0.0065034	0.2697705
empl.2	-0.0049960	0.0061647	0.4176973
empl.1	-0.0086348	0.0061528	0.1604973
empl0	-0.0042857	0.0070219	0.5416376
empl1	-0.1168831	0.0084228	0.0000000	***
empl2	-0.3025408	0.0095427	0.0000000	***
empl3	-0.1025278	0.0119990	0.0000000	***
empl4	-0.0620198	0.0122320	0.0000004	***
empl5	-0.0493788	0.0108192	0.0000050	***
empl6	-0.0452262	0.0096148	0.0000026	***
empl7	-0.0315675	0.0091384	0.0005516	***
empl8	-0.0393094	0.0087965	0.0000079	***
empl9	-0.0269101	0.0087143	0.0020149	**
empl10	-0.0353038	0.0084544	0.0000297	***
empl11	-0.0313051	0.0086506	0.0002960	***
empl12	-0.0425565	0.0095582	0.0000085	***
firstempl	0.2900487	0.0747295	0.0001039	***
cumempl	-0.8898144	0.0653036	0.0000000	***
¹ Signif. codes: 0 " * " 0.001 " " 0.01 " * " 0.05 " . " 0.1 " " 1

Additionally, we explore the heterogeneity in the effects programme participation has on various subgroups of participants.

We explore the ATTs for subgroups based on:

Gender - differences in effects for women and men
Education level - differences in effects for JSs with no education, primary education, lower and secondary education.
Share of Roma in the city of residence - differences in effects for JSs who lives in the city with share of Roma bellow of 10% and above 10%
The size of the city of residence - differences in effects for JSs who lives in the city with population less then 4 000 and more then 4 000

Table 4: The average treatment effect on the treated: subgroup based on gender

gender

			Whole Sample					Gender
			Sample				Women				Men
	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance
empl.4	-0.0001528	0.0066455	0.9816548		0.0047789	0.0092413	0.6050685		-0.0059840	0.0101855	0.5568668
empl.3	-0.0071771	0.0065034	0.2697705		-0.0192602	0.0089129	0.0307009		0.0003591	0.0094790	0.9697835
empl.2	-0.0049960	0.0061647	0.4176973		-0.0048013	0.0090722	0.5966423		-0.0063588	0.0098908	0.5202890
empl.1	-0.0086348	0.0061528	0.1604973		-0.0153996	0.0089290	0.0845851	.	-0.0024831	0.0099921	0.8037454
empl0	-0.0042857	0.0070219	0.5416376		-0.0091343	0.0105023	0.3844418		0.0012357	0.0110207	0.9107212
empl1	-0.1168831	0.0084228	0.0000000	***	-0.1106256	0.0134794	0.0000000	***	-0.1219978	0.0143164	0.0000000	***
empl2	-0.3025408	0.0095427	0.0000000	***	-0.3122924	0.0132000	0.0000000	***	-0.2876199	0.0146412	0.0000000	***
empl3	-0.1025278	0.0119990	0.0000000	***	-0.1181603	0.0163784	0.0000000	***	-0.0850388	0.0182714	0.0000033	***
empl4	-0.0620198	0.0122320	0.0000004	***	-0.0725201	0.0168746	0.0000173	***	-0.0457626	0.0179635	0.0108487
empl5	-0.0493788	0.0108192	0.0000050	***	-0.0588643	0.0154432	0.0001380	***	-0.0332334	0.0151114	0.0278623
empl6	-0.0452262	0.0096148	0.0000026	***	-0.0630743	0.0133759	0.0000024	***	-0.0224468	0.0137894	0.1035607
empl7	-0.0315675	0.0091384	0.0005516	***	-0.0476226	0.0129052	0.0002241	***	-0.0115875	0.0126885	0.3611248
empl8	-0.0393094	0.0087965	0.0000079	***	-0.0526617	0.0133307	0.0000780	***	-0.0212041	0.0114769	0.0646688	.
empl9	-0.0269101	0.0087143	0.0020149	**	-0.0289596	0.0130425	0.0263912		-0.0196184	0.0111596	0.0787494	.
empl10	-0.0353038	0.0084544	0.0000297	***	-0.0400096	0.0126931	0.0016212	**	-0.0275162	0.0113318	0.0151733
empl11	-0.0313051	0.0086506	0.0002960	***	-0.0392422	0.0129629	0.0024677	**	-0.0230177	0.0112239	0.0402893
empl12	-0.0425565	0.0095582	0.0000085	***	-0.0531251	0.0134838	0.0000815	***	-0.0324271	0.0129065	0.0119891
firstempl	0.2900487	0.0747295	0.0001039	***	0.3927003	0.1173578	0.0008193	***	0.1135504	0.0876135	0.1949624
cumempl	-0.8898144	0.0653036	0.0000000	***	-1.0062921	0.1001283	0.0000000	***	-0.7302344	0.0844675	0.0000000	***
¹ Signif. codes: 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1

Table 5: The average treatment effect on the treated: subgroup based on education level

education

			Whole Sample									Education
			Sample				No education				Primary				Lower secondary				Upper secondary
	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance
empl.4	-0.0001528	0.0066455	0.9816548		0.2204317	0.1004315	0.0281746		-0.0230424	0.0202619	0.2554420		0.0015559	0.0195730	0.9366419		0.0079545	0.0091263	0.3834236
empl.3	-0.0071771	0.0065034	0.2697705		0.0193480	0.1104170	0.8609018		-0.0071852	0.0174006	0.6796584		-0.0164589	0.0190170	0.3867737		-0.0005291	0.0089249	0.9527308
empl.2	-0.0049960	0.0061647	0.4176973		-0.2550939	0.1030151	0.0132760		-0.0242469	0.0188561	0.1984805		0.0253695	0.0182005	0.1633507		0.0026898	0.0098516	0.7848274
empl.1	-0.0086348	0.0061528	0.1604973		0.2431276	0.0990234	0.0140785		-0.0036822	0.0194893	0.8501438		-0.0154443	0.0178700	0.3874454		-0.0197904	0.0102256	0.0529431	.
empl0	-0.0042857	0.0070219	0.5416376		-0.2477897	0.1849794	0.1803907		-0.0123714	0.0203586	0.5434031		-0.0579669	0.0195427	0.0030153	**	0.0126936	0.0117844	0.2814093
empl1	-0.1168831	0.0084228	0.0000000	***	0.0595226	0.1587960	0.7077815		-0.0833597	0.0261122	0.0014112	**	-0.1501391	0.0261423	0.0000000	***	-0.1035339	0.0149501	0.0000000	***
empl2	-0.3025408	0.0095427	0.0000000	***	-0.2851112	0.2091379	0.1727978		-0.2787220	0.0248219	0.0000000	***	-0.3105487	0.0253445	0.0000000	***	-0.3316073	0.0147442	0.0000000	***
empl3	-0.1025278	0.0119990	0.0000000	***	-0.2249194	0.1743499	0.1970347		-0.2097712	0.0297849	0.0000000	***	-0.0959071	0.0294002	0.0011058	**	-0.0711872	0.0178923	0.0000693	***
empl4	-0.0620198	0.0122320	0.0000004	***	-0.3800671	0.1934786	0.0494850		-0.1301316	0.0306756	0.0000221	***	-0.1201214	0.0306559	0.0000891	***	-0.0234809	0.0180225	0.1926214
empl5	-0.0493788	0.0108192	0.0000050	***	-0.4069734	0.2325781	0.0801466	.	-0.0799293	0.0313760	0.0108508		-0.0863951	0.0273921	0.0016104	**	-0.0198572	0.0152280	0.1922365
empl6	-0.0452262	0.0096148	0.0000026	***	-0.4419484	0.2214896	0.0460052		-0.0907477	0.0296728	0.0022261	**	-0.0721514	0.0245372	0.0032769	**	-0.0167482	0.0131226	0.2018523
empl7	-0.0315675	0.0091384	0.0005516	***	-0.3325405	0.1856119	0.0731986	.	-0.0909494	0.0295060	0.0020534	**	-0.0444409	0.0255899	0.0824476	.	-0.0001034	0.0121256	0.9931981
empl8	-0.0393094	0.0087965	0.0000079	***	-0.1234178	0.2223342	0.5788260		-0.0721470	0.0286289	0.0117330		-0.0645944	0.0236670	0.0063467	**	-0.0194231	0.0113719	0.0876364	.
empl9	-0.0269101	0.0087143	0.0020149	**	0.1813692	0.2401928	0.4501900		-0.0512728	0.0269533	0.0571342	.	-0.0657283	0.0227833	0.0039149	**	-0.0115951	0.0117674	0.3244495
empl10	-0.0353038	0.0084544	0.0000297	***	0.0246285	0.1748544	0.8879874		-0.0725303	0.0277486	0.0089531	**	-0.0736040	0.0208733	0.0004215	***	-0.0112597	0.0118417	0.3416799
empl11	-0.0313051	0.0086506	0.0002960	***	-0.2844172	0.1897217	0.1338403		-0.0724534	0.0277608	0.0090565	**	-0.0542426	0.0226482	0.0166200		-0.0132082	0.0111310	0.2353814
empl12	-0.0425565	0.0095582	0.0000085	***	-0.1537542	0.2082563	0.4603363		-0.0280233	0.0290068	0.3339978		-0.0749123	0.0243569	0.0021008	**	-0.0455444	0.0128030	0.0003747	***
firstempl	0.2900487	0.0747295	0.0001039	***	2.0673824	1.7393993	0.2346124		0.8431542	0.2378172	0.0003920	***	0.6900627	0.1911010	0.0003050	***	-0.0244083	0.0867161	0.7783473
cumempl	-0.8898144	0.0653036	0.0000000	***	-2.6154187	1.6712264	0.1175900		-1.2724091	0.2147692	0.0000000	***	-1.2707521	0.1743953	0.0000000	***	-0.6548548	0.0882222	0.0000000	***
¹ Signif. codes: 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1

Table 6: The average treatment effect on the treated: subgroup based on share of Roma in the city of residence

romas

			Whole Sample					Roma share
			Sample				0-10%				10-100%
	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance
empl.4	-0.0001528	0.0066455	0.9816548		-0.0001337	0.0078768	0.9864574		0.0033009	0.0144244	0.8189896
empl.3	-0.0071771	0.0065034	0.2697705		-0.0075146	0.0077688	0.3334045		-0.0088828	0.0128519	0.4894603
empl.2	-0.0049960	0.0061647	0.4176973		0.0042980	0.0076404	0.5737472		-0.0237566	0.0127561	0.0625512	.
empl.1	-0.0086348	0.0061528	0.1604973		-0.0083991	0.0074527	0.2597452		-0.0125355	0.0129274	0.3322051
empl0	-0.0042857	0.0070219	0.5416376		0.0059797	0.0086645	0.4901048		-0.0238240	0.0146485	0.1038697
empl1	-0.1168831	0.0084228	0.0000000	***	-0.1093453	0.0107812	0.0000000	***	-0.1266253	0.0210319	0.0000000	***
empl2	-0.3025408	0.0095427	0.0000000	***	-0.3036768	0.0120374	0.0000000	***	-0.3094156	0.0177494	0.0000000	***
empl3	-0.1025278	0.0119990	0.0000000	***	-0.0654963	0.0144498	0.0000058	***	-0.2027111	0.0230514	0.0000000	***
empl4	-0.0620198	0.0122320	0.0000004	***	-0.0370297	0.0145794	0.0110892		-0.1319960	0.0246057	0.0000001	***
empl5	-0.0493788	0.0108192	0.0000050	***	-0.0371528	0.0121799	0.0022858	**	-0.0865447	0.0242816	0.0003650	***
empl6	-0.0452262	0.0096148	0.0000026	***	-0.0298126	0.0105943	0.0048927	**	-0.0831633	0.0225267	0.0002227	***
empl7	-0.0315675	0.0091384	0.0005516	***	-0.0288719	0.0096336	0.0027266	**	-0.0350076	0.0221118	0.1133736
empl8	-0.0393094	0.0087965	0.0000079	***	-0.0294590	0.0092737	0.0014901	**	-0.0719097	0.0212915	0.0007318	***
empl9	-0.0269101	0.0087143	0.0020149	**	-0.0222731	0.0092026	0.0155071		-0.0388418	0.0210559	0.0650813	.
empl10	-0.0353038	0.0084544	0.0000297	***	-0.0134280	0.0088541	0.1293718		-0.0812107	0.0203191	0.0000642	***
empl11	-0.0313051	0.0086506	0.0002960	***	-0.0204256	0.0091262	0.0252124		-0.0538642	0.0199309	0.0068810	**
empl12	-0.0425565	0.0095582	0.0000085	***	-0.0405650	0.0100081	0.0000505	***	-0.0518020	0.0213167	0.0150940
firstempl	0.2900487	0.0747295	0.0001039	***	0.0098105	0.0707651	0.8897390		0.9730510	0.1894766	0.0000003	***
cumempl	-0.8898144	0.0653036	0.0000000	***	-0.7315563	0.0687727	0.0000000	***	-1.2969159	0.1648149	0.0000000	***
¹ Signif. codes: 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1

Table 7: The average treatment effect on the treated: subgroup based on the size of the city of residence

City

			Whole Sample					Type of city
			Sample				Village				City
	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance	effect	results_sd	pval	Significance
empl.4	-0.0001528	0.0066455	0.9816548		0.0086703	0.0100944	0.3903844		-0.0061125	0.0094603	0.5182004
empl.3	-0.0071771	0.0065034	0.2697705		-0.0098196	0.0093742	0.2948660		-0.0028482	0.0092188	0.7573553
empl.2	-0.0049960	0.0061647	0.4176973		-0.0060690	0.0092708	0.5127022		-0.0054922	0.0090677	0.5447214
empl.1	-0.0086348	0.0061528	0.1604973		0.0066251	0.0095015	0.4856362		-0.0244982	0.0090159	0.0065832	**
empl0	-0.0042857	0.0070219	0.5416376		-0.0149872	0.0110227	0.1739353		0.0049449	0.0113485	0.6630326
empl1	-0.1168831	0.0084228	0.0000000	***	-0.1271783	0.0136967	0.0000000	***	-0.1063381	0.0143715	0.0000000	***
empl2	-0.3025408	0.0095427	0.0000000	***	-0.2864666	0.0136751	0.0000000	***	-0.3183524	0.0143342	0.0000000	***
empl3	-0.1025278	0.0119990	0.0000000	***	-0.1056675	0.0176077	0.0000000	***	-0.0979206	0.0177206	0.0000000	***
empl4	-0.0620198	0.0122320	0.0000004	***	-0.0512845	0.0174700	0.0033293	**	-0.0726942	0.0176414	0.0000378	***
empl5	-0.0493788	0.0108192	0.0000050	***	-0.0463641	0.0152976	0.0024390	**	-0.0527709	0.0157239	0.0007905	***
empl6	-0.0452262	0.0096148	0.0000026	***	-0.0447670	0.0138028	0.0011814	**	-0.0448072	0.0136797	0.0010549	**
empl7	-0.0315675	0.0091384	0.0005516	***	-0.0154915	0.0135391	0.2525391		-0.0473796	0.0127188	0.0001952	***
empl8	-0.0393094	0.0087965	0.0000079	***	-0.0233445	0.0134166	0.0818649	.	-0.0549356	0.0119884	0.0000046	***
empl9	-0.0269101	0.0087143	0.0020149	**	-0.0031329	0.0131559	0.8117742		-0.0511514	0.0113655	0.0000068	***
empl10	-0.0353038	0.0084544	0.0000297	***	-0.0249155	0.0130726	0.0566582	.	-0.0435667	0.0116178	0.0001768	***
empl11	-0.0313051	0.0086506	0.0002960	***	-0.0217755	0.0128450	0.0900276	.	-0.0378112	0.0118980	0.0014832	**
empl12	-0.0425565	0.0095582	0.0000085	***	-0.0300263	0.0143442	0.0363242		-0.0549966	0.0131943	0.0000307	***
firstempl	0.2900487	0.0747295	0.0001039	***	0.2218309	0.1173290	0.0586678	.	0.3275007	0.0977759	0.0008096	***
cumempl	-0.8898144	0.0653036	0.0000000	***	-0.7954014	0.1019701	0.0000000	***	-0.9777797	0.0917594	0.0000000	***
¹ Signif. codes: 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1

4. Technical appendix

4.1 Sample Selection

info_table <- data.frame(matrix(ncol=2,nrow=0, dimnames=list(NULL, c("Desciption", "Value"))))
info_table[1,] <- c('Start of the evaluation period', paste(ep_start))
info_table[2,] <- c('End of the evaluation period', paste(ep_end))

info_table <-
  info_table[,-3] %>% kbl(format = 'html', booktabs = T , align = 'c', caption = 'Evaluation period', row.names = F) %>%
  column_spec(1,  border_right = T) %>%
  kable_paper('hover', full_width = F)


sample_selection <- data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("Desciption", "Dropped","Total"))))

sample_selection[1,] <- c('Total registrations ', 0,format(n1, big.mark=" ", scientific=FALSE))
sample_selection[2,] <- c('Total eligible jobseekers', 0,format(n2, big.mark=" ", scientific=FALSE))
sample_selection[3,] <- c('Total participations in the evaluated measure', 0,format(npart1, big.mark=" ", scientific=FALSE))
sample_selection[4,] <- c('Total participants in the evaluated measure', 0 ,format(npart0, big.mark=" ", scientific=FALSE))

x<-n1-n4
y<-npart1-n3
sample_selection[5,] <- c('Dropped eligible JSs', format(n4, big.mark=" ", scientific=FALSE), format(x, big.mark=" ", scientific=FALSE))
sample_selection[6,] <- c('Dropped participants', format(n3, big.mark=" ", scientific=FALSE), format(y, big.mark=" ", scientific=FALSE))

x<-x-n6
y<-y-n5
sample_selection[7,] <- c('Dropped eligible JSs', format(n6, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[8,] <- c('Dropped participants', format(n5, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n7
sample_selection[9,] <- c('Dropped participants', format(n7, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

x<-x-n9
y<-y-n8
sample_selection[10,] <- c('Dropped eligible JSs', format(n9, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[11,] <- c('Dropped participants', format(n8, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-nrowdfa-n10
sample_selection[12,] <- c('Inflating participants by merging dataframes', 0,format(nrowdfa, big.mark=" ", scientific=FALSE))
sample_selection[13,] <- c('Dropped participants', format(n10, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n11
sample_selection[14,] <- c('Dropped participants', format(n11, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

sample_selection[15,] <- c('Eligibles', format(x-(length(unique(esample$klient_id[esample$treated==FALSE]))), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[16,] <- c('Participants', format(y-length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection[17,] <- c('Eligibles', 0, format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[18,] <- c('Participants', 0, format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection$Variable <- c('All registrations (before cleaning)', 'All registrations (before cleaning)', 'All registrations (before cleaning)','All registrations (before cleaning)',
                             
                             'Dropped JSs with ALMP participation 2 years before the EP', 'Dropped JSs with ALMP participation 2 years before the EP',
                             
                             'Dropped JSs with ALMP participation in other ALMP during the EP', 'Dropped JSs with ALMP participation in other ALMP during the EP',
                             
                             'Dropped participants with multiple ALMP participations in evaluated programme',
                             
                             'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',  'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',
                             
                             'Dropped participations which not happening during an unemployment spell','Dropped participations which not happening during an unemployment spell', 
                             
                             'Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure ',
                             
                             'Dropped JSs with multiple registrations', 'Dropped JSs with multiple registrations',
                             
                             'Total registrations after cleaning', 'Total registrations after cleaning'
)

sample_selection <-sample_selection[,-4] %>% kbl(format = 'html', booktabs = T , align = 'c', caption = 'Documentation of sample reduction', row.names = F) %>%
  column_spec(1,  border_right = T) %>%
  kable_paper('hover', full_width = F)%>%
  pack_rows(index = table(fct_inorder(sample_selection$Variable)))


info_table

Evaluation period
Desciption	Value
Start of the evaluation period	2017-01-01
End of the evaluation period	2017-12-31

sample_selection

Documentation of sample reduction
Desciption	Dropped	Total
All registrations (before cleaning)
Total registrations	0	483 955
Total eligible jobseekers	0	414 546
Total participations in the evaluated measure	0	7 813
Total participants in the evaluated measure	0	7 813
Dropped JSs with ALMP participation 2 years before the EP
Dropped eligible JSs	81 808	402 147
Dropped participants	1 606	6 207
Dropped JSs with ALMP participation in other ALMP during the EP
Dropped eligible JSs	61 675	340 472
Dropped participants	1 244	4 963
Dropped participants with multiple ALMP participations in evaluated programme
Dropped participants	0	4 963
Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP
Dropped eligible JSs	21 747	318 725
Dropped participants	1 237	3 726
Dropped participations which not happening during an unemployment spell
Inflating participants by merging dataframes	0	4 141
Dropped participants	1 264	2 877
Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure
Dropped participants	29	2 848
Dropped JSs with multiple registrations
Eligibles	38 927	279 798
Participants	0	2 848
Total registrations after cleaning
Eligibles	0	279 798
Participants	0	2 848

4.2 Description of explanatory variables

Description of X-variables, which we used for estimation of ATTs. We also show their balance before and after matching in graph 4 and balance before and after weighting in graph 10.

#TREBA DOPLNIŤ DO POSTUP_EXPORT
Q[["df"]][["flang"]] <- "knowledge of a foreign language (1: Yes, 0: No)"
Q[["df"]][["drive"]] <- "driving license holder (1: Yes, 0: No)"
Q[["df"]][["pc"]] <- "computer skills (1: Yes, 0: No)"
Q[["df"]][["unpast"]] <- "registered citizen in the past (1: Yes, 0: No), unemployed in past"
Q[["df"]][["min_urad"]] <- "traveling time to the nearest Labour office (in minutes)"
Q[["df"]][["min_BA"]] <- "traveling time to the Capital city - Bratislava (in minutes)"
Q[["df"]][["population"]] <- "number of inhabitants of the place of residence"



list_vars_table <- data.frame(Variables = list_vars)

desc <- c()
for (i in list_vars_table$Variables){
  Q[["df"]][[i]][1]
  desc <- append(desc, ifelse(is.null(Q[["df"]][[i]][1]), NA, Q[["df"]][[i]][1]))
}

list_vars_table$Description <- desc
list_vars_table[list_vars_table$Variables == "ent","Description"] <- "difference between entry into unemployment register and started of evalueted period"
list_vars_table[list_vars_table$Variables == "UR_region","Description"] <- "unemployment rate in region during valuated period"
list_vars_table[list_vars_table$Variables == "roma_share","Description"] <- "share of Roma in the place of residence"

list_vars_table[,-3] %>% kbl(format = 'html', booktabs = T , align = 'c', row.names = F) %>%
  column_spec(1,  border_right = T) %>%
  kable_paper('hover', full_width = F)

Variables	Description
ent	difference between entry into unemployment register and started of evalueted period
male	gender (1: man, 0: woman)
married	marital status: married
kids	kids under 10 years
slovak	nationality: slovak
noedu	level of education: no education
primary	level of education: primary
lsec	level of education: ower secondary
usec	level of education: upper secondary
flang	knowledge of a foreign language (1: Yes, 0: No)
drive	driving license holder (1: Yes, 0: No)
pc	computer skills (1: Yes, 0: No)
unpast	registered citizen in the past (1: Yes, 0: No), unemployed in past
min_urad	traveling time to the nearest Labour office (in minutes)
min_BA	traveling time to the Capital city - Bratislava (in minutes)
UR_region	unemployment rate in region during valuated period
roma_share	share of Roma in the place of residence
population	number of inhabitants of the place of residence
age	age of citizen

Another Evaluation Approach

In this section, we explore the sensitivity of our results to various model specifications.

Under M0 we report the main results, a propensity score matching model, with 1 nearest neighbour, no caliper and no weighting but with exact matching in yearly quarter of the start of the unemployment since the start of the evaluation period.
Under M1 we report an inverse probability weighting

4.3 M1 Treatweight

#Estimation parameters
Ssamples <- seq(1,4)
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) # Month of participation since the start of the evaluation period
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = T)

OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")

list_vars <- c('ent', 'male', 'married','kids',
           'slovak', 'noedu','primary', 'lsec', 'usec',
           'flang', 'drive', 'pc',
           'unpast', 'min_urad', 'min_BA',
           'UR_region', 'roma_share', 'population', 'age')

#     # All potentially useful explanatory variables (Xs)
#     list_vars <- c('ent', 'male', 'single', 'married','kids',
#                    'slovak', 'hungarian', 'roma', 
#                    'noedu','primary', 'lsec', 'usec', 'tertiary'
#                    , 'zaujem_vzdel',
#                    'flang', 'drive', 'pc',
#                    'healthy', 'barrier', 'graduate', 'ziad_undn_sp', 'cvyhl_poisteu', 
#                    'empl', 'unpast', 'employee', 'selfempl', 'zaujem_szco',
#                    'look_ptime', 'commute', 'relocate', 'zaujem_zam_zahr',
#                    'min_kraj', 'min_urad', 'min_BA', 
#                    'UR_region', 'roma_share', 'population', 
#                    ageg_dummies,
# #                    paste0("urad_",seq(from=1, to=46), sep=""), 
#                    paste0("isco1_",seq(from=1, to=3), sep=""),
#                    paste0("odbor1_",seq(from=1, to=5), sep=""))
# #                   paste0(colnames(df)[grepl("nace1_",colnames(df))], sep=""))

Balance_vars <- list_vars
# Result Matrixes
  N<-nrow(esample[esample$treated==T,])
  N_sp <- matrix(NA, nrow=length(Ssamples)) 
  
  resultsArray_ATT  <- array(NA, dim=c(length(O_vars),7,length(Ssamples))) 
  dimnames(resultsArray_ATT)[[1]] <- c(O_vars)
  dimnames(resultsArray_ATT)[[2]] <- c('ATT', 'se', 'pval', 'Y1', 'Y0', 'SampleSize', 'Sign.')
  results <- array(NA, dim=c(length(O_vars),4))

  balance_matrix_w <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_w)[[1]] <- c(list_vars)
  
  results_bv_W <- array(NA, dim=c(1,length(list_vars)))
  
  balance_matrix_un <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_un)[[1]] <- c(list_vars)
  
  results_bv_un <- array(NA, dim=c(1,length(list_vars)))
  
### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = T)

partic<-Sesample[Sesample$treated==T,]
nonpart<-Sesample[Sesample$treated==F,]

#NonPart majú iba spodné kritérium minimálnej dĺžky nezamestnanosti 
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

DataSample <- c()

for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    
    N_sp[s,] <- nrow(esampleS[esampleS$treated==T,])
    
    if (mean(esampleS$treated) > 0.0001){
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      
      #Adding unemployment history
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:15), sep=""), 
                                   paste0("exit", seq(1:15), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      
      #Imputing the start of the unemployment spell 
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==F,]), ncol=max_pcpQ)
  
          for (p in 1:max_pcpQ){  
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ
                          [esampleS$treated==T & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            Pmatrix[,p]<-as.numeric(esampleS$infQ[esampleS$treated==F]  %in% PinfQ)
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)==0) & esampleS$treated == F] <- 0
        Pmatrix<-Pmatrix[as.logical(apply(Pmatrix, 1, FUN=sum)>0),]
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)>0) & esampleS$treated == F] <- apply(Pmatrix,1,hh)
        
  
        
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                    for (n in 1:15) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90))    
        
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
            for (col in colnames(result$D)){
               D <-  D[!is.na(D[,col]),]
             }  
        # ESTIMATION:
            
            d = D$treated*1
            x = as.matrix(D[,c(list_vars, paste0("empl.", seq(1,4,1), sep=""))])
            y_mat <- D[,c(paste0("empl.", seq(1,4,1), sep=""), 
                                    paste0("empl", seq(0,12,1), sep=""), 
                                    'firstempl', 'cumempl')]
            
            att <- treatweight_pmp(y = y_mat, d, x, s = NULL, z = NULL, selpop = FALSE, trim = 0.05, ATET = TRUE, logit = TRUE, boot = 10)
            
            resultsArray_ATT[,1,s] <- round(att$effect,3)
            resultsArray_ATT[,2,s] <- round(att$se,3)
            resultsArray_ATT[,3,s] <- round(att$pval,3)
            resultsArray_ATT[,4,s] <- round(att$y1,3)
            resultsArray_ATT[,5,s] <- round(att$y0,3)
            resultsArray_ATT[,6,s] <- format(length(d)-att$ntrimmed, big.mark=" ", scientific=FALSE)
            resultsArray_ATT[,7,s] <- stars.pval(att$pval)
            
            
            DataSample <- bind_rows(DataSample, D)
            
            #Balance
            #Generating the propensity score variable
            PSmodel<-glm(result$spec, family=binomial(link = "logit"), data=D)
            #print(summary(PSmodel))
            D$PSvar<-as.numeric(PSmodel$fitted.values)
            
            w_ATE <- D$treated/D$PSvar + (1-D$treated)/(1-D$PSvar)
            
                          #Balance_vars <- colnames(result$D)[colnames(result$D) %in% list_vars]
                          for (bv in Balance_vars){
                          
                              if (apply(D[,bv,drop = F] ,2,function(x) { all(x %in% c(0:1)) }) ) {
                                #unweighted discrete
                                p_treat <- apply(D[D$treated==1,bv,drop = F],2,mean)
                                p_contr <- apply(D[D$treated==0,bv,drop = F],2,mean)
                                balance_matrix_un[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )

                                #weighted discrete
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = T)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = T)
                                
                                balance_matrix_w[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )
                            
                              } else {
                                #unweighted continuous
                                balance_matrix_un[bv,s] <- abs( 100*(apply(D[D$treated==1,bv,drop = F],2,mean) - apply(D[D$treated==0,bv,drop = F],2,mean))/
                                                  sqrt( (apply(D[D$treated==1,bv,drop = F],2,sd)^2 + apply(D[D$treated==0,bv,drop = F],2,sd)^2)/2 ) )

                                #weighted continuous
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = T)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = T)
                                p_treat_var <- ( sum(w_ATE[D$treated==1]) / (sum(w_ATE[D$treated==1])^2 - sum(w_ATE[D$treated==1]^2)) )* 
                                  t(w_ATE[D$treated==1]) %*% ((D[D$treated==1,bv] - c(p_treat))^2)
                                p_contr_var <- ( sum(w_ATE[D$treated==0]) / (sum(w_ATE[D$treated==0])^2 - sum(w_ATE[D$treated==0]^2)) ) * 
                                  t(w_ATE[D$treated==0]) %*% ((D[D$treated==0,bv] - c(p_contr))^2)
                                
                                balance_matrix_w[bv,s] <- abs(100* (p_treat - p_contr ) / 
                                            sqrt( (p_treat_var + p_contr_var )/2 ) )
                              }

                          }
    }
} 
  

for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum(sum((as.numeric(resultsArray_ATT[iQ,'ATT',])*(N_sp/N)), na.rm = T), na.rm = T)
      results[iQ,2] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'se',])*(N_sp/N)), na.rm = T), na.rm = T)
      results[iQ,3] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y1',])*(N_sp/N)), na.rm = T), na.rm = T)
      results[iQ,4] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y0',])*(N_sp/N)), na.rm = T), na.rm = T)
}

results<-cbind(O_vars, results) 
colnames(results) <- c('O_vars', 'ATT', 'se','Y1', 'Y0')

results_bv <- array(NA, dim=c(length(list_vars),2)) 
dimnames(results_bv)[[1]] <- c(list_vars)
dimnames(results_bv)[[2]] <- c("unweighted", "weighted")


for (bVar in 1:length(list_vars)){
      results_bv[bVar,1] <- abs(sum(sum((balance_matrix_w[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
      results_bv[bVar,2] <- abs(sum(sum((balance_matrix_un[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
}

We can explore the sensitivity of our results by displaying the average treatment effects on the treated (ATTs) computing by different approach by inverse probability weighting.

Graph 9: The average treatment effects on the treated (ATTs) estimated for the participation in the [Component] Projects and programmes - Projects - Guidance during 2017

graphATT

Graph 10: Balance of the groups of participants and eligible before and after weighting x

BVDF <- data.frame(balance_vars = c(rownames(results_bv),rownames(results_bv)),
                   balance = c(results_bv[,1], results_bv[,2]),
                   Balance = c(rep("before weighting",nrow(results_bv)),rep("after weighting",nrow(results_bv))))
                          
BVDF %>% subset(!is.na(balance)) %>% 
    mutate(balance_vars = fct_reorder(balance_vars, balance)) %>%
    ggplot(aes(x=balance, y=balance_vars,col=Balance)) + 
    geom_point() +
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Zdroj: ÚPSVaR")+
    geom_vline(xintercept = 0.0, color = 'darkgrey')

Also, we can show a proxy of the employment rate to see a share of persons out of the register of unemployed JSs observed at the first day of quarters before and after the start of participation (Quarters 0).

Graph 11: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(DataSample[DataSample$treated == T,o,drop = F], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(DataSample[DataSample$treated == F,o,drop = F], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  labs(
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

The last one we report the ATTs in table computing computing by different approach by inverse probability weighting.

Table 8: The average treatment effect on the treated

resultsDF %>% kbl(format = 'html', booktabs = T , align = 'c') %>%
  column_spec(1,  border_right = T) %>%
  kable_paper('hover', full_width = F) %>%
  kableExtra::footnote(number = paste('Signif. codes:',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))

O_vars	effect	results_sd	pval	Significance
empl.4	-0.0002486	0.0000000	0.0000000	***
empl.3	0.0002496	0.0000000	0.0000000	***
empl.2	-0.0002486	0.0004975	0.6173236
empl.1	-0.0002486	0.0000000	0.0000000	***
empl0	0.0038785	0.0094789	0.6824136
empl1	-0.1969066	0.0079905	0.0000000	***
empl2	-0.4661499	0.0092662	0.0000000	***
empl3	-0.0839133	0.0122433	0.0000000	***
empl4	-0.0105653	0.0172451	0.5401036
empl5	-0.0345569	0.0177388	0.0514030	.
empl6	-0.0461187	0.0144944	0.0014635	**
empl7	-0.0428522	0.0112426	0.0001381	***
empl8	-0.0413950	0.0117440	0.0004238	***
empl9	-0.0358683	0.0124874	0.0040740	**
empl10	-0.0423483	0.0124846	0.0006937	***
empl11	-0.0368185	0.0089782	0.0000412	***
empl12	-0.0429350	0.0117324	0.0002527	***
firstempl	0.3450846	0.0854396	0.0000537	***
cumempl	-1.0773069	0.0786759	0.0000000	***
¹ Signif. codes: 0 " * " 0.001 " " 0.01 " * " 0.05 " . " 0.1 " " 1

References

https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html

LMP Qualitative data in MS Excel, downloaded from (accessed at 14th of April 2021) : https://ec.europa.eu/social/main.jsp?catId=1143&intPageId=3227&langId=en