Relationship between Perceived Country Education and Economical State with Employment Status

1 Introduction

1.1 Sample Description

The data set we’re exploring in this analysis is the European Social Survey (ESS) data, which is a biennial cross-national survey of attitudes, beliefs and behaviour in 30 countries in Europe. It started in 2001 and this data is the 10th round of ESS, which was conducted in 2020. The samples are representative of all persons aged 15 and over resident within private households in each country.

The focus sample of this analysis is to observe those who are available for work only, so we will exclude individuals who are still mainly pursuing education, unemployed but not looking for job, retired, permanently sick or disabled, serving community or military service, and houseworkers. So the subject of this analysis are individuals who were doing paid work in the last 7 days and those who were not but looking for a job.

1.2 Research Question

How is individuals’ perception of the effectiveness of the education system in their country, along with their satisfaction with the present state of the economy, related to their employment status?

1.3 Hypothesis

Employed people are more likely to perceive the education system as more effective and be satisfied with the present state of the economy in their country compared to those who are not employed.

1.4 Model Description

The model we’ll be using is logistic regression model.

2 Exploratory Data Analysis

2.1 Main Variables

Here are some of the important variables we’ll be using.

1. mainact : Main activity last 7 days
2. stfeco : How satisfied the respondent is with present state of economy in country
3. stfedu : How satisfied the respondent is with present state of education in country
4. agea : Age of the respondent
5. gndr : Gender of the respondent

Variable mainact will be utilised to form the target variable on whether the individual is employed or not, agea and gndr will be used for discriminatory performance analysis, and stfeco and stfedu as the variables for the main analysis.

Some of the other variables are crucial to the data set, but for the purpose of this analysis, we will only be using these variables.

For the target variable, we determine the value using the mainact information in the dataset. Those who reported working in a paid job in the last 7 days are labeled as employed (1), while those who reported not working, while actively seeking work, are labeled as unemployed (0). This value then stored in a target variable called empl_status. Rows where this information (employed or unemployed) was not available were dropped.

2.2 Univariate Analysis

Before getting into building the model, we can explore the data to find observable patterns or issues.

2.2.1 Statistical Summary

First, we can observe the statistical distribution from each variable

## [1] "Sample Size : 1556"
##      stfeco           stfedu            agea           gndr     empl_status
##  Min.   : 0.000   Min.   : 0.000   Min.   :15.00   female:869   0:  67     
##  1st Qu.: 4.000   1st Qu.: 5.000   1st Qu.:33.00   male  :687   1:1489     
##  Median : 6.000   Median : 7.000   Median :44.00                           
##  Mean   : 5.629   Mean   : 6.264   Mean   :43.58                           
##  3rd Qu.: 7.000   3rd Qu.: 8.000   3rd Qu.:54.00                           
##  Max.   :10.000   Max.   :10.000   Max.   :83.00

As we can see, the numerical data distributes normally, where the difference between the mean and median is minimal. For categorical variables, the distribution for gender is fairly balanced, but the distribution for employment status is skewed to the positive class. This means that the data consists of mostly employed individuals.

2.2.2 Distribution Plot

Next, we can plot the data distribution to observe if there are some interesting pattern within or if there are some issues with the data

As we can see, the distribution roughly spreads around the middle, which means the data spreads normally, except for perceived state of education in the country. There is a noticeable difference in frequency between each gender, but not significant enough.

Unlike the statistical summary where the skewness is not apparent, by observing the distribution plot - specifically the density curve, we can see that perceived state of education doesn’t spread normally. The distribution is slightly skewed to the right, which indicates the appearance of outliers.

We could further check this potential issue by looking at the boxplot

And as expected, the skewness of education state variable is caused by an outlier, in this case, 0. The data collection process already captured missing values or measurement errors, hence this is most likely a case of extreme values as opposed to sampling errors. So instead of removing these observations, we will manipulate the value to fit the data better.

2.3 Multivariate Analysis

In this step, we can start comparing two or more variables

2.3.1 Education State vs Employment Status

Based on the results, we can see that there are no observable patterns between perceived education state score and employment status. The value increases and decreases every few steps with no clear boundaries.

2.3.2 Economical State vs Employment Status

In contrast to education system score, we can see the tendency that the higher the perceived economical state score is, the lower the unemployment rate is and the higher the employment rate is. This shows the tendency for employed people to have a good perception of the country’s economical state.

2.3.3 Discriminatory Performance Analysis

Discriminatory performance analysis is a method utilised to observe at how the explanatory variables differentiate the target distribution.

In this analysis, we’ll check the distribution between age and gender towards employment status

By observing the chart above, we can say that within the data set, there is no apparent evidence that the employment status is affected by gender. There is precisely the same percentage of employed and unemployed individual in both males and females.

Similar with gender, there are no obvious patterns within the target variable distribution when compared to age, as the value increases and decreases regularly with no observable boundaries. This means that regardless of the age, an individual could be employed or unemployed, as neither has any tendency towards certain age range.

2.4 Data Wrangling

Next, we will modify some of the variables to be in better format for our analysis.

To reduce dimensionality, the scores in perceived education and economical state are grouped together based on their closest order. For example, those who answered 0-4 to the question regarding their satisfaction with the country’s economic situation are grouped under the label dissatisfied in the same category.

This could also solve the outliers problem in perceived education state variable, since the value 0 will be grouped with other non-outliers values.

As we can see, after binning the variable, the amount of dissatisfied individuals doesn’t have much difference compared to the other groups.

3 Binary Logistic Regression Model

3.1 Model Information

We’ll be using logistic regression model, with 3 different scenarios:

  1. empl_status as the target variable and stfedu_grp, stfeco_grp, agea, and gndr as the predictors

The purpose is to check the relationship between perceived education state, perceived economical state, age and gender with the employment status

  1. empl_status as the target variable and stfedu_grp:stfeco_grp, agea:gndr as the predictors

Using similar predictors as (1), but applying variable interactions between education state with economical state, and age with gender. The purpose is to see whether there’s a combination of values between the variables that provides more underlying pattern regarding the data distribution

  1. similar formula as (1), but using forward stepwise regression

The purpose is to brute force the model to see if it could observe better patterns from the data

3.2 Binary Model Outcomes

3.2.1 Coefficients

Model 1

##                              Coefficient Estimate P Value
## Intercept                                    1.73    0.00
## Education State : Neutral                    0.39    0.19
## Education State : Satisfied                  0.70    0.08
## Economical State : Neutral                   0.80    0.01
## Economical State : Satisfied                 1.10    0.01
## Age                                          0.01    0.18
## Gender : Male                               -0.13    0.60

Based on the model summary, we can see that some of the variables are statistically significant, with p-value below 0.05. These variables include economical state : neutral and economical state : satisfied, with positive values. This means that the perception of economical state of the country is related to the employment status of an individual, where employed individuals are far more likely to be satisfied with the economical state of the country.

On the other hand, the other variables’ coefficients are not statistically significant. So we can say that we don’t have enough evidence to show the relationship between perceived education state, age, and gender towards the employment status.

The coefficient estimate refers to the amount of impact one-unit increase of the variable has over the log-odds of the target variable. Positive values mean linear relationship and negative values mean inverse relationship.

For example, people who are satisfied with the economical state of the country has a coefficient of 1.09779. Initial value for this variable is 0, meaning the individual is not satisfied, while one-unit increase is 1, meaning the individual is satisfied. This means that if someone is satisfied with the country’s education system, the log odds of them to be employed increases by 1.09779.

Model 2

##                                                                    Coefficient Estimate
## Intercept                                                                          3.40
## Education State : Dissatisfied and Economical State : Dissatisfied                -1.73
## Education State : Neutral and Economical State : Dissatisfied                     -1.42
## Education State : Satisfied and Economical State : Dissatisfied                    0.15
## Education State : Dissatisfied and Economical State : Neutral                     -0.98
## Education State : Neutral and Economical State : Neutral                          -0.35
## Education State : Satisfied and Economical State : Neutral                        -0.47
## Education State : Dissatisfied and Economical State : Satisfied                   13.66
## Education State : Neutral and Economical State : Satisfied                        -0.31
## Age + Gender : Female                                                              0.01
## Age + Gender : Male                                                                0.01
##                                                                    P Value
## Intercept                                                             0.00
## Education State : Dissatisfied and Economical State : Dissatisfied    0.00
## Education State : Neutral and Economical State : Dissatisfied         0.01
## Education State : Satisfied and Economical State : Dissatisfied       0.89
## Education State : Dissatisfied and Economical State : Neutral         0.17
## Education State : Neutral and Economical State : Neutral              0.55
## Education State : Satisfied and Economical State : Neutral            0.46
## Education State : Dissatisfied and Economical State : Satisfied       0.99
## Education State : Neutral and Economical State : Satisfied            0.66
## Age + Gender : Female                                                 0.15
## Age + Gender : Male                                                   0.36

Compared to the first model, this model evaluates each education state with each economical state, and age with each gender instead of each variables separately.

For example, individuals who are not satisfied with both the education and economical state are far less likely to be employed, indicated by coefficients of -1.73 where negative coefficients mean that the variable has an inverse relationship with the target variable.

In contrast to the first model, neither the economical state : neutral nor the economical state : satisfied variation is statistically significant, but on the other hand, one of the variation from education state : dissatisfied is statistically significant. Here we have proven the advantage from variable interactions, where we can obtain previously unobserved patterns when using single variable.

Model 3

##                              Coefficient Estimate P Value
## Intercept                                    1.73    0.00
## Education State : Neutral                    0.39    0.19
## Education State : Satisfied                  0.70    0.08
## Economical State : Neutral                   0.80    0.01
## Economical State : Satisfied                 1.10    0.01
## Age                                          0.01    0.18
## Gender : Male                               -0.13    0.60

This model is similar with model 1. This means that the stepwise regression couldn’t find a better variable combination compared to the initial model.

3.2.2 Odds Ratio

Another measures we can use to perceive the relationship between predictor and target variables is Odds Ratio.

By checking at the odds ratio, we can further emphasize our points. The odds ratio represents the multiplicative effect of a one-unit increase in the corresponding predictor variable on the odds of the outcome.

For example, the odds ratio estimate for the people who are satisfied with economical state of the country is 2.99, indicating that for each one-unit increase in this variable (if the individual is satisfied with the economical state of the country), the odds of being employed increase by a factor of 2.99, holding all other variables constant.

The odds ratio chart shows that there is a significant relationship between an individual being employed and satisfied of the country’s economical state.

4 Conclusion

In conclusion, we have proven the hypothesis

Employed people are more likely to be satisfied with the present state of the economy in their country

Unemployed people are more likely to be dissatisfied with the present state of both the education and the economy in their country

but we failed to prove the hypothesis

Employed people are more likely to be satisfied with the present state of the education in their country

This is shown by the p-value of the coefficient, where Economical State : Satisfied and Education State : Dissatisfied and Economical State : Dissatisfied both having p-value less than 0.05.

So while there is a tendency for employed people to be more satisfied with the country’s economical state, the same can’t be said with education system. On the other hand, there is a tendency for unemployed people to be dissatisfied with the country’s education and economical state.

Overall, this analysis reveals that there is a statistically significant relationship between perception of country’s economical state with employment status, where employed people tend to have a good perception and unemployed people tend to have a bad perception of the economical state of the country.