1) Abstract:

This paper uses ordered choices logit model to build a model for career satisfaction of programmers in Eastern European countries. The data is based on the online survey results collected by StackOverlFlow.com1. The latter is one of the most popular online communities for programmers in the world. In this paper, we explained dependent variable career satisfaction with different independent variables that was selected using ordered choices logit model and verified by statistical tests. To build the econometric model, we used available packages in R such as MASS, lmtest and oglmx. The end result is a final model with significant variables that best explains the career satisfaction of programmers in Eastern Europe.

Keywords: career satisfaction, ordered choices logit model, ordered logit model, polr, ologit.reg, lmtest, anova

2) Introduction:

“The only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle.”

— Steve Jobs

Today, it is very important for the organizations to stay competitive on the market. To sustain competitive advantage, organizations need to attract and keep highly talented people. Many companies are trying to be number one choice for the employees on the market. In general, career satisfaction is very much connected to life satisfaction (Joo and Park 2009)2.

Growing demand on IT professionals has increased salaries. Companies are trying to study psychological aspects of employees to design strategies how to keep highly talented people in the company and assure their career satisfaction. Career satisfaction leads employees to high job performance. On the other hand, companies are trying to achieve lower levels of turnover (Irvine and Evans,1995)3.

In our study we used data collected by Stack Overflow during the annual survey in 2018 designed for programmers. Stack Overflow is a community forum for programmers. In this forum the users are able to ask questions about their programs, and one is able to reply to them sharing his knowledge.

We will verify the following hypothesis using ordered choices logit model:

Hypothesis 1: Higher salary increases career satisfaction In our study we will examine the relationship between salary and career satisfaction. During the career, a person changes the job from one company to another when there is a better offer. Biggest part of the offer is salary levels. When a person is satisfied with the salary, stays in the company longer and claims that she has a higher level of career satisfaction.

Hypothesis 2: Having hobby increases career satisfaction We will describe the relationship between having hobby and career satisfaction. We think the people with hobbies are more satisfied with their career and find time to express themselves by having hobbies. Also, people who do not have hobbies are more concerned first to reach career satisfaction and then take some time for hobbies.

3) Literature review:

There are many researches conducted to study career satisfaction and the variables that are influencing it. Factors that affect career satisfaction vary, such as education (Andres and Henry 1963)4, age (Lawler and Porter 1966)5, pay and gender (Mason 1997)6.

At a personal, level career satisfaction is defined if a person has meaningful accomplishments in career (Judge et al., 1995)7. Goal orientation is also one of the influential factor in performance and career satisfaction (Kozlowski et al., 2001)8. Career satisfaction is related to job satisfaction. There are many jobs that a person can take during their career. The current job satisfaction is very much related to general career satisfaction. Individuals who have higher level of job satisfaction have better job performance, they are less likely to leave the company and they are committed more to the company than those who have low level of job satisfaction (Fried and Ferris 1987)9. On the other hand, the problem about employee turnover is always actual and requires well established HR strategies by the companies (Joo and McLean 2006)10.

Interesting conclusions are derived from the study conducted by Nan Hu and the team11. The is an empirical investigation on how different factors affect job satisfaction for information technology professionals. IT professionals are more likely to be satisfied with their job when the job opportunity is aligned with their career expectations (Hu et al., 2004)12. In our study we will also examine the relationship between career satisfaction and programmers current job satisfaction. Also, IT professionals are more likely to be happy with their job when they have decision power and are given chances to influence with recommendations. However, we will not include these factors in our model as we lack information about these factors. It is worth mentioning that there are some job duties that are demotivating IT professionals (Hu et al., 2004)13.

LeDuc broadly overviews “motivation of progammers” in his paper. He groups motivation into two groups internal and external motivation. Internal motivation has to do with person’s needs and satisfiers. The external motivation is mostly related to work environmet (LeDuc 1980)14. In their studies, Couger and Zawacki reviewed programmers motivation in different dimensions. In particular, they discussed importance of variety of skills in relation with job satisfaction. What is the role of tasks significance and autonomy in responsibilities and how they related to programmers satisfaction. Study incorporates many psychological aspects and analyzes Maslow’s hierarchy of need connected to the motivation and satisfaction of programmers at their jobs (Couger and Zawacki 1978)15.

4) Data description:

Stack Overflow is a community forum for programmers. In this forum the users are able to ask questions about their programs, and one is able to reply to them sharing his knowledge.

Being one of the most known community forums of programmers around the world, it is a good place to develop a survey in order to know more information about programmers. Every year Stack Overflow develops a voluntary survey; however, the main data object of this paper is the survey developed in January 2018, which is available in the website Kaggle. The survey was answered by almost 98 thousand respondents and 129 different questions, rows and columns respectively of the dataset.

[1] “The dataset survey initialy has 98855 rows and 129 columns”

4.1) Data preparation:

During the process of cleaning the data several strategies were applied.

First of all, we selected several columns, based on our intuition, trying to answer the following question “Which questions of the survey can explain the best the career satisfaction of the programmers?”.

Eight survey questions were selected, together with the dependent variable of our model – CareerSatisfaction. More details about the questions can be found below, linked to the columns of our data:

  • CareerSatisfaction – Overall, how satisfied are you with your career thus far?
  • Country - In which country do you currently reside?
  • FormalEducation - Which of the following best describes the highest level of formal education that you have completed?
  • CompanySize - Approximately how many people are employed by the company or organization you work for?
  • YearsCoding - For how many years have you coded professionally (as a part of your work)?
  • JobSatisfaction – How satisfied are you with your current job? If you work more than one job, please answer regarding the one you spend the most hours on.
  • ConvertedSalary – What is your current gross salary (before taxes and deductions)? Please enter a whole number in the box below, without any punctuation. If you are paid hourly, please estimate an equivalent weekly, monthly, or yearly salary. If you prefer not to answer, please leave the box empty. *NOTE: (There was an option to select the currency. After, the salary was converted to annual USD salaries using the exchange rate on 2018-01-18, assuming 12 working months and 50 working weeks)
  • Hobby - Do you code as a hobby?
  • Age - What is your age? If you prefer not to answer, you may leave this question blank.

In some questions the respondents had the option to not reply, having the option to leave in blank some of them. For this reason, we excluded the missing value and nulls, consequently we assume that programmers are people who are employed and receive salary. This assumption was motivated by the empty values in the columns ConvertedSalary and CompanySize. Apart from that, we considered only programmers that gave their Age.

We will check if there is any NA or zero values:

[1] FALSE

[1] FALSE

Both of the results were FALSE, hence we can continue farther.

The next step was to select ten Eastern countries to study: Poland, Czech Republic, Hungary, Slovakia, Romania, Bulgaria, Turkey, Moldova, Belarus and Ukraine.

Subsequently, each variable selected was analyzed separately, in order to find some outliers in case of numeric variables and to encode the data in case of ordinal features:

  • CareerSatisfaction: This characteristic feature is the dependent variable, and has seven levels which can be ordered in a logical way, from extremely dissatisfied to Extremely satisfied. For the purpoese of this analysis it will be transformed to three levels, because it is difficult to define the difference between extremly disatisfied, moderately disatisfied and slighty disatisfied, the same for satisfied.

Below more details about the encoding can be found:

  • Disatisfied: “Extremely dissatisfied”, “Moderately dissatisfied” and “Slightly dissatisfied”, encoded to 1
  • Neutral: “Neither satisfied nor dissatisfied”, encoded to 2
  • Satisfied: “Slightly satisfied”, “Moderately satisfied” and “Extremely satisfied”, encoded to 3

[1] “Slightly satisfied” “Slightly dissatisfied”
[3] “Moderately dissatisfied” “Extremely satisfied”
[5] “Moderately satisfied” “Neither satisfied nor dissatisfied” [7] “Extremely dissatisfied”

  • Country: This characteristic feature was previously described. However, after removing missing values there is one country less (Moldova), hence we will have nine countries. It cannot be ordered in a logical way, hence it will be encoded, but it will be transformed to factor.

[1] “Poland” “Romania” “Turkey” “Slovakia”
[5] “Bulgaria” “Belarus” “Czech Republic” “Ukraine”
[9] “Hungary”

  • FormalEducation - This characteristic feature has nine levels that can be ordered in a logical way, from “I never completed any formal education”, to “Other doctoral degree (Ph.D, Ed.D., etc.)”.

Because this feature can be ordered in a logical way, it will be grouped in three categories:

  • Low level of education: “I never completed any formal education”, “Primary/elementary school” and “Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)”, encoded to 1
  • Medium level of education: “Professional degree (JD, MD, etc.)”, “Some college/university study without earning a degree” and “Associate degree”, encoded to 2
  • High level of education: “Bachelor’s degree (BA, BS, B.Eng., etc.)”, “Master’s degree (MA, MS, M.Eng., MBA, etc.)” and “Other doctoral degree (Ph.D, Ed.D., etc.)”, encoded to 3

Below more details about the encoding can be found:

[1] “Some college/university study without earning a degree”
[2] “Master’s degree (MA, MS, M.Eng., MBA, etc.)”
[3] “Bachelor’s degree (BA, BS, B.Eng., etc.)”
[4] “Professional degree (JD, MD, etc.)”
[5] “Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)” [6] “Other doctoral degree (Ph.D, Ed.D., etc.)”
[7] “Primary/elementary school”
[8] “Associate degree”
[9] “I never completed any formal education”

  • CompanySize - This characteristic feature has eight levels that can be ordered in a logical way, from “Fewer than 10 employees”, to “10,000 or more employees”.

Because this feature can be ordered in a logical way, it will be grouped in three categories:

  • Small company size: “Fewer than 10 employees”, “10 to 19 employees” and “20 to 99 employees”, encoded to 1
  • Medium company size: “100 to 499 employees” and “500 to 999 employees”, encoded to 2
  • Big comopany size: “1,000 to 4,999 employees”, “5,000 to 9,999 employees” and “10,000 or more employees”, encoded to 3

Below more details about the encoding can be found:

[1] “20 to 99 employees” “10,000 or more employees” [3] “1,000 to 4,999 employees” “5,000 to 9,999 employees” [5] “100 to 499 employees” “500 to 999 employees”
[7] “Fewer than 10 employees” “10 to 19 employees”

  • YearsCoding: This characteristic feature has eleven levels that can be ordered in a logical way, from “0-2 years”, to “30 or more years”.
    Because this feature can be ordered in a logical way, it will be grouped in three categories:

  • Short experience coding: “0-2 years” and “3-5 years”, encoded to 1
  • Medium experience coding: “6-8 years”, “9-11 years”, “12-14 years” and “15-17 years”, encoded to 2
  • Long experience coding: “18-20 years”, “21-23 years”, “24-26 years”, “27-29 years”, “30 or more years” and encoded to 3

Below more details about the encoding can be found:

[1] “3-5 years” “6-8 years” “0-2 years” “12-14 years”
[5] “9-11 years” “24-26 years” “18-20 years” “15-17 years”
[9] “30 or more years” “21-23 years” “27-29 years”

  • JobSatisfaction: This characteristic feature has seven levels which can be ordered in a logical way, from “Extremely dissatisfied”, to “Extremely satisfied”, the same levels as the dependent variable.

Because this feature can be ordered in a logical way, it will be grouped in three categories:

  • Disatisfied: “Extremely dissatisfied”, “Moderately dissatisfied” and “Slightly dissatisfied”, encoded to 1
  • Neutral: “Neither satisfied nor dissatisfied”, encoded to 2
  • Satisfied: “Slightly satisfied”, “Moderately satisfied” and “Extremely satisfied”, encoded to 3

Below more details about the encoding can be found:

[1] “Slightly satisfied” “Moderately satisfied”
[3] “Slightly dissatisfied” “Neither satisfied nor dissatisfied” [5] “Extremely satisfied” “Extremely dissatisfied”
[7] “Moderately dissatisfied”

  • ConvertedSalary: This is a numerical continuous variable. This feature was analyzed in a graph in order to find some outliers. Finally, twenty-two observations were deleted. All of them were greater than the cutoff point selected – 250,000 $. Finally all the values were divided by 1000.

Below we can find a scatter plot of this feature, in red we can find the cutoff point:

We can find below the amount of observation which ConvertedSalary is greater than the cutoff point:

[1] 22

As we can we will remove 22 outliers:

Finally as mentioned before, all of the salaries will be divided by 1000:

  • Hobby - This characteristic binomial feature, “Yes” or “No”, was respectively transformed to “1” and “0”.

[1] “No” “Yes”

  • Age - This characteristic feature has six levels that can be ordered in a logical way, from “Under 18 years old”, to “55 - 64 years old”.
    Because this feature can be ordered in a logical way, it will be grouped in three categories:

  • Early Adulthood and Adolescence: “Under 18 years old” and “18 - 24 years old”, encoded to 1
  • Midlife: “25 - 34 years old” and “35 - 44 years old”, encoded to 2
  • Mature Adulthood: “45 - 54 years old” and “55 - 64 years old”, encoded to 3

Below can be found more details about the encoding:

[1] “25 - 34 years old” “45 - 54 years old” “35 - 44 years old” [4] “18 - 24 years old” “Under 18 years old” “55 - 64 years old”

As we saw above, the characteristics features were transformed to factors.

After cleaning the data, the dataset contains 2,248 observations and 9 features.

[1] “The dataset survey after the preparation has 2248 rows and 9 columns”

Finally, we will change the names of the columns, in order to remove the capital letters:

4.2) Data representation:

Below there is a summary of the dataset:

vars n mean sd median trimmed mad min max range skew kurtosis se
career.satisfaction* 1 2248 2.54 0.80 3.00 2.67 0.00 1.00 3.00 2.00 -1.28 -0.21 0.02
country* 2 2248 5.46 2.29 5.00 5.51 2.97 1.00 9.00 8.00 0.00 -0.86 0.05
education* 3 2248 2.67 0.62 3.00 2.82 0.00 1.00 3.00 2.00 -1.70 1.59 0.01
company.size* 4 2248 1.81 0.82 2.00 1.76 1.48 1.00 3.00 2.00 0.37 -1.42 0.02
years.coding* 5 2248 1.81 0.61 2.00 1.76 0.00 1.00 3.00 2.00 0.12 -0.47 0.01
job.satisfaction* 6 2248 2.45 0.85 3.00 2.56 0.00 1.00 3.00 2.00 -0.99 -0.88 0.02
salary 7 2248 31.27 20.92 28.24 29.03 17.68 0.04 228.89 228.85 2.48 14.21 0.44
hobby* 8 2248 1.81 0.39 2.00 1.88 0.00 1.00 2.00 1.00 -1.56 0.44 0.01
age* 9 2248 1.79 0.44 2.00 1.85 0.00 1.00 3.00 2.00 -0.91 0.19 0.01
*NOTE: The features wit h aster ik are `factors `.

In the next graph we can find the distribution by levels of the dependent variable.

As we can observe the data is imbalanced. There are much more people with the level 3 satisfied than the other levels, 1 disatisfied and 2 neutral.

Below we can find a representation of the covariates:

Analyzing the table and graph, one can extract several characteristics of the respondents: + Most of them are from Poland. + Their level of education is high. + The majority works in small/medium size companies. + Most of them are amateur, because in general they have not long experience. + Mostly, they are satisfied with their jobs. + They are young. + Coding is a hobby for most of them. + The most common average salary is around 31,000 $ per year.

5) Application of Econometric models:

5) Application of Econometric models:

In our study we will use ordered choices logit model to test hypotheses proposed in the introduction section.

Model for ordered choices first time was described in a regression form by McElvey and Zavoina in 197516. The model was inspired as to model data of individuals ordered responses and choices, and to create a method for estimations. The examples of this kind of data are bonds ratings, health status information, customer preferences surveys, satisfaction surveys and etc. The main idea is that outcomes can be ordered (Greene and Hensher 2009)17.

Our model is specified as follows:

HERE!!!

5.1) Ordered Choice Models:

Let’s estimate order logit for career.satisfaction. First way to do so is to use polr function. We will use career.satisfaction as dependent variable and covariates salary, hobby, and years.coding and current job.satisfaction as independent variables to explain career.satisfaction.

## Call:
## polr(formula = career.satisfaction ~ salary + hobby + years.coding, 
##     data = survey)
## 
## Coefficients:
##                  Value Std. Error t value
## salary         0.01596   0.003086  5.1713
## hobby1         0.35748   0.115252  3.1018
## years.coding2 -0.09336   0.113670 -0.8213
## years.coding3 -0.52333   0.182876 -2.8617
## 
## Intercepts:
##     Value   Std. Error t value
## 1|2 -0.7863  0.1355    -5.8020
## 2|3 -0.3665  0.1342    -2.7299
## 
## Residual Deviance: 3259.966 
## AIC: 3271.966

We obtained t-values which is defined as ratio between value and standard error.

Since we do not have p-value in the summary, we will call coefficients separately.

t test of coefficients:

            Estimate Std. Error t value     Pr(>|t|)    

salary 0.0159607 0.0030864 5.1713 0.0000002531 * hobby1 0.3574838 0.1152516 3.1018 0.001948 years.coding2 -0.0933610 0.1136700 -0.8213 0.411544
years.coding3 -0.5233335 0.1828759 -2.8617 0.004253 ** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu % Date and time: Fri, Sep 18, 2020 - 09:49:10

As a result, we can see that salary is the most significant, then comes hobby and years.coding2. While years.coding3 is the least significant variable in our model as p-value for this variable is more than 5% threshold.

visualizing the p-values of the variables:

Interpreting the parameters

We can interpret signs of the estimates, we cannot interpret their values quantitatively but only qualitatively.

salary: Marginal effect of the very first alternative, in our case it is “dissatisfied” will have the opposite sign than the parameter. So if salary increases then the probability of a person being “dissatisfied” with career decreases.

hobby: Same applies to the variable - hobby. A person with hobby has less probability to be “dissatisfied” with career.

years.coding2: The opposite is true with years.coding2. When a person has years.coding2, in other words, person with medium experience, more likely to be “dissatisfied” with career.

years.coding3: Person with years.coding3 or senior, more likely to be dissatisfied with career.

We cannot interpret all levels of career.satisfaction but only first and the last in relation with the signs of parameters.

5.2) Joint significance test:

In the model we used covariates salary, hobby, and years.coding. Let’s check if they are jointly significant. There is joint significance test in above output, so we will use likelihood ratio test. First, we will estimate model with a constant only and run lrtest to test hypotheses.

Likelihood ratio test

Model 1: career.satisfaction ~ salary + hobby + years.coding Model 2: career.satisfaction ~ 1 #Df LogLik Df Chisq Pr(>Chisq)
1 6 -1630.0
2 2 -1652.5 -4 44.981 0.000000004013 *** — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu % Date and time: Fri, Sep 18, 2020 - 09:49:12

In our case null hypothesis is that parameters salary=0, hobby=0, and years.coding=0. In other words they are insignificant.For this hypotheses we obtained test statistics equal to 44.981 and p-value is almost 0, much below than 5% threshold. Therefore, we have to reject null hypothesis that those three variables are jointly insignificant. So they are jointly significant.

5.3) Goodness-of-fit test:

To check the goodness of fit, we can use a couple of tests

  • Lipsitz test
  • Hosmer and Lemeshow test
  • Pulkstenis-Robinson chi-squared test
  • Brant test
## 
##  Lipsitz goodness of fit test for ordinal response models
## 
## data:  formula:  career.satisfaction ~ salary + hobby + years.coding
## LR statistic = 13.497, df = 9, p-value = 0.1414

Lipsitz goodness-of-fit test says that we cannot reject null hypothesis. P-value is greater than 5% threshold. This means that the fitted model satisfies the proportional odds assumption. In other words we have good specifications for the for the model.

## 
##  Hosmer and Lemeshow test (ordinal model)
## 
## data:  survey$career.satisfaction, fitted(ologit)
## X-squared = 4.2616, df = 7, p-value = 0.7492

Ordinal Logit Hosmer and Lemeshow test indicates also the same as in Lipsitz test, we cannot reject null hypothesis because p-value is more than 5% threshold, we have correct specifications for the model. If p-value was less than 5% it would meant that according to the test we have incorrect specifications. If we change number of groups, e.g. g = 10, the test says the same - p-value more than 5%.

## Warning in pulkrob.chisq(ologit, c("salary")): At least one cell in the expected
## frequencies table is < 1. Chi-square approximation may be incorrect.
## 
##  Pulkstenis-Robinson chi-squared test
## 
## data:  formula:  career.satisfaction ~ salary + hobby + years.coding
## X-squared = 2216.5, df = 2120, p-value = 0.07072

Pulkstenis-Robinson chi-squared test indicates the same as the other tests. P-value is more than 5%, we cannot reject null hypothesis, our model has correct specifications.

## -------------------------------------------- 
## Test for X2  df  probability 
## -------------------------------------------- 
## Omnibus      6.81    4   0.15
## salary       1.52    1   0.22
## hobby1       0.32    1   0.57
## years.coding2    0.84    1   0.36
## years.coding3    0.83    1   0.36
## -------------------------------------------- 
## 
## H0: Parallel Regression Assumption holds
##                      X2 df probability
## Omnibus       6.8094695  4   0.1463060
## salary        1.5181923  1   0.2178933
## hobby1        0.3187850  1   0.5723387
## years.coding2 0.8380564  1   0.3599532
## years.coding3 0.8285916  1   0.3626801

Brant test is to test the parallel regression assumption. It tests the main assumption of the order logit model that we have proportional odds assumption met, in other words if odds are constant. On any level of outcome variable level the odds should be the same. In the output p-value of omnibus test says that we cannot reject null hypothesis, therefore odds assumption met and we can use ordered logit model.

Order logit model with ologit.reg function

Let’s estimate the model with different function ologit.reg.

## Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
## response will be ignored
## Ordered Logit Regression 
## Log-Likelihood: -1634.179 
## No. Iterations: 5 
## McFadden's R2: 0.01107102 
## AIC: 3276.358 
##              Estimate Std. error t value       Pr(>|t|)    
## (Intercept) 0.7605456  0.1271131  5.9832 0.000000002188 ***
## salary      0.0129232  0.0027131  4.7633 0.000001904938 ***
## hobby1      0.3613091  0.1150414  3.1407       0.001686 ** 
## ----- Threshold Parameters -----
##                  Estimate Std. error t value              Pr(>|t|)    
## Threshold (2->3) 0.418400   0.031759  13.174 < 0.00000000000000022 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ologit.reg functions allows us to obtain p-values as well.

5.4) Marginal effects:

## Marginal Effects on Pr(Outcome==1)
##          Marg. Eff  Std. error t value    Pr(>|t|)    
## hobby1 -0.05923816  0.02006173 -2.9528    0.003149 ** 
## salary -0.00198087  0.00041124 -4.8169 0.000001458 ***
## ------------------------------------ 
## Marginal Effects on Pr(Outcome==2)
##          Marg. Eff  Std. error t value    Pr(>|t|)    
## hobby1 -0.01407499  0.00447761 -3.1434     0.00167 ** 
## salary -0.00051492  0.00011615 -4.4331 0.000009288 ***
## ------------------------------------ 
## Marginal Effects on Pr(Outcome==3)
##         Marg. Eff Std. error t value    Pr(>|t|)    
## hobby1 0.07331315 0.02436551  3.0089    0.002622 ** 
## salary 0.00249580 0.00051849  4.8136 0.000001482 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We will interpret marginal effects or each level of dependent variable from 1 to 3 or from “dissatisfied” to “satisfied”.

The first outcome = 1 is “dissatisfied” with career. People who have hobby has 5.9 pts less probability to be dissatisfied with career than those people who do not have hobby. If income increases by 1K USD, the probability of having this outcome decreases by 0.2%.

The second outcome =2 is “neutral”. People who have hobby has 1.4 pts less probability to have this outcome than those people who do not have hobby. If income increases by 1K USD, the probability of having this outcome decreases by 0.05%.

The third outcome =3 “satisfied”. People who have hobby has 7 pts more probability to have this outcome than those people who do not have hobby. If income increases by 1K USD, the probability of having this outcome increases by 2.5 pts.

These marginal effects are obtained for average characteristics in the sample. To obtain marginal effects on user given characteristics. Let’s call ome.R function. The function is written by Rafał Woźniak PhD. To use ome.R function we will need to create model using polr function.

## Call:
## polr(formula = career.satisfaction ~ salary + hobby, data = survey, 
##     method = "logistic")
## 
## Coefficients:
##          Value Std. Error t value
## salary 0.01292   0.002712   4.765
## hobby1 0.36133   0.115041   3.141
## 
## Intercepts:
##     Value   Std. Error t value
## 1|2 -0.7605  0.1271    -5.9831
## 2|3 -0.3421  0.1258    -2.7197
## 
## Residual Deviance: 3268.358 
## AIC: 3276.358

Let’s create user given characteristics for example salary at the level of 30K USD and people with hobby.

##        alternative1 alternative2 alternative3 at X=
## salary -0.001916047 -0.004348517   0.00243247    30
## hobby1 -0.053565036 -0.121567200   0.06800216     1

By calling ome function, we have a matrix where we see marginal effects of salary and hobby at given levels for each alternatives from 1 to 3. For example, people with hobby have 5.3 pts less probability of having alternative1 (dissatisfied with career).

5.5) General-to-specific approach

Let’s estimate ordered logit model using function ologit.reg:

## Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
## response will be ignored
## Ordered Logit Regression 
## Log-Likelihood: -1399.869 
## No. Iterations: 6 
## McFadden's R2: 0.1528643 
## AIC: 2843.739 
##                         Estimate Std. error t value              Pr(>|t|)    
## salary                 0.0138320  0.0034175  4.0474             0.0000518 ***
## years.coding2         -0.0428740  0.1301585 -0.3294              0.741855    
## years.coding3         -0.4585163  0.2140670 -2.1419              0.032199 *  
## education2            -0.0741672  0.2318729 -0.3199              0.749073    
## education3            -0.0511507  0.2074870 -0.2465              0.805276    
## company.size2          0.1869913  0.1272171  1.4699              0.141600    
## company.size3          0.1828526  0.1333849  1.3709              0.170417    
## hobby1                 0.3726120  0.1307996  2.8487              0.004389 ** 
## age2                  -0.2413011  0.1486651 -1.6231              0.104564    
## age3                   0.1117945  0.4568720  0.2447              0.806692    
## job.satisfaction2      0.5213072  0.1663351  3.1341              0.001724 ** 
## job.satisfaction3      2.2277039  0.1156480 19.2628 < 0.00000000000000022 ***
## countryBulgaria        0.3902385  0.3255863  1.1986              0.230695    
## countryCzech Republic  0.4642617  0.2903308  1.5991              0.109803    
## countryHungary         0.1195156  0.2993864  0.3992              0.689745    
## countryPoland          0.2726137  0.2528809  1.0780              0.281019    
## countryRomania         0.2884461  0.2781501  1.0370              0.299728    
## countrySlovakia        0.2436242  0.3832979  0.6356              0.525037    
## countryTurkey         -0.0284535  0.2672253 -0.1065              0.915203    
## countryUkraine         0.1222733  0.2703725  0.4522              0.651096    
## ----- Threshold Parameters -----
##                  Estimate Std. error t value Pr(>|t|)   
## Threshold (1->2)  0.53239    0.32768  1.6247 0.104226   
## Threshold (2->3)  1.05484    0.32830  3.2131 0.001313 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that there are some variables that are insignificant.

One school of econometric (London School of Economics) uses this approach to find out the significant variables for the model. We start with wider model. Some researchers start with all variables or just selecting the most important variables that we think that may affect the dependent variable. Then we will reduce the model step by step. The suggested approach is to test joint hypothesis step by step instead of simple hypothesis to avoid error.

Step 1

In step 1 let’s estimate model without insignificant variables. Firstly, we will create dummy variables for years.coding, education, company.size, age, job.satisfaction, country.

According to the model we can see that the significant variables are salary, hobby1, years.coding2(dummy variable = years.coding.middle), job.satisfaction2 (dummy variable = job.satisfaction.neutral), and job.satisfaction3 (dummy variable = job.satisfaction.satisfied).

Let’s estimate model with mentioned variables so without insignificant variables.

## Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
## response will be ignored
## Ordered Logit Regression 
## Log-Likelihood: -1410.814 
## No. Iterations: 6 
## McFadden's R2: 0.1462409 
## AIC: 2835.629 
##                      Estimate Std. error t value              Pr(>|t|)    
## salary              0.0111059  0.0029402  3.7773             0.0001586 ***
## years.coding.middle 0.0439241  0.1080722  0.4064             0.6844248    
## hobby1              0.4262591  0.1275662  3.3415             0.0008333 ***
## job.neutral         0.5273316  0.1642122  3.2113             0.0013214 ** 
## job.satisfied       2.2509421  0.1143342 19.6874 < 0.00000000000000022 ***
## ----- Threshold Parameters -----
##                  Estimate Std. error t value         Pr(>|t|)    
## Threshold (1->2)  0.53800    0.16059  3.3501        0.0008079 ***
## Threshold (2->3)  1.05412    0.16195  6.5090 0.00000000007567 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s apply lrtest test to check whether all insignificant variables are jointly insignificant.

## Likelihood ratio test
## 
## Model 1: career.satisfaction ~ salary + years.coding + education + company.size + 
##     hobby + age + job.satisfaction + country
## Model 2: career.satisfaction ~ salary + years.coding.middle + hobby + 
##     job.neutral + job.satisfied
##   #Df  LogLik  Df Chisq Pr(>Chisq)
## 1  22 -1399.9                     
## 2   7 -1410.8 -15 21.89     0.1107

In our case null hypothesis is that parameters salary=0, hobby=0, and years.coding=0. In other words they are insignificant.For this hypotheses we obtained Chi-squared statistics equal to 19.286 and p-value is more than 5% threshold. Therefore, we cannot reject null hypothesis. So it is true that all insignificant variables are jointly insignificant. So they are jointly insignificant. We removed the insignificant variables all together at once. We will not need to move other steps of general-to-specific approach as we already see that we can remove all insignificant variables together. We will stop at step 1. We obtained final model for age:

HERE!!

In the final model we can see that career.satisfaction is explained with salary, hobby, years.coding.middle, job.neutral, and job.satisfied variables.

6) Results:

In the beginning, we described that we would test two hypothesis

Hypothesis 1: Higher salary increases career satisfaction

Hypothesis 2: Having hobby increases career satisfaction

We modeled our data with ordered choices logit model. We created different models with different variables. Also, models with significant and insignificant variables. We applied test to check goodness-of-fit and likelihood tests, we also applied general to specific approach and we come to the final model as shown in above section.

According to our final model and findings stemmed from the analysis we verified the first hypothesis. Salary variable is in thousand USD. When salary increases with 1K USD we see that people are more satisfied with career. This is normal also in real life. People are seeking financial stability. Moving from one company to another and moving up to the ladder of the career to have better financial situation. The more salary they earn it is more likely to stay in the company and to claim that you are satisfied with your career.

The second hypothesis very interesting. Our analysis proved that people who have hobby are more satisfied with career comparing with people who do not have hobby. If we take example from life, when we are satisfied with we have time to do something we are passionate about outside of our everyday work environment. For IT professionals, having hobby could be even crucial as their work could be very stressful.

Outcomes of the study may help HR strategists to define strategies that aims increase of career satisfaction among IT employees. Making clear what are the factors of salary increase could be important for career satisfaction of programmers that would lead them to be more committed to the company.

Nevertheless, there are many aspects that could further develop the study. Incorporating additional demographic and psychological variables could improve the efficiency of the model.

7) Findings

The main finding of this study is that the main factors that impact on career satisfaction are salary, hobby and current job satisfaction. Person that said that she is satisfied with current job, has hobby and high salary is more likely to be satisfied with career.

On the other hand, we came to conclusions that age, numbers of years spent on coding and the county does not impact career satisfaction. Programmers in different central and eastern European countries have similar results. It is also very surprising that career satisfaction is not related to the age.

There are some limitations to our study. Although our dataset contains more than 2,248 observations the data for the efficient econometric model could be larger. It would be interesting to include additional variables that have impact on career satisfaction such as demography, family status, job orientation, and etc. For example, people who has more decision power and have more rights at work to influence with their recommendations they are more likely to be satisfied with the career than those who do not have any decision power at work. Also, those professionals who feel more that their work is aligned to the company strategy may feel more satisfied with the career.

For future studies it is good to compare western and eastern countries. Introducing other western countries into the model we may have different outcome. We expect country variable to become more significant. Also, job orientation, managerial statuses and related variables may influence the model. In the introduction we described how strong relationship is between life satisfaction and career satisfaction. Incorporating psychological aspects to the study may have different settings of the final model.

9) Bibliography:


  1. Online community for developers worldwide https://stackoverflow.com/questions

  2. Joo, B. and Park, S. (2009), “Career satisfaction, organizational commitment, and turnover intention”, Leadership & Organizational Development Journal, Vol. 21 No. 6, pp. 482-486.

  3. Irvine, D., & Evans, M. “Job satisfaction and turnover among nurses; integrating research findings across studies,” Nursing Research 44 (4), 1995, pp. 246-53.

  4. Andrews, I.R., and Henry, M.M. “Management Attitudes Toward Pay,” Industrial Relations, (3), 1963, pp. 29-39

  5. Lawler E.E., and Porter L.W., “Predicting Managers’ Pay and Their Satisfaction With Their Pay”, Personnel Psychology

  6. Mason, E.S. “A Case Study of Gender Differences in Job Satisfaction Subsequent To Implementation of An Insurance Firm in Canada, “British Journal of Management (8:2), 1997, pp.163-173.

  7. Judge, T.A., Cable, D.M., Boudreau, J.W. and Bretz, R.D. (1995), “An empirical investigation of the predictors of executive career success”, Personnel Psychology, Vol. 48 No. 3, pp. 485-519.

  8. Kozlowski, S.W.J., Gully, S.M., Brown, K.G., Salas, E., Smith, E.M. and Nason, E.R. (2001), “Effects of training goals and goal orientation traits on multidimensional training outcomes and performance adaptability”, Organizational Behavior and Human Decision Processes, Vol. 85 No. 1, pp. 1-31.

  9. Fried, Y., and Ferris, G.R. The validity of the job characteristics model. Personnel Psychology, 40, 2 (1987), 287-322

  10. Joo, B. and McLean, G.N. (2006), “Best employer studies: a conceptual model from a literature review and a case study”, Human Resource Development Review, Vol. 5 No. 2, pp. 228-57.

  11. Hu, Nan; Poon, Simon; Zhong, Jiangfan; and Wan, Yun, “Job Satisfaction of Information Technology Professionals” (2004). AMCIS 2004 pp. 3616-3623. http://aisel.aisnet.org/amcis2004/456

  12. Hu, Nan; Poon, Simon; Zhong, Jiangfan; and Wan, Yun, “Job Satisfaction of Information Technology Professionals” (2004). AMCIS 2004 pp. 3616-3623. http://aisel.aisnet.org/amcis2004/456

  13. Hu, Nan; Poon, Simon; Zhong, Jiangfan; and Wan, Yun, “Job Satisfaction of Information Technology Professionals” (2004). AMCIS 2004 pp. 3616-3623. http://aisel.aisnet.org/amcis2004/456

  14. A. L. LeDuc Jr, 1980, Motivation of Programmers, ACM SIGMIS Database, 11 (4):4-12 https://dl.acm.org/doi/pdf/10.1145/1113469.1113470

  15. Couger, D.J., and Zawacki R.A., 1978, What Motivates DP Professionals?" Datamation, 24, 9 (September 1978), pp. 116ff.

  16. McElvey, R. and W. Zavoina, 1975. “A Statistical Model for the Analysis of Ordered Level Dependent Variables,” Journal of Mathematical Sociology, 4, pp. 103-120.

  17. Greene, W.H., and Hensher, D.A., 2009, Modeling Ordered Choices, pp. 83-90