library(dplyr)
df = survey_results_public
cols = c(2:12, 17:23, 25)
df[,cols] %>% lapply(function(x) as.factor(x)) -> df[,cols]
df$YearsCode = ifelse(df$YearsCode == "Less than 1 year", 0, df$YearsCode)
df$YearsCode = ifelse(df$YearsCode == "More than 50 years", 51, df$YearsCode)
df$YearsCodePro = ifelse(df$YearsCodePro == "Less than 1 year", 0, df$YearsCode)
df$YearsCodePro = ifelse(df$YearsCodePro == "More than 50 years", 51, df$YearsCode)
df$BetterLife = as.factor(df$BetterLife)
cols = c(14, 78, 15, 16)
df[,cols] %>% lapply(function(x) as.numeric(x)) -> df[,cols]
## summary(df$MainBranch)
## I am a developer by profession 65679
## I am a student who is learning to code 10189
## I am not primarily a developer, but I write code sometimes as part of my work 7539
## I code primarily as a hobby 3340
## I used to be a developer by profession, but no longer am 1584
## NA's 552
0.1 Intro
“Will People Born Today Have a Better Life Than Their Parents?” [1], is a popular question to measure optimism and faith in a future.
Since programming is a popular job, and only 3.7 % (3340 out of 88331) does coding mostly as a hobby, while the rest is employed or going to be employed. In this homework I will look at coding as a profession, so I will try to link optimism with some of the working conditions. With the help of SEM Xanthopoulou at all [2] showed the major role of personal resources (f.e. optimism) in work environment. They proved that satisfaction with the job connected with optimism. Also, [3] found a correlation between early career and optimism, so we could expect to see more optimism among people who have started to work earlier. Cheng [4] uses theory of job adaption by Hulin [5]. Who wrote about gradual process of integration people into the institution of career. So we could expect that smaller experience could lead to a lower optimism.
0.2 RQ
- RQ1: Gender does not have influence on optimism
- RQ2: Satisfaction with the job have positive influence on optimism
- RQ3: Earlier start of a job have positive influence on optimism
- RQ4: Longer career have positive influence on optimism
lets take a look at our data, and data that we do not have
library(mice)
library(VIM)
#md.pattern(df1)
mice_plot <- aggr(df1, col=c('navyblue','yellow'), numbers=TRUE, sortVars=TRUE, labels=names(df1), cex.axis=.7, gap=3, ylab=c("Missing data","Pattern")
)
##
## Variables sorted by number of missings:
## Variable Count
## JobSat 0.20133209
## CareerSat 0.18041695
## YearsCodePro 0.16504843
## Age 0.10882846
## LastHireDate 0.10158298
## Gender 0.03911884
## BetterLife 0.02940945
## Age1stCode 0.02019509
## Employment 0.01914877
## YearsCode 0.01063195
Most of the data is missing in Job or career satisfaction, since some people might have not experience any job. we could filter them out, and impute the other data.
df2 = na.omit(df1, cols = c(2,3, 9))
imputed_Data <- mice(df2, m=5, maxit = 50, method = 'pmm', seed = 1)
#summary(imputed_Data)
completeData <- complete(imputed_Data,2)
we will save this imputed data for later, to compare with our final model.
##
## Descriptive statistics by group
## group: No
## vars n mean sd median trimmed mad min max range skew
## Employment* 1 22972 1.23 0.61 1 1.04 0.00 1 3 2 2.40
## Gender* 2 22972 NaN NA NA NaN NA Inf -Inf -Inf NA
## CareerSat* 3 22972 3.45 1.36 3 3.56 1.48 1 5 4 -0.22
## JobSat* 4 22972 3.25 1.36 3 3.31 1.48 1 5 4 -0.05
## LastHireDate* 5 22972 3.15 1.66 4 3.13 1.48 1 6 5 -0.11
## YearsCode 6 22972 13.68 9.42 11 12.42 7.41 0 51 51 1.13
## YearsCodePro 7 22972 13.68 9.42 11 12.42 7.41 0 51 51 1.13
## Age1stCode 8 22972 15.33 4.84 15 15.02 4.45 5 65 60 1.08
## Age 9 22972 32.60 8.83 30 31.54 7.41 1 99 98 1.22
## BetterLife* 10 22972 1.00 0.00 1 1.00 0.00 1 1 0 NaN
## kurtosis se
## Employment* 4.03 0.00
## Gender* NA NA
## CareerSat* -1.11 0.01
## JobSat* -1.14 0.01
## LastHireDate* -1.50 0.01
## YearsCode 0.81 0.06
## YearsCodePro 0.81 0.06
## Age1stCode 3.70 0.03
## Age 2.16 0.06
## BetterLife* NaN 0.00
## ------------------------------------------------------------
## group: Yes
## vars n mean sd median trimmed mad min max range skew
## Employment* 1 38807 1.23 0.60 1 1.04 0.00 1 3 2 2.42
## Gender* 2 38807 NaN NA NA NaN NA Inf -Inf -Inf NA
## CareerSat* 3 38807 3.68 1.33 3 3.82 2.97 1 5 4 -0.45
## JobSat* 4 38807 3.36 1.37 3 3.45 1.48 1 5 4 -0.15
## LastHireDate* 5 38807 3.09 1.64 4 3.07 1.48 1 6 5 -0.08
## YearsCode 6 38807 12.23 8.42 10 11.04 7.41 0 51 51 1.32
## YearsCodePro 7 38807 12.23 8.42 10 11.04 7.41 0 51 51 1.32
## Age1stCode 8 38807 15.34 4.50 15 15.14 4.45 5 79 74 0.91
## Age 9 38807 30.77 7.94 29 29.80 5.93 1 99 98 1.38
## BetterLife* 10 38807 2.00 0.00 2 2.00 0.00 2 2 0 NaN
## kurtosis se
## Employment* 4.12 0.00
## Gender* NA NA
## CareerSat* -0.99 0.01
## JobSat* -1.15 0.01
## LastHireDate* -1.49 0.01
## YearsCode 1.68 0.04
## YearsCodePro 1.68 0.04
## Age1stCode 3.86 0.02
## Age 2.92 0.04
## BetterLife* NaN 0.00
Employment | Gender | CareerSat | JobSat | LastHireDate | YearsCode | YearsCodePro | Age1stCode | Age | BetterLife |
---|---|---|---|---|---|---|---|---|---|
Employed full-time | Man | Slightly satisfied | Slightly satisfied | 1-2 years ago | 3 | 3 | 22 | 28 | Yes |
Employed full-time | Man | Very satisfied | Slightly satisfied | Less than a year ago | 3 | 3 | 16 | 22 | Yes |
Employed full-time | Man | Very dissatisfied | Slightly dissatisfied | Less than a year ago | 16 | 16 | 14 | 30 | Yes |
Employed full-time | Man | Very satisfied | Slightly satisfied | 1-2 years ago | 13 | 13 | 15 | 28 | No |
Independent contractor, freelancer, or self-employed | Man | Slightly satisfied | Neither satisfied nor dissatisfied | NA - I am an independent contractor or self employed | 6 | 6 | 17 | 42 | No |
Employed full-time | Man | Slightly satisfied | Slightly satisfied | Less than a year ago | 12 | 12 | 11 | 23 | No |
lest check our cells, because we have to, but with such a big data there is a small chance of having less than 40 observations, also we have a huge dataset, so no problem with a minimal size
0.3 assumptions
## CareerSat
## BetterLife Neither satisfied nor dissatisfied Slightly dissatisfied
## No 2425 2721
## Yes 3131 3563
## CareerSat
## BetterLife Slightly satisfied Very dissatisfied Very satisfied
## No 8372 1034 8420
## Yes 13123 1911 17079
look at the outliers
par(mfrow=c(2,2))
boxplot(df2$YearsCode)
boxplot(df2$YearsCodePro)
boxplot(df2$Age1stCode)
boxplot(df2$Age)
RO <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
And remove them once
df2$YearsCode = RO(df2$YearsCode)
df2$YearsCodePro = RO(df2$YearsCodePro)
df2$Age1stCode = RO(df2$Age1stCode)
df2$Age = RO(df2$Age)
df2 %>% na.omit() -> df2
correlation
years of codePro is highly correlated with years of code, so lets exclude years o code, since we are talked about career
0.4 model
Lets build our base model
mlg1 <- glm(BetterLife ~ Employment + CareerSat + JobSat, data = df2, family = "binomial")
summary(mlg1)
##
## Call:
## glm(formula = BetterLife ~ Employment + CareerSat + JobSat, family = "binomial",
## data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5396 -1.3812 0.8822 0.9777 1.1348
##
## Coefficients:
## Estimate
## (Intercept) 0.290054
## EmploymentEmployed part-time 0.054970
## EmploymentIndependent contractor, freelancer, or self-employed 0.043073
## CareerSatSlightly dissatisfied 0.088125
## CareerSatSlightly satisfied 0.197477
## CareerSatVery dissatisfied 0.472137
## CareerSatVery satisfied 0.473048
## JobSatSlightly dissatisfied -0.048260
## JobSatSlightly satisfied 0.002233
## JobSatVery dissatisfied -0.189046
## JobSatVery satisfied -0.020303
## Std. Error
## (Intercept) 0.032924
## EmploymentEmployed part-time 0.044756
## EmploymentIndependent contractor, freelancer, or self-employed 0.030990
## CareerSatSlightly dissatisfied 0.039838
## CareerSatSlightly satisfied 0.032980
## CareerSatVery dissatisfied 0.053094
## CareerSatVery satisfied 0.035479
## JobSatSlightly dissatisfied 0.034275
## JobSatSlightly satisfied 0.030454
## JobSatVery dissatisfied 0.045441
## JobSatVery satisfied 0.033606
## z value Pr(>|z|)
## (Intercept) 8.810 < 2e-16
## EmploymentEmployed part-time 1.228 0.219
## EmploymentIndependent contractor, freelancer, or self-employed 1.390 0.165
## CareerSatSlightly dissatisfied 2.212 0.027
## CareerSatSlightly satisfied 5.988 2.13e-09
## CareerSatVery dissatisfied 8.893 < 2e-16
## CareerSatVery satisfied 13.333 < 2e-16
## JobSatSlightly dissatisfied -1.408 0.159
## JobSatSlightly satisfied 0.073 0.942
## JobSatVery dissatisfied -4.160 3.18e-05
## JobSatVery satisfied -0.604 0.546
##
## (Intercept) ***
## EmploymentEmployed part-time
## EmploymentIndependent contractor, freelancer, or self-employed
## CareerSatSlightly dissatisfied *
## CareerSatSlightly satisfied ***
## CareerSatVery dissatisfied ***
## CareerSatVery satisfied ***
## JobSatSlightly dissatisfied
## JobSatSlightly satisfied
## JobSatVery dissatisfied ***
## JobSatVery satisfied
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75357 on 57479 degrees of freedom
## Residual deviance: 74919 on 57469 degrees of freedom
## AIC: 74941
##
## Number of Fisher Scoring iterations: 4
As we see, I forgot to properly re-level factors, so we have to do it now
df2$CareerSat = factor(df2$CareerSat, levels = c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied"))
df2$JobSat = factor(df2$JobSat, levels = c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied"))
Now it should be much better, because the lowest satisfaction with the job and career is our base level
mlg2 <- glm(BetterLife ~ Employment + CareerSat + JobSat, data = df2, family = "binomial")
summary(mlg2)
##
## Call:
## glm(formula = BetterLife ~ Employment + CareerSat + JobSat, family = "binomial",
## data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5396 -1.3812 0.8822 0.9777 1.1348
##
## Coefficients:
## Estimate
## (Intercept) 0.5731454
## EmploymentEmployed part-time 0.0549699
## EmploymentIndependent contractor, freelancer, or self-employed 0.0430733
## CareerSatSlightly dissatisfied -0.3840118
## CareerSatNeither satisfied nor dissatisfied -0.4721371
## CareerSatSlightly satisfied -0.2746599
## CareerSatVery satisfied 0.0009108
## JobSatSlightly dissatisfied 0.1407860
## JobSatNeither satisfied nor dissatisfied 0.1890460
## JobSatSlightly satisfied 0.1912792
## JobSatVery satisfied 0.1687434
## Std. Error
## (Intercept) 0.0442484
## EmploymentEmployed part-time 0.0447565
## EmploymentIndependent contractor, freelancer, or self-employed 0.0309898
## CareerSatSlightly dissatisfied 0.0505812
## CareerSatNeither satisfied nor dissatisfied 0.0530936
## CareerSatSlightly satisfied 0.0482488
## CareerSatVery satisfied 0.0496199
## JobSatSlightly dissatisfied 0.0421903
## JobSatNeither satisfied nor dissatisfied 0.0454415
## JobSatSlightly satisfied 0.0418779
## JobSatVery satisfied 0.0439376
## z value Pr(>|z|)
## (Intercept) 12.953 < 2e-16
## EmploymentEmployed part-time 1.228 0.219372
## EmploymentIndependent contractor, freelancer, or self-employed 1.390 0.164553
## CareerSatSlightly dissatisfied -7.592 3.15e-14
## CareerSatNeither satisfied nor dissatisfied -8.893 < 2e-16
## CareerSatSlightly satisfied -5.693 1.25e-08
## CareerSatVery satisfied 0.018 0.985355
## JobSatSlightly dissatisfied 3.337 0.000847
## JobSatNeither satisfied nor dissatisfied 4.160 3.18e-05
## JobSatSlightly satisfied 4.568 4.93e-06
## JobSatVery satisfied 3.841 0.000123
##
## (Intercept) ***
## EmploymentEmployed part-time
## EmploymentIndependent contractor, freelancer, or self-employed
## CareerSatSlightly dissatisfied ***
## CareerSatNeither satisfied nor dissatisfied ***
## CareerSatSlightly satisfied ***
## CareerSatVery satisfied
## JobSatSlightly dissatisfied ***
## JobSatNeither satisfied nor dissatisfied ***
## JobSatSlightly satisfied ***
## JobSatVery satisfied ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75357 on 57479 degrees of freedom
## Residual deviance: 74919 on 57469 degrees of freedom
## AIC: 74941
##
## Number of Fisher Scoring iterations: 4
Employment is not significant so lets change it for years of code
mlg3 <- glm(BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode, data = df2, family = "binomial")
summary(mlg3)
##
## Call:
## glm(formula = BetterLife ~ CareerSat + JobSat + YearsCodePro +
## Age1stCode, family = "binomial", data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6340 -1.3586 0.8610 0.9621 1.2865
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) 0.905860 0.063770 14.205
## CareerSatSlightly dissatisfied -0.387145 0.050673 -7.640
## CareerSatNeither satisfied nor dissatisfied -0.485309 0.053202 -9.122
## CareerSatSlightly satisfied -0.279607 0.048328 -5.786
## CareerSatVery satisfied -0.009344 0.049699 -0.188
## JobSatSlightly dissatisfied 0.137550 0.042275 3.254
## JobSatNeither satisfied nor dissatisfied 0.177702 0.045525 3.903
## JobSatSlightly satisfied 0.189553 0.041956 4.518
## JobSatVery satisfied 0.178589 0.044023 4.057
## YearsCodePro -0.019979 0.001351 -14.791
## Age1stCode -0.005690 0.002375 -2.395
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## CareerSatSlightly dissatisfied 2.17e-14 ***
## CareerSatNeither satisfied nor dissatisfied < 2e-16 ***
## CareerSatSlightly satisfied 7.23e-09 ***
## CareerSatVery satisfied 0.85087
## JobSatSlightly dissatisfied 0.00114 **
## JobSatNeither satisfied nor dissatisfied 9.49e-05 ***
## JobSatSlightly satisfied 6.25e-06 ***
## JobSatVery satisfied 4.98e-05 ***
## YearsCodePro < 2e-16 ***
## Age1stCode 0.01661 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75357 on 57479 degrees of freedom
## Residual deviance: 74687 on 57469 degrees of freedom
## AIC: 74709
##
## Number of Fisher Scoring iterations: 4
and add gender, just because we have a tradition to add genders in our model
mlg4 <- glm(BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode + Gender + Age, data = df2, family = "binomial")
summary(mlg4)
##
## Call:
## glm(formula = BetterLife ~ CareerSat + JobSat + YearsCodePro +
## Age1stCode + Gender + Age, family = "binomial", data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0480 -1.3436 0.8413 0.9569 1.6490
##
## Coefficients:
## Estimate
## (Intercept) 1.483756
## CareerSatSlightly dissatisfied -0.375586
## CareerSatNeither satisfied nor dissatisfied -0.478794
## CareerSatSlightly satisfied -0.276521
## CareerSatVery satisfied -0.012703
## JobSatSlightly dissatisfied 0.130877
## JobSatNeither satisfied nor dissatisfied 0.159409
## JobSatSlightly satisfied 0.179867
## JobSatVery satisfied 0.180895
## YearsCodePro 0.003721
## Age1stCode 0.006921
## GenderMan;Non-binary, genderqueer, or gender non-conforming -0.466250
## GenderNon-binary, genderqueer, or gender non-conforming -0.683841
## GenderWoman -0.517619
## GenderWoman;Man 0.958924
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.436499
## GenderWoman;Non-binary, genderqueer, or gender non-conforming -1.142684
## Age -0.032988
## Std. Error
## (Intercept) 0.072939
## CareerSatSlightly dissatisfied 0.050927
## CareerSatNeither satisfied nor dissatisfied 0.053471
## CareerSatSlightly satisfied 0.048574
## CareerSatVery satisfied 0.049949
## JobSatSlightly dissatisfied 0.042476
## JobSatNeither satisfied nor dissatisfied 0.045745
## JobSatSlightly satisfied 0.042151
## JobSatVery satisfied 0.044232
## YearsCodePro 0.002131
## Age1stCode 0.002493
## GenderMan;Non-binary, genderqueer, or gender non-conforming 0.191272
## GenderNon-binary, genderqueer, or gender non-conforming 0.114235
## GenderWoman 0.033754
## GenderWoman;Man 0.413599
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.474418
## GenderWoman;Non-binary, genderqueer, or gender non-conforming 0.206510
## Age 0.002148
## z value
## (Intercept) 20.342
## CareerSatSlightly dissatisfied -7.375
## CareerSatNeither satisfied nor dissatisfied -8.954
## CareerSatSlightly satisfied -5.693
## CareerSatVery satisfied -0.254
## JobSatSlightly dissatisfied 3.081
## JobSatNeither satisfied nor dissatisfied 3.485
## JobSatSlightly satisfied 4.267
## JobSatVery satisfied 4.090
## YearsCodePro 1.746
## Age1stCode 2.776
## GenderMan;Non-binary, genderqueer, or gender non-conforming -2.438
## GenderNon-binary, genderqueer, or gender non-conforming -5.986
## GenderWoman -15.335
## GenderWoman;Man 2.318
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.920
## GenderWoman;Non-binary, genderqueer, or gender non-conforming -5.533
## Age -15.360
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## CareerSatSlightly dissatisfied 1.64e-13 ***
## CareerSatNeither satisfied nor dissatisfied < 2e-16 ***
## CareerSatSlightly satisfied 1.25e-08 ***
## CareerSatVery satisfied 0.799249
## JobSatSlightly dissatisfied 0.002062 **
## JobSatNeither satisfied nor dissatisfied 0.000493 ***
## JobSatSlightly satisfied 1.98e-05 ***
## JobSatVery satisfied 4.32e-05 ***
## YearsCodePro 0.080854 .
## Age1stCode 0.005503 **
## GenderMan;Non-binary, genderqueer, or gender non-conforming 0.014784 *
## GenderNon-binary, genderqueer, or gender non-conforming 2.15e-09 ***
## GenderWoman < 2e-16 ***
## GenderWoman;Man 0.020423 *
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.357535
## GenderWoman;Non-binary, genderqueer, or gender non-conforming 3.14e-08 ***
## Age < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75357 on 57479 degrees of freedom
## Residual deviance: 74136 on 57462 degrees of freedom
## AIC: 74172
##
## Number of Fisher Scoring iterations: 4
Check if gender improved our model
## Analysis of Deviance Table
##
## Model 1: BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode
## Model 2: BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode +
## Gender + Age
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 57469 74687
## 2 57462 74136 7 550.92 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
yes, It did, same as age
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: BetterLife
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 57479 75357
## CareerSat 4 411.69 57475 74945 < 2.2e-16 ***
## JobSat 4 22.56 57471 74922 0.0001549 ***
## YearsCodePro 1 229.74 57470 74693 < 2.2e-16 ***
## Age1stCode 1 5.74 57469 74687 0.0166171 *
## Gender 6 316.03 57463 74371 < 2.2e-16 ***
## Age 1 234.88 57462 74136 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
every predictor is significant. nice
Let`s write our model equation
cc = mlg4$coefficients
(eqn <- paste("Y =", paste(round(cc[1],2), paste(round(cc[-1],2), names(cc[-1]), sep=" * ", collapse=" + "), sep=" + "), "+ e"))
## [1] "Y = 1.48 + -0.38 * CareerSatSlightly dissatisfied + -0.48 * CareerSatNeither satisfied nor dissatisfied + -0.28 * CareerSatSlightly satisfied + -0.01 * CareerSatVery satisfied + 0.13 * JobSatSlightly dissatisfied + 0.16 * JobSatNeither satisfied nor dissatisfied + 0.18 * JobSatSlightly satisfied + 0.18 * JobSatVery satisfied + 0 * YearsCodePro + 0.01 * Age1stCode + -0.47 * GenderMan;Non-binary, genderqueer, or gender non-conforming + -0.68 * GenderNon-binary, genderqueer, or gender non-conforming + -0.52 * GenderWoman + 0.96 * GenderWoman;Man + 0.44 * GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming + -1.14 * GenderWoman;Non-binary, genderqueer, or gender non-conforming + -0.03 * Age + e"
## OR
## (Intercept) 4.4094747
## CareerSatSlightly dissatisfied 0.6868864
## CareerSatNeither satisfied nor dissatisfied 0.6195304
## CareerSatSlightly satisfied 0.7584177
## CareerSatVery satisfied 0.9873773
## JobSatSlightly dissatisfied 1.1398276
## JobSatNeither satisfied nor dissatisfied 1.1728174
## JobSatSlightly satisfied 1.1970578
## JobSatVery satisfied 1.1982894
## YearsCodePro 1.0037279
## Age1stCode 1.0069451
## GenderMan;Non-binary, genderqueer, or gender non-conforming 0.6273502
## GenderNon-binary, genderqueer, or gender non-conforming 0.5046746
## GenderWoman 0.5959381
## GenderWoman;Man 2.6088865
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 1.5472808
## GenderWoman;Non-binary, genderqueer, or gender non-conforming 0.3189619
## Age 0.9675503
## 2.5 %
## (Intercept) 3.8227457
## CareerSatSlightly dissatisfied 0.6214952
## CareerSatNeither satisfied nor dissatisfied 0.5577673
## CareerSatSlightly satisfied 0.6893486
## CareerSatVery satisfied 0.8950569
## JobSatSlightly dissatisfied 1.0487313
## JobSatNeither satisfied nor dissatisfied 1.0722096
## JobSatSlightly satisfied 1.1020595
## JobSatVery satisfied 1.0987073
## YearsCodePro 0.9995423
## Age1stCode 1.0020361
## GenderMan;Non-binary, genderqueer, or gender non-conforming 0.4314732
## GenderNon-binary, genderqueer, or gender non-conforming 0.4032243
## GenderWoman 0.5578043
## GenderWoman;Man 1.2346419
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.6445233
## GenderWoman;Non-binary, genderqueer, or gender non-conforming 0.2110720
## Age 0.9634859
## 97.5 %
## (Intercept) 5.0880546
## CareerSatSlightly dissatisfied 0.7588282
## CareerSatNeither satisfied nor dissatisfied 0.6878406
## CareerSatSlightly satisfied 0.8339485
## CareerSatVery satisfied 1.0886574
## JobSatSlightly dissatisfied 1.2387380
## JobSatNeither satisfied nor dissatisfied 1.2828051
## JobSatSlightly satisfied 1.3000733
## JobSatVery satisfied 1.3067331
## YearsCodePro 1.0079289
## Age1stCode 1.0118773
## GenderMan;Non-binary, genderqueer, or gender non-conforming 0.9151325
## GenderNon-binary, genderqueer, or gender non-conforming 0.6312645
## GenderWoman 0.6367214
## GenderWoman;Man 6.4064091
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 4.2885341
## GenderWoman;Non-binary, genderqueer, or gender non-conforming 0.4755201
## Age 0.9716319
so our baseline is a Very dissatisfied in career and job male with mean age and experience
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.0 25.0 29.0 30.1 34.0 50.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 12.00 15.00 15.09 18.00 27.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 6.0 10.0 11.6 15.0 33.0
## [1] 57.60989
that have almost mean chance of being optimistic (i.e. 57%)
while woman that is satisfied with the job and career with twice as much of coding experience and at the age 49 have
## [1] 89.77978
almost 90% chance to be optimistic about the future
Lets draw some picturies, to understand effect of satisfaction with the career better
newdata1 <- with(df2, data.frame(Age = mean(Age), Age1stCode = mean(Age1stCode), YearsCodePro = mean(YearsCodePro), Gender = "Man", JobSat = "Very dissatisfied", CareerSat = factor( c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied") )))
newdata1$rankP <- predict(mlg4, newdata = newdata1, type = "response")
newdata1
## Age Age1stCode YearsCodePro Gender JobSat
## 1 30.1006 15.0901 11.6039 Man Very dissatisfied
## 2 30.1006 15.0901 11.6039 Man Very dissatisfied
## 3 30.1006 15.0901 11.6039 Man Very dissatisfied
## 4 30.1006 15.0901 11.6039 Man Very dissatisfied
## 5 30.1006 15.0901 11.6039 Man Very dissatisfied
## CareerSat rankP
## 1 Very dissatisfied 0.6543962
## 2 Slightly dissatisfied 0.5653328
## 3 Neither satisfied nor dissatisfied 0.5398221
## 4 Slightly satisfied 0.5895001
## 5 Very satisfied 0.6515177
Neither satisfied nor dissatisfied with the career less good for the optimism, while being very dissatisfied or satisfied equally increases chance to be optimistic. (on 10%)
newdata2 = with(df2, data.frame(Age = mean(Age), Age1stCode = mean(Age1stCode), YearsCodePro =rep(seq(from = 0, to = 51, length.out = 100)), Gender = "Man", JobSat = "Very dissatisfied", CareerSat = factor(rep( c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied"), each = 100))))
newdata3 <- cbind(newdata2, predict(mlg4, newdata = newdata2, type = "link",
se = TRUE))
newdata3 <- within(newdata3, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
library(ggplot2)
ggplot(newdata3, aes(x = YearsCodePro, y = PredictedProb)) +
geom_ribbon(aes(ymin = LL,
ymax = UL, fill = CareerSat), alpha = 0.2) +
geom_line(aes(colour = CareerSat),
size = 1)
So at the begging of the carrer satisfaction with the career playes crusial role in probability of being optimistic.
While in the late career there is almost no difference in satisfaction, between all levels, exept for “Neither satisfied nor dissatisfied”, hovewer that could be explained as problem of my data, because there are not enough observations to make a valid prediction.
## llh llhNull G2 McFadden r2ML
## -3.706799e+04 -3.767831e+04 1.220640e+03 1.619818e-02 2.101201e-02
## r2CU
## 2.876580e-02
however, our model explain only 2% of our data
##
## Hosmer and Lemeshow test (binary model)
##
## data: df2$BetterLife, fitted(mlg4)
## X-squared = 9.2123, df = 8, p-value = 0.3247
and we could for sure say, that parameters in this model were chosen poorly. Good student (and researcher) will start the whole hw from the beginning, so I will move on.
0.5 diagnostic
residuals are not normal, there are still some outliers, that needs to be removed, and in Scale- Location observation cross the line. that a sign of a poor model fit.
library(broom)
model.data <- augment(mlg4) %>%
mutate(index = 1:n())
ggplot(model.data, aes(index, .std.resid)) +
geom_point(aes(color = BetterLife), alpha = .5) +
theme_bw()
residuals distributed pretty well, and we do not have any outliers here
## # A tibble: 0 x 15
## # … with 15 variables: BetterLife <fct>, CareerSat <fct>, JobSat <fct>,
## # YearsCodePro <dbl>, Age1stCode <dbl>, Gender <chr>, Age <dbl>,
## # .fitted <dbl>, .se.fit <dbl>, .resid <dbl>, .hat <dbl>, .sigma <dbl>,
## # .cooksd <dbl>, .std.resid <dbl>, index <int>
yes, nothing to remove
Multicollinearity
## GVIF Df GVIF^(1/(2*Df))
## CareerSat 2.321464 4 1.111016
## JobSat 2.326986 4 1.111346
## YearsCodePro 3.018257 1 1.737313
## Age1stCode 1.342531 1 1.158676
## Gender 1.018154 6 1.001500
## Age 2.523435 1 1.588532
nothing is over 5, so we are fine at least here.
0.6 Conclusion
Taking everything into consideration, proper Logistic regression supposed to start with a good descriptive statistics, where this research fails. On the other side, hypothesis were confirmed (RQ2- RQ4), while RQ1 needs more examination since I have not used to this classification.
However, overall poor model fit might be explained with the culture of programming, where common models of analisys could not be applied. So futher research should pay more attention to the specific case of https://stackoverflow.com/
0.7 Resources
[1] Inc, G. (2018, April 3). Americans More Optimistic About Future of Next Generation. Gallup.Com. https://news.gallup.com/poll/232076/americans-optimistic-future-next-generation.aspx
[2] Xanthopoulou, D., Bakker, A. B., Demerouti, E., & Schaufeli, W. B. (2007). The role of personal resources in the job demands-resources model. International Journal of Stress Management, 14(2), 121–141. https://doi.org/10.1037/1072-5245.14.2.121
[3] Burke, R. J. (1991). Early Work and Career Experiences of Female and Male Managers and Professionals: Reasons for Optimism? Canadian Journal of Administrative Sciences / Revue Canadienne Des Sciences de l’Administration, 8(4), 224–230. https://doi.org/10.1111/j.1936-4490.1991.tb00565.x
[4] Cheng, G. H.-L., & Chan, D. K.-S. (2008). Who Suffers More from Job Insecurity? A Meta-Analytic Review. Applied Psychology, 57(2), 272–303. https://doi.org/10.1111/j.1464-0597.2007.00312.x
[5] APA Handbook of Industrial and Organizational Psychology. (n.d.). Https://Www.Apa.Org. Retrieved February 14, 2020, from https://www.apa.org/pubs/books/4311502