hw1

0.1 Intro
0.2 RQ
0.3 assumptions
0.4 model
0.5 diagnostic
0.6 Conclusion
0.7 Resources

library(readr)
survey_results_public <- read_csv("survey_results_public.csv")

library(dplyr)
df = survey_results_public

cols = c(2:12, 17:23, 25)

df[,cols] %>% lapply(function(x) as.factor(x)) -> df[,cols]



df$YearsCode = ifelse(df$YearsCode == "Less than 1 year", 0, df$YearsCode)
df$YearsCode = ifelse(df$YearsCode == "More than 50 years", 51, df$YearsCode)

df$YearsCodePro = ifelse(df$YearsCodePro == "Less than 1 year", 0, df$YearsCode)
df$YearsCodePro = ifelse(df$YearsCodePro == "More than 50 years", 51, df$YearsCode)

df$BetterLife =  as.factor(df$BetterLife)

cols = c(14, 78, 15, 16)

df[,cols] %>% lapply(function(x) as.numeric(x)) -> df[,cols]

as.data.frame(summary(df$MainBranch))

##                                                                               summary(df$MainBranch)
## I am a developer by profession                                                                 65679
## I am a student who is learning to code                                                         10189
## I am not primarily a developer, but I write code sometimes as part of my work                   7539
## I code primarily as a hobby                                                                     3340
## I used to be a developer by profession, but no longer am                                        1584
## NA's                                                                                             552

0.1 Intro

“Will People Born Today Have a Better Life Than Their Parents?” [1], is a popular question to measure optimism and faith in a future.

Since programming is a popular job, and only 3.7 % (3340 out of 88331) does coding mostly as a hobby, while the rest is employed or going to be employed. In this homework I will look at coding as a profession, so I will try to link optimism with some of the working conditions. With the help of SEM Xanthopoulou at all [2] showed the major role of personal resources (f.e. optimism) in work environment. They proved that satisfaction with the job connected with optimism. Also, [3] found a correlation between early career and optimism, so we could expect to see more optimism among people who have started to work earlier. Cheng [4] uses theory of job adaption by Hulin [5]. Who wrote about gradual process of integration people into the institution of career. So we could expect that smaller experience could lead to a lower optimism.

0.2 RQ

RQ1: Gender does not have influence on optimism
RQ2: Satisfaction with the job have positive influence on optimism
RQ3: Earlier start of a job have positive influence on optimism
RQ4: Longer career have positive influence on optimism

lets take a look at our data, and data that we do not have

df %>% dplyr::select(Employment, Gender, CareerSat, JobSat, LastHireDate, YearsCode, YearsCodePro, Age1stCode, Age, BetterLife) -> df1

library(mice)
library(VIM)

#md.pattern(df1)

mice_plot <- aggr(df1, col=c('navyblue','yellow'), numbers=TRUE, sortVars=TRUE, labels=names(df1), cex.axis=.7, gap=3, ylab=c("Missing data","Pattern")
)

## 
##  Variables sorted by number of missings: 
##      Variable      Count
##        JobSat 0.20133209
##     CareerSat 0.18041695
##  YearsCodePro 0.16504843
##           Age 0.10882846
##  LastHireDate 0.10158298
##        Gender 0.03911884
##    BetterLife 0.02940945
##    Age1stCode 0.02019509
##    Employment 0.01914877
##     YearsCode 0.01063195

Most of the data is missing in Job or career satisfaction, since some people might have not experience any job. we could filter them out, and impute the other data.

df2 = na.omit(df1, cols = c(2,3, 9))

imputed_Data <- mice(df2, m=5, maxit = 50, method = 'pmm', seed = 1)
#summary(imputed_Data)

completeData <- complete(imputed_Data,2)

we will save this imputed data for later, to compare with our final model.

library(psych)
library(kableExtra)
describe.by(df2, df2$BetterLife)

## 
##  Descriptive statistics by group 
## group: No
##               vars     n  mean   sd median trimmed  mad min  max range  skew
## Employment*      1 22972  1.23 0.61      1    1.04 0.00   1    3     2  2.40
## Gender*          2 22972   NaN   NA     NA     NaN   NA Inf -Inf  -Inf    NA
## CareerSat*       3 22972  3.45 1.36      3    3.56 1.48   1    5     4 -0.22
## JobSat*          4 22972  3.25 1.36      3    3.31 1.48   1    5     4 -0.05
## LastHireDate*    5 22972  3.15 1.66      4    3.13 1.48   1    6     5 -0.11
## YearsCode        6 22972 13.68 9.42     11   12.42 7.41   0   51    51  1.13
## YearsCodePro     7 22972 13.68 9.42     11   12.42 7.41   0   51    51  1.13
## Age1stCode       8 22972 15.33 4.84     15   15.02 4.45   5   65    60  1.08
## Age              9 22972 32.60 8.83     30   31.54 7.41   1   99    98  1.22
## BetterLife*     10 22972  1.00 0.00      1    1.00 0.00   1    1     0   NaN
##               kurtosis   se
## Employment*       4.03 0.00
## Gender*             NA   NA
## CareerSat*       -1.11 0.01
## JobSat*          -1.14 0.01
## LastHireDate*    -1.50 0.01
## YearsCode         0.81 0.06
## YearsCodePro      0.81 0.06
## Age1stCode        3.70 0.03
## Age               2.16 0.06
## BetterLife*        NaN 0.00
## ------------------------------------------------------------ 
## group: Yes
##               vars     n  mean   sd median trimmed  mad min  max range  skew
## Employment*      1 38807  1.23 0.60      1    1.04 0.00   1    3     2  2.42
## Gender*          2 38807   NaN   NA     NA     NaN   NA Inf -Inf  -Inf    NA
## CareerSat*       3 38807  3.68 1.33      3    3.82 2.97   1    5     4 -0.45
## JobSat*          4 38807  3.36 1.37      3    3.45 1.48   1    5     4 -0.15
## LastHireDate*    5 38807  3.09 1.64      4    3.07 1.48   1    6     5 -0.08
## YearsCode        6 38807 12.23 8.42     10   11.04 7.41   0   51    51  1.32
## YearsCodePro     7 38807 12.23 8.42     10   11.04 7.41   0   51    51  1.32
## Age1stCode       8 38807 15.34 4.50     15   15.14 4.45   5   79    74  0.91
## Age              9 38807 30.77 7.94     29   29.80 5.93   1   99    98  1.38
## BetterLife*     10 38807  2.00 0.00      2    2.00 0.00   2    2     0   NaN
##               kurtosis   se
## Employment*       4.12 0.00
## Gender*             NA   NA
## CareerSat*       -0.99 0.01
## JobSat*          -1.15 0.01
## LastHireDate*    -1.49 0.01
## YearsCode         1.68 0.04
## YearsCodePro      1.68 0.04
## Age1stCode        3.86 0.02
## Age               2.92 0.04
## BetterLife*        NaN 0.00

library(formattable)
formattable(head(df2))

Employment	Gender	CareerSat	JobSat	LastHireDate	YearsCode	YearsCodePro	Age1stCode	Age	BetterLife
Employed full-time	Man	Slightly satisfied	Slightly satisfied	1-2 years ago	3	3	22	28	Yes
Employed full-time	Man	Very satisfied	Slightly satisfied	Less than a year ago	3	3	16	22	Yes
Employed full-time	Man	Very dissatisfied	Slightly dissatisfied	Less than a year ago	16	16	14	30	Yes
Employed full-time	Man	Very satisfied	Slightly satisfied	1-2 years ago	13	13	15	28	No
Independent contractor, freelancer, or self-employed	Man	Slightly satisfied	Neither satisfied nor dissatisfied	NA - I am an independent contractor or self employed	6	6	17	42	No
Employed full-time	Man	Slightly satisfied	Slightly satisfied	Less than a year ago	12	12	11	23	No

lest check our cells, because we have to, but with such a big data there is a small chance of having less than 40 observations, also we have a huge dataset, so no problem with a minimal size

0.3 assumptions

xtabs(~ BetterLife + CareerSat, data = df2)

##           CareerSat
## BetterLife Neither satisfied nor dissatisfied Slightly dissatisfied
##        No                                2425                  2721
##        Yes                               3131                  3563
##           CareerSat
## BetterLife Slightly satisfied Very dissatisfied Very satisfied
##        No                8372              1034           8420
##        Yes              13123              1911          17079

look at the outliers

par(mfrow=c(2,2))
boxplot(df2$YearsCode)
boxplot(df2$YearsCodePro)
boxplot(df2$Age1stCode)
boxplot(df2$Age)

RO <- function(x, na.rm = TRUE, ...) {
  qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
  H <- 1.5 * IQR(x, na.rm = na.rm)
  y <- x
  y[x < (qnt[1] - H)] <- NA
  y[x > (qnt[2] + H)] <- NA
  y
}

And remove them once

df2$YearsCode = RO(df2$YearsCode)
df2$YearsCodePro = RO(df2$YearsCodePro)
df2$Age1stCode = RO(df2$Age1stCode)
df2$Age = RO(df2$Age)

df2 %>% na.omit() -> df2

correlation

library(corrplot)
correlations <- cor(df2[,c(6:9)])
corrplot(correlations, method="circle")

years of codePro is highly correlated with years of code, so lets exclude years o code, since we are talked about career

0.4 model

Lets build our base model

mlg1 <- glm(BetterLife ~ Employment + CareerSat + JobSat, data = df2, family = "binomial")
summary(mlg1)

## 
## Call:
## glm(formula = BetterLife ~ Employment + CareerSat + JobSat, family = "binomial", 
##     data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5396  -1.3812   0.8822   0.9777   1.1348  
## 
## Coefficients:
##                                                                 Estimate
## (Intercept)                                                     0.290054
## EmploymentEmployed part-time                                    0.054970
## EmploymentIndependent contractor, freelancer, or self-employed  0.043073
## CareerSatSlightly dissatisfied                                  0.088125
## CareerSatSlightly satisfied                                     0.197477
## CareerSatVery dissatisfied                                      0.472137
## CareerSatVery satisfied                                         0.473048
## JobSatSlightly dissatisfied                                    -0.048260
## JobSatSlightly satisfied                                        0.002233
## JobSatVery dissatisfied                                        -0.189046
## JobSatVery satisfied                                           -0.020303
##                                                                Std. Error
## (Intercept)                                                      0.032924
## EmploymentEmployed part-time                                     0.044756
## EmploymentIndependent contractor, freelancer, or self-employed   0.030990
## CareerSatSlightly dissatisfied                                   0.039838
## CareerSatSlightly satisfied                                      0.032980
## CareerSatVery dissatisfied                                       0.053094
## CareerSatVery satisfied                                          0.035479
## JobSatSlightly dissatisfied                                      0.034275
## JobSatSlightly satisfied                                         0.030454
## JobSatVery dissatisfied                                          0.045441
## JobSatVery satisfied                                             0.033606
##                                                                z value Pr(>|z|)
## (Intercept)                                                      8.810  < 2e-16
## EmploymentEmployed part-time                                     1.228    0.219
## EmploymentIndependent contractor, freelancer, or self-employed   1.390    0.165
## CareerSatSlightly dissatisfied                                   2.212    0.027
## CareerSatSlightly satisfied                                      5.988 2.13e-09
## CareerSatVery dissatisfied                                       8.893  < 2e-16
## CareerSatVery satisfied                                         13.333  < 2e-16
## JobSatSlightly dissatisfied                                     -1.408    0.159
## JobSatSlightly satisfied                                         0.073    0.942
## JobSatVery dissatisfied                                         -4.160 3.18e-05
## JobSatVery satisfied                                            -0.604    0.546
##                                                                   
## (Intercept)                                                    ***
## EmploymentEmployed part-time                                      
## EmploymentIndependent contractor, freelancer, or self-employed    
## CareerSatSlightly dissatisfied                                 *  
## CareerSatSlightly satisfied                                    ***
## CareerSatVery dissatisfied                                     ***
## CareerSatVery satisfied                                        ***
## JobSatSlightly dissatisfied                                       
## JobSatSlightly satisfied                                          
## JobSatVery dissatisfied                                        ***
## JobSatVery satisfied                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75357  on 57479  degrees of freedom
## Residual deviance: 74919  on 57469  degrees of freedom
## AIC: 74941
## 
## Number of Fisher Scoring iterations: 4

As we see, I forgot to properly re-level factors, so we have to do it now

df2$CareerSat = factor(df2$CareerSat, levels = c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied"))

df2$JobSat = factor(df2$JobSat, levels = c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied"))

Now it should be much better, because the lowest satisfaction with the job and career is our base level

mlg2 <- glm(BetterLife ~ Employment + CareerSat + JobSat, data = df2, family = "binomial")
summary(mlg2)

## 
## Call:
## glm(formula = BetterLife ~ Employment + CareerSat + JobSat, family = "binomial", 
##     data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5396  -1.3812   0.8822   0.9777   1.1348  
## 
## Coefficients:
##                                                                  Estimate
## (Intercept)                                                     0.5731454
## EmploymentEmployed part-time                                    0.0549699
## EmploymentIndependent contractor, freelancer, or self-employed  0.0430733
## CareerSatSlightly dissatisfied                                 -0.3840118
## CareerSatNeither satisfied nor dissatisfied                    -0.4721371
## CareerSatSlightly satisfied                                    -0.2746599
## CareerSatVery satisfied                                         0.0009108
## JobSatSlightly dissatisfied                                     0.1407860
## JobSatNeither satisfied nor dissatisfied                        0.1890460
## JobSatSlightly satisfied                                        0.1912792
## JobSatVery satisfied                                            0.1687434
##                                                                Std. Error
## (Intercept)                                                     0.0442484
## EmploymentEmployed part-time                                    0.0447565
## EmploymentIndependent contractor, freelancer, or self-employed  0.0309898
## CareerSatSlightly dissatisfied                                  0.0505812
## CareerSatNeither satisfied nor dissatisfied                     0.0530936
## CareerSatSlightly satisfied                                     0.0482488
## CareerSatVery satisfied                                         0.0496199
## JobSatSlightly dissatisfied                                     0.0421903
## JobSatNeither satisfied nor dissatisfied                        0.0454415
## JobSatSlightly satisfied                                        0.0418779
## JobSatVery satisfied                                            0.0439376
##                                                                z value Pr(>|z|)
## (Intercept)                                                     12.953  < 2e-16
## EmploymentEmployed part-time                                     1.228 0.219372
## EmploymentIndependent contractor, freelancer, or self-employed   1.390 0.164553
## CareerSatSlightly dissatisfied                                  -7.592 3.15e-14
## CareerSatNeither satisfied nor dissatisfied                     -8.893  < 2e-16
## CareerSatSlightly satisfied                                     -5.693 1.25e-08
## CareerSatVery satisfied                                          0.018 0.985355
## JobSatSlightly dissatisfied                                      3.337 0.000847
## JobSatNeither satisfied nor dissatisfied                         4.160 3.18e-05
## JobSatSlightly satisfied                                         4.568 4.93e-06
## JobSatVery satisfied                                             3.841 0.000123
##                                                                   
## (Intercept)                                                    ***
## EmploymentEmployed part-time                                      
## EmploymentIndependent contractor, freelancer, or self-employed    
## CareerSatSlightly dissatisfied                                 ***
## CareerSatNeither satisfied nor dissatisfied                    ***
## CareerSatSlightly satisfied                                    ***
## CareerSatVery satisfied                                           
## JobSatSlightly dissatisfied                                    ***
## JobSatNeither satisfied nor dissatisfied                       ***
## JobSatSlightly satisfied                                       ***
## JobSatVery satisfied                                           ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75357  on 57479  degrees of freedom
## Residual deviance: 74919  on 57469  degrees of freedom
## AIC: 74941
## 
## Number of Fisher Scoring iterations: 4

Employment is not significant so lets change it for years of code

mlg3 <- glm(BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode, data = df2, family = "binomial")
summary(mlg3)

## 
## Call:
## glm(formula = BetterLife ~ CareerSat + JobSat + YearsCodePro + 
##     Age1stCode, family = "binomial", data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6340  -1.3586   0.8610   0.9621   1.2865  
## 
## Coefficients:
##                                              Estimate Std. Error z value
## (Intercept)                                  0.905860   0.063770  14.205
## CareerSatSlightly dissatisfied              -0.387145   0.050673  -7.640
## CareerSatNeither satisfied nor dissatisfied -0.485309   0.053202  -9.122
## CareerSatSlightly satisfied                 -0.279607   0.048328  -5.786
## CareerSatVery satisfied                     -0.009344   0.049699  -0.188
## JobSatSlightly dissatisfied                  0.137550   0.042275   3.254
## JobSatNeither satisfied nor dissatisfied     0.177702   0.045525   3.903
## JobSatSlightly satisfied                     0.189553   0.041956   4.518
## JobSatVery satisfied                         0.178589   0.044023   4.057
## YearsCodePro                                -0.019979   0.001351 -14.791
## Age1stCode                                  -0.005690   0.002375  -2.395
##                                             Pr(>|z|)    
## (Intercept)                                  < 2e-16 ***
## CareerSatSlightly dissatisfied              2.17e-14 ***
## CareerSatNeither satisfied nor dissatisfied  < 2e-16 ***
## CareerSatSlightly satisfied                 7.23e-09 ***
## CareerSatVery satisfied                      0.85087    
## JobSatSlightly dissatisfied                  0.00114 ** 
## JobSatNeither satisfied nor dissatisfied    9.49e-05 ***
## JobSatSlightly satisfied                    6.25e-06 ***
## JobSatVery satisfied                        4.98e-05 ***
## YearsCodePro                                 < 2e-16 ***
## Age1stCode                                   0.01661 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75357  on 57479  degrees of freedom
## Residual deviance: 74687  on 57469  degrees of freedom
## AIC: 74709
## 
## Number of Fisher Scoring iterations: 4

and add gender, just because we have a tradition to add genders in our model

mlg4 <- glm(BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode + Gender + Age, data = df2, family = "binomial")
summary(mlg4)

## 
## Call:
## glm(formula = BetterLife ~ CareerSat + JobSat + YearsCodePro + 
##     Age1stCode + Gender + Age, family = "binomial", data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0480  -1.3436   0.8413   0.9569   1.6490  
## 
## Coefficients:
##                                                                    Estimate
## (Intercept)                                                        1.483756
## CareerSatSlightly dissatisfied                                    -0.375586
## CareerSatNeither satisfied nor dissatisfied                       -0.478794
## CareerSatSlightly satisfied                                       -0.276521
## CareerSatVery satisfied                                           -0.012703
## JobSatSlightly dissatisfied                                        0.130877
## JobSatNeither satisfied nor dissatisfied                           0.159409
## JobSatSlightly satisfied                                           0.179867
## JobSatVery satisfied                                               0.180895
## YearsCodePro                                                       0.003721
## Age1stCode                                                         0.006921
## GenderMan;Non-binary, genderqueer, or gender non-conforming       -0.466250
## GenderNon-binary, genderqueer, or gender non-conforming           -0.683841
## GenderWoman                                                       -0.517619
## GenderWoman;Man                                                    0.958924
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming  0.436499
## GenderWoman;Non-binary, genderqueer, or gender non-conforming     -1.142684
## Age                                                               -0.032988
##                                                                   Std. Error
## (Intercept)                                                         0.072939
## CareerSatSlightly dissatisfied                                      0.050927
## CareerSatNeither satisfied nor dissatisfied                         0.053471
## CareerSatSlightly satisfied                                         0.048574
## CareerSatVery satisfied                                             0.049949
## JobSatSlightly dissatisfied                                         0.042476
## JobSatNeither satisfied nor dissatisfied                            0.045745
## JobSatSlightly satisfied                                            0.042151
## JobSatVery satisfied                                                0.044232
## YearsCodePro                                                        0.002131
## Age1stCode                                                          0.002493
## GenderMan;Non-binary, genderqueer, or gender non-conforming         0.191272
## GenderNon-binary, genderqueer, or gender non-conforming             0.114235
## GenderWoman                                                         0.033754
## GenderWoman;Man                                                     0.413599
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming   0.474418
## GenderWoman;Non-binary, genderqueer, or gender non-conforming       0.206510
## Age                                                                 0.002148
##                                                                   z value
## (Intercept)                                                        20.342
## CareerSatSlightly dissatisfied                                     -7.375
## CareerSatNeither satisfied nor dissatisfied                        -8.954
## CareerSatSlightly satisfied                                        -5.693
## CareerSatVery satisfied                                            -0.254
## JobSatSlightly dissatisfied                                         3.081
## JobSatNeither satisfied nor dissatisfied                            3.485
## JobSatSlightly satisfied                                            4.267
## JobSatVery satisfied                                                4.090
## YearsCodePro                                                        1.746
## Age1stCode                                                          2.776
## GenderMan;Non-binary, genderqueer, or gender non-conforming        -2.438
## GenderNon-binary, genderqueer, or gender non-conforming            -5.986
## GenderWoman                                                       -15.335
## GenderWoman;Man                                                     2.318
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming   0.920
## GenderWoman;Non-binary, genderqueer, or gender non-conforming      -5.533
## Age                                                               -15.360
##                                                                   Pr(>|z|)    
## (Intercept)                                                        < 2e-16 ***
## CareerSatSlightly dissatisfied                                    1.64e-13 ***
## CareerSatNeither satisfied nor dissatisfied                        < 2e-16 ***
## CareerSatSlightly satisfied                                       1.25e-08 ***
## CareerSatVery satisfied                                           0.799249    
## JobSatSlightly dissatisfied                                       0.002062 ** 
## JobSatNeither satisfied nor dissatisfied                          0.000493 ***
## JobSatSlightly satisfied                                          1.98e-05 ***
## JobSatVery satisfied                                              4.32e-05 ***
## YearsCodePro                                                      0.080854 .  
## Age1stCode                                                        0.005503 ** 
## GenderMan;Non-binary, genderqueer, or gender non-conforming       0.014784 *  
## GenderNon-binary, genderqueer, or gender non-conforming           2.15e-09 ***
## GenderWoman                                                        < 2e-16 ***
## GenderWoman;Man                                                   0.020423 *  
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.357535    
## GenderWoman;Non-binary, genderqueer, or gender non-conforming     3.14e-08 ***
## Age                                                                < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75357  on 57479  degrees of freedom
## Residual deviance: 74136  on 57462  degrees of freedom
## AIC: 74172
## 
## Number of Fisher Scoring iterations: 4

Check if gender improved our model

anova(mlg3, mlg4, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode
## Model 2: BetterLife ~ CareerSat + JobSat + YearsCodePro + Age1stCode + 
##     Gender + Age
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1     57469      74687                          
## 2     57462      74136  7   550.92 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

yes, It did, same as age

anova(mlg4, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: BetterLife
## 
## Terms added sequentially (first to last)
## 
## 
##              Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                         57479      75357              
## CareerSat     4   411.69     57475      74945 < 2.2e-16 ***
## JobSat        4    22.56     57471      74922 0.0001549 ***
## YearsCodePro  1   229.74     57470      74693 < 2.2e-16 ***
## Age1stCode    1     5.74     57469      74687 0.0166171 *  
## Gender        6   316.03     57463      74371 < 2.2e-16 ***
## Age           1   234.88     57462      74136 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

every predictor is significant. nice

Let`s write our model equation

cc = mlg4$coefficients
(eqn <- paste("Y =", paste(round(cc[1],2), paste(round(cc[-1],2), names(cc[-1]), sep=" * ", collapse=" + "), sep=" + "), "+ e"))

## [1] "Y = 1.48 + -0.38 * CareerSatSlightly dissatisfied + -0.48 * CareerSatNeither satisfied nor dissatisfied + -0.28 * CareerSatSlightly satisfied + -0.01 * CareerSatVery satisfied + 0.13 * JobSatSlightly dissatisfied + 0.16 * JobSatNeither satisfied nor dissatisfied + 0.18 * JobSatSlightly satisfied + 0.18 * JobSatVery satisfied + 0 * YearsCodePro + 0.01 * Age1stCode + -0.47 * GenderMan;Non-binary, genderqueer, or gender non-conforming + -0.68 * GenderNon-binary, genderqueer, or gender non-conforming + -0.52 * GenderWoman + 0.96 * GenderWoman;Man + 0.44 * GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming + -1.14 * GenderWoman;Non-binary, genderqueer, or gender non-conforming + -0.03 * Age + e"

exp(cbind(OR = coef(mlg4), confint(mlg4)))

##                                                                          OR
## (Intercept)                                                       4.4094747
## CareerSatSlightly dissatisfied                                    0.6868864
## CareerSatNeither satisfied nor dissatisfied                       0.6195304
## CareerSatSlightly satisfied                                       0.7584177
## CareerSatVery satisfied                                           0.9873773
## JobSatSlightly dissatisfied                                       1.1398276
## JobSatNeither satisfied nor dissatisfied                          1.1728174
## JobSatSlightly satisfied                                          1.1970578
## JobSatVery satisfied                                              1.1982894
## YearsCodePro                                                      1.0037279
## Age1stCode                                                        1.0069451
## GenderMan;Non-binary, genderqueer, or gender non-conforming       0.6273502
## GenderNon-binary, genderqueer, or gender non-conforming           0.5046746
## GenderWoman                                                       0.5959381
## GenderWoman;Man                                                   2.6088865
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 1.5472808
## GenderWoman;Non-binary, genderqueer, or gender non-conforming     0.3189619
## Age                                                               0.9675503
##                                                                       2.5 %
## (Intercept)                                                       3.8227457
## CareerSatSlightly dissatisfied                                    0.6214952
## CareerSatNeither satisfied nor dissatisfied                       0.5577673
## CareerSatSlightly satisfied                                       0.6893486
## CareerSatVery satisfied                                           0.8950569
## JobSatSlightly dissatisfied                                       1.0487313
## JobSatNeither satisfied nor dissatisfied                          1.0722096
## JobSatSlightly satisfied                                          1.1020595
## JobSatVery satisfied                                              1.0987073
## YearsCodePro                                                      0.9995423
## Age1stCode                                                        1.0020361
## GenderMan;Non-binary, genderqueer, or gender non-conforming       0.4314732
## GenderNon-binary, genderqueer, or gender non-conforming           0.4032243
## GenderWoman                                                       0.5578043
## GenderWoman;Man                                                   1.2346419
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 0.6445233
## GenderWoman;Non-binary, genderqueer, or gender non-conforming     0.2110720
## Age                                                               0.9634859
##                                                                      97.5 %
## (Intercept)                                                       5.0880546
## CareerSatSlightly dissatisfied                                    0.7588282
## CareerSatNeither satisfied nor dissatisfied                       0.6878406
## CareerSatSlightly satisfied                                       0.8339485
## CareerSatVery satisfied                                           1.0886574
## JobSatSlightly dissatisfied                                       1.2387380
## JobSatNeither satisfied nor dissatisfied                          1.2828051
## JobSatSlightly satisfied                                          1.3000733
## JobSatVery satisfied                                              1.3067331
## YearsCodePro                                                      1.0079289
## Age1stCode                                                        1.0118773
## GenderMan;Non-binary, genderqueer, or gender non-conforming       0.9151325
## GenderNon-binary, genderqueer, or gender non-conforming           0.6312645
## GenderWoman                                                       0.6367214
## GenderWoman;Man                                                   6.4064091
## GenderWoman;Man;Non-binary, genderqueer, or gender non-conforming 4.2885341
## GenderWoman;Non-binary, genderqueer, or gender non-conforming     0.4755201
## Age                                                               0.9716319

so our baseline is a Very dissatisfied in career and job male with mean age and experience

summary(df2$Age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    12.0    25.0    29.0    30.1    34.0    50.0

summary(df2$Age1stCode)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   12.00   15.00   15.09   18.00   27.00

summary(df2$YearsCodePro)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     6.0    10.0    11.6    15.0    33.0

4.4094747+(1.0069451*15)+(0.9675503*29)+(1.0037279*10)

## [1] 57.60989

that have almost mean chance of being optimistic (i.e. 57%)

while woman that is satisfied with the job and career with twice as much of coding experience and at the age 49 have

4.4094747+(1.0069451*15)+(0.9675503*49)+(1.0037279*20) + (0.9873773) + (1.1982894)+(0.5959381)

## [1] 89.77978

almost 90% chance to be optimistic about the future

Lets draw some picturies, to understand effect of satisfaction with the career better

newdata1 <- with(df2, data.frame(Age = mean(Age), Age1stCode = mean(Age1stCode), YearsCodePro = mean(YearsCodePro), Gender = "Man", JobSat = "Very dissatisfied", CareerSat = factor( c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied") )))

newdata1$rankP <- predict(mlg4, newdata = newdata1, type = "response")
newdata1

##       Age Age1stCode YearsCodePro Gender            JobSat
## 1 30.1006    15.0901      11.6039    Man Very dissatisfied
## 2 30.1006    15.0901      11.6039    Man Very dissatisfied
## 3 30.1006    15.0901      11.6039    Man Very dissatisfied
## 4 30.1006    15.0901      11.6039    Man Very dissatisfied
## 5 30.1006    15.0901      11.6039    Man Very dissatisfied
##                            CareerSat     rankP
## 1                  Very dissatisfied 0.6543962
## 2              Slightly dissatisfied 0.5653328
## 3 Neither satisfied nor dissatisfied 0.5398221
## 4                 Slightly satisfied 0.5895001
## 5                     Very satisfied 0.6515177

Neither satisfied nor dissatisfied with the career less good for the optimism, while being very dissatisfied or satisfied equally increases chance to be optimistic. (on 10%)

newdata2 = with(df2, data.frame(Age = mean(Age), Age1stCode = mean(Age1stCode), YearsCodePro =rep(seq(from = 0, to = 51, length.out = 100)), Gender = "Man", JobSat = "Very dissatisfied", CareerSat = factor(rep( c("Very dissatisfied", "Slightly dissatisfied", "Neither satisfied nor dissatisfied", "Slightly satisfied", "Very satisfied"), each = 100))))


newdata3 <- cbind(newdata2, predict(mlg4, newdata = newdata2, type = "link",
    se = TRUE))

newdata3 <- within(newdata3, {
    PredictedProb <- plogis(fit)
    LL <- plogis(fit - (1.96 * se.fit))
    UL <- plogis(fit + (1.96 * se.fit))
})

library(ggplot2)

ggplot(newdata3, aes(x = YearsCodePro, y = PredictedProb)) + 
  geom_ribbon(aes(ymin = LL,
    ymax = UL, fill = CareerSat), alpha = 0.2) + 
  geom_line(aes(colour = CareerSat),
    size = 1)

So at the begging of the carrer satisfaction with the career playes crusial role in probability of being optimistic.

While in the late career there is almost no difference in satisfaction, between all levels, exept for “Neither satisfied nor dissatisfied”, hovewer that could be explained as problem of my data, because there are not enough observations to make a valid prediction.

library(pscl)
pR2(mlg4)

##           llh       llhNull            G2      McFadden          r2ML 
## -3.706799e+04 -3.767831e+04  1.220640e+03  1.619818e-02  2.101201e-02 
##          r2CU 
##  2.876580e-02

however, our model explain only 2% of our data

library(generalhoslem)
logitgof(df2$BetterLife, fitted(mlg4))

## 
##  Hosmer and Lemeshow test (binary model)
## 
## data:  df2$BetterLife, fitted(mlg4)
## X-squared = 9.2123, df = 8, p-value = 0.3247

and we could for sure say, that parameters in this model were chosen poorly. Good student (and researcher) will start the whole hw from the beginning, so I will move on.

0.5 diagnostic

plot(mlg4)

residuals are not normal, there are still some outliers, that needs to be removed, and in Scale- Location observation cross the line. that a sign of a poor model fit.

library(broom)
model.data <- augment(mlg4) %>% 
  mutate(index = 1:n())

ggplot(model.data, aes(index, .std.resid)) + 
  geom_point(aes(color = BetterLife), alpha = .5) +
  theme_bw()

residuals distributed pretty well, and we do not have any outliers here

model.data %>% 
  filter(abs(.std.resid) > 3)

## # A tibble: 0 x 15
## # … with 15 variables: BetterLife <fct>, CareerSat <fct>, JobSat <fct>,
## #   YearsCodePro <dbl>, Age1stCode <dbl>, Gender <chr>, Age <dbl>,
## #   .fitted <dbl>, .se.fit <dbl>, .resid <dbl>, .hat <dbl>, .sigma <dbl>,
## #   .cooksd <dbl>, .std.resid <dbl>, index <int>

yes, nothing to remove

Multicollinearity

car::vif(mlg4)

##                  GVIF Df GVIF^(1/(2*Df))
## CareerSat    2.321464  4        1.111016
## JobSat       2.326986  4        1.111346
## YearsCodePro 3.018257  1        1.737313
## Age1stCode   1.342531  1        1.158676
## Gender       1.018154  6        1.001500
## Age          2.523435  1        1.588532

nothing is over 5, so we are fine at least here.

0.6 Conclusion

Taking everything into consideration, proper Logistic regression supposed to start with a good descriptive statistics, where this research fails. On the other side, hypothesis were confirmed (RQ2- RQ4), while RQ1 needs more examination since I have not used to this classification.

However, overall poor model fit might be explained with the culture of programming, where common models of analisys could not be applied. So futher research should pay more attention to the specific case of https://stackoverflow.com/

0.7 Resources

[1] Inc, G. (2018, April 3). Americans More Optimistic About Future of Next Generation. Gallup.Com. https://news.gallup.com/poll/232076/americans-optimistic-future-next-generation.aspx

[2] Xanthopoulou, D., Bakker, A. B., Demerouti, E., & Schaufeli, W. B. (2007). The role of personal resources in the job demands-resources model. International Journal of Stress Management, 14(2), 121–141. https://doi.org/10.1037/1072-5245.14.2.121

[3] Burke, R. J. (1991). Early Work and Career Experiences of Female and Male Managers and Professionals: Reasons for Optimism? Canadian Journal of Administrative Sciences / Revue Canadienne Des Sciences de l’Administration, 8(4), 224–230. https://doi.org/10.1111/j.1936-4490.1991.tb00565.x

[4] Cheng, G. H.-L., & Chan, D. K.-S. (2008). Who Suffers More from Job Insecurity? A Meta-Analytic Review. Applied Psychology, 57(2), 272–303. https://doi.org/10.1111/j.1464-0597.2007.00312.x

[5] APA Handbook of Industrial and Organizational Psychology. (n.d.). Https://Www.Apa.Org. Retrieved February 14, 2020, from https://www.apa.org/pubs/books/4311502

hw1

Suschevskiy Vsevolod

2020-02-15