The preferred method of analysis for this dataset is Confirmatory Factor Analysis. The goal is to determine the factors that affect the employbility of newly graduated students making use of observed variables such as GPA, Study Years and Major. the success of the k - Factor model will rest on the choice of latent or unobservable variables that are correlated in some way with the observable variables. Let’s have a look into the dataset
library("readxl")
student_data <- read_excel("/home/asma/Desktop/Customers/Waleed_SAUDI/R/Data_new.xlsx")
## New names:
## * `` -> ...67
head(student_data)
## # A tibble: 6 x 67
## Gender Age Nationality Majors `Study program` `year of enroll…
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2 25 1 Gener… 1 2013
## 2 2 23 1 Gener… 1 2015
## 3 2 24 1 Gener… 1 2015
## 4 2 23 1 Gener… 1 2016
## 5 2 25 1 Gener… 1 2016
## 6 2 24 1 Gener… 1 2015
## # … with 61 more variables: `graduation year` <dbl>, `study years` <dbl>,
## # `graduated before` <dbl>, GPA <chr>, Num_courses <chr>,
## # sector_work_class <dbl>, `sector work` <chr>, `sector name` <chr>,
## # YearJop <dbl>, `current job related to your specialty` <chr>, salary <chr>,
## # `Career Day` <chr>, `Career Day help you land job` <chr>, PA1 <dbl>,
## # PA2 <dbl>, PA3 <dbl>, PA4 <dbl>, PA5 <dbl>, PA6 <dbl>, PA7 <dbl>,
## # PA8 <dbl>, PA9 <dbl>, PA10 <dbl>, PA11 <dbl>, PA12 <dbl>, PA13 <dbl>,
## # PA14 <dbl>, SA1 <dbl>, SA2 <dbl>, SA3 <dbl>, SA4 <dbl>, SA5 <dbl>,
## # SA6 <dbl>, SA7 <dbl>, SA8 <dbl>, SA9 <dbl>, SA10 <dbl>, SA11 <dbl>,
## # SA12 <dbl>, SA13 <dbl>, IS1 <dbl>, IS2 <dbl>, IS3 <dbl>, IS4 <dbl>,
## # IS5 <dbl>, IS6 <dbl>, IS7 <dbl>, IS8 <dbl>, IS9 <dbl>, IS10 <dbl>,
## # IS11 <dbl>, IS12 <dbl>, IS13 <dbl>, IS14 <dbl>, GA1 <dbl>, GA2 <dbl>,
## # GA3 <dbl>, GA4 <dbl>, `start studying again KAU` <chr>, `recommend
## # KAU` <chr>, ...67 <lgl>
Let’s see the summary of the Data
summary(student_data)
## Gender Age Nationality Majors Study program
## Min. :2 Min. : 0.00 Min. :1.000 Length:336 Min. :1.000
## 1st Qu.:2 1st Qu.:23.00 1st Qu.:1.000 Class :character 1st Qu.:1.000
## Median :2 Median :24.00 Median :1.000 Mode :character Median :1.000
## Mean :2 Mean :24.01 Mean :1.057 Mean :1.016
## 3rd Qu.:2 3rd Qu.:25.00 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :2 Max. :34.00 Max. :2.000 Max. :2.000
## NA's :1 NA's :1 NA's :1 NA's :16
## year of enrollment graduation year study years graduated before
## Min. :2008 Min. :2013 Min. : 3.000 Min. :1.000
## 1st Qu.:2014 1st Qu.:2019 1st Qu.: 4.000 1st Qu.:1.000
## Median :2016 Median :2019 Median : 4.000 Median :2.000
## Mean :2015 Mean :2019 Mean : 4.391 Mean :2.231
## 3rd Qu.:2016 3rd Qu.:2020 3rd Qu.: 5.000 3rd Qu.:3.000
## Max. :2018 Max. :2021 Max. :10.000 Max. :4.000
## NA's :16 NA's :16 NA's :16 NA's :16
## GPA Num_courses sector_work_class sector work
## Length:336 Length:336 Min. :0.0000 Length:336
## Class :character Class :character 1st Qu.:0.0000 Class :character
## Mode :character Mode :character Median :0.0000 Mode :character
## Mean :0.1688
## 3rd Qu.:0.0000
## Max. :1.0000
## NA's :16
## sector name YearJop current job related to your specialty
## Length:336 Min. :2014 Length:336
## Class :character 1st Qu.:2018 Class :character
## Mode :character Median :2019 Mode :character
## Mean :2019
## 3rd Qu.:2020
## Max. :2020
## NA's :293
## salary Career Day Career Day help you land job
## Length:336 Length:336 Length:336
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## PA1 PA2 PA3 PA4 PA5
## Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.75 1st Qu.:3.750 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.00 Median :4.000 Median :4.000
## Mean :3.938 Mean :4.006 Mean :4.05 Mean :4.013 Mean :3.812
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :6.000 Max. :6.000 Max. :6.00 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16 NA's :16
## PA6 PA7 PA8 PA9 PA10
## Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.00 Median :4.000 Median :4.000
## Mean :4.019 Mean :3.916 Mean :3.95 Mean :3.816 Mean :3.678
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :6.000 Max. :6.000 Max. :6.00 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16 NA's :16
## PA11 PA12 PA13 PA14
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000
## Median :4.000 Median :4.000 Median :4.000 Median :3.000
## Mean :3.803 Mean :3.625 Mean :3.834 Mean :3.222
## 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16
## SA1 SA2 SA3 SA4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :3.966 Mean :3.906 Mean :3.816 Mean :3.837
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :6.000 Max. :5.000 Max. :5.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16
## SA5 SA6 SA7 SA8
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :3.784 Mean :3.944 Mean :3.853 Mean :3.578
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :6.000 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16
## SA9 SA10 SA11 SA12
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :3.631 Mean :3.712 Mean :3.766 Mean :3.734
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:5.000
## Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16
## SA13 IS1 IS2 IS3
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :3.612 Mean :3.491 Mean :3.734 Mean :3.597
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16
## IS4 IS5 IS6 IS7 IS8
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:2.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.00 Median :4.000
## Mean :3.669 Mean :3.869 Mean :3.941 Mean :3.75 Mean :3.469
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:4.00 3rd Qu.:4.000
## Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.00 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16 NA's :16
## IS9 IS10 IS11 IS12
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000
## Median :4.000 Median :4.000 Median :4.000 Median :3.000
## Mean :4.237 Mean :3.775 Mean :3.734 Mean :2.925
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :6.000 Max. :6.000 Max. :6.000 Max. :6.000
## NA's :16 NA's :16 NA's :16 NA's :16
## IS13 IS14 GA1 GA2 GA3
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4 Median :4.000
## Mean :3.741 Mean :3.969 Mean :3.913 Mean :4 Mean :4.013
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5 3rd Qu.:5.000
## Max. :6.000 Max. :6.000 Max. :5.000 Max. :5 Max. :5.000
## NA's :16 NA's :16 NA's :16 NA's :16 NA's :16
## GA4 start studying again KAU recommend KAU ...67
## Min. :1.000 Length:336 Length:336 Mode:logical
## 1st Qu.:3.000 Class :character Class :character NA's:336
## Median :4.000 Mode :character Mode :character
## Mean :3.769
## 3rd Qu.:5.000
## Max. :5.000
## NA's :16
The concept of degrees of freedom is essential in CFA. To begin, first count the number of known values in your observed population variance-covariance matrix , given by the formula where is the number of items in your survey.
library(foreign)
library(lavaan)
## This is lavaan 0.6-7
## lavaan is BETA software! Please report any bugs.
#cov(student_data[,20:40])
Identification for the one factor CFA with all items is necessary due to the fact that we have total parameters from the model-implied covariance matrix but only six known values from the observed population covariance matrix to work with. The total parameters include three factor loadings, three residual variances and one factor variance. The extra parameter comes from the fact that we do not observe the factor but are estimating its variance. In order to identify a factor in a CFA model with three or more items, there are two options known respectively as the marker method and the variance standardization method.
m1a <- ' f =~ PA1 + PA2 + PA3 + PA4 + PA5 + PA6 + PA7 + PA8 + PA9 + PA10 + PA11 + PA12 + PA13 + PA14 + SA1 + SA2 + SA3 + SA4 + SA5 + SA6 + SA7 + SA8 + SA9 + SA10 + SA11 + SA12 + SA13 +IS1 + IS2 + IS3 + IS4 '
onefacall_itemsa <- cfa(m1a, data=student_data)
summary(onefacall_itemsa)
## lavaan 0.6-7 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 62
##
## Used Total
## Number of observations 320 336
##
## Model Test User Model:
##
## Test statistic 2450.550
## Degrees of freedom 434
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## f =~
## PA1 1.000
## PA2 1.033 0.096 10.748 0.000
## PA3 0.961 0.095 10.096 0.000
## PA4 0.956 0.096 9.958 0.000
## PA5 1.131 0.108 10.441 0.000
## PA6 1.000 0.102 9.791 0.000
## PA7 0.998 0.100 9.966 0.000
## PA8 1.176 0.116 10.157 0.000
## PA9 1.127 0.104 10.856 0.000
## PA10 1.210 0.115 10.476 0.000
## PA11 1.074 0.106 10.109 0.000
## PA12 1.203 0.117 10.256 0.000
## PA13 1.056 0.111 9.491 0.000
## PA14 1.018 0.139 7.352 0.000
## SA1 1.192 0.102 11.694 0.000
## SA2 1.180 0.101 11.652 0.000
## SA3 1.219 0.104 11.712 0.000
## SA4 1.036 0.096 10.779 0.000
## SA5 1.070 0.099 10.860 0.000
## SA6 0.969 0.094 10.265 0.000
## SA7 1.030 0.093 11.039 0.000
## SA8 1.148 0.106 10.817 0.000
## SA9 1.031 0.099 10.412 0.000
## SA10 0.949 0.097 9.791 0.000
## SA11 1.041 0.101 10.316 0.000
## SA12 1.172 0.108 10.874 0.000
## SA13 1.199 0.104 11.503 0.000
## IS1 0.935 0.114 8.181 0.000
## IS2 0.754 0.099 7.613 0.000
## IS3 0.710 0.108 6.561 0.000
## IS4 0.873 0.109 8.011 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .PA1 0.565 0.046 12.250 0.000
## .PA2 0.505 0.042 12.173 0.000
## .PA3 0.566 0.046 12.281 0.000
## .PA4 0.590 0.048 12.300 0.000
## .PA5 0.686 0.056 12.228 0.000
## .PA6 0.688 0.056 12.321 0.000
## .PA7 0.642 0.052 12.299 0.000
## .PA8 0.828 0.067 12.272 0.000
## .PA9 0.576 0.047 12.151 0.000
## .PA10 0.774 0.063 12.223 0.000
## .PA11 0.704 0.057 12.279 0.000
## .PA12 0.834 0.068 12.258 0.000
## .PA13 0.856 0.069 12.356 0.000
## .PA14 1.721 0.138 12.513 0.000
## .SA1 0.443 0.037 11.926 0.000
## .SA2 0.443 0.037 11.940 0.000
## .SA3 0.460 0.039 11.919 0.000
## .SA4 0.502 0.041 12.167 0.000
## .SA5 0.518 0.043 12.151 0.000
## .SA6 0.540 0.044 12.256 0.000
## .SA7 0.445 0.037 12.112 0.000
## .SA8 0.607 0.050 12.159 0.000
## .SA9 0.577 0.047 12.233 0.000
## .SA10 0.619 0.050 12.321 0.000
## .SA11 0.610 0.050 12.248 0.000
## .SA12 0.618 0.051 12.148 0.000
## .SA13 0.491 0.041 11.988 0.000
## .IS1 1.075 0.086 12.466 0.000
## .IS2 0.859 0.069 12.500 0.000
## .IS3 1.124 0.090 12.548 0.000
## .IS4 0.997 0.080 12.477 0.000
## f 0.393 0.062 6.316 0.000
summary(onefacall_itemsa, fit.measures=TRUE, standardized=TRUE)
## lavaan 0.6-7 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 62
##
## Used Total
## Number of observations 320 336
##
## Model Test User Model:
##
## Test statistic 2450.550
## Degrees of freedom 434
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 6645.397
## Degrees of freedom 465
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.674
## Tucker-Lewis Index (TLI) 0.650
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -12490.472
## Loglikelihood unrestricted model (H1) -11265.197
##
## Akaike (AIC) 25104.945
## Bayesian (BIC) 25338.581
## Sample-size adjusted Bayesian (BIC) 25141.928
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.120
## 90 Percent confidence interval - lower 0.116
## 90 Percent confidence interval - upper 0.125
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.097
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## f =~
## PA1 1.000 0.627 0.641
## PA2 1.033 0.096 10.748 0.000 0.648 0.674
## PA3 0.961 0.095 10.096 0.000 0.603 0.625
## PA4 0.956 0.096 9.958 0.000 0.600 0.615
## PA5 1.131 0.108 10.441 0.000 0.710 0.651
## PA6 1.000 0.102 9.791 0.000 0.627 0.603
## PA7 0.998 0.100 9.966 0.000 0.626 0.616
## PA8 1.176 0.116 10.157 0.000 0.738 0.630
## PA9 1.127 0.104 10.856 0.000 0.707 0.682
## PA10 1.210 0.115 10.476 0.000 0.759 0.653
## PA11 1.074 0.106 10.109 0.000 0.674 0.626
## PA12 1.203 0.117 10.256 0.000 0.754 0.637
## PA13 1.056 0.111 9.491 0.000 0.662 0.582
## PA14 1.018 0.139 7.352 0.000 0.639 0.438
## SA1 1.192 0.102 11.694 0.000 0.748 0.747
## SA2 1.180 0.101 11.652 0.000 0.740 0.743
## SA3 1.219 0.104 11.712 0.000 0.765 0.748
## SA4 1.036 0.096 10.779 0.000 0.649 0.676
## SA5 1.070 0.099 10.860 0.000 0.671 0.682
## SA6 0.969 0.094 10.265 0.000 0.608 0.638
## SA7 1.030 0.093 11.039 0.000 0.646 0.696
## SA8 1.148 0.106 10.817 0.000 0.720 0.679
## SA9 1.031 0.099 10.412 0.000 0.647 0.648
## SA10 0.949 0.097 9.791 0.000 0.595 0.603
## SA11 1.041 0.101 10.316 0.000 0.653 0.641
## SA12 1.172 0.108 10.874 0.000 0.735 0.683
## SA13 1.199 0.104 11.503 0.000 0.752 0.732
## IS1 0.935 0.114 8.181 0.000 0.586 0.492
## IS2 0.754 0.099 7.613 0.000 0.473 0.455
## IS3 0.710 0.108 6.561 0.000 0.445 0.387
## IS4 0.873 0.109 8.011 0.000 0.547 0.481
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PA1 0.565 0.046 12.250 0.000 0.565 0.590
## .PA2 0.505 0.042 12.173 0.000 0.505 0.546
## .PA3 0.566 0.046 12.281 0.000 0.566 0.609
## .PA4 0.590 0.048 12.300 0.000 0.590 0.621
## .PA5 0.686 0.056 12.228 0.000 0.686 0.577
## .PA6 0.688 0.056 12.321 0.000 0.688 0.636
## .PA7 0.642 0.052 12.299 0.000 0.642 0.621
## .PA8 0.828 0.067 12.272 0.000 0.828 0.604
## .PA9 0.576 0.047 12.151 0.000 0.576 0.535
## .PA10 0.774 0.063 12.223 0.000 0.774 0.573
## .PA11 0.704 0.057 12.279 0.000 0.704 0.608
## .PA12 0.834 0.068 12.258 0.000 0.834 0.594
## .PA13 0.856 0.069 12.356 0.000 0.856 0.661
## .PA14 1.721 0.138 12.513 0.000 1.721 0.808
## .SA1 0.443 0.037 11.926 0.000 0.443 0.442
## .SA2 0.443 0.037 11.940 0.000 0.443 0.447
## .SA3 0.460 0.039 11.919 0.000 0.460 0.440
## .SA4 0.502 0.041 12.167 0.000 0.502 0.543
## .SA5 0.518 0.043 12.151 0.000 0.518 0.535
## .SA6 0.540 0.044 12.256 0.000 0.540 0.594
## .SA7 0.445 0.037 12.112 0.000 0.445 0.516
## .SA8 0.607 0.050 12.159 0.000 0.607 0.539
## .SA9 0.577 0.047 12.233 0.000 0.577 0.580
## .SA10 0.619 0.050 12.321 0.000 0.619 0.636
## .SA11 0.610 0.050 12.248 0.000 0.610 0.589
## .SA12 0.618 0.051 12.148 0.000 0.618 0.534
## .SA13 0.491 0.041 11.988 0.000 0.491 0.465
## .IS1 1.075 0.086 12.466 0.000 1.075 0.758
## .IS2 0.859 0.069 12.500 0.000 0.859 0.793
## .IS3 1.124 0.090 12.548 0.000 1.124 0.850
## .IS4 0.997 0.080 12.477 0.000 0.997 0.769
## f 0.393 0.062 6.316 0.000 1.000 1.000
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.