Introduction

I am interested in examining the school factors that encourage African American/Black high school boys to choose to major in Science, Technology, Engineering, and Mathematics (STEM) major in post secondary education in other to enter into the STEM profession (Professions like Engineer, Scientist, Computer Scientist, Information Technology, Medicine, etc). It is known that one’s professional career defines how successful they are and one needs to have knowledge and interest in that field to decide to pursue it. In a study by Kurban and Cabrera, they investigated “the role of cognitive and contextual factors affecting the process of high school students’ readiness for and consideration of an academic major in a STEM field” (Kurban & Cabrera, 2020). In their future research section of the article, they suggested to investigate the “extent to which school climate, school culture, and teacher expectations impact the STEM readiness and intention developmental process” (Kurban & Cabrera, 2020). The authors conducted Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) to discover five distinct latent factors (SES, Parental Involvment, Math Self-Efficacy, Science Self-Efficacy, and STEM Readiness) and four single indicators (Math Ability, Math Interest, Science Interest, and Intention to Major in STEM)

Problem Statement

I propose to investigate the extent to which school opportunities for students impact the STEM readiness and intention development process for Black boys using the High School Longitudinal Study of 2009 (HSLS:09).

Data and Variables

The High School Longitudinal Study of 2009 (HSLS:09) was a study conducted by the National Center for Education Statistics (NCES) as part of a series of secondary longitudinal studies. “The core research question for HSLS:09 explore secondary to postsecondary transition plans and the evolution of those plans; the paths into and out of science, technology, engineering, and mathematics; and the educational and social experiences that affect these shifts” (Ingels et al., 2011, p. iii). The longitudinal study takes the student as the fundamental unit of analysis. The base year occured in the fall term of 2009-10 School year, with a randomly selected sample of 9th graders in more than 900 public and private high schools (Ingels et al., 2011). There was a follow up study in spring of 2012 when the participants will be in the spring of the 11th grade.

Some questions on the student questionnaire allows to investigate the school environment through student, teacher, and administrator reports. This includes the extent to which schools are expected to provide special services to selected groups of students to compensate for limitations and poor performance. For example, including special services to assist those lagging in their understanding of mathematics and science is of interest in this study. (Ingels et al., 2011)

A subset of the publicly available data set is used, specifically using variables related to questions answered by school counselors and administrators.

knitr::opts_chunk$set(warning = FALSE)
setwd("~/UB/CEP525/FinalExam")
df <- read.csv("CEP525FinalDataSet.csv")
library(DT)
## Warning: package 'DT' was built under R version 4.3.3
library(vtable)
## Warning: package 'vtable' was built under R version 4.3.3
## Loading required package: kableExtra
## Warning: package 'kableExtra' was built under R version 4.3.3
df <- df[, !(names(df) %in% c("X.1", "X"))]
st(df)
Summary Statistics
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
x1sex 23497
… Female 11524 49%
… Male 11973 51%
x1black 22938 0.16 0.37 0 0 0 1
x1white 22938 0.74 0.44 0 0 1 1
x1hispanic 22497 0.17 0.37 0 0 0 1
x1locale 23503
… City 6689 28%
… Rural 5559 24%
… Suburb 8467 36%
… Town 2788 12%
x1region 23503
… Midwest 6224 26%
… Northeast 3662 16%
… South 9587 41%
… West 4030 17%
c1cluster 23503 1.5 0.5 1 1 2 2
c1indvcrs 23503 2 0.22 1 2 2 2
c1intern 23503 1.4 0.49 1 1 2 2
c1jobfair 23503 1.3 0.45 1 1 2 2
c1jobguide 23503 1.3 0.46 1 1 2 2
c1employer 23503 1.6 0.49 1 1 2 2
c1awareness 23503 1.7 0.47 1 1 2 2
c1careerunit 23503 1.4 0.49 1 1 2 2
c1workstudy 23503 1.6 0.5 1 1 2 2
c1careerday 23503 1.4 0.49 1 1 2 2
c1assemblies 23503 1.4 0.49 1 1 2 2
c1jobinfocmp 23503 1.8 0.4 1 2 2 2
c1jobinfonon 23503 1.4 0.48 1 1 2 2
c1hstowrkoth 23503 1.2 0.43 1 1 1 2
c1hstoworkno 23503 1 0.15 1 1 1 2
a1mspdlearn 23503 1.6 0.48 1 1 2 2
a1mspdintrst 23503 1.4 0.48 1 1 2 2
a1msother 23503 1.2 0.41 1 1 1 2
a1msnone 23503 1 0.14 1 1 1 2
a1g9summer 23503 1.4 0.48 1 1 2 2
a1g9overage 23503 1.2 0.36 1 1 1 2
a1g9communty 23503 1.2 0.43 1 1 1 2
a1g9blocksch 23503 1.9 0.33 1 2 2 2
a1g9double 23503 1.6 0.49 1 1 2 2
a1g9study 23503 1.4 0.48 1 1 2 2
a1g9teacher 23503 1.4 0.48 1 1 2 2

One section of the counselor questionnaire focused on programs and services offered to students. Some questions pertained to enrichment courses, assistance for struggling students, dropout prevention programs, encouragement of the pursuit of mathematics and science education and employment, and assistance with the transition from high school to college or the workforce. “Other topics included the use of mathematics competency tests and options for failing students”(Ingels et al., 2011, p. 20)

My definition for “Opportunites for students” would be investigated with the following variables:

Counselor variables

Variable Name Varialbe Lable
c1cluster C1 B30 Career Clusters/Pathways/Programs of Study (POS) offered
c1indvcrs C1 B31 Student not enrolled in career Clusters, etc. may take course in program
c1intern C1 B32A School offers internships with local employers
c1jobfair C1 B32B School offers job fairs
c1jobguide C1 B32C School offers Career guides or skills assessments
c1employer C1 B32D School offers school/classroom presentations by local employers
c1awareness C1 B32E School offers career awareness activities
c1careerunit C1 B32G School offers career information units in subject-matter courses
c1workstudy C1 B32H School offers exploratory work experience programs/co-op/workstudy/EBCE
c1careerday C1 B32I School offers career days or nights
c1assemblies C1 B32J School offers vocational oriented assemblies
c1jobinfocmp C1 B32Q School offers computerized career information resources
c1jobinfonon C1 B32R School offers non-computerized career information resources
c1hstowrkoth C1 B32S School assists students with transition from HS to work in other ways
c1hstoworkno C1 B32T School doesn’t assist students with transition from HS to work

Administrator variables

Variable Name Variable Label
a1mspdlearn A1 A25I Requires teacher prof development in how students learn math/science
a1mspdintrst A1 A25J Requires teacher prof development in increasing interest in math/science
a1msother A1 A25K Raises students math/science interest/achievement in another way
a1msnone A1 A25L Doesn’t do any of these to raise math/science interest/achievement
a1g9summer A1 A26A Offers pre-HS summer reading/math instructions for struggling 9th graders
a1g9overage A1 A26B Offers learning communities for over-age student lacking HS prerequisite
a1g9communty A1 A26C Offers 9th grade learning communities separate from rest of school
a1g9blocksch A1 A26D Offers block scheduling to assist struggling 9th graders
a1g9double A1 A26E Offers catch-up courses / double-dosing to assist struggling 9th graders
a1g9study A1 A26F Offers study skill seminar/class for struggling 9th graders
a1g9teacher A1 A26G Offers assistance for teachers working with struggling 9th graders

Covariates

Variable Name Variable Label
x1sex Sex of student (Male, Female)
x1black Race of student is Black
x1white Race of student is White
x1hispanic Race of student is Hispanic
x1locale Location where student resides (city, Rural, Suburb, Town)
x1region Region where student resides (Midwest, Northeast, South, West)

Sample size, description, excluded participants

In preparing the sample data, I downloaded the public available data set from the NCES website and then prep the data set as follows.

First, I choose the following variables from the HSLS:09 Student data set - sex, race (black, white, Hispanic), locale, region plus counselor variable of interest plus administrator variable of interest.

Next, I re coded the following variables - sex, locale and region. There were missing data within the data set and replaced these missed data with “NA”. The counselor and administrator variables were coded “O” for No and “1” for Yes. I had to recode the “0” to 1 (for No) and “1” to 2 (for Yes). The original variable type was character so I re coded them all to integer.

For missing values, I imputed with the mean of the column and rounded to the nearest whole integer. Then saved the final data set that needed to be used for latent class analysis (LCA)

The total sample size is 23503 with 32 variables

summary(as.factor(df$x1sex))
## Female   Male   NA's 
##  11524  11973      6
summary (as.factor(df$x1black))
##     0     1  NA's 
## 19175  3763   565
summary(as.factor(df$x1locale))
##   City  Rural Suburb   Town 
##   6689   5559   8467   2788
summary(as.factor(df$x1region))
##   Midwest Northeast     South      West 
##      6224      3662      9587      4030

Results

Multivariate Statistical Method

I will perform a Latent Class Analysis (LCA) on the variables responses by the school administrator and counselors to see the probability administrator and counselors answer yes to the “Opportunities for students”, and correlate it with gender, racial, place of residence (local and region) of the students.

knitr::opts_chunk$set(warning = FALSE)
library(poLCA)
## Loading required package: scatterplot3d
## Loading required package: MASS
library(vtable)
library(psych)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
df %>%
  slice(1:10) %>%
  kbl(caption = "Opportunities for students in HSLS:09 Dataset", digits = 1) %>%
  kable_classic("hover", full_width = F, html_font = "Cambria")
Opportunities for students in HSLS:09 Dataset
x1sex x1black x1white x1hispanic x1locale x1region c1cluster c1indvcrs c1intern c1jobfair c1jobguide c1employer c1awareness c1careerunit c1workstudy c1careerday c1assemblies c1jobinfocmp c1jobinfonon c1hstowrkoth c1hstoworkno a1mspdlearn a1mspdintrst a1msother a1msnone a1g9summer a1g9overage a1g9communty a1g9blocksch a1g9double a1g9study a1g9teacher
Male 0 1 0 Rural Midwest 1 2 1 2 1 2 2 2 1 2 2 2 1 2 1 1 2 1 1 2 1 1 2 1 1 1
Female 0 1 0 Rural Northeast 1 2 1 1 1 2 2 2 1 1 2 2 2 2 1 1 1 2 1 1 1 1 1 1 1 1
Female 1 0 0 Suburb West 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2
Female 0 1 0 Suburb South 1 2 1 1 1 2 2 1 2 1 1 2 1 1 1 2 1 1 1 2 1 2 1 2 1 1
Male 0 1 0 City South 2 2 1 1 2 1 2 1 2 1 1 2 1 1 1 2 2 1 1 1 1 1 2 2 1 2
Female 0 1 0 Rural South 2 2 1 1 2 2 2 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1
Female 0 1 0 Rural Northeast 2 2 2 1 1 1 2 2 2 2 1 2 1 1 1 2 1 1 1 1 1 1 2 2 2 2
Male 0 1 0 Town Northeast 1 2 1 1 1 2 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1
Male 0 1 0 Town South 2 2 2 2 1 2 2 2 2 2 2 2 1 1 1 2 2 1 1 1 1 1 2 1 1 1
Female 0 1 0 City Northeast 2 2 2 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 2

Preparation for LCA (Select Variables)

As for the measure of opportunities for students (for struggling students), fifteen items related to opportunities for students were used to represent responses by the school counselor, and eleven items were used to represent responses by the school administrator.

Rationale for using latent class analysis (LCA) is a person-oriented approach where a number of distinctly separable classes or types of people or groups are identified based on the probabilities of answering to various responses to a certain construct.

f1 <- as.formula(cbind(c1cluster, c1indvcrs, c1intern, c1jobfair, c1jobguide, c1employer, c1awareness, c1careerunit, c1workstudy, c1careerday, c1assemblies,
                           c1jobinfocmp, c1jobinfonon, c1hstowrkoth, c1hstoworkno)~1)

f2 <- as.formula(cbind(a1mspdlearn, a1mspdintrst, a1msother, a1msnone, a1g9summer, a1g9overage, a1g9communty, a1g9blocksch, a1g9double, a1g9study, a1g9teacher)~1)
max_II <- -100000
min_bic <- 100000
for (i in 1:4) {
  lc <- poLCA(f1, df, nclass = i, maxiter = 3000,
              tol = 1e-5, na.rm = FALSE, nrep = 10,
              verbose = TRUE, calc.se = TRUE)
  if(lc$bic < min_bic) {
    min_bic <- lc$bic
    LCA_best_model <- lc
  }
}
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.4932 0.5068
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0489 0.9511
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  0.5709 0.4291
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  0.7213 0.2787
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.6897 0.3103
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.3934 0.6066
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.3341 0.6659
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  0.6101 0.3899
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  0.4348 0.5652
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.6049 0.3951
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  0.5795 0.4205
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.2024 0.7976
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  0.6445 0.3555
## 
## $c1hstowrkoth
##           Pr(1) Pr(2)
## class 1:  0.755 0.245
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  0.9783 0.0217
## 
## Estimated class population shares 
##  1 
##  
## Predicted class memberships (by modal posterior prob.) 
##  1 
##  
## ========================================================= 
## Fit for 1 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 15 
## residual degrees of freedom: 23488 
## maximum log-likelihood: -202373.5 
##  
## AIC(1): 404777
## BIC(1): 404898
## G^2(1): 124537.1 (Likelihood ratio/deviance statistic) 
## X^2(1): 3920120 (Chi-square goodness of fit) 
##  
## Model 1: llik = -187489.6 ... best llik = -187489.6
## Model 2: llik = -187489.6 ... best llik = -187489.6
## Model 3: llik = -187489.6 ... best llik = -187489.6
## Model 4: llik = -187489.6 ... best llik = -187489.6
## Model 5: llik = -187489.6 ... best llik = -187489.6
## Model 6: llik = -187489.6 ... best llik = -187489.6
## Model 7: llik = -187489.6 ... best llik = -187489.6
## Model 8: llik = -187489.6 ... best llik = -187489.6
## Model 9: llik = -187489.6 ... best llik = -187489.6
## Model 10: llik = -187489.6 ... best llik = -187489.6
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.6810 0.3190
## class 2:  0.2455 0.7545
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0375 0.9625
## class 2:  0.0640 0.9360
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  0.7624 0.2376
## class 2:  0.3186 0.6814
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  0.8888 0.1112
## class 2:  0.5003 0.4997
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.8201 0.1799
## class 2:  0.5177 0.4823
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.5443 0.4557
## class 2:  0.1943 0.8057
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.4679 0.5321
## class 2:  0.1578 0.8422
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  0.8323 0.1677
## class 2:  0.3171 0.6829
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  0.5173 0.4827
## class 2:  0.3260 0.6740
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.7731 0.2269
## class 2:  0.3832 0.6168
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  0.8236 0.1764
## class 2:  0.2577 0.7423
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.3217 0.6783
## class 2:  0.0450 0.9550
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  0.8692 0.1308
## class 2:  0.3482 0.6518
## 
## $c1hstowrkoth
##            Pr(1)  Pr(2)
## class 1:  0.8681 0.1319
## class 2:  0.6058 0.3942
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  0.9618 0.0382
## class 2:  1.0000 0.0000
## 
## Estimated class population shares 
##  0.5687 0.4313 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.5717 0.4283 
##  
## ========================================================= 
## Fit for 2 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 31 
## residual degrees of freedom: 23472 
## maximum log-likelihood: -187489.6 
##  
## AIC(2): 375041.2
## BIC(2): 375291.2
## G^2(2): 94769.25 (Likelihood ratio/deviance statistic) 
## X^2(2): 773166.1 (Chi-square goodness of fit) 
##  
## Model 1: llik = -181024 ... best llik = -181024
## Model 2: llik = -181024 ... best llik = -181024
## Model 3: llik = -181024 ... best llik = -181024
## Model 4: llik = -181024 ... best llik = -181024
## Model 5: llik = -186446.1 ... best llik = -181024
## Model 6: llik = -181024 ... best llik = -181024
## Model 7: llik = -181024 ... best llik = -181024
## Model 8: llik = -184451.8 ... best llik = -181024
## Model 9: llik = -181024 ... best llik = -181024
## Model 10: llik = -184451.8 ... best llik = -181024
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.6189 0.3811
## class 2:  0.2659 0.7341
## class 3:  0.9787 0.0213
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0499 0.9501
## class 2:  0.0593 0.9407
## class 3:  0.0000 1.0000
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  0.7113 0.2887
## class 2:  0.3432 0.6568
## class 3:  1.0000 0.0000
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  0.8681 0.1319
## class 2:  0.5223 0.4777
## class 3:  1.0000 0.0000
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.7871 0.2129
## class 2:  0.5295 0.4705
## class 3:  0.9970 0.0030
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.7127 0.2873
## class 2:  0.1902 0.8098
## class 3:  0.0039 0.9961
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.6165 0.3835
## class 2:  0.1521 0.8479
## class 3:  0.0000 1.0000
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  0.8074 0.1926
## class 2:  0.3392 0.6608
## class 3:  1.0000 0.0000
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  0.6627 0.3373
## class 2:  0.3251 0.6749
## class 3:  0.0032 0.9968
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.7315 0.2685
## class 2:  0.4009 0.5991
## class 3:  0.9865 0.0135
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  0.7980 0.2020
## class 2:  0.2821 0.7179
## class 3:  1.0000 0.0000
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.4209 0.5791
## class 2:  0.0485 0.9515
## class 3:  0.0000 1.0000
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  0.8410 0.1590
## class 2:  0.3823 0.6177
## class 3:  1.0000 0.0000
## 
## $c1hstowrkoth
##            Pr(1)  Pr(2)
## class 1:  0.8340 0.1660
## class 2:  0.6259 0.3741
## class 3:  1.0000 0.0000
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  0.9491 0.0509
## class 2:  1.0000 0.0000
## class 3:  1.0000 0.0000
## 
## Estimated class population shares 
##  0.4271 0.4655 0.1073 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.4201 0.4712 0.1088 
##  
## ========================================================= 
## Fit for 3 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 47 
## residual degrees of freedom: 23456 
## maximum log-likelihood: -181024 
##  
## AIC(3): 362142.1
## BIC(3): 362521.1
## G^2(3): 81838.14 (Likelihood ratio/deviance statistic) 
## X^2(3): 494691.4 (Chi-square goodness of fit) 
##  
## Model 1: llik = -178320.5 ... best llik = -178320.5
## Model 2: llik = -178316.4 ... best llik = -178316.4
## Model 3: llik = -178333.6 ... best llik = -178316.4
## Model 4: llik = -178316.4 ... best llik = -178316.4
## Model 5: llik = -178316.4 ... best llik = -178316.4
## Model 6: llik = -178320.5 ... best llik = -178316.4
## Model 7: llik = -178316.4 ... best llik = -178316.4
## Model 8: llik = -178333.6 ... best llik = -178316.4
## Model 9: llik = -178333.6 ... best llik = -178316.4
## Model 10: llik = -178316.4 ... best llik = -178316.4
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.9886 0.0114
## class 2:  0.2249 0.7751
## class 3:  0.4859 0.5141
## class 4:  0.9845 0.0155
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0053 0.9947
## class 2:  0.0609 0.9391
## class 3:  0.0581 0.9419
## class 4:  0.0000 1.0000
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  0.9219 0.0781
## class 2:  0.2890 0.7110
## class 3:  0.6067 0.3933
## class 4:  1.0000 0.0000
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  0.9560 0.0440
## class 2:  0.4356 0.5644
## class 3:  0.8083 0.1917
## class 4:  1.0000 0.0000
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.9476 0.0524
## class 2:  0.5019 0.4981
## class 3:  0.7049 0.2951
## class 4:  0.9992 0.0008
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.9505 0.0495
## class 2:  0.1115 0.8885
## class 3:  0.5716 0.4284
## class 4:  0.0089 0.9911
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.9326 0.0674
## class 2:  0.1150 0.8850
## class 3:  0.4560 0.5440
## class 4:  0.0000 1.0000
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  0.9215 0.0785
## class 2:  0.2440 0.7560
## class 3:  0.7131 0.2869
## class 4:  1.0000 0.0000
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  0.9873 0.0127
## class 2:  0.2814 0.7186
## class 3:  0.5419 0.4581
## class 4:  0.0057 0.9943
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.9356 0.0644
## class 2:  0.3410 0.6590
## class 3:  0.6418 0.3582
## class 4:  0.9892 0.0108
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  0.9218 0.0782
## class 2:  0.1837 0.8163
## class 3:  0.6903 0.3097
## class 4:  1.0000 0.0000
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.8252 0.1748
## class 2:  0.0103 0.9897
## class 3:  0.2756 0.7244
## class 4:  0.0000 1.0000
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  0.9377 0.0623
## class 2:  0.2713 0.7287
## class 3:  0.7620 0.2380
## class 4:  1.0000 0.0000
## 
## $c1hstowrkoth
##            Pr(1)  Pr(2)
## class 1:  0.9411 0.0589
## class 2:  0.5794 0.4206
## class 3:  0.7867 0.2133
## class 4:  1.0000 0.0000
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  0.7042 0.2958
## class 2:  1.0000 0.0000
## class 3:  1.0000 0.0000
## class 4:  1.0000 0.0000
## 
## Estimated class population shares 
##  0.0735 0.3175 0.5024 0.1067 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.0701 0.3129 0.5089 0.1082 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 63 
## residual degrees of freedom: 23440 
## maximum log-likelihood: -178316.4 
##  
## AIC(4): 356758.8
## BIC(4): 357266.8
## G^2(4): 76422.84 (Likelihood ratio/deviance statistic) 
## X^2(4): 468266.4 (Chi-square goodness of fit) 
## 
LCA_best_model <- lc
LCA_best_model
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.9886 0.0114
## class 2:  0.2249 0.7751
## class 3:  0.4859 0.5141
## class 4:  0.9845 0.0155
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0053 0.9947
## class 2:  0.0609 0.9391
## class 3:  0.0581 0.9419
## class 4:  0.0000 1.0000
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  0.9219 0.0781
## class 2:  0.2890 0.7110
## class 3:  0.6067 0.3933
## class 4:  1.0000 0.0000
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  0.9560 0.0440
## class 2:  0.4356 0.5644
## class 3:  0.8083 0.1917
## class 4:  1.0000 0.0000
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.9476 0.0524
## class 2:  0.5019 0.4981
## class 3:  0.7049 0.2951
## class 4:  0.9992 0.0008
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.9505 0.0495
## class 2:  0.1115 0.8885
## class 3:  0.5716 0.4284
## class 4:  0.0089 0.9911
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.9326 0.0674
## class 2:  0.1150 0.8850
## class 3:  0.4560 0.5440
## class 4:  0.0000 1.0000
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  0.9215 0.0785
## class 2:  0.2440 0.7560
## class 3:  0.7131 0.2869
## class 4:  1.0000 0.0000
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  0.9873 0.0127
## class 2:  0.2814 0.7186
## class 3:  0.5419 0.4581
## class 4:  0.0057 0.9943
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.9356 0.0644
## class 2:  0.3410 0.6590
## class 3:  0.6418 0.3582
## class 4:  0.9892 0.0108
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  0.9218 0.0782
## class 2:  0.1837 0.8163
## class 3:  0.6903 0.3097
## class 4:  1.0000 0.0000
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.8252 0.1748
## class 2:  0.0103 0.9897
## class 3:  0.2756 0.7244
## class 4:  0.0000 1.0000
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  0.9377 0.0623
## class 2:  0.2713 0.7287
## class 3:  0.7620 0.2380
## class 4:  1.0000 0.0000
## 
## $c1hstowrkoth
##            Pr(1)  Pr(2)
## class 1:  0.9411 0.0589
## class 2:  0.5794 0.4206
## class 3:  0.7867 0.2133
## class 4:  1.0000 0.0000
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  0.7042 0.2958
## class 2:  1.0000 0.0000
## class 3:  1.0000 0.0000
## class 4:  1.0000 0.0000
## 
## Estimated class population shares 
##  0.0735 0.3175 0.5024 0.1067 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.0701 0.3129 0.5089 0.1082 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 63 
## residual degrees of freedom: 23440 
## maximum log-likelihood: -178316.4 
##  
## AIC(4): 356758.8
## BIC(4): 357266.8
## G^2(4): 76422.84 (Likelihood ratio/deviance statistic) 
## X^2(4): 468266.4 (Chi-square goodness of fit) 
## 

Brief Interpretation

For the best model, the probability for counselors to NOT provide opportunities for students should be: Class 3 > Class 1 > Class 2 > Class 4

fit.indices <- data.frame(
  Model = c("Model 1", "Model 2", "Model 3", "Model 4"),
  log_liklihood = c(-202373.5, -187489.6, -181024, -178316.4),
  resid.df = c(23488, 23472, 23456, 23440),
  BIC = c(404898, 375291.2, 362521.1, 357266.8),
  AIC = c(404777, 375041.2, 362142.1, 356758.8),
  likelihood_ratio = c(124537.1, 94769.25, 81838.14, 76422.84)
)

fit.indices %>%
  kbl(caption = "Fit indices for Opportunites for students by School Counselor") %>%
  kable_styling(full_width = F)
Fit indices for Opportunites for students by School Counselor
Model log_liklihood resid.df BIC AIC likelihood_ratio
Model 1 -202373.5 23488 404898.0 404777.0 124537.10
Model 2 -187489.6 23472 375291.2 375041.2 94769.25
Model 3 -181024.0 23456 362521.1 362142.1 81838.14
Model 4 -178316.4 23440 357266.8 356758.8 76422.84
max_II <- -100000
min_bic <- 100000
for (i in 1:4) {
  lc2 <- poLCA(f2, df, nclass = i, maxiter = 3000,
              tol = 1e-5, na.rm = FALSE, nrep = 10,
              verbose = TRUE, calc.se = TRUE)
  if(lc2$bic < min_bic) {
    min_bic <- lc2$bic
    LCA_best_model2 <- lc2
  }
}
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##           Pr(1) Pr(2)
## class 1:  0.355 0.645
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.6436 0.3564
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.7827 0.2173
## 
## $a1msnone
##            Pr(1)  Pr(2)
## class 1:  0.9811 0.0189
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.6469 0.3531
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  0.8496 0.1504
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.7526 0.2474
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.1222 0.8778
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.4038 0.5962
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.6392 0.3608
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.6227 0.3773
## 
## Estimated class population shares 
##  1 
##  
## Predicted class memberships (by modal posterior prob.) 
##  1 
##  
## ========================================================= 
## Fit for 1 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 11 
## residual degrees of freedom: 2036 
## maximum log-likelihood: -138994.1 
##  
## AIC(1): 278010.2
## BIC(1): 278098.9
## G^2(1): 40874.66 (Likelihood ratio/deviance statistic) 
## X^2(1): 151829.9 (Chi-square goodness of fit) 
##  
## Model 1: llik = -132319.4 ... best llik = -132319.4
## Model 2: llik = -132319.4 ... best llik = -132319.4
## Model 3: llik = -132319.4 ... best llik = -132319.4
## Model 4: llik = -132319.4 ... best llik = -132319.4
## Model 5: llik = -132319.4 ... best llik = -132319.4
## Model 6: llik = -132319.4 ... best llik = -132319.4
## Model 7: llik = -132319.4 ... best llik = -132319.4
## Model 8: llik = -132319.4 ... best llik = -132319.4
## Model 9: llik = -132319.4 ... best llik = -132319.4
## Model 10: llik = -132319.4 ... best llik = -132319.4
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##            Pr(1)  Pr(2)
## class 1:  0.1604 0.8396
## class 2:  0.4520 0.5480
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.2981 0.7019
## class 2:  0.8157 0.1843
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.6893 0.3107
## class 2:  0.8292 0.1708
## 
## $a1msnone
##            Pr(1)  Pr(2)
## class 1:  0.9970 0.0030
## class 2:  0.9732 0.0268
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.4582 0.5418
## class 2:  0.7409 0.2591
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  0.6233 0.3767
## class 2:  0.9623 0.0377
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.4739 0.5261
## class 2:  0.8915 0.1085
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.0816 0.9184
## class 2:  0.1425 0.8575
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.3390 0.6610
## class 2:  0.4361 0.5639
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.4140 0.5860
## class 2:  0.7514 0.2486
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.1777 0.8223
## class 2:  0.8445 0.1555
## 
## Estimated class population shares 
##  0.3326 0.6674 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.3301 0.6699 
##  
## ========================================================= 
## Fit for 2 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 23 
## residual degrees of freedom: 2024 
## maximum log-likelihood: -132319.4 
##  
## AIC(2): 264684.9
## BIC(2): 264870.4
## G^2(2): 27525.34 (Likelihood ratio/deviance statistic) 
## X^2(2): 111654 (Chi-square goodness of fit) 
##  
## Model 1: llik = -131218.7 ... best llik = -131218.7
## Model 2: llik = -129611.7 ... best llik = -129611.7
## Model 3: llik = -129611.7 ... best llik = -129611.7
## Model 4: llik = -129611.7 ... best llik = -129611.7
## Model 5: llik = -131218.7 ... best llik = -129611.7
## Model 6: llik = -129611.7 ... best llik = -129611.7
## Model 7: llik = -129611.7 ... best llik = -129611.7
## Model 8: llik = -129611.7 ... best llik = -129611.7
## Model 9: llik = -129611.7 ... best llik = -129611.7
## Model 10: llik = -129611.7 ... best llik = -129611.7
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.1039 0.8961
## class 3:  0.6636 0.3364
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.9024 0.0976
## class 2:  0.2812 0.7188
## class 3:  0.8249 0.1751
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.9501 0.0499
## class 2:  0.7064 0.2936
## class 3:  0.7821 0.2179
## 
## $a1msnone
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  1.0000 0.0000
## class 3:  0.9606 0.0394
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.9983 0.0017
## class 2:  0.4828 0.5172
## class 3:  0.6488 0.3512
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.6526 0.3474
## class 3:  0.9448 0.0552
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.9983 0.0017
## class 2:  0.5149 0.4851
## class 3:  0.8456 0.1544
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.0752 0.9248
## class 3:  0.1991 0.8009
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.0603 0.9397
## class 2:  0.3305 0.6695
## class 3:  0.5759 0.4241
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.9610 0.0390
## class 2:  0.4448 0.5552
## class 3:  0.6738 0.3262
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.9790 0.0210
## class 2:  0.2347 0.7653
## class 3:  0.7898 0.2102
## 
## Estimated class population shares 
##  0.164 0.3569 0.4791 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.1756 0.3547 0.4696 
##  
## ========================================================= 
## Fit for 3 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 35 
## residual degrees of freedom: 2012 
## maximum log-likelihood: -129611.7 
##  
## AIC(3): 259293.3
## BIC(3): 259575.6
## G^2(3): 22109.79 (Likelihood ratio/deviance statistic) 
## X^2(3): 56826.11 (Chi-square goodness of fit) 
##  
## Model 1: llik = -128102.5 ... best llik = -128102.5
## Model 2: llik = -128102.5 ... best llik = -128102.5
## Model 3: llik = -128102.5 ... best llik = -128102.5
## Model 4: llik = -128102.5 ... best llik = -128102.5
## Model 5: llik = -128102.5 ... best llik = -128102.5
## Model 6: llik = -128102.5 ... best llik = -128102.5
## Model 7: llik = -128102.5 ... best llik = -128102.5
## Model 8: llik = -128102.5 ... best llik = -128102.5
## Model 9: llik = -128102.5 ... best llik = -128102.5
## Model 10: llik = -128102.5 ... best llik = -128102.5
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.1298 0.8702
## class 3:  1.0000 0.0000
## class 4:  0.1738 0.8262
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.9735 0.0265
## class 2:  0.3053 0.6947
## class 3:  1.0000 0.0000
## class 4:  0.4330 0.5670
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.9468 0.0532
## class 2:  0.6101 0.3899
## class 3:  0.7430 0.2570
## class 4:  0.8248 0.1752
## 
## $a1msnone
##           Pr(1) Pr(2)
## class 1:  1.000 0.000
## class 2:  1.000 0.000
## class 3:  0.928 0.072
## class 4:  1.000 0.000
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.9937 0.0063
## class 2:  0.4192 0.5808
## class 3:  0.6726 0.3274
## class 4:  0.5979 0.4021
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.4640 0.5360
## class 3:  0.9350 0.0650
## class 4:  0.9145 0.0855
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.9923 0.0077
## class 2:  0.3125 0.6875
## class 3:  0.8217 0.1783
## class 4:  0.8183 0.1817
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.0487 0.9513
## class 3:  0.1886 0.8114
## class 4:  0.1620 0.8380
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.0505 0.9495
## class 2:  0.1917 0.8083
## class 3:  0.5639 0.4361
## class 4:  0.5393 0.4607
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.9610 0.0390
## class 2:  0.3716 0.6284
## class 3:  0.6835 0.3165
## class 4:  0.6067 0.3933
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.9872 0.0128
## class 2:  0.0846 0.9154
## class 3:  0.7891 0.2109
## class 4:  0.6196 0.3804
## 
## Estimated class population shares 
##  0.1582 0.1861 0.2625 0.3932 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.1562 0.1659 0.2941 0.3839 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 47 
## residual degrees of freedom: 2000 
## maximum log-likelihood: -128102.5 
##  
## AIC(4): 256299
## BIC(4): 256678
## G^2(4): 19091.43 (Likelihood ratio/deviance statistic) 
## X^2(4): 37904.92 (Chi-square goodness of fit) 
## 
LCA_best_model2 <- lc2
LCA_best_model2
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.1298 0.8702
## class 3:  1.0000 0.0000
## class 4:  0.1738 0.8262
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.9735 0.0265
## class 2:  0.3053 0.6947
## class 3:  1.0000 0.0000
## class 4:  0.4330 0.5670
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.9468 0.0532
## class 2:  0.6101 0.3899
## class 3:  0.7430 0.2570
## class 4:  0.8248 0.1752
## 
## $a1msnone
##           Pr(1) Pr(2)
## class 1:  1.000 0.000
## class 2:  1.000 0.000
## class 3:  0.928 0.072
## class 4:  1.000 0.000
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.9937 0.0063
## class 2:  0.4192 0.5808
## class 3:  0.6726 0.3274
## class 4:  0.5979 0.4021
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.4640 0.5360
## class 3:  0.9350 0.0650
## class 4:  0.9145 0.0855
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.9923 0.0077
## class 2:  0.3125 0.6875
## class 3:  0.8217 0.1783
## class 4:  0.8183 0.1817
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.0487 0.9513
## class 3:  0.1886 0.8114
## class 4:  0.1620 0.8380
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.0505 0.9495
## class 2:  0.1917 0.8083
## class 3:  0.5639 0.4361
## class 4:  0.5393 0.4607
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.9610 0.0390
## class 2:  0.3716 0.6284
## class 3:  0.6835 0.3165
## class 4:  0.6067 0.3933
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.9872 0.0128
## class 2:  0.0846 0.9154
## class 3:  0.7891 0.2109
## class 4:  0.6196 0.3804
## 
## Estimated class population shares 
##  0.1582 0.1861 0.2625 0.3932 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.1562 0.1659 0.2941 0.3839 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## number of observations: 23503 
## number of estimated parameters: 47 
## residual degrees of freedom: 2000 
## maximum log-likelihood: -128102.5 
##  
## AIC(4): 256299
## BIC(4): 256678
## G^2(4): 19091.43 (Likelihood ratio/deviance statistic) 
## X^2(4): 37904.92 (Chi-square goodness of fit) 
## 

Brief Interpretation

For the best model,

fit.indices2 <- data.frame(
  Model = c("Model 1", "Model 2", "Model 3", "Model 4"),
  log_liklihood = c(-138994.1, -132319.4, -129611.7, -128102.5),
  resid.df = c(2036, 2024, 2012, 2000),
  BIC = c(278098.9, 264870.4, 259575.6, 256678),
  AIC = c(278010.2, 264684.9, 259293.3, 256299),
  likelihood_ratio = c(40874.66, 27525.34, 22109.79, 19091.43)
)

fit.indices2 %>%
  kbl(caption = "Fit indices for Opportunites for students by School Administrator") %>%
  kable_styling(full_width = F)
Fit indices for Opportunites for students by School Administrator
Model log_liklihood resid.df BIC AIC likelihood_ratio
Model 1 -138994.1 2036 278098.9 278010.2 40874.66
Model 2 -132319.4 2024 264870.4 264684.9 27525.34
Model 3 -129611.7 2012 259575.6 259293.3 22109.79
Model 4 -128102.5 2000 256678.0 256299.0 19091.43
set.seed(112)
plot(LCA_best_model)

plot(LCA_best_model2)

Adding student gender as a covariates

f3 <- as.formula(cbind(c1cluster, c1indvcrs, c1intern, c1jobfair, c1jobguide, c1employer, c1awareness, c1careerunit, c1workstudy, c1careerday, c1assemblies,c1jobinfocmp, c1jobinfonon, c1hstowrkoth, c1hstoworkno)~x1sex)

f4 <- as.formula(cbind(a1mspdlearn, a1mspdintrst, a1msother, a1msnone, a1g9summer, a1g9overage, a1g9communty, a1g9blocksch, a1g9double, a1g9study, a1g9teacher)~x1sex)
LCA <- poLCA(f3, data = df, nclass = 4, maxiter = 30000, nrep = 5)
## Model 1: llik = -180983 ... best llik = -180983
## Model 2: llik = -178272.7 ... best llik = -178272.7
## Model 3: llik = -178272.7 ... best llik = -178272.7
## Model 4: llik = -178272.7 ... best llik = -178272.7
## Model 5: llik = -178272.7 ... best llik = -178272.7
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.9845 0.0155
## class 2:  0.9887 0.0113
## class 3:  0.2252 0.7748
## class 4:  0.4869 0.5131
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.0054 0.9946
## class 3:  0.0610 0.9390
## class 4:  0.0580 0.9420
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.9220 0.0780
## class 3:  0.2895 0.7105
## class 4:  0.6075 0.3925
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.9567 0.0433
## class 3:  0.4362 0.5638
## class 4:  0.8088 0.1912
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.9992 0.0008
## class 2:  0.9476 0.0524
## class 3:  0.5020 0.4980
## class 4:  0.7057 0.2943
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.0088 0.9912
## class 2:  0.9511 0.0489
## class 3:  0.1121 0.8879
## class 4:  0.5727 0.4273
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.9353 0.0647
## class 3:  0.1155 0.8845
## class 4:  0.4567 0.5433
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.9219 0.0781
## class 3:  0.2447 0.7553
## class 4:  0.7140 0.2860
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  0.0057 0.9943
## class 2:  0.9886 0.0114
## class 3:  0.2819 0.7181
## class 4:  0.5425 0.4575
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.9892 0.0108
## class 2:  0.9374 0.0626
## class 3:  0.3417 0.6583
## class 4:  0.6422 0.3578
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.9222 0.0778
## class 3:  0.1843 0.8157
## class 4:  0.6912 0.3088
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.8266 0.1734
## class 3:  0.0107 0.9893
## class 4:  0.2764 0.7236
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.9381 0.0619
## class 3:  0.2721 0.7279
## class 4:  0.7628 0.2372
## 
## $c1hstowrkoth
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.9421 0.0579
## class 3:  0.5797 0.4203
## class 4:  0.7871 0.2129
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.7013 0.2987
## class 3:  1.0000 0.0000
## class 4:  1.0000 0.0000
## 
## Estimated class population shares 
##  0.1066 0.0728 0.3186 0.502 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.1081 0.0699 0.3151 0.5069 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## 2 / 1 
##             Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)    -0.42871     0.05578   -7.686     0.000
## x1sexMale       0.09013     0.06868    1.312     0.189
## ========================================================= 
## 3 / 1 
##             Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)     1.12855     0.03572   31.590     0.000
## x1sexMale      -0.06607     0.04797   -1.377     0.168
## ========================================================= 
## 4 / 1 
##             Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)     1.56849     0.03357   46.729     0.000
## x1sexMale      -0.03650     0.04569   -0.799     0.424
## ========================================================= 
## number of observations: 23497 
## number of estimated parameters: 66 
## residual degrees of freedom: 23431 
## maximum log-likelihood: -178272.7 
##  
## AIC(4): 356677.4
## BIC(4): 357209.7
## X^2(4): 468140.2 (Chi-square goodness of fit) 
##  
## ALERT: estimation algorithm automatically restarted with new initial values 
## 
probs.start <- LCA$probs.start
plot(LCA)

LCA2 <- poLCA(f4, data = df, nclass = 4, maxiter = 30000, nrep = 5)
## Model 1: llik = -132296.9 ... best llik = -132296.9
## Model 2: llik = -128084.6 ... best llik = -128084.6
## Model 3: llik = -128084.6 ... best llik = -128084.6
## Model 4: llik = -138969 ... best llik = -128084.6
## Model 5: llik = -138969 ... best llik = -128084.6
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.1298 0.8702
## class 3:  1.0000 0.0000
## class 4:  0.1737 0.8263
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.9738 0.0262
## class 2:  0.3053 0.6947
## class 3:  1.0000 0.0000
## class 4:  0.4328 0.5672
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.9467 0.0533
## class 2:  0.6100 0.3900
## class 3:  0.7430 0.2570
## class 4:  0.8247 0.1753
## 
## $a1msnone
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  1.0000 0.0000
## class 3:  0.9281 0.0719
## class 4:  1.0000 0.0000
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.9938 0.0062
## class 2:  0.4191 0.5809
## class 3:  0.6726 0.3274
## class 4:  0.5980 0.4020
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.4638 0.5362
## class 3:  0.9350 0.0650
## class 4:  0.9144 0.0856
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.9923 0.0077
## class 2:  0.3122 0.6878
## class 3:  0.8217 0.1783
## class 4:  0.8183 0.1817
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.0000 1.0000
## class 2:  0.0487 0.9513
## class 3:  0.1886 0.8114
## class 4:  0.1618 0.8382
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.0508 0.9492
## class 2:  0.1916 0.8084
## class 3:  0.5639 0.4361
## class 4:  0.5390 0.4610
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.9609 0.0391
## class 2:  0.3714 0.6286
## class 3:  0.6835 0.3165
## class 4:  0.6068 0.3932
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.9873 0.0127
## class 2:  0.0845 0.9155
## class 3:  0.7891 0.2109
## class 4:  0.6194 0.3806
## 
## Estimated class population shares 
##  0.1579 0.186 0.2626 0.3935 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.156 0.1659 0.2942 0.3839 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## 2 / 1 
##             Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)     0.18072     0.04557    3.966     0.000
## x1sexMale      -0.03346     0.05061   -0.661     0.509
## ========================================================= 
## 3 / 1 
##             Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)     0.52177     0.03494   14.935     0.000
## x1sexMale      -0.02561     0.04463   -0.574     0.566
## ========================================================= 
## 4 / 1 
##             Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)     0.94016     0.03986   23.584     0.000
## x1sexMale      -0.05350     0.04562   -1.173     0.241
## ========================================================= 
## number of observations: 23497 
## number of estimated parameters: 50 
## residual degrees of freedom: 1997 
## maximum log-likelihood: -128084.6 
##  
## AIC(4): 256269.2
## BIC(4): 256672.5
## X^2(4): 37908.8 (Chi-square goodness of fit) 
##  
## ALERT: estimation algorithm automatically restarted with new initial values 
## 
probs.start <- LCA2$probs.start
plot(LCA2)

LCA with covariates - student gender, race, locale and region

f5 <- as.formula(cbind(c1cluster, c1indvcrs, c1intern, c1jobfair, c1jobguide, c1employer, c1awareness, c1careerunit, c1workstudy, c1careerday, c1assemblies,c1jobinfocmp, c1jobinfonon, c1hstowrkoth, c1hstoworkno)~x1sex+x1black+x1white+x1hispanic+x1locale+x1region)

f6 <- as.formula(cbind(a1mspdlearn, a1mspdintrst, a1msother, a1msnone, a1g9summer, a1g9overage, a1g9communty, a1g9blocksch, a1g9double, a1g9study, a1g9teacher)~x1sex+x1black+x1white+x1hispanic+x1locale+x1region)
LCA3 <- poLCA(f5, data = df, nclass = 4, maxiter = 30000, nrep = 5)
## Model 1: llik = -172563.6 ... best llik = -172563.6
## Model 2: llik = -167186.3 ... best llik = -167186.3
## Model 3: llik = -167186.3 ... best llik = -167186.3
## Model 4: llik = -167186.3 ... best llik = -167186.3
## Model 5: llik = -167304 ... best llik = -167186.3
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $c1cluster
##            Pr(1)  Pr(2)
## class 1:  0.9897 0.0103
## class 2:  0.9845 0.0155
## class 3:  0.2283 0.7717
## class 4:  0.4980 0.5020
## 
## $c1indvcrs
##            Pr(1)  Pr(2)
## class 1:  0.0054 0.9946
## class 2:  0.0000 1.0000
## class 3:  0.0605 0.9395
## class 4:  0.0546 0.9454
## 
## $c1intern
##            Pr(1)  Pr(2)
## class 1:  0.9234 0.0766
## class 2:  1.0000 0.0000
## class 3:  0.2953 0.7047
## class 4:  0.6160 0.3840
## 
## $c1jobfair
##            Pr(1)  Pr(2)
## class 1:  0.9666 0.0334
## class 2:  1.0000 0.0000
## class 3:  0.4411 0.5589
## class 4:  0.8160 0.1840
## 
## $c1jobguide
##            Pr(1)  Pr(2)
## class 1:  0.9543 0.0457
## class 2:  0.9991 0.0009
## class 3:  0.5059 0.4941
## class 4:  0.7099 0.2901
## 
## $c1employer
##            Pr(1)  Pr(2)
## class 1:  0.9537 0.0463
## class 2:  0.0022 0.9978
## class 3:  0.1175 0.8825
## class 4:  0.5816 0.4184
## 
## $c1awareness
##            Pr(1)  Pr(2)
## class 1:  0.9399 0.0601
## class 2:  0.0000 1.0000
## class 3:  0.1188 0.8812
## class 4:  0.4623 0.5377
## 
## $c1careerunit
##            Pr(1)  Pr(2)
## class 1:  0.9261 0.0739
## class 2:  1.0000 0.0000
## class 3:  0.2581 0.7419
## class 4:  0.7187 0.2813
## 
## $c1workstudy
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  0.0002 0.9998
## class 3:  0.2866 0.7134
## class 4:  0.5464 0.4536
## 
## $c1careerday
##            Pr(1)  Pr(2)
## class 1:  0.9477 0.0523
## class 2:  0.9891 0.0109
## class 3:  0.3508 0.6492
## class 4:  0.6447 0.3553
## 
## $c1assemblies
##            Pr(1)  Pr(2)
## class 1:  0.9307 0.0693
## class 2:  1.0000 0.0000
## class 3:  0.1886 0.8114
## class 4:  0.7010 0.2990
## 
## $c1jobinfocmp
##            Pr(1)  Pr(2)
## class 1:  0.8393 0.1607
## class 2:  0.0000 1.0000
## class 3:  0.0157 0.9843
## class 4:  0.2808 0.7192
## 
## $c1jobinfonon
##            Pr(1)  Pr(2)
## class 1:  0.9387 0.0613
## class 2:  1.0000 0.0000
## class 3:  0.2794 0.7206
## class 4:  0.7726 0.2274
## 
## $c1hstowrkoth
##            Pr(1)  Pr(2)
## class 1:  0.9495 0.0505
## class 2:  1.0000 0.0000
## class 3:  0.5807 0.4193
## class 4:  0.7908 0.2092
## 
## $c1hstoworkno
##            Pr(1)  Pr(2)
## class 1:  0.6731 0.3269
## class 2:  1.0000 0.0000
## class 3:  1.0000 0.0000
## class 4:  1.0000 0.0000
## 
## Estimated class population shares 
##  0.0692 0.1016 0.3293 0.4999 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.0708 0.104 0.3284 0.4969 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## 2 / 1 
##                   Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)           0.16919     0.13567    1.247     0.212
## x1sexMale            -0.10118     0.07361   -1.375     0.169
## x1black              -0.09856     0.11575   -0.852     0.394
## x1white              -0.22175     0.09643   -2.300     0.021
## x1hispanic            0.46481     0.10536    4.411     0.000
## x1localeRural         0.77854     0.12282    6.339     0.000
## x1localeSuburb        1.00359     0.09471   10.596     0.000
## x1localeTown          0.37831     0.14497    2.610     0.009
## x1regionNortheast     0.81378     0.14419    5.644     0.000
## x1regionSouth        -0.40145     0.10213   -3.931     0.000
## x1regionWest         -0.40066     0.12935   -3.097     0.002
## ========================================================= 
## 3 / 1 
##                   Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)           0.93077     0.12037    7.733     0.000
## x1sexMale            -0.16552     0.06476   -2.556     0.011
## x1black               0.12657     0.10392    1.218     0.223
## x1white               0.07192     0.08745    0.822     0.411
## x1hispanic            0.18632     0.09632    1.934     0.053
## x1localeRural         1.32780     0.10881   12.203     0.000
## x1localeSuburb        0.94142     0.08531   11.035     0.000
## x1localeTown          1.02538     0.12559    8.165     0.000
## x1regionNortheast     0.53553     0.13639    3.926     0.000
## x1regionSouth        -0.14909     0.09025   -1.652     0.099
## x1regionWest         -0.21285     0.11640   -1.829     0.067
## ========================================================= 
## 4 / 1 
##                   Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)           1.58079     0.12069   13.098     0.000
## x1sexMale            -0.11699     0.06511   -1.797     0.072
## x1black               0.08705     0.10446    0.833     0.405
## x1white               0.05584     0.08766    0.637     0.524
## x1hispanic            0.28351     0.09669    2.932     0.003
## x1localeRural         1.29271     0.11082   11.665     0.000
## x1localeSuburb        1.04618     0.08605   12.159     0.000
## x1localeTown          0.94291     0.12661    7.447     0.000
## x1regionNortheast     0.21312     0.13709    1.555     0.120
## x1regionSouth        -0.52924     0.09042   -5.853     0.000
## x1regionWest         -0.58730     0.11671   -5.032     0.000
## ========================================================= 
## number of observations: 22046 
## number of estimated parameters: 93 
## residual degrees of freedom: 21953 
## maximum log-likelihood: -167186.3 
##  
## AIC(4): 334558.6
## BIC(4): 335302.6
## X^2(4): 445957.8 (Chi-square goodness of fit) 
##  
## ALERT: estimation algorithm automatically restarted with new initial values 
## 
probs.start <- LCA3$probs.start
plot(LCA3)

LCA4 <- poLCA(f6, data = df, nclass = 4, maxiter = 30000, nrep = 5)
## Model 1: llik = -119605.3 ... best llik = -119605.3
## Model 2: llik = -120221.7 ... best llik = -119605.3
## Model 3: llik = -129656.2 ... best llik = -119605.3
## Model 4: llik = -120473.9 ... best llik = -119605.3
## Model 5: llik = -121543 ... best llik = -119605.3
## Conditional item response (column) probabilities,
##  by outcome variable, for each class (row) 
##  
## $a1mspdlearn
##           Pr(1) Pr(2)
## class 1:  0.120 0.880
## class 2:  0.000 1.000
## class 3:  0.184 0.816
## class 4:  1.000 0.000
## 
## $a1mspdintrst
##            Pr(1)  Pr(2)
## class 1:  0.3231 0.6769
## class 2:  0.9728 0.0272
## class 3:  0.4390 0.5610
## class 4:  1.0000 0.0000
## 
## $a1msother
##            Pr(1)  Pr(2)
## class 1:  0.6432 0.3568
## class 2:  0.9505 0.0495
## class 3:  0.8171 0.1829
## class 4:  0.7461 0.2539
## 
## $a1msnone
##            Pr(1)  Pr(2)
## class 1:  1.0000 0.0000
## class 2:  1.0000 0.0000
## class 3:  1.0000 0.0000
## class 4:  0.9282 0.0718
## 
## $a1g9summer
##            Pr(1)  Pr(2)
## class 1:  0.4408 0.5592
## class 2:  1.0000 0.0000
## class 3:  0.6033 0.3967
## class 4:  0.6690 0.3310
## 
## $a1g9overage
##            Pr(1)  Pr(2)
## class 1:  0.4934 0.5066
## class 2:  1.0000 0.0000
## class 3:  0.9293 0.0707
## class 4:  0.9321 0.0679
## 
## $a1g9communty
##            Pr(1)  Pr(2)
## class 1:  0.3012 0.6988
## class 2:  0.9916 0.0084
## class 3:  0.8553 0.1447
## class 4:  0.8209 0.1791
## 
## $a1g9blocksch
##            Pr(1)  Pr(2)
## class 1:  0.0587 0.9413
## class 2:  0.0000 1.0000
## class 3:  0.1596 0.8404
## class 4:  0.1883 0.8117
## 
## $a1g9double
##            Pr(1)  Pr(2)
## class 1:  0.2178 0.7822
## class 2:  0.0485 0.9515
## class 3:  0.5421 0.4579
## class 4:  0.5682 0.4318
## 
## $a1g9study
##            Pr(1)  Pr(2)
## class 1:  0.3877 0.6123
## class 2:  0.9585 0.0415
## class 3:  0.6175 0.3825
## class 4:  0.6803 0.3197
## 
## $a1g9teacher
##            Pr(1)  Pr(2)
## class 1:  0.1286 0.8714
## class 2:  0.9995 0.0005
## class 3:  0.6235 0.3765
## class 4:  0.7897 0.2103
## 
## Estimated class population shares 
##  0.206 0.1507 0.3812 0.2621 
##  
## Predicted class memberships (by modal posterior prob.) 
##  0.1901 0.1544 0.3603 0.2952 
##  
## ========================================================= 
## Fit for 4 latent classes: 
## ========================================================= 
## 2 / 1 
##                   Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)           0.11687     0.09417    1.241     0.215
## x1sexMale            -0.01375     0.05199   -0.264     0.792
## x1black               0.18737     0.07543    2.484     0.013
## x1white              -0.07310     0.06691   -1.093     0.275
## x1hispanic           -0.16485     0.07027   -2.346     0.019
## x1localeRural         0.53988     0.07698    7.013     0.000
## x1localeSuburb       -0.30381     0.06481   -4.688     0.000
## x1localeTown          0.09506     0.11126    0.854     0.393
## x1regionNortheast     0.47197     0.10192    4.631     0.000
## x1regionSouth        -0.67357     0.06849   -9.835     0.000
## x1regionWest         -0.76684     0.08770   -8.744     0.000
## ========================================================= 
## 3 / 1 
##                   Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)           0.62922     0.09620    6.540     0.000
## x1sexMale            -0.04634     0.05020   -0.923     0.356
## x1black              -0.24638     0.07777   -3.168     0.002
## x1white               0.26694     0.06655    4.011     0.000
## x1hispanic           -0.16612     0.06670   -2.491     0.013
## x1localeRural         0.50370     0.07715    6.529     0.000
## x1localeSuburb       -0.20002     0.06291   -3.179     0.001
## x1localeTown          0.83954     0.10009    8.388     0.000
## x1regionNortheast     0.78864     0.10201    7.731     0.000
## x1regionSouth        -0.52601     0.06798   -7.737     0.000
## x1regionWest         -0.40606     0.08336   -4.871     0.000
## ========================================================= 
## 4 / 1 
##                   Coefficient  Std. error  t value  Pr(>|t|)
## (Intercept)           0.51934     0.08760    5.929     0.000
## x1sexMale            -0.00341     0.04764   -0.072     0.943
## x1black              -0.30175     0.07491   -4.028     0.000
## x1white               0.39111     0.06366    6.144     0.000
## x1hispanic           -0.36457     0.06541   -5.574     0.000
## x1localeRural         0.83217     0.07159   11.624     0.000
## x1localeSuburb       -0.43264     0.05991   -7.222     0.000
## x1localeTown          0.72834     0.09614    7.576     0.000
## x1regionNortheast     0.41230     0.09639    4.277     0.000
## x1regionSouth        -1.31308     0.06525  -20.123     0.000
## x1regionWest         -0.38213     0.07310   -5.228     0.000
## ========================================================= 
## number of observations: 22046 
## number of estimated parameters: 77 
## residual degrees of freedom: 1970 
## maximum log-likelihood: -119605.3 
##  
## AIC(4): 239364.5
## BIC(4): 239980.6
## X^2(4): 34789.35 (Chi-square goodness of fit) 
##  
## ALERT: estimation algorithm automatically restarted with new initial values 
## 
probs.start <- LCA4$probs.start
plot(LCA4)

Brief Interpretation

Limitations and Next Step Analysis

Concluding Remarks

Reference

Ingels, S. J., Pratt, D. J., Herget, D. R., Burns, L. J., Dever, J. A., Ottem, R., Rogers, J. E., Jin, Y., & Leinwand, S. (2011). High school longitudinal study of 2009 (HSLS: 09): Base-year data file documentation. NCES 2011-328. National Center for Education Statistics.
Kurban, E. R., & Cabrera, A. F. (2020). Building readiness and intention towards STEM fields of study: Using HSLS: 09 and SEM to examine this complex process among high school students. The Journal of Higher Education, 91(4), 620–650.