Latent Profile Analysis (LPA) with Eye-Tracking Data

Karinna Rodriguez

2026-02-24

LPA is a type of latent class analysis that allows the use of continuous variables to identify groups of individuals that are based on similar patterns in their performance (Berlin et al. 2014; Lanza and Cooper 2016). The following steps highlight how to run a LPA, compare models, visualize models and interpret model outputs.TidyLPA 1.0.9.4 was used to run an LPA in RStudio (Rosenberg et al. 2019).

#Download packages 
library(tidyLPA)
library(tidyverse)
library(psych)
library(dplyr)

Import Data Files & Variables Defined

#Import data 
Data <- read.csv("LPA_eyetracking_data.csv")

The defining variables used to identify profiles are participants fixation count, fixation duration, visit count, and visit duration.

Note: AOI = area of interests
Variable Definition
Fixation Eye movement shifts below the samples calculated eye shifts velocity threshold and pauses in a spot.
Fixation Count (“fixcou”) Number of fixations within an area of interest.

Fixation Duration

(“fixdur”)

Time spent in a fixation within an AOI.
Visit Movement between one fixation within an AOI and a second fixation within a different AOI.

Visit Count

(“visitcou”)

Number of visits within an AOI

Visit Duration

(“visitdur”)

Time of total visits within an AOI

LPA modeling and comparing best fit between 1, 2 and 3 profiles

LPAcompare is the assigned name of the current model. The four defining variables are selected using the subset() function. estimate_profiles() indicates the range of profiles to include in the model. In this example the range is set from 1:3 profiles. Variance and covariance parameters are set to test all model possibilities. Lastly, compare_soultions() function was used to indicate the type of fit indices to include in the model comparisons outputs.

LPAcompare <- Data  %>%
  subset(select = c("visitcou", "visitdur", "fixcou", "fixdur")) %>%
  single_imputation() %>%
  estimate_profiles(1:3, 
  variances = c("equal", "varying","varying", "equal"),
  covariances = c("zero", "varying", "zero", "equal" )) %>%
  compare_solutions(statistics = c("AIC", "BIC"))
LPAcompare

Defined by Rosenberg et al. (2019) the four models specified are as follows: Model 1: Equal variances and co-variances fixed to 0, Model 2: Varying variances and co-variances fixed to 0, Model 3: Equal variances and equal co-variances Model 4: Varying variances and varying co-variances

The model used in this study held the local independence assumption (i.e., co-variances are equal to zero) and the homogeneity of variance assumption (i.e., variances are equal). Therefore only Model 1 with varying classes was examined. These assumptions helped minimize parameter estimations and produce models that were more constrained and stable (Tein et al. 2013).

When comparing models, BIC is thought to be the better fit index to consider and evaluate (Pastor et al. 2007). After comparing BIC values for Model 1 with 2 Classes (362.668) and Model 1 with 3 Classes (352.470), both models had similar BIC values. Therefore, further visual representation of the model is needed

> LPAcompare4
Compare tidyLPA solutions:

 Model Classes AIC     BIC    
 1     1       496.681 520.659
 1     2       323.704 362.668
 1     3       298.520 352.470
 6     1       51.199  93.160 
 6     2       -16.814 70.105 
 6     3       -27.255 104.622
 2     1       496.681 520.659
 2     2       307.835 358.788
 2     3       277.538 355.466
 3     1       51.199  93.160 
 3     2       2.542   59.489 
 3     3       -10.465 61.468 

Best model according to AIC is Model 6 with 3 classes.
Best model according to BIC is Model 3 with 2 classes.

An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 3 with 2 classes.
> 

Visualizing LPA Model 1 with varying classes

Next, we can visualize LPA Model 1 (varinaces = “equal”, covariances = “zero) with 2 and 3 classes using the plot_profiles() function and check other fit indices using the get_fit() function.


#LPA with Model 1 parameters comparing 2 and 3 Classes as suggested by BIC values 

LPA <- Data %>% 
  subset(select = c("visitcou", "visitdur", "fixcou", "fixdur")) %>%     
  single_imputation() %>% 
  estimate_profiles(2:3, variances = "equal", covariances = "zero")

get_fit (LPA)

#Plot

plot <- plot_profiles(LPA)

plot + labs(subtitle = "Model 1", y = "Average", x= "" ) + 
  theme(plot.subtitle = 
          element_text(size = 14, 
                       color = "black",
                       hjust = 0.5))

Entropy value can also be used to assess class classification accuracy, with entropy values closer to 1 suggesting greater accuracy (Berlin et al. 2014). The entropy for the model with 2 classes is 0.96, and the entropy for the model with 3 classes is 0.83.

The plot below shows the eye-tracking parameters plotted by LPA. Both models held the local independence assumption. Class 1, depicted by red circles, had lower visit count and fixation count aver- ages in both models. Class 2, depicted by blue triangles, had higher visit count and fixation count averages in both models. Class 3, depicted by green squares, overlaps with Class 2. The overlap between Class 1 and 3 in the model with 3 classes. This suggests a third class is redundant and unnecessary. Therefore,according to BIC fit indices and overlapping classes, the model with 2 classes is the best solution.

get_fit (LPA)

# A tibble: 2 × 18
  Model Classes LogLik   AIC   AWE   BIC  CAIC   CLC   KIC SABIC   ICL Entropy prob_min prob_max
  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>    <dbl>
1     1       2  -149.  324.  465.  363.  376.  300.  340.  322. -367.   0.956    0.964    0.991
2     1       3  -131.  299.  495.  352.  370.  264.  320.  296. -379.   0.834    0.717    0.962
# ℹ 4 more variables: n_min <dbl>, n_max <dbl>, BLRT_val <dbl>, BLRT_p <dbl>
> 
  

Interpreting LPA Model with 2 Classes

R package gtsummary 1.7.2 (Sjoberg et al. 2021) was used to compare the means of the variables used to classify participants.

First, the final LPA model with just 2 classes needs to be saved. Then the models data needs to be exported and saved into a .cvs file using the get_data() function.

#Final LPA model with 2 classes
Final_LPA <- Data %>% 
  subset(select = c("visitcou", "visitdur", "fixcou", "fixdur")) %>%     
  single_imputation() %>% 
  estimate_profiles(2, variances = "equal", covariances = "zero")

# Load the data into an object
lpa_data <- get_data(Final_LPA)

# Save as CSV
write.csv(lpa_data, 
          file = "LPA_data.csv", 
          row.names = FALSE)

Now that the output is saved gtsummary package can be used to run Welch two sample t-test, comparing the four parameters used in the LPA model.

library(gtsummary)

data <- read_csv ("LPA_data.csv")

# Creates tables, summarizes data, and runs a Welch two sample t-test
table1 <- 
 data %>%
  select( Class, visitcou, visitdur, fixcou, fixdur) %>% 
  tbl_summary(
    by = Class,
    label = list(
      fixcou ~"Fixation Count",
      fixdur ~ "Fixation Duration",
      visitcou ~ "Visit Count",
      visitdur ~ "Visit Duration"
    ), 
    statistic = all_continuous() ~ "{mean} ({sd})") %>% 
    add_p(
    test = list(all_continuous() ~ "t.test"), 
    test.args = all_continuous() ~ list(var.equal = FALSE),
    pvalue_fun = function(x) style_pvalue(x, digits = 3)
    ) %>%
  add_stat_label() %>%
  add_stat(
    fns = everything() ~ function(data, variable, ...) {
      if(variable %in% c("visitcou", "visitdur", "fixcou", "fixdur")) {
        t_result <- t.test(data[[variable]] ~ data[["Class"]], var.equal = FALSE)
        df <- round(t_result$parameter, 1)
        t_val <- round(t_result$statistic, 2)
        return(paste0("t(", df, ") = ", t_val))
      } else {
        return(NA)
      }
    }
  ) %>%
  modify_header(add_stat_1 ~ "Significance Testing")

table1 

table1, shown below, is the summary output of the LPA model with two classes testing which parameters from the eye-tracking data are significantly different from each other. On average, Class 1 had significantly fewer fixation counts, fewer visit counts and shorter visit duration compared to Class 2. Fixation duration was not significantly different between classes.

References

These analysis and data are based on published work. For further interpretations of results please review here: https://doi.org/10.1002/icd.70018

Berlin, K. S., N. A. Williams, and G. R. Parra. 2014. “An Introduction to Latent Variable Mixture Modeling (Part 1): Overview and Cross-Sectional Latent Class and Latent Profile Analyses.” Journal of Pediatric Psychology 39, no. 2: 174–187. https:// doi. org/ 10. 1093/ jpepsy/ jst084.

Lanza, S. T., and B. R. Cooper. 2016. “Latent Class Analysis for Developmental Research.” Child Development Perspectives 10, no. 1: 59–64. https:// doi. org/ 10. 1111/ cdep. 12163 .

Pastor, D. A., K. E. Barron, B. J. Miller, and S. L. Davis. 2007. “A Latent Profile Analysis of College Students’ Achievement Goal Orientation.” Contemporary Educational Psychology 32, no. 1: 8–47. https:// doi. org/10. 1016/j. cedps ych. 2006. 10. 003.

Rosenberg, J. M., C. J. van Lissa, P. N. Beymer, D. J. Anderson, M. J. Schell, and J. A. Schmidt. 2019. “tidyLPA: Easily Carry out Latent Profile Analysis (LPA) Using Open-­ Source or Commercial Software [RPackage].” Journal of Open Source Software 3, no. 30: 978. https:// doi.org/ 10. 21105/ joss. 00978 .

Sjoberg, D. D., K. Whiting, M. Curry, J. A. Lavery, and J. Larmarange. 2021. “Reproducible Summary Tables With the Gtsummary Package.” R Journal 13, no. 1: 570–580. https:// doi. org/ 10. 32614/ rj- 2021- 053.

Tein, J. Y., S. Coxe, and H. Cham. 2013. “Statistical Power to Detect the Correct Number of Classes in Latent Profile Analysis.” Structural Equation Modeling: A Multidisciplinary Journal 20, no. 4: 640–657. https:// doi. org/ 10. 1080/ 10705 511. 2013. 824781.

Code used in LPA graph that was published:

# Load required packages
library(ggplot2)
library(dplyr)
library(tidyr)


# import data 
data1  <- read.csv ("2ClassData.csv")

data2 <- read.csv ("3ClassData.csv")

# Melt the datasets for scatter plotting
data1_long <- data1 %>%
  pivot_longer(cols = c(fixdur, visitdur, fixcou, visitcou), names_to = "Variable", values_to = "Value")

data2_long <- data2 %>%
  pivot_longer(cols = c(fixdur, visitdur, fixcou, visitcou), names_to = "Variable", values_to = "Value")               

# Add dataset identifiers
data1_long$Dataset <- "Model 1"
data2_long$Dataset <- "Model 2"

#Select Coulumns
data1_long <- data1_long %>% select(ID, Variable, Value, Dataset,Class)
data2_long <- data2_long %>% select(ID, Variable, Value, Dataset,Class)

# Combine the datasets
combined_data <- rbind(data1_long, data2_long) 

combined_data$Class <-as.factor(combined_data$Class)

# Calculate summaries for "mean ± SD" boxes
data1_summary <- data1_long %>%
  group_by(Dataset, Variable, Class) %>%
  summarise(mean = mean(Value), sd = sd(Value), .groups = "drop")

data2_summary <- data2_long %>%
  group_by(Dataset, Variable, Class) %>%
  summarise(mean = mean(Value), sd = sd(Value), .groups = "drop")

# Combine summaries
all_summaries <- rbind(data1_summary, data2_summary)
all_summaries$Class <-as.factor(all_summaries$Class)

# Create the plot
ggplot(combined_data, aes(x = Variable, y = Value)) +
  geom_point(aes(color = Class, shape = Class), alpha = .9, position = position_jitter(width = 0.3, height = .4)) +
  # Outline rectangles 
  geom_rect(
    data = all_summaries,
    aes(
      xmin = as.numeric(factor(Variable)) - 0.4,
      xmax = as.numeric(factor(Variable)) + 0.4,
      ymin = mean - sd,
      ymax = mean + sd,
      color = Class,
    ), 
    linetype = "solid" ,
    fill = NA,
    alpha = 1,
    inherit.aes = FALSE
  ) +
  # Mean line within rect. 
  geom_segment(
    data = all_summaries,
    aes(
      x = as.numeric(factor(Variable)) - 0.4,
      xend = as.numeric(factor(Variable)) + 0.4,
      y = mean,
      yend = mean,
      color = Class # Match color to class
    ),
    linetype = "dashed",
    size = 1.2
  )  +
  geom_point(
    data = all_summaries,
    aes(x = as.numeric(factor(Variable)), y = mean, shape = Class, color = Class),  # Midpoint of mean line
    size = 3
  ) +
  scale_color_manual(values = c("#E43222", "#377EB8", "#4FAF4A")) +

  scale_x_discrete(labels = c("fixcou" = "Fixation Count", 
                              "fixdur" = "Fixation Duration",
                              "visitcou" = "Visit Count",
                              "visitdur" = "Visit Duration")) +
  scale_y_continuous(breaks = seq(floor(min(combined_data$Value)), ceiling(max(combined_data$Value)), by = 2)) +
 facet_wrap(~Dataset) +
  labs (x = "", y = "") +
  theme_minimal() +
  theme(legend.position = "right",
        axis.text.x = element_text(size = 11),
        axis.text.y = element_text(size = 12),
        strip.text = element_text(size = 15),
        strip.background = element_rect(fill = "grey", color = "black"),
        legend.text = element_text(size = 12), 
        legend.title = element_text(size = 12),
        panel.border = element_rect(color = "black", fill = NA, size = 1))