Clustering complex life trajectories with Hidden Markov Models

State Space Definition

Hidden States

Let \(S_t\) represent the hidden state at time \(t\), comprising:

Disease combinations (e.g., {diabetes, hypertension, stroke})
Severity levels for each condition
Overall health status indicators

Observation Space

Let \(Y_t\) represent observed variables at time \(t\):

Clinical measurements (morbidities)
Hospital visits
Medication changes
Diagnostic codes

Model Specification

Hidden Markov Model Overview

A Hidden Markov Model (HMM) is a statistical framework for modeling systems that transition between hidden states over time. Key components of an HMM include:

Hidden states: The unobservable or latent variables (\(S_t\)) representing the underlying system’s state at each time step.
Observations: The measurable outputs (\(Y_t\)) influenced by the hidden states.
Transitions: The probabilities of moving from one hidden state to another, represented by a state transition matrix (\(A\)).
Emissions: The probabilities of observing specific outputs given a hidden state, represented by emission probabilities (\(B\)).

HMMs assume the Markov property, meaning the current state \(S_t\) depends only on the previous state \(S_{t-1}\), and the current observation \(Y_t\) depends only on the current state \(S_t\).

HMMs are widely used in applications such as speech recognition, genomics, and clinical modeling because they effectively handle systems with temporal dependencies and partially observable states.

Parameters

\(\pi\): Initial state distribution
\(A\): State transition matrix \(P(S_t | S_{t-1})\)
\(B\): Emission probabilities \(P(Y_t | S_t)\)

Key Features

Time-varying transition probabilities:
\(A_t = f(\text{age, gender, risk factors})\)
State-dependent observations:
\(P(Y_t | S_t) = \prod P(Y_t^i | S_t)\),
where \(Y_t^i\) represents individual clinical measurements.

##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 534436 28.6    1196758   64   660402 35.3
## Vcells 984539  7.6    8388608   64  1770156 13.6

For now i will filter the dataset for individuals with exactly 7 sequences from Wave 1 to Wave 8. Note: I will not consider Wave 3 since it is a retrospective wave.

Model without covariates:

In this section, I will fit a Hidden Markov Model (HMM) to the filtered dataset without including any covariates. The model will estimate the hidden states and transition probabilities based solely on the observed health conditions across the seven waves.

Using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) as model selection criteria, we can identify the optimal number of states for the HMM. Lower AIC and BIC values indicate better model fit, with the optimal number of states balancing model complexity and explanatory power. Based on the elbow method, the optimal number of states appears to be 7, as it provides a good balance between model complexity and fit. This model captures the underlying health states and transitions effectively, as shown by the state distribution plots.

## iteration 0 logLik: -61529.39 
## iteration 5 logLik: -60911.15 
## iteration 10 logLik: -50994.41 
## iteration 15 logLik: -41612.9 
## iteration 20 logLik: -37939.38 
## iteration 25 logLik: -37387.68 
## iteration 30 logLik: -35974.54 
## iteration 35 logLik: -35809.85 
## iteration 40 logLik: -35398.94 
## iteration 45 logLik: -34145.79 
## iteration 50 logLik: -34073.25 
## iteration 55 logLik: -34066.41 
## iteration 60 logLik: -34062.8 
## iteration 65 logLik: -34060 
## iteration 70 logLik: -34024.49 
## iteration 75 logLik: -33933.69 
## iteration 80 logLik: -33717.78 
## iteration 85 logLik: -33696.54 
## iteration 90 logLik: -33273.59 
## iteration 95 logLik: -33265.27 
## iteration 100 logLik: -33265.2 
## iteration 105 logLik: -33265.18 
## iteration 110 logLik: -33265.16 
## iteration 115 logLik: -33265.16 
## iteration 120 logLik: -33265.16 
## converged at iteration 124 with logLik: -33265.15

## Initial state probabilities model 
##   pr1   pr2   pr3   pr4   pr5   pr6   pr7 
## 0.034 0.060 0.582 0.038 0.213 0.009 0.062 
## 
## Transition matrix 
##         toS1  toS2  toS3  toS4  toS5  toS6  toS7
## fromS1 1.000 0.000 0.000 0.000 0.000 0.000 0.000
## fromS2 0.060 0.932 0.000 0.008 0.000 0.000 0.000
## fromS3 0.007 0.015 0.834 0.020 0.087 0.001 0.037
## fromS4 0.143 0.000 0.000 0.857 0.000 0.000 0.000
## fromS5 0.054 0.034 0.000 0.000 0.891 0.020 0.000
## fromS6 0.051 0.030 0.000 0.000 0.000 0.919 0.000
## fromS7 0.011 0.013 0.000 0.022 0.046 0.047 0.861
## 
## Response parameters 
## Resp 1 : multinomial 
## Resp 2 : multinomial 
## Resp 3 : multinomial 
## Resp 4 : multinomial 
## Resp 5 : multinomial 
## Resp 6 : multinomial 
##     Re1.0 Re1.1 Re2.0 Re2.1 Re3.0 Re3.1 Re4.0 Re4.1 Re5.0 Re5.1 Re6.0 Re6.1
## St1 0.842 0.158 0.000 1.000 0.000 1.000 0.696 0.304 0.827 0.173 0.783 0.217
## St2 0.881 0.119 0.243 0.757 0.999 0.001 0.000 1.000 0.953 0.047 0.891 0.109
## St3 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
## St4 0.869 0.131 1.000 0.000 0.000 1.000 0.899 0.101 0.924 0.076 0.903 0.097
## St5 1.000 0.000 0.000 1.000 1.000 0.000 1.000 0.000 0.941 0.059 0.918 0.082
## St6 0.000 1.000 0.000 1.000 1.000 0.000 1.000 0.000 0.905 0.095 0.909 0.091
## St7 0.532 0.468 1.000 0.000 1.000 0.000 1.000 0.000 0.868 0.132 0.627 0.373

##   State cancre_mean hibpe_mean hearte_mean diabe_mean stroke_mean lunge_mean
## 1     1   0.8419337   0.000000   0.0000000  0.6961548   0.8274019  0.7827099
## 2     2   0.8806969   0.242927   0.9991788  0.0000000   0.9526185  0.8911794
## 3     3   1.0000000   1.000000   1.0000000  1.0000000   1.0000000  1.0000000
## 4     4   0.8694699   1.000000   0.0000000  0.8991014   0.9239931  0.9034356
## 5     5   1.0000000   0.000000   1.0000000  1.0000000   0.9405069  0.9177417
## 6     6   0.0000000   0.000000   1.0000000  1.0000000   0.9053030  0.9090909
## 7     7   0.5315140   1.000000   1.0000000  1.0000000   0.8677154  0.6266752

## Initial State Probabilities

The initial state probabilities indicate the likelihood of starting in each of the seven states. State 3 stands out with a dominant starting probability of 58%, suggesting it represents the most prevalent or stable condition in the population. In contrast, State 2 has the lowest probability at 1.7%, highlighting its rarity as a starting point. State 6 is also significant, with a 20.3% probability, pointing to its potential importance in the system.

Transition Matrix Dynamics

The transition matrix outlines the probabilities of moving between states, with diagonal elements indicating the likelihood of remaining in the same state. Most states exhibit strong self-transition tendencies: - States 1, 2, 3, 4, 5, and 6 all have over 80% probabilities of staying within the same state. - State 7, however, is an absorbing state, meaning that once entered, transitions to other states are impossible. This makes it a terminal state, potentially reflecting critical or irreversible conditions.

Key transition patterns include: - State 1 has a 10% probability of transitioning to State 5. - State 4 shows an 18.6% chance of moving to State 7. - State 6 has a smaller but notable likelihood of transitioning to State 5, potentially signaling recovery or progression depending on the conditions represented by these states.

Response Parameters and Health Condition Probabilities

The response parameters reveal the prevalence of six health conditions—cancer, hypertension, heart disease, diabetes, stroke, and lung disease—across the seven states. Each condition is analyzed using multinomial probabilities, illustrating its distribution within each state:

Cancer: Nearly universal in State 3, with high probabilities also observed in States 4 and 5.
Hypertension: Present in States 1 through 4, but entirely absent in States 5, 6, and 7, suggesting distinct differences in population characteristics or health risks.
Heart Disease: Widespread in States 2, 3, and 6, but less common in States 5 and 7.
Diabetes: Found in States 1, 2, 3, 5, and 6, but absent in States 4 and 7, highlighting clear distinctions in risk profiles.
Stroke: More prevalent in States 3 and 6, with lower probabilities in States 1 and 5, indicating variable stroke risks across states.
Lung Disease: Dominant in States 3 and 6, rare in State 2, and moderate elsewhere.

Interpretation of States

States 3 and 6 are critical, marked by high starting probabilities, strong self-transitions, and significant health burdens. These states likely represent populations with severe or chronic conditions requiring focused attention.
State 7, as a terminal absorbing state, signifies a critical endpoint. Its role emphasizes the importance of understanding transitions leading to it, particularly from States 4 and 6.
States 1, 2, 4, and 5 exhibit intermediate characteristics. For example, State 4 shows a higher risk of transitioning to State 7, suggesting vulnerability, while State 5 might represent a recovery or transitional phase.

Implications

Preventing Transition to Terminal States: The absorbing nature of State 7 underscores the need for interventions to prevent transitions from states like 4 and 6.
Optimizing Care in Transitional States: Understanding the role of State 5 as a possible recovery or progression phase can guide tailored healthcare strategies.
Targeting High-Risk States: States 3 and 6 require targeted efforts to manage chronic or severe health conditions prevalent in these populations.
Identifying Key Transition Pathways: Examining the dynamics between states, particularly those leading to State 7, can inform proactive measures to mitigate risks.

## Warning: package 'knitr' was built under R version 4.3.3

Summary Statistics of Morbidity Count by State and Wave
Wave	State	Mean	SD	Median	N	Min	Max
1	1	2.45	0.65	2	124	2	5
1	2	1.68	0.61	2	217	1	4
1	3	0.00	0.00	0	2148	0	0
1	4	1.20	0.47	1	138	1	3
1	5	1.05	0.23	1	767	1	3
1	6	2.09	0.30	2	32	2	3
1	7	1.01	0.22	1	170	0	2
2	1	2.51	0.71	2	220	2	5
2	2	1.84	0.61	2	270	1	4
2	3	0.00	0.00	0	1781	0	0
2	4	1.25	0.50	1	164	1	3
2	5	1.08	0.28	1	887	1	3
2	6	2.11	0.31	2	55	2	3
2	7	1.04	0.19	1	219	1	2
4	1	2.73	0.80	3	345	2	5
4	2	1.94	0.63	2	350	1	4
4	3	0.00	0.00	0	1373	0	0
4	4	1.31	0.59	1	183	1	4
4	5	1.12	0.35	1	1003	1	3
4	6	2.12	0.33	2	92	2	3
4	7	1.06	0.23	1	250	1	2
5	1	2.79	0.85	3	465	2	6
5	2	2.02	0.65	2	387	1	4
5	3	0.00	0.00	0	1176	0	0
5	4	1.44	0.71	1	184	1	4
5	5	1.14	0.38	1	1013	1	3
5	6	2.16	0.37	2	121	2	3
5	7	1.08	0.28	1	250	1	3
6	1	2.85	0.88	3	550	2	6
6	2	2.09	0.66	2	408	1	4
6	3	0.00	0.00	0	1027	0	0
6	4	1.44	0.69	1	195	1	4
6	5	1.16	0.40	1	1008	1	3
6	6	2.18	0.39	2	142	2	3
6	7	1.09	0.30	1	266	1	3
7	1	2.92	0.90	3	663	2	6
7	2	2.16	0.68	2	428	1	4
7	3	0.00	0.00	0	897	0	0
7	4	1.51	0.70	1	198	1	4
7	5	1.19	0.43	1	971	1	3
7	6	2.20	0.42	2	165	2	4
7	7	1.12	0.33	1	274	1	3
8	1	3.04	0.92	3	809	2	6
8	2	2.24	0.74	2	431	1	5
8	3	0.00	0.00	0	739	0	0
8	4	1.59	0.72	1	201	1	4
8	5	1.22	0.46	1	940	1	3
8	6	2.26	0.48	2	185	2	4
8	7	1.15	0.38	1	291	1	3

Model with Covariates

In this section, I will extend the Hidden Markov Model (HMM) to include covariates such as age, sex and smoking status. By incorporating these additional variables, we can explore how demographic and lifestyle factors influence state transitions and health trajectories over time.

The addition of covariates (such as age, sex, smoking status, and BMI) fundamentally alters the Hidden Markov Model (HMM) and its interpretation. Without covariates, the model focuses solely on identifying latent health profiles based on patterns of co-occurring health conditions, with transitions between these profiles assumed to occur at constant probabilities over time. The primary focus is on interpreting the emission probabilities (the likelihood of observing each condition within each hidden state) and the overall transition matrix. In contrast, incorporating covariates introduces a dynamic element to the model. The transition probabilities between hidden states now become dependent on these covariates, allowing for the examination of how individual characteristics influence transitions between health profiles. The interpretation shifts to quantifying the effects of each covariate on these transitions, such as determining if smoking increases the likelihood of transitioning to a lung disease-prone state or if increasing age elevates the risk of transitioning to a frail state. While the model without covariates describes patterns, the model with covariates explores how individual characteristics drive changes in health status, providing stronger evidence for potential causal relationships, albeit requiring careful consideration of confounding factors.

Feature	Without Covariates	With Covariates
Transition Prob.	Constant over time	Dependent on covariates (age, sex, smoking, BMI)
Interpretation	Focus on patterns of co-occurring conditions	Focus on how covariates influence transitions between health profiles
Causal Inference	No causal claims	Stronger evidence for potential causal relationships, but still needs careful consideration
Output Focus	Emission probabilities and transition matrix	Transition effects (coefficients/odds ratios) of covariates on state transitions

Model with Age

An initial exploration of the Hidden Markov Model (HMM) with age as a covariate reveals intriguing insights into the impact of age on health state transitions. By incorporating age into the model, we can assess how increasing age influences the likelihood of moving between different health profiles over time. The model estimates transition probabilities that are dependent on age, allowing us to quantify the effect of age on health trajectories and identify age-specific patterns of health conditions.

## Initial state probabilities model 
##   pr1   pr2   pr3   pr4   pr5   pr6 
## 0.070 0.039 0.047 0.051 0.611 0.182 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~age
## Coefficients: 
##             St1        St2          St3        St4        St5        St6
## (Intercept)   0 -2.1445136 -29.44955688 -6.3632529  1.5731502 -6.5270267
## age           0 -0.5074109   0.09755199 -0.4116626 -0.7248676 -0.4093913
## Probalities at zero values of the covariates.
## 0.1682899 0.01971096 2.730726e-14 0.0002900898 0.8114628 0.0002462671 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~age
## Coefficients: 
##             St1        St2         St3         St4        St5         St6
## (Intercept)   0 1.53470433 -4.26865146 -11.0320276  8.4107547 -8.99355042
## age           0 0.02404886  0.07145551  -0.1364407 -0.5105528  0.00690485
## Probalities at zero values of the covariates.
## 0.0002221824 0.001030916 3.110699e-06 3.593859e-09 0.9987438 2.759691e-08 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~age
## Coefficients: 
##             St1       St2        St3        St4        St5        St6
## (Intercept)   0 -1.110527 1.56767013 -3.6733322  1.1616892 -3.6797525
## age           0 -0.252991 0.02163354 -0.2121404 -0.3746935 -0.2114876
## Probalities at zero values of the covariates.
## 0.1067145 0.03515019 0.5117455 0.002709665 0.3409878 0.002692324 
## 
## Transition model for state (component) 4 
## Model of type multinomial (mlogit), formula: ~age
## Coefficients: 
##             St1        St2        St3        St4       St5        St6
## (Intercept)   0  3.8982743 0.91325879 2.17424864 84.573803  2.8282568
## age           0 -0.3479535 0.01642125 0.02438118 -1.801534 -0.3276157
## Probalities at zero values of the covariates.
## 1.862363e-37 9.184663e-36 4.641811e-37 1.638055e-36 1 3.150363e-36 
## 
## Transition model for state (component) 5 
## Model of type multinomial (mlogit), formula: ~age
## Coefficients: 
##             St1         St2         St3         St4          St5         St6
## (Intercept)   0 1.221196285 -3.62459633 -3.49106128  4.457299093  0.82140968
## age           0 0.003852688  0.05657206  0.06062786 -0.006108755 -0.02645405
## Probalities at zero values of the covariates.
## 0.01075543 0.03647426 0.0002867382 0.000327702 0.9277012 0.02445465 
## 
## Transition model for state (component) 6 
## Model of type multinomial (mlogit), formula: ~age
## Coefficients: 
##             St1         St2         St3        St4        St5        St6
## (Intercept)   0  4.97025356 -4.29520920 -6.7476899 116.495932 14.2671789
## age           0 -0.03257055  0.06907864 -0.6386596  -2.777685 -0.3668857
## Probalities at zero values of the covariates.
## 2.549528e-51 3.672937e-49 3.475955e-53 2.992094e-54 1 4.005132e-45 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
## Resp 2 : multinomial 
## Resp 3 : multinomial 
## Resp 4 : multinomial 
## Resp 5 : multinomial 
## Resp 6 : multinomial 
##     Re1.0 Re1.1 Re2.0 Re2.1 Re3.0 Re3.1 Re4.0 Re4.1 Re5.0 Re5.1 Re6.0 Re6.1
## St1 0.864 0.136 0.204 0.796 0.695 0.305 0.000 1.000 0.905 0.095 0.841 0.159
## St2 0.890 0.110 0.000 1.000 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
## St3 0.852 0.148 0.000 1.000 0.345 0.655 1.000 0.000 0.760 0.240 0.696 0.304
## St4 0.864 0.136 1.000 0.000 0.231 0.769 0.998 0.002 0.776 0.224 0.912 0.088
## St5 0.920 0.080 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000 0.934 0.066
## St6 0.984 0.016 0.101 0.899 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000

##               St1          St2          St3          St4       St5          St6
## [1,] 1.682899e-01 1.971096e-02 2.730726e-14 2.900898e-04 0.8114628 2.462671e-04
## [2,] 2.221824e-04 1.030916e-03 3.110699e-06 3.593859e-09 0.9987438 2.759691e-08
## [3,] 1.067145e-01 3.515019e-02 5.117455e-01 2.709665e-03 0.3409878 2.692324e-03
## [4,] 1.862363e-37 9.184663e-36 4.641811e-37 1.638055e-36 1.0000000 3.150363e-36
## [5,] 1.075543e-02 3.647426e-02 2.867382e-04 3.277020e-04 0.9277012 2.445465e-02
## [6,] 2.549528e-51 3.672937e-49 3.475955e-53 2.992094e-54 1.0000000 4.005132e-45

## [1] "Processing 6 states"

##   State cancre_mean hibpe_mean hearte_mean diabe_mean stroke_mean lunge_mean
## 1     1   0.8637371  0.2038678   0.6952490  0.0000000   0.9048828  0.8411700
## 2     2   0.8901575  0.0000000   1.0000000  1.0000000   1.0000000  1.0000000
## 3     3   0.8518248  0.0000000   0.3448696  1.0000000   0.7598843  0.6955570
## 4     4   0.8644776  1.0000000   0.2314547  0.9981887   0.7761438  0.9120504
## 5     5   0.9196583  1.0000000   1.0000000  1.0000000   1.0000000  0.9344434
## 6     6   0.9837320  0.1007630   1.0000000  1.0000000   1.0000000  1.0000000

Summary Statistics of Morbidity Count by State and Wave
Wave	State	Mean	SD	Median	N	Min	Max
1	1	1.87	0.79	2	253	1	5
1	2	2.00	0.00	2	23	2	2
1	3	2.08	0.51	2	157	1	4
1	4	1.11	0.40	1	157	0	3
1	5	0.06	0.24	0	2289	0	2
1	6	1.00	0.05	1	717	1	2
2	1	2.10	0.83	2	336	1	5
2	2	1.06	0.23	1	868	1	2
2	3	2.20	0.43	2	241	2	4
2	4	1.17	0.39	1	177	1	3
2	5	0.10	0.30	0	1974	0	2
4	1	2.30	0.91	2	467	1	5
4	2	1.08	0.28	1	974	1	2
4	3	2.32	0.53	2	363	2	4
4	4	1.21	0.44	1	199	1	3
4	5	0.14	0.36	0	1593	0	2
5	1	2.44	0.97	2	540	1	6
5	2	1.10	0.31	1	982	1	2
5	3	2.37	0.58	2	482	2	5
5	4	1.29	0.54	1	205	1	3
5	5	0.16	0.39	0	1387	0	2
6	1	2.55	1.00	2	593	1	6
6	2	1.12	0.32	1	970	1	2
6	3	2.40	0.59	2	566	2	5
6	4	1.28	0.55	1	215	1	3
6	5	0.19	0.43	0	1252	0	2
7	1	2.66	1.01	2	658	1	6
7	2	1.14	0.35	1	930	1	2
7	3	2.46	0.64	2	664	2	5
7	4	1.35	0.58	1	216	1	3
7	5	0.22	0.46	0	1128	0	2
8	1	2.84	1.07	3	735	1	6
8	2	1.16	0.36	1	889	1	2
8	3	2.52	0.66	2	765	2	4
8	4	1.44	0.62	1	225	1	3
8	5	0.28	0.51	0	982	0	2

## Warning: package 'networkD3' was built under R version 4.3.3

## Warning: package 'htmlwidgets' was built under R version 4.3.3

## 
## Attaching package: 'htmlwidgets'

## The following object is masked from 'package:networkD3':
## 
##     JS

## Links is a tbl_df. Converting to a plain data frame.

## [1] "Individual Summary (Age and Morbidity Count):"

## # A tibble: 2 × 5
##   wave  avg_age sd_age avg_morbidity sd_morbidity
##   <chr>   <dbl>  <dbl>         <dbl>        <dbl>
## 1 1        61.8   7.29          1           0    
## 2 2        64.1   7.29          1.02        0.146

## [1] "Morbidities Summary (Types of Morbidities):"

## # A tibble: 12 × 4
##    wave  morbidity_type count proportion
##    <chr> <chr>          <dbl>      <dbl>
##  1 1     cancre             0     0     
##  2 1     diabe              0     0     
##  3 1     hearte             0     0     
##  4 1     hibpe            644     1     
##  5 1     lunge              0     0     
##  6 1     stroke             0     0     
##  7 2     cancre            14     0.0217
##  8 2     diabe              0     0     
##  9 2     hearte             0     0     
## 10 2     hibpe            644     1     
## 11 2     lunge              0     0     
## 12 2     stroke             0     0

Initial State Probabilities

The initial state probabilities indicate the likelihood of starting in each latent state. State 5 has the highest probability (61.1%), suggesting it is the most likely starting state for the sequences in the dataset. Other states, such as State 1 (7.0%), State 2 (3.9%), and State 3 (4.7%), have much lower initial probabilities, while State 6 (18.2%) also exhibits a moderately high likelihood.

Transition Dynamics

he effect of age on state transitions in this hidden Markov model can be understood by analyzing the coefficients associated with the covariate “age” for each state’s transition model. Each state represents a unique pattern of probabilities for transitioning to other states, and the impact of age modifies these transitions in distinct ways.

For State 1, age has varying effects on the transition probabilities to other states. The coefficients indicate that as age increases, the likelihood of transitioning to States 2, 4, 5, and 6 decreases due to the negative values associated with these transitions. However, the transition to State 3 shows a slight increase in probability with age, as evidenced by its positive coefficient. This suggests that older individuals in State 1 are slightly more likely to move into State 3 than to other states.

In State 2, the transition probabilities are influenced by a complex interaction between the intercepts and the effects of age. Transitions to States 3 and 6 experience a slight increase in probability with age due to their small positive coefficients. Conversely, transitions to States 4 and 5 decrease in probability as individuals age. This pattern reflects a nuanced dynamic where age promotes certain transitions while inhibiting others, particularly those to States 4 and 5.

For State 3, the impact of age is predominantly negative across most transitions, meaning that as individuals age, they are less likely to transition to States 2, 4, 5, and 6. However, there is a slight positive effect of age on the probability of remaining in or transitioning to State 3 itself. This suggests a stabilization effect where older individuals are more likely to persist within or return to State 3.

In the case of State 4, transitions are heavily influenced by the dominance of State 5 in the intercepts. Age further reduces the probability of transitioning to States 2, 5, and 6, while there is a small positive effect for transitions to State 3. This indicates that age tends to limit movement away from State 4 to dominant states like State 5, although transitions to State 3 might slightly increase with age.

The pattern for State 5 reveals a somewhat mixed effect of age on transitions. While transitions to States 3 and 4 show increased probabilities with age, those to States 5 (itself) and 6 decrease. This highlights an interesting dynamic where age promotes movement to specific states while discouraging persistence in or movement to others, particularly State 6.

Finally, in State 6, age exerts a predominantly negative influence on transitions to States 4, 5, and 6 itself. However, there is a notable positive effect on transitions to State 3. This suggests that as individuals in State 6 grow older, they are less likely to remain within or transition to States 4, 5, or 6, and are more inclined to move toward State 3.

In summary, age plays a significant role in shaping the transition dynamics between states. It often reduces the probability of transitioning to dominant states like State 5 or remaining within the same state, while sometimes promoting transitions to less dominant states, particularly State 3. This indicates that age may act as a stabilizing factor for certain states, such as State 3, while simultaneously discouraging transitions to or persistence in other states, depending on the context.

Response Patterns

The response probabilities describe the likelihood of the observed variables (cancre, hibpe, hearte, diabe, stroke, and lunge) across the six latent states:

For the variable cancre, the absence is consistently more likely across all states, with probabilities ranging from 86.4% in State 1 to 98.4% in State 6. State 2 has a slightly lower absence probability (89.0%) compared to others.

The variable hibpe shows a strong likelihood of being present in States 1, 3, 5, and 6, with probabilities of presence exceeding 79.6%. State 2, however, exclusively indicates the presence of hibpe (100%).

For hearte, States 3 and 4 are more likely to indicate its presence, with probabilities of 65.5% and 76.9%, respectively. Other states, such as State 5, strongly indicate its absence.

The variable diabe is generally absent in States 1, 3, and 5, with probabilities of absence exceeding 76%. States like 2 and 4 show a complete absence (100%).

For stroke, State 5 consistently predicts its absence (100%), while other states show lower probabilities of absence, though absence is still dominant.

Finally, for the variable lunge, States 2, 5, and 6 strongly predict its absence, with probabilities exceeding 93%.

General Observations

State 5 is the most stable and dominant state, representing an absorbing state associated with the absence of most conditions. Age has a mild influence on transitions, except in cases like transitions from State 4 and State 6 to State 5, where the effects are more pronounced. The distinct response probabilities across states highlight heterogeneity in the latent states, suggesting meaningful differentiation between health conditions.

This analysis provides a comprehensive understanding of the transition dynamics and observed variable profiles across latent states. The model highlights the role of age and reveals significant stability in State 5, potentially representing a latent “healthy” state. Further evaluation, such as predictive validation, would enhance the model’s interpretability and applicability.

## Warning: package 'scales' was built under R version 4.3.3

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

## Warning: package 'ggridges' was built under R version 4.3.3

## Warning: package 'patchwork' was built under R version 4.3.3

## 
## Attaching package: 'patchwork'

## The following object is masked from 'package:MASS':
## 
##     area

## Picking joint bandwidth of 0.278

Model with Age, BMI and Gender

## Initial state probabilities model 
##   pr1   pr2   pr3   pr4   pr5   pr6 
## 0.011 0.098 0.046 0.040 0.605 0.200 
## 
## Transition model for state (component) 1 
## Model of type multinomial (mlogit), formula: ~age + bmi + ragender
## Coefficients: 
##             St1        St2          St3          St4        St5       St6
## (Intercept)   0 -4.3726801 -30.84706324 -8.324400375 20.2388315 91.510169
## age           0 -0.2652189   0.03379167 -0.003989091 -0.5630895 -2.336433
## bmi           0 -0.1767751   0.28170801  0.189467781 -0.4678567  1.076658
## ragender      0 -0.3166819  -0.68016519  0.262782342 -0.8176360 -9.751396
## Probalities at zero values of the covariates.
## 1.809833e-40 2.283535e-42 7.259876e-54 4.389322e-44 1.11494e-31 1 
## 
## Transition model for state (component) 2 
## Model of type multinomial (mlogit), formula: ~age + bmi + ragender
## Coefficients: 
##             St1          St2          St3         St4        St5        St6
## (Intercept)   0  5.064272102  1.158908338 -0.15412046 23.1415231  7.0161999
## age           0  0.004781799  0.005751702 -0.00732978 -0.6180368 -0.6637809
## bmi           0 -0.033533060  0.014342610  0.09435469 -0.2143606 -0.2550175
## ragender      0 -0.808028554 -0.486847877 -0.69600043  0.2306994  0.9258525
## Probalities at zero values of the covariates.
## 8.907671e-11 1.409774e-08 2.838387e-10 7.635377e-11 0.9999999 9.927984e-08 
## 
## Transition model for state (component) 3 
## Model of type multinomial (mlogit), formula: ~age + bmi + ragender
## Coefficients: 
##             St1        St2          St3         St4         St5        St6
## (Intercept)   0  4.0410625  2.983263278 -0.07828250 10.15156377  1.5436363
## age           0 -0.3138754 -0.005160342 -0.02500456 -0.54235216 -0.3171443
## bmi           0 -0.2154979  0.016571997  0.07085508 -0.32936372 -0.1051573
## ragender      0 -0.8383869  0.371536206 -0.01607845  0.07865847  0.2431312
## Probalities at zero values of the covariates.
## 3.888872e-05 0.002212253 0.0007681365 3.596052e-05 0.9967627 0.0001820608 
## 
## Transition model for state (component) 4 
## Model of type multinomial (mlogit), formula: ~age + bmi + ragender
## Coefficients: 
##             St1        St2         St3        St4          St5         St6
## (Intercept)   0  3.7202885 -29.5240257 18.2145031 10.072082293  8.48122743
## age           0 -0.2599481   0.3760997  0.1859056 -0.407148298 -0.15337891
## bmi           0 -0.1088260   0.4002262  0.1258704 -0.278299814 -0.07769555
## ragender      0 -0.6663261   0.6542362 -3.2707545 -0.005661903  0.25258464
## Probalities at zero values of the covariates.
## 1.228541e-08 5.070961e-07 1.85041e-21 0.9996494 0.00029083 5.925702e-05 
## 
## Transition model for state (component) 5 
## Model of type multinomial (mlogit), formula: ~age + bmi + ragender
## Coefficients: 
##             St1         St2         St3          St4         St5         St6
## (Intercept)   0 -1.58478956 -2.87473783 -8.985170506  8.36196535  2.90964300
## age           0  0.03998251  0.04085339  0.007446481 -0.02376290 -0.02508694
## bmi           0  0.03848956  0.04323757  0.258567063 -0.06810147  0.05156881
## ragender      0 -1.26029417 -0.54990863 -0.698950421 -0.75945846 -0.84113922
## Probalities at zero values of the covariates.
## 0.0002325196 4.766441e-05 1.31213e-05 2.912391e-08 0.9954399 0.004266793 
## 
## Transition model for state (component) 6 
## Model of type multinomial (mlogit), formula: ~age + bmi + ragender
## Coefficients: 
##             St1        St2         St3        St4        St5         St6
## (Intercept)   0 -3.8841175 -3.23673296 -3.1069930 27.5179541  2.72181760
## age           0 -0.1734169  0.03064236 -0.0174861 -0.4734588 -0.01544053
## bmi           0 -0.1719368  0.07219933  0.1637082 -0.7450730  0.06317561
## ragender      0 -0.4421371  0.58242810  0.4750164 -2.5139022  0.56491676
## Probalities at zero values of the covariates.
## 1.119707e-12 2.302786e-14 4.399559e-14 5.009039e-14 1 1.702843e-11 
## 
## 
## Response parameters 
## Resp 1 : multinomial 
## Resp 2 : multinomial 
## Resp 3 : multinomial 
## Resp 4 : multinomial 
## Resp 5 : multinomial 
## Resp 6 : multinomial 
##     Re1.0 Re1.1 Re2.0 Re2.1 Re3.0 Re3.1 Re4.0 Re4.1 Re5.0 Re5.1 Re6.0 Re6.1
## St1 0.000 1.000     0     1 0.704 0.296   1.0   0.0 0.879 0.121 0.876 0.124
## St2 0.871 0.129     1     0 0.447 0.553   0.7   0.3 0.849 0.151 0.914 0.086
## St3 1.000 0.000     0     1 0.355 0.645   1.0   0.0 0.762 0.238 0.694 0.306
## St4 0.846 0.154     0     1 0.661 0.339   0.0   1.0 0.879 0.121 0.828 0.172
## St5 0.916 0.084     1     0 1.000 0.000   1.0   0.0 1.000 0.000 0.936 0.064
## St6 1.000 0.000     0     1 1.000 0.000   1.0   0.0 1.000 0.000 1.000 0.000

##               St1          St2          St3          St4          St5
## [1,] 1.809833e-40 2.283535e-42 7.259876e-54 4.389322e-44 1.114940e-31
## [2,] 8.907671e-11 1.409774e-08 2.838387e-10 7.635377e-11 9.999999e-01
## [3,] 3.888872e-05 2.212253e-03 7.681365e-04 3.596052e-05 9.967627e-01
## [4,] 1.228541e-08 5.070961e-07 1.850410e-21 9.996494e-01 2.908300e-04
## [5,] 2.325196e-04 4.766441e-05 1.312130e-05 2.912391e-08 9.954399e-01
## [6,] 1.119707e-12 2.302786e-14 4.399559e-14 5.009039e-14 1.000000e+00
##               St6
## [1,] 1.000000e+00
## [2,] 9.927984e-08
## [3,] 1.820608e-04
## [4,] 5.925702e-05
## [5,] 4.266793e-03
## [6,] 1.702843e-11

## [1] "Processing 6 states"

##   State cancre_mean hibpe_mean hearte_mean diabe_mean stroke_mean lunge_mean
## 1     1   0.0000000          0   0.7044335   1.000000   0.8788177  0.8758621
## 2     2   0.8708257          1   0.4468979   0.700403   0.8492413  0.9142652
## 3     3   1.0000000          0   0.3554936   1.000000   0.7620345  0.6938203
## 4     4   0.8457265          0   0.6611111   0.000000   0.8786325  0.8282051
## 5     5   0.9158184          1   1.0000000   1.000000   1.0000000  0.9360606
## 6     6   1.0000000          0   1.0000000   1.000000   1.0000000  1.0000000

Summary Statistics of Morbidity Count by State and Wave
Wave	State	Mean	SD	Median	N	Min	Max
1	1	2.24	0.50	2	34	2	4
1	2	1.00	0.53	1	268	0	3
1	3	2.09	0.39	2	121	1	3
1	4	2.35	0.63	2	123	2	5
1	5	0.06	0.25	0	1892	0	2
1	6	1.00	0.00	1	635	1	1
2	1	2.29	0.56	2	59	2	4
2	2	1.21	0.46	1	251	1	3
2	3	2.14	0.34	2	198	2	3
2	4	2.44	0.69	2	192	2	5
2	5	0.10	0.31	0	1652	0	2
2	6	1.00	0.00	1	721	1	1
4	1	2.41	0.65	2	111	2	4
4	2	1.27	0.53	1	284	1	4
4	3	2.20	0.42	2	287	2	4
4	4	2.63	0.81	2	284	2	5
4	5	0.15	0.37	0	1322	0	2
4	6	1.00	0.00	1	785	1	1
5	1	2.49	0.68	2	148	2	5
5	2	1.33	0.61	1	286	1	4
5	3	2.24	0.46	2	369	2	4
5	4	2.72	0.88	2	349	2	6
5	5	0.16	0.39	0	1152	0	2
5	6	1.00	0.00	1	769	1	1
6	1	2.52	0.67	2	178	2	5
6	2	1.32	0.60	1	280	1	4
6	3	2.25	0.49	2	427	2	4
6	4	2.80	0.93	3	405	2	6
6	5	0.20	0.43	0	1044	0	2
6	6	1.00	0.00	1	739	1	1
7	1	2.60	0.71	2	222	2	5
7	2	1.41	0.64	1	278	1	4
7	3	2.29	0.54	2	486	2	4
7	4	2.87	0.96	3	461	2	6
7	5	0.23	0.46	0	943	0	2
7	6	1.00	0.00	1	683	1	1
8	1	2.69	0.71	3	263	2	4
8	2	1.49	0.68	1	282	1	4
8	3	2.32	0.56	2	541	2	4
8	4	3.05	0.99	3	526	2	6
8	5	0.29	0.52	0	820	0	2
8	6	1.00	0.00	1	641	1	1

## [1] "Individual Summary (Age and Morbidity Count):"

## # A tibble: 0 × 5
## # ℹ 5 variables: wave <chr>, avg_age <dbl>, sd_age <dbl>, avg_morbidity <dbl>,
## #   sd_morbidity <dbl>

## [1] "Morbidities Summary (Types of Morbidities):"

## # A tibble: 0 × 4
## # ℹ 4 variables: wave <chr>, morbidity_type <chr>, count <dbl>,
## #   proportion <dbl>