To begin, we load the data. In this case I have removed rows with missing values and applied outlier removal according to the 1 x 1.5 IQR rule.
source("R/0_setup.R")
IP<- readRDS("DemoSet.rds")
mid<- IP[,24:70]
mid<- mid %>% apply(2, applyIQRrule) %>%
data.frame() %>%
filter(complete.cases(.)) %>%
select(-(contains("nucleated")))
sk<- skim(mid) %>% select(variable= skim_variable, mean= numeric.mean, min= numeric.p0, median= numeric.p50,
max= numeric.p100, numeric.hist)
sk[, 2:4]<- apply(sk[, 2:4], 2, function(x) round(x, 2))
print(sk)
When we examine the pairwise correlation matrix, we can clearly see that there are some tests that are correlated, particularly in the blood counts. However we also see that there are weaker patterns of correlation between elements of biochemistry that we may not have expected.
M<- cor(mid)
p.mat <- corrplot::cor.mtest(mid)$p
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot::corrplot(M, method="color", col=rev(col(200)),
#order="hclust",
tl.col="black", tl.cex= 0.86,
p.mat = p.mat, sig.level = 0.01, insig = "blank",
diag=FALSE )
Principal components is often the first method we reach for when seeking to reduce a large number of variables into a smaller number of ‘components’ according to their shared correlations.
# Ordinary principal components
pc1<- prcomp(mid, scale= TRUE)
s<- summary(pc1)
s$importance[,1:21]
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
Standard deviation 2.417364 1.948426 1.831635 1.809081 1.685618 1.651617 1.368446 1.334526 1.318558 1.28199 1.252773
Proportion of Variance 0.129860 0.084360 0.074550 0.072730 0.063140 0.060620 0.041610 0.039580 0.038640 0.03652 0.034880
Cumulative Proportion 0.129860 0.214220 0.288780 0.361500 0.424640 0.485260 0.526880 0.566450 0.605090 0.64161 0.676490
PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 1.193991 1.159047 1.025337 1.007481 0.9910688 0.9643123 0.9162659 0.8779952 0.8527957 0.8447815
Proportion of Variance 0.031680 0.029850 0.023360 0.022560 0.0218300 0.0206600 0.0186600 0.0171300 0.0161600 0.0158600
Cumulative Proportion 0.708170 0.738020 0.761380 0.783940 0.8057700 0.8264300 0.8450900 0.8622200 0.8783800 0.8942400
plot(pc1$sdev^2, type= "b", main="PCA Eigenvalues")
abline(h=1, col="red")
PCA shows that the cumulative proportion of variance is only just nearing 90% with 21 components retained (out of a possible 45). Examination of the eigenvalues indicates that there are approximately 15 components with eigenvalues >= 1. This is not a particularly good fit, with only 29% of the variance of the data explained in the first three components.
Another option is to use rotation between the components. In other words, we no longer constrain the extracted components to be orthogonal. The algorithm allows the components to be somewhat correlated, and seeks the best fit within a reduced feature space. Here I have specified 12 components to be retained in the solution.
pc2<- psych::principal(scale(mid), nfactors=12, rotate= "varimax", scores= TRUE)
summary(pc2)
Factor analysis with Call: psych::principal(r = scale(mid), nfactors = 12, rotate = "varimax",
scores = TRUE)
Test of the hypothesis that 12 factors are sufficient.
The degrees of freedom for the model is 516 and the objective function was 37.35
The number of observations was 208129 with Chi Square = 7772151 with prob < 0
The root mean square of the residuals (RMSA) is 0.04
The numerical output is somewhat voluminous, however it is clear that PCA has been able to reduce the 45 test variables into 12 factors/components which explain all the variation in the raw variables.
pc2
Principal Components Analysis
Call: psych::principal(r = scale(mid), nfactors = 12, rotate = "varimax",
scores = TRUE)
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 RC6 RC5 RC3 RC7 RC4 RC8 RC12 RC9 RC10 RC11
SS loadings 4.32 3.48 3.02 2.99 2.98 2.68 2.66 2.14 2.05 1.93 1.88 1.74
Proportion Var 0.10 0.08 0.07 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04 0.04
Cumulative Var 0.10 0.17 0.24 0.31 0.37 0.43 0.49 0.54 0.58 0.63 0.67 0.71
Proportion Explained 0.14 0.11 0.09 0.09 0.09 0.08 0.08 0.07 0.06 0.06 0.06 0.05
Cumulative Proportion 0.14 0.24 0.34 0.43 0.53 0.61 0.69 0.76 0.83 0.89 0.95 1.00
Mean item complexity = 1.8
Test of the hypothesis that 12 components are sufficient.
The root mean square of the residuals (RMSR) is 0.04
with the empirical chi square 775354 with prob < 0
Fit based upon off diagonal values = 0.95
In order to understand how the components are composed of the different blood tests, we examine the component/factor loadings. In the plot below, weak loadings (<= +/- 0.3) have been filtered out, to allow us to see which tests group together.
loadings<- as.matrix(unclass(pc2$loadings)) %>%
data.frame() %>%
rownames_to_column(var= "Measure") %>%
gather(PC, value, -Measure) %>%
filter(abs(value)>= 0.3) %>%
mutate(direction= ifelse(value < 0, "Negative", "Positive"))
#howmany(loadings$Measure) 45
ggplot(loadings, aes(x= value, y= Measure, fill= direction)) +
geom_col() +
facet_wrap(~ PC, scales= "free", ncol= 2) +
theme_bw() +
theme(legend.position = "None") +
theme_bw() +
theme(legend.position = "None",
text = element_text(size = 13),
strip.background = element_rect(fill= "grey30"),
strip.text = element_text(color="white", face= "bold"))
This plot allows us to see that (as expected), many blood count variables have grouped with their related counterparts, forming the basis for 8 out of the 12 factors. Lipid tests have grouped together, while the other biochemistry markers are represented in the remaining 3 factors
Y<- pc2$scores %>% data.frame()
Ysmaller<- Y[sample(1:nrow(Y), 10000),]
psych::pairs.panels(Ysmaller, method= "pearson", hist.col= "#00AFBB", density= TRUE)
An examination of the pairwise plots of the component scores indicates that some of the 12 components are slightly correlated in places.
Another approach to this data set might be to perform hierarchical clustering based on the correlation between variables.
library(dendextend)
hc<- hclust(d= as.dist(1-abs(M)), method= "ward.D2")
col8<- RColorBrewer::brewer.pal(8, "Dark2")
dend<- hc %>%
as.dendrogram() %>%
color_branches(k= 12, col= c(col8, col8)) %>%
color_labels(k= 12, col= c(col8, col8)) %>%
set("labels_cex", 0.7)
par(mar= c(1,1,1,10))
plot(dend, horiz= TRUE, main= "Clustering by Correlation Distance")
Again, I have requested that 12 groups of tests be extracted. This solutions is perhaps clearer with regard to expected measures grouping together. For example, all reticulocyte measures are present together, as are the platelet measures. What we do not get from this solution, however, is a set of latent or composite variables that can be used in further analysis.
We can take the group specifications outlined via hierarchical clustering, and use this as a basis for a confirmatory factor analysis operationalised by the {lavaan} package. Again, the code and output is verbose, please scroll below for a figure.
library(lavaan)
# Extract groupings from cluster solution --------------
t0<- tibble(Measure= names(mid), Cluster= cutree(hc, k=12)) %>%
arrange(Cluster) #%>% print()
t1<- t0 %>% group_by(Cluster) %>%
summarise(spec= str_c(Measure, collapse= " + ")) %>%
mutate(model= paste0("Group_",Cluster," =~ ", spec)) %>%
to_clipboard()
to_clipboard(paste(names(mid),"~~ 1*",names(mid)))
# Standardise the data -----------------------
X<- data.frame(scale(mid))
# Specify the model -----------------------
biochem.model <- "Group_1 =~ haematocrit_percentage + red_blood_cell_erythrocyte_count + haemoglobin_concentration + total_bilirubin
Group_2 =~ mean_corpuscular_volume + mean_corpuscular_haemoglobin + mean_reticulocyte_volume + mean_sphered_cell_volume
Group_3 =~ red_blood_cell_erythrocyte_distribution_width + mean_corpuscular_haemoglobin_concentration + alkaline_phosphatase + triglycerides + c_reactive_protein + igf_1 + glycated_haemoglobin_hb_a1c
Group_4 =~ platelet_count + mean_platelet_thrombocyte_volume + platelet_crit + platelet_distribution_width
Group_5 =~ white_blood_cell_leukocyte_count + lymphocyte_percentage + neutrophill_percentage + lymphocyte_count + neutrophill_count
Group_6 =~ basophill_percentage + basophill_count
Group_7 =~ eosinophill_percentage + eosinophill_count
Group_8 =~ monocyte_percentage + monocyte_count
Group_9 =~ reticulocyte_count + reticulocyte_percentage + high_light_scatter_reticulocyte_percentage + high_light_scatter_reticulocyte_count + immature_reticulocyte_fraction
Group_10 =~ cholesterol + ldl_direct + apolipoprotein_b
Group_11 =~ cystatin_c + creatinine + urea + urate
Group_12 =~ alanine_aminotransferase + gamma_glutamyltransferase + aspartate_aminotransferase
haematocrit_percentage ~~ 1* haematocrit_percentage
red_blood_cell_erythrocyte_count ~~ 1* red_blood_cell_erythrocyte_count
haemoglobin_concentration ~~ 1* haemoglobin_concentration
mean_corpuscular_volume ~~ 1* mean_corpuscular_volume
red_blood_cell_erythrocyte_distribution_width ~~ 1* red_blood_cell_erythrocyte_distribution_width
mean_corpuscular_haemoglobin ~~ 1* mean_corpuscular_haemoglobin
platelet_count ~~ 1* platelet_count
white_blood_cell_leukocyte_count ~~ 1* white_blood_cell_leukocyte_count
mean_corpuscular_haemoglobin_concentration ~~ 1* mean_corpuscular_haemoglobin_concentration
mean_platelet_thrombocyte_volume ~~ 1* mean_platelet_thrombocyte_volume
platelet_crit ~~ 1* platelet_crit
platelet_distribution_width ~~ 1* platelet_distribution_width
basophill_percentage ~~ 1* basophill_percentage
eosinophill_percentage ~~ 1* eosinophill_percentage
lymphocyte_percentage ~~ 1* lymphocyte_percentage
monocyte_percentage ~~ 1* monocyte_percentage
neutrophill_percentage ~~ 1* neutrophill_percentage
basophill_count ~~ 1* basophill_count
eosinophill_count ~~ 1* eosinophill_count
lymphocyte_count ~~ 1* lymphocyte_count
monocyte_count ~~ 1* monocyte_count
neutrophill_count ~~ 1* neutrophill_count
reticulocyte_count ~~ 1* reticulocyte_count
reticulocyte_percentage ~~ 1* reticulocyte_percentage
mean_reticulocyte_volume ~~ 1* mean_reticulocyte_volume
high_light_scatter_reticulocyte_percentage ~~ 1* high_light_scatter_reticulocyte_percentage
mean_sphered_cell_volume ~~ 1* mean_sphered_cell_volume
high_light_scatter_reticulocyte_count ~~ 1* high_light_scatter_reticulocyte_count
immature_reticulocyte_fraction ~~ 1* immature_reticulocyte_fraction
alkaline_phosphatase ~~ 1* alkaline_phosphatase
cholesterol ~~ 1* cholesterol
cystatin_c ~~ 1* cystatin_c
alanine_aminotransferase ~~ 1* alanine_aminotransferase
creatinine ~~ 1* creatinine
gamma_glutamyltransferase ~~ 1* gamma_glutamyltransferase
urea ~~ 1* urea
triglycerides ~~ 1* triglycerides
urate ~~ 1* urate
ldl_direct ~~ 1* ldl_direct
c_reactive_protein ~~ 1* c_reactive_protein
aspartate_aminotransferase ~~ 1* aspartate_aminotransferase
total_bilirubin ~~ 1* total_bilirubin
apolipoprotein_b ~~ 1* apolipoprotein_b
igf_1 ~~ 1* igf_1
glycated_haemoglobin_hb_a1c ~~ 1* glycated_haemoglobin_hb_a1c"
# Analyze the model with cfa()
biochem.fit <- cfa(model= biochem.model, data= X)
# Summarize the model
summary(biochem.fit, standardized= TRUE, fit.measures = TRUE, rsquare = TRUE)
lavaan 0.6-7 ended normally after 141 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 111
Number of observations 208129
Model Test User Model:
Test statistic 14492021.317
Degrees of freedom 924
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 16113695.678
Degrees of freedom 990
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.101
Tucker-Lewis Index (TLI) 0.036
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -12478641.929
Loglikelihood unrestricted model (H1) -5232631.270
Akaike (AIC) 24957505.857
Bayesian (BIC) 24958643.154
Sample-size adjusted Bayesian (BIC) 24958290.391
Root Mean Square Error of Approximation:
RMSEA 0.275
90 Percent confidence interval - lower 0.274
90 Percent confidence interval - upper 0.274
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.127
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Group_1 =~
hamtcrt_prcntg 1.000 0.779 0.614
rd_bld_cll_ry_ 0.983 0.005 213.553 0.000 0.765 0.608
hmglbn_cncntrt 1.006 0.005 216.106 0.000 0.784 0.617
total_bilirubn 0.458 0.004 126.968 0.000 0.357 0.336
Group_2 =~
mn_crpsclr_vlm 1.000 0.713 0.580
mn_crpsclr_hmg 0.939 0.005 179.283 0.000 0.669 0.556
mn_rtclcyt_vlm 0.856 0.005 170.313 0.000 0.610 0.521
mn_sphrd_cll_v 0.948 0.005 180.173 0.000 0.675 0.560
Group_3 =~
rd_bld_cll_r__ 1.000 0.056 0.056
mn_crpsclr_hm_ 0.864 0.056 15.446 0.000 0.048 0.048
alkaln_phsphts 4.288 0.186 23.004 0.000 0.239 0.233
triglycerides 8.944 0.381 23.475 0.000 0.499 0.447
c_reactiv_prtn 6.052 0.260 23.305 0.000 0.338 0.320
igf_1 -1.133 0.064 -17.709 0.000 -0.063 -0.063
glyctd_hmgl__1 3.958 0.173 22.902 0.000 0.221 0.216
Group_4 =~
platelet_count 1.000 0.715 0.582
mn_pltlt_thrm_ -0.443 0.004 -100.271 0.000 -0.317 -0.302
platelet_crit 0.871 0.005 162.460 0.000 0.623 0.529
pltlt_dstrbtn_ -0.533 0.005 -116.404 0.000 -0.381 -0.356
Group_5 =~
wht_bld_cll_l_ 1.000 0.519 0.461
lymphcyt_prcnt -1.365 0.009 -156.504 0.000 -0.709 -0.578
ntrphll_prcntg 1.441 0.009 159.395 0.000 0.748 0.599
lymphocyte_cnt -0.360 0.005 -65.748 0.000 -0.187 -0.184
neutrophll_cnt 1.397 0.009 157.778 0.000 0.725 0.587
Group_6 =~
basphll_prcntg 1.000 0.588 0.507
basophill_cont 1.018 0.008 124.674 0.000 0.598 0.513
Group_7 =~
esnphll_prcntg 1.000 0.669 0.556
eosinophll_cnt 0.993 0.007 151.852 0.000 0.664 0.553
Group_8 =~
monocyt_prcntg 1.000 0.595 0.511
monocyte_count 0.995 0.007 134.778 0.000 0.591 0.509
Group_9 =~
reticulcyt_cnt 1.000 0.818 0.633
rtclcyt_prcntg 0.996 0.004 230.732 0.000 0.815 0.632
hgh_lght_sct__ 1.050 0.004 236.735 0.000 0.859 0.651
hgh_lght_sct__ 1.060 0.004 237.832 0.000 0.867 0.655
immtr_rtclcyt_ 0.715 0.004 190.146 0.000 0.585 0.505
Group_10 =~
cholesterol 1.000 0.773 0.612
ldl_direct 1.035 0.005 207.843 0.000 0.800 0.625
apolipoprotn_b 1.018 0.005 206.159 0.000 0.787 0.618
Group_11 =~
cystatin_c 1.000 0.528 0.467
creatinine 1.126 0.007 152.887 0.000 0.595 0.511
urea 0.589 0.006 103.820 0.000 0.311 0.297
urate 1.215 0.008 157.880 0.000 0.642 0.540
Group_12 =~
alnn_mntrnsfrs 1.000 0.633 0.535
gmm_gltmyltrns 0.892 0.006 156.663 0.000 0.564 0.492
asprtt_mntrnsf 0.789 0.005 145.764 0.000 0.499 0.447
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Group_1 ~~
Group_2 -0.076 0.002 -40.027 0.000 -0.138 -0.138
Group_3 0.012 0.001 21.954 0.000 0.283 0.283
Group_4 -0.182 0.002 -81.971 0.000 -0.327 -0.327
Group_5 0.082 0.001 57.193 0.000 0.203 0.203
Group_6 -0.015 0.002 -7.802 0.000 -0.033 -0.033
Group_7 0.060 0.002 29.022 0.000 0.115 0.115
Group_8 0.150 0.002 71.684 0.000 0.323 0.323
Group_9 0.126 0.002 61.563 0.000 0.198 0.198
Group_10 0.047 0.002 23.387 0.000 0.079 0.079
Group_11 0.314 0.002 136.499 0.000 0.765 0.765
Group_12 0.313 0.002 129.725 0.000 0.636 0.636
Group_2 ~~
Group_3 -0.012 0.001 -22.009 0.000 -0.294 -0.294
Group_4 -0.033 0.002 -17.352 0.000 -0.065 -0.065
Group_5 -0.009 0.001 -7.524 0.000 -0.025 -0.025
Group_6 -0.007 0.002 -4.198 0.000 -0.018 -0.018
Group_7 -0.034 0.002 -17.866 0.000 -0.072 -0.072
Group_8 0.020 0.002 10.907 0.000 0.047 0.047
Group_9 -0.015 0.002 -8.205 0.000 -0.026 -0.026
Group_10 -0.029 0.002 -15.528 0.000 -0.053 -0.053
Group_11 -0.019 0.001 -13.092 0.000 -0.049 -0.049
Group_12 0.017 0.002 9.380 0.000 0.037 0.037
Group_3 ~~
Group_4 0.007 0.000 19.316 0.000 0.177 0.177
Group_5 0.008 0.000 21.608 0.000 0.261 0.261
Group_6 0.003 0.000 12.024 0.000 0.084 0.084
Group_7 0.009 0.000 20.656 0.000 0.240 0.240
Group_8 0.009 0.000 21.079 0.000 0.282 0.282
Group_9 0.024 0.001 23.163 0.000 0.524 0.524
Group_10 0.025 0.001 23.161 0.000 0.569 0.569
Group_11 0.022 0.001 23.216 0.000 0.733 0.733
Group_12 0.028 0.001 23.286 0.000 0.791 0.791
Group_4 ~~
Group_5 0.055 0.001 38.606 0.000 0.147 0.147
Group_6 0.034 0.002 17.173 0.000 0.081 0.081
Group_7 0.019 0.002 9.042 0.000 0.040 0.040
Group_8 0.002 0.002 1.122 0.262 0.005 0.005
Group_9 0.003 0.002 1.298 0.194 0.004 0.004
Group_10 0.093 0.002 44.242 0.000 0.168 0.168
Group_11 -0.125 0.002 -73.168 0.000 -0.330 -0.330
Group_12 -0.079 0.002 -40.038 0.000 -0.175 -0.175
Group_5 ~~
Group_6 0.015 0.001 11.301 0.000 0.048 0.048
Group_7 -0.060 0.001 -41.907 0.000 -0.171 -0.171
Group_8 -0.039 0.001 -29.301 0.000 -0.126 -0.126
Group_9 0.049 0.001 36.001 0.000 0.115 0.115
Group_10 -0.038 0.001 -27.576 0.000 -0.094 -0.094
Group_11 0.037 0.001 35.598 0.000 0.136 0.136
Group_12 0.002 0.001 1.615 0.106 0.006 0.006
Group_6 ~~
Group_7 0.051 0.002 25.677 0.000 0.129 0.129
Group_8 0.073 0.002 38.558 0.000 0.210 0.210
Group_9 -0.006 0.002 -3.119 0.002 -0.012 -0.012
Group_10 0.003 0.002 1.571 0.116 0.007 0.007
Group_11 -0.007 0.001 -4.743 0.000 -0.022 -0.022
Group_12 -0.010 0.002 -5.541 0.000 -0.027 -0.027
Group_7 ~~
Group_8 0.122 0.002 58.702 0.000 0.308 0.308
Group_9 0.034 0.002 16.845 0.000 0.062 0.062
Group_10 -0.000 0.002 -0.175 0.861 -0.001 -0.001
Group_11 0.082 0.002 50.435 0.000 0.231 0.231
Group_12 0.056 0.002 28.772 0.000 0.133 0.133
Group_8 ~~
Group_9 0.036 0.002 18.875 0.000 0.074 0.074
Group_10 -0.044 0.002 -22.618 0.000 -0.096 -0.096
Group_11 0.132 0.002 78.863 0.000 0.419 0.419
Group_12 0.133 0.002 67.763 0.000 0.354 0.354
Group_9 ~~
Group_10 0.035 0.002 17.475 0.000 0.055 0.055
Group_11 0.120 0.002 73.799 0.000 0.278 0.278
Group_12 0.176 0.002 86.281 0.000 0.339 0.339
Group_10 ~~
Group_11 0.017 0.002 11.057 0.000 0.041 0.041
Group_12 0.064 0.002 33.332 0.000 0.131 0.131
Group_11 ~~
Group_12 0.231 0.002 117.903 0.000 0.690 0.690
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.hamtcrt_prcntg 1.000 1.000 0.623
.rd_bld_cll_ry_ 1.000 1.000 0.631
.hmglbn_cncntrt 1.000 1.000 0.620
.mn_crpsclr_vlm 1.000 1.000 0.663
.rd_bld_cll_r__ 1.000 1.000 0.997
.mn_crpsclr_hmg 1.000 1.000 0.691
.platelet_count 1.000 1.000 0.662
.wht_bld_cll_l_ 1.000 1.000 0.788
.mn_crpsclr_hm_ 1.000 1.000 0.998
.mn_pltlt_thrm_ 1.000 1.000 0.909
.platelet_crit 1.000 1.000 0.721
.pltlt_dstrbtn_ 1.000 1.000 0.873
.basphll_prcntg 1.000 1.000 0.743
.esnphll_prcntg 1.000 1.000 0.691
.lymphcyt_prcnt 1.000 1.000 0.666
.monocyt_prcntg 1.000 1.000 0.739
.ntrphll_prcntg 1.000 1.000 0.641
.basophill_cont 1.000 1.000 0.736
.eosinophll_cnt 1.000 1.000 0.694
.lymphocyte_cnt 1.000 1.000 0.966
.monocyte_count 1.000 1.000 0.741
.neutrophll_cnt 1.000 1.000 0.655
.reticulcyt_cnt 1.000 1.000 0.599
.rtclcyt_prcntg 1.000 1.000 0.601
.mn_rtclcyt_vlm 1.000 1.000 0.729
.hgh_lght_sct__ 1.000 1.000 0.576
.mn_sphrd_cll_v 1.000 1.000 0.687
.hgh_lght_sct__ 1.000 1.000 0.571
.immtr_rtclcyt_ 1.000 1.000 0.745
.alkaln_phsphts 1.000 1.000 0.946
.cholesterol 1.000 1.000 0.626
.cystatin_c 1.000 1.000 0.782
.alnn_mntrnsfrs 1.000 1.000 0.714
.creatinine 1.000 1.000 0.739
.gmm_gltmyltrns 1.000 1.000 0.758
.urea 1.000 1.000 0.912
.triglycerides 1.000 1.000 0.800
.urate 1.000 1.000 0.708
.ldl_direct 1.000 1.000 0.610
.c_reactiv_prtn 1.000 1.000 0.898
.asprtt_mntrnsf 1.000 1.000 0.801
.total_bilirubn 1.000 1.000 0.887
.apolipoprotn_b 1.000 1.000 0.617
.igf_1 1.000 1.000 0.996
.glyctd_hmgl__1 1.000 1.000 0.953
Group_1 0.606 0.004 139.073 0.000 1.000 1.000
Group_2 0.508 0.004 123.936 0.000 1.000 1.000
Group_3 0.003 0.000 11.774 0.000 1.000 1.000
Group_4 0.512 0.004 119.626 0.000 1.000 1.000
Group_5 0.270 0.003 93.839 0.000 1.000 1.000
Group_6 0.345 0.004 90.140 0.000 1.000 1.000
Group_7 0.447 0.004 107.713 0.000 1.000 1.000
Group_8 0.354 0.004 94.996 0.000 1.000 1.000
Group_9 0.669 0.005 147.207 0.000 1.000 1.000
Group_10 0.598 0.004 134.591 0.000 1.000 1.000
Group_11 0.279 0.003 97.070 0.000 1.000 1.000
Group_12 0.400 0.004 110.395 0.000 1.000 1.000
R-Square:
Estimate
hamtcrt_prcntg 0.377
rd_bld_cll_ry_ 0.369
hmglbn_cncntrt 0.380
mn_crpsclr_vlm 0.337
rd_bld_cll_r__ 0.003
mn_crpsclr_hmg 0.309
platelet_count 0.338
wht_bld_cll_l_ 0.212
mn_crpsclr_hm_ 0.002
mn_pltlt_thrm_ 0.091
platelet_crit 0.279
pltlt_dstrbtn_ 0.127
basphll_prcntg 0.257
esnphll_prcntg 0.309
lymphcyt_prcnt 0.334
monocyt_prcntg 0.261
ntrphll_prcntg 0.359
basophill_cont 0.264
eosinophll_cnt 0.306
lymphocyte_cnt 0.034
monocyte_count 0.259
neutrophll_cnt 0.345
reticulcyt_cnt 0.401
rtclcyt_prcntg 0.399
mn_rtclcyt_vlm 0.271
hgh_lght_sct__ 0.424
mn_sphrd_cll_v 0.313
hgh_lght_sct__ 0.429
immtr_rtclcyt_ 0.255
alkaln_phsphts 0.054
cholesterol 0.374
cystatin_c 0.218
alnn_mntrnsfrs 0.286
creatinine 0.261
gmm_gltmyltrns 0.242
urea 0.088
triglycerides 0.200
urate 0.292
ldl_direct 0.390
c_reactiv_prtn 0.102
asprtt_mntrnsf 0.199
total_bilirubn 0.113
apolipoprotn_b 0.383
igf_1 0.004
glyctd_hmgl__1 0.047
semPlot::semPaths(object = biochem.fit,
layout = "tree",
rotation = 2,
residuals= FALSE,
whatLabels = "std",
sizeMan = 10,
sizeMan2= 2,
label.cex= 0.4,
sizeLat = 4,
edge.label.cex = 0.5,
manifests= names(mid),
what= "std",
edge.color= "dodgerblue3")
The fit obtained from the CFA is not great (RMSE > 0.1) but is not terrible either (TFI < 0.1, CFI ~ 0.1). Grouping the tests in this way clearly results in greater correlation between the latent variables compared to the rotated PCA. There are several groups where the measures have roughly equivalent contribution to the latent variable (e.g Groups 9, 10 and 12). On the other hand, in Group 3, the standardised coefficients are uneven, with several test variables hardly contributing at all.
There are a number of ways to reduce dimensionality in a set of data with many variables. This notebook has covered how a few of these might be applied. …
…
…