Meta analysis of lipid nanoparticle biodistribution

Majority liver vs non-liver

Author

Lu Mao

Published

December 23, 2024

An alternative analysis of liver only vs non-liver: report.

Descrptive analysis

A total of 563 records were included in the data. Consider two classes:

Class 1 (\(N=\) 282): majority Liver, meaning liver = 1
Class 2 (\(N=\) 281): majority non-liver, meaning a non-liver organ=1

Table 1: Summary statistics by class.

Characteristic	Maj. Liver, N = 282¹	Maj. Non-Liver, N = 281¹	p-value²
Ionizable %	35 (29, 50)	42 (35, 50)	<0.001
Helper %	16 (10, 24)	16 (10, 24)	0.6
Sterol %	39 (33, 47)	39 (23, 47)	<0.001
PEGylated %	2.50 (1.50, 4.55)	2.08 (1.50, 3.00)	0.021
¹ Median (IQR)
² Wilcoxon rank sum test

Figure 1: Boxplots of ionizable, helper, sterol, pe_gylated by class.

Model building

Outcome: Maj. liver (\(n=\) 282) vs Maj. non-liver (\(n=\) 281);
Predictors (\(p\) = 2885):
- Helper %, Ionizable %, PE-gylated %;
- IL: 963 descriptors;
- Helper: 963 descriptors;
- PE-gylated: 956 descriptors.

Data splitting & preprocessing

Data are randomly split 3:1 into:

Training set \(n=\) 421 (211 Maj. liver \(+\) 210 Maj. non-liver);
Test set \(n=\) 142 (71 Maj. liver \(+\) 71 Maj. non-liver).

After removing zero-variance predictors, the training set contains \(p\) = 1406 predictors.

Helper %, Ionizable %, PE-gylated %;
IL: 599 descriptors;
Helper: 434 descriptors;
PE-gylated: 370 descriptors.

Random forests

10-fold cross-validation on training set to tune:
- mtry (number of variables randomly sampled at each split);
- min_n (minimum number of observations in terminal nodes);
Tuning results:
- Performance insensitive to mtry; use default (square root of the number of predictors);
- min_n = 9 yields the best AUC (0.837; Figure 2).

Performance of final model on test set (\(n\) = 142) and variance importance (Figure 3):
- AUC = 0.847;
- Accuracy = 0.739.

Figure 3: ROC of final random forests on test set and variance importance plot.

Gradient boosted trees (XGBoost)

10-fold cross-validation on training set to tune:
- min_n (minimum number of observations in terminal nodes);
- mtry (number of variables randomly sampled at each split).
Tuning results:
- min_n = 4 and mtry = 419 yields the best AUC (0.866).
Performance of final model on test set (\(n\) = 142) and variance importance (Figure 4):
- AUC = 0.819;
- Accuracy = 0.718.

Figure 4: ROC of final XGBoost on test set and variance importance plot.

Single tree

10-fold cross-validation on training set to tune:
- min_n (minimum number of observations in terminal nodes);
- tree_depth (maximum depth of the tree).
Tuning results:
- min_n = 19 and tree_depth = 10 yields the best AUC (0.815).
Performance of final model on test set (\(n\) = 142) and variance importance (Figure 5):
- AUC = 0.766;
- Accuracy = 0.69.

Figure 5: ROC of final decision tree on test set and variance importance plot.

The final tree structure is below:

Figure 6: Final decision tree structure.

Discussion

The final models are compared in terms of cross-validated (CV) and test-set AUC:

Table 2: Comparison of models by AUC.

Model	CV AUC	Test AUC
Random forests	0.837	0.847
XGBoost	0.866	0.819
Decision tree	0.815	0.766

The test ROC curves are plotted below:

Figure 7: Comparison of ROC curves of final models on test set.

Performance: Random forests \(\approx\) XGBoost \(>\) Decision tree;
Variance importance:
- Common variables on the top \(k\) \((k=10, 20, 50)\) lists of both random forests and XGBoost are shown below (Table 3);
- Besides the % of ionizable, helper, and PE-gylated, most important descriptors are IL-related.

Table 3: Top k most important variables shared by random forests and XGBoost.

k	Common variables
10	Ionizable %, Helper %, PEGylated %, IL WTPT-2, IL ASP-3, IL ASP-0, IL AVP-5, IL VC-5
20	Ionizable %, Helper %, PEGylated %, IL WTPT-2, IL ASP-3, IL ASP-0, IL AVP-5, IL VC-5, IL MAXDN, IL MLogP
50	Ionizable %, Helper %, PEGylated %, IL WTPT-2, IL ASP-3, IL ASP-0, IL AVP-5, IL VC-5, IL MAXDN, IL ETA_AlphaP, IL BCUTp-1l, IL AVP-0, IL MLogP, IL ALogP, IL AVP-2, IL ALogp2, IL LipoaffinityIndex, IL C2SP3, IL ETA_EtaP_F_L, IL BCUTw-1h, IL nH, IL ASP-1, IL ETA_EtaP_L, IL Mp, IL WTPT-5, IL nAtomLAC, IL MDEN-23, IL hmin, IL ETA_dBeta

The top 50 lists of most important variables for all three models are provided below.

Table 4: Top 50 most important variables for random forests, XGBoost, and decision tree.

Rank	Random Forests	XGBoost	Decision Tree
1	Ionizable %	Helper %	IL ASP-3
2	Helper %	Ionizable %	IL VC-5
3	PEGylated %	IL VC-5	IL MDEC-33
4	IL AVP-3	PEGylated %	IL nHBint8
5	IL WTPT-2	IL AVP-5	IL nHBint5
6	IL ASP-3	IL nH	IL SPC-6
7	IL AVP-4	IL ASP-3	PEGylated %
8	IL ASP-0	IL WTPT-2	IL WTPT-2
9	IL AVP-5	IL ASP-0	IL ASP-4
10	IL VC-5	IL ETA_EtaP	IL ASP-0
11	IL MAXDN	IL MLogP	IL ASP-5
12	IL AVP-6	IL JGI6	IL nAtom
13	IL ETA_AlphaP	IL ASP-6	IL nH
14	IL ETA_dAlpha_B	IL MAXDN	IL ASP-6
15	IL MDEC-33	IL nHBint5	IL ASP-7
16	IL BCUTp-1l	IL Spe	IL apol
17	IL AVP-0	IL ETA_EtaP_L	IL BCUTp-1l
18	IL MLogP	IL WTPT-5	IL ASP-2
19	IL ASP-4	IL WTPT-4	IL LipoaffinityIndex
20	IL MAXDN2	IL ETA_BetaP_ns	Ionizable %
21	IL ALogP	IL hmax	IL CrippenLogP
22	IL AVP-2	IL ALogp2	Helper %
23	IL ALogp2	IL JGI9	IL Kier3
24	IL BCUTc-1h	IL ASP-1	IL VC-3
25	IL SPC-6	IL AVP-0	IL topoDiameter
26	IL BCUTc-1l	IL ALogP	IL hmax
27	IL ASP-2	IL BCUTp-1l	IL ASP-1
28	IL LipoaffinityIndex	IL ETA_dBeta	IL SC-3
29	IL gmax	IL meanI	IL nBonds2
30	IL nHBint8	IL ETA_Epsilon_5	IL nBondsS
31	IL JGI3	IL BCUTw-1h	IL nBondsS2
32	IL nssCH2	IL Mp	IL Sse
33	IL AMW	IL topoShape	IL Kier2
34	IL C2SP3	IL nAtomLAC	IL MLogP
35	IL ETA_EtaP_F_L	IL MDEN-23	IL ECCEN
36	IL BCUTw-1h	IL C2SP3	IL SP-2
37	IL gmin	IL MDEC-11	IL SP-4
38	IL nH	IL hmin	IL SP-7
39	IL ASP-1	IL AVP-2	IL gmax
40	IL ETA_EtaP_L	IL XLogP	IL ETA_Shape_P
41	IL Mp	IL nBondsS	IL hmin
42	IL WTPT-5	IL CrippenLogP	IL nC
43	IL ETA_Shape_P	IL ETA_AlphaP	IL CrippenMR
44	IL nAtomLAC	IL LipoaffinityIndex	IL RotBtFrac
45	IL MDEN-23	IL MAXDP	IL SCH-6
46	IL hmin	IL nBonds2	IL JGI8
47	IL AVP-7	IL VP-6	IL MAXDN
48	IL ETA_dBeta	IL topoDiameter	IL MAXDN2
49	IL ETA_dEpsilon_D	IL ETA_EtaP_F_L	IL MAXDP
50	IL ETA_EtaP_F	IL JGI4	PEGylated- AMR

Table 5: Top 50 most important variables for random forests, XGBoost, and decision tree, with importance scores.

Rank	RF	RF_score	XGB	XGB_score	Tree	Tree_score
1	Ionizable %	3.56	Helper %	8.89	IL ASP-3	28.40
2	Helper %	3.24	Ionizable %	8.22	IL VC-5	28.21
3	PEGylated %	2.28	IL VC-5	7.40	IL MDEC-33	22.79
4	IL AVP-3	1.63	PEGylated %	6.70	IL nHBint8	19.16
5	IL WTPT-2	1.61	IL AVP-5	3.28	IL nHBint5	18.10
6	IL ASP-3	1.39	IL nH	3.18	IL SPC-6	18.10
7	IL AVP-4	1.30	IL ASP-3	3.03	PEGylated %	17.23
8	IL ASP-0	1.14	IL WTPT-2	2.76	IL WTPT-2	12.45
9	IL AVP-5	1.11	IL ASP-0	2.34	IL ASP-4	9.98
10	IL VC-5	1.06	IL ETA_EtaP	2.30	IL ASP-0	9.96
11	IL MAXDN	1.01	IL MLogP	1.79	IL ASP-5	9.64
12	IL AVP-6	1.01	IL JGI6	1.51	IL nAtom	9.47
13	IL ETA_AlphaP	0.97	IL ASP-6	1.49	IL nH	9.47
14	IL ETA_dAlpha_B	0.97	IL MAXDN	1.35	IL ASP-6	9.29
15	IL MDEC-33	0.96	IL nHBint5	1.29	IL ASP-7	9.29
16	IL BCUTp-1l	0.95	IL Spe	1.27	IL apol	8.55
17	IL AVP-0	0.93	IL ETA_EtaP_L	1.23	IL BCUTp-1l	8.22
18	IL MLogP	0.91	IL WTPT-5	1.14	IL ASP-2	8.01
19	IL ASP-4	0.90	IL WTPT-4	1.09	IL LipoaffinityIndex	7.98
20	IL MAXDN2	0.90	IL ETA_BetaP_ns	1.04	Ionizable %	7.70
21	IL ALogP	0.90	IL hmax	1.03	IL CrippenLogP	7.68
22	IL AVP-2	0.90	IL ALogp2	0.99	Helper %	7.48
23	IL ALogp2	0.88	IL JGI9	0.94	IL Kier3	7.46
24	IL BCUTc-1h	0.85	IL ASP-1	0.94	IL VC-3	7.46
25	IL SPC-6	0.83	IL AVP-0	0.91	IL topoDiameter	7.05
26	IL BCUTc-1l	0.82	IL ALogP	0.89	IL hmax	6.95
27	IL ASP-2	0.82	IL BCUTp-1l	0.88	IL ASP-1	6.69
28	IL LipoaffinityIndex	0.79	IL ETA_dBeta	0.78	IL SC-3	6.69
29	IL gmax	0.78	IL meanI	0.77	IL nBonds2	6.60
30	IL nHBint8	0.77	IL ETA_Epsilon_5	0.76	IL nBondsS	6.60
31	IL JGI3	0.76	IL BCUTw-1h	0.72	IL nBondsS2	6.60
32	IL nssCH2	0.75	IL Mp	0.72	IL Sse	6.60
33	IL AMW	0.74	IL topoShape	0.70	IL Kier2	6.24
34	IL C2SP3	0.73	IL nAtomLAC	0.68	IL MLogP	5.92
35	IL ETA_EtaP_F_L	0.73	IL MDEN-23	0.66	IL ECCEN	5.87
36	IL BCUTw-1h	0.72	IL C2SP3	0.65	IL SP-2	5.87
37	IL gmin	0.72	IL MDEC-11	0.64	IL SP-4	5.87
38	IL nH	0.71	IL hmin	0.63	IL SP-7	5.87
39	IL ASP-1	0.71	IL AVP-2	0.61	IL gmax	5.87
40	IL ETA_EtaP_L	0.69	IL XLogP	0.60	IL ETA_Shape_P	5.81
41	IL Mp	0.69	IL nBondsS	0.60	IL hmin	5.81
42	IL WTPT-5	0.68	IL CrippenLogP	0.59	IL nC	5.76
43	IL ETA_Shape_P	0.67	IL ETA_AlphaP	0.58	IL CrippenMR	5.28
44	IL nAtomLAC	0.67	IL LipoaffinityIndex	0.57	IL RotBtFrac	4.98
45	IL MDEN-23	0.66	IL MAXDP	0.56	IL SCH-6	4.70
46	IL hmin	0.64	IL nBonds2	0.55	IL JGI8	4.70
47	IL AVP-7	0.63	IL VP-6	0.51	IL MAXDN	4.70
48	IL ETA_dBeta	0.63	IL topoDiameter	0.50	IL MAXDN2	4.11
49	IL ETA_dEpsilon_D	0.63	IL ETA_EtaP_F_L	0.44	IL MAXDP	4.11
50	IL ETA_EtaP_F	0.63	IL JGI4	0.41	PEGylated- AMR	3.00