Meta analysis of lipid nanoparticle biodistribution

Liver only vs non-liver

Author

Lu Mao

Published

December 17, 2024

An alternative analysis of majority liver vs non-liver: report.

Descrptive analysis

A total of 675 records were included in the data. Consider two classes:

Class 1 (\(N=\) 99): liver only meaning that liver=1 and all others=0
Class 2 (\(N=\) 466): non-liver included meaning that any other > 0 regardless of liver = 0 or 1

Table 1: Summary statistics by class.

Characteristic	Liver, N = 99¹	Non-Liver, N = 466¹	p-value²
Ionizable %	35 (25, 50)	39 (35, 50)	<0.001
Helper %	16 (10, 24)	16 (10, 24)	0.6
Unknown	0	11
Sterol %	39 (38, 48)	39 (30, 47)	<0.001
Unknown	11	33
PEGylated %	2.50 (1.50, 4.76)	2.50 (1.50, 3.42)	0.3
Unknown	0	3
¹ Median (IQR)
² Wilcoxon rank sum test

Figure 1: Boxplots of ionizable, helper, sterol, pe_gylated by class.

Model building

Outcome: Liver only (\(n=\) 99) vs non-liver (\(n=\) 466);
Predictors (\(p\) = 2885):
- Helper %, Ionizable %, PE-gylated %;
- IL: 963 descriptors;
- Helper: 963 descriptors;
- PE-gylated: 956 descriptors.

Data splitting & preprocessing

Data are randomly split 3:1 into:

Training set \(n=\) 423 (74 liver only \(+\) 349 non-liver);
Test set \(n=\) 142 (25 liver only \(+\) 117 non-liver).

After removing zero-variance predictors, the training set contains \(p\) = 1420 predictors.

Helper %, Ionizable %, PE-gylated %;
IL: 612 descriptors;
Helper: 435 descriptors;
PE-gylated: 370 descriptors.

Random forests

10-fold cross-validation on training set to tune:
- mtry (number of variables randomly sampled at each split);
- min_n (minimum number of observations in terminal nodes);
Tuning results:
- Performance insensitive to mtry; use default (square root of the number of predictors);
- min_n = 9 yields the best AUC (0.811; Figure 2).

Performance of final model on test set (\(n\) = 142) and variance importance (Figure 3):
- AUC = 0.85;
- Accuracy = 0.831.

Figure 3: ROC of final random forests on test set and variance importance plot.

Gradient boosted trees (XGBoost)

10-fold cross-validation on training set to tune:
- min_n (minimum number of observations in terminal nodes);
- mtry (number of variables randomly sampled at each split).
Tuning results:
- min_n = 7 and mtry = 681 yields the best AUC (0.824).
Performance of final model on test set (\(n\) = 142) and variance importance (Figure 4):
- AUC = 0.821;
- Accuracy = 0.838.

Figure 4: ROC of final XGBoost on test set and variance importance plot.

Single tree

10-fold cross-validation on training set to tune:
- min_n (minimum number of observations in terminal nodes);
- tree_depth (maximum depth of the tree).
Tuning results:
- min_n = 10 and tree_depth = 11 yields the best AUC (0.75).
Performance of final model on test set (\(n\) = 142) and variance importance (Figure 5):
- AUC = 0.749;
- Accuracy = 0.782.

Figure 5: ROC of final decision tree on test set and variance importance plot.

The final tree structure is below:

Figure 6: Final decision tree structure.

Discussion

The final models are compared in terms of cross-validated (CV) and test-set AUC:

Table 2: Comparison of models by AUC.

Model	CV AUC	Test AUC
Random forests	0.811	0.850
XGBoost	0.824	0.821
Decision tree	0.750	0.749

The test ROC curves are plotted below:

Figure 7: Comparison of ROC curves of final models on test set.

Performance: Random forests \(\approx\) XGBoost \(>\) Decision tree;
Variance importance:
- Common variables on the top \(k\) \((k=10, 20, 50)\) lists of both random forests and XGBoost are shown below (Table 3);
- Besides the % of ionizable, helper, and PE-gylated, most important descriptors are IL-related.

Table 3: Top k most important variables shared by random forests and XGBoost.

k	Common variables
10	Ionizable %, Helper %, IL ECCEN, PEGylated %, IL JGI9
20	Ionizable %, Helper %, IL ECCEN, PEGylated %, IL JGI9, IL ETA_EtaP_L, IL WTPT-4, IL Kier2
50	Ionizable %, Helper %, IL ECCEN, PEGylated %, IL JGI9, IL ETA_EtaP_L, IL WTPT-4, IL Kier2, IL BCUTc-1h, IL CrippenLogP, IL BCUTp-1l, IL AVP-3, IL nAtomLC, IL topoDiameter, IL MAXDN, IL AVP-2, IL BCUTc-1l, IL ETA_EtaP, IL hmin, IL sumI, IL MAXDP2

The top 50 lists of most important variables for all three models are provided below.

Table 4: Top 50 most important variables for random forests, XGBoost, and decision tree.

Rank	Random Forests	XGBoost	Decision Tree
1	Ionizable %	Ionizable %	IL Mp
2	IL Mp	PEGylated %	IL Mi
3	Helper %	Helper %	IL BCUTc-1h
4	IL AMW	IL C2SP3	IL ndO
5	IL ECCEN	IL JGI9	IL AMW
6	IL WTPT-3	IL ECCEN	IL ETA_Eta_F_L
7	PEGylated %	IL AVP-2	Ionizable %
8	IL Mi	IL ETA_Eta_F_L	IL C1SP2
9	IL ALogp2	IL RotBtFrac	IL ETA_Eta_F
10	IL JGI9	IL Kier2	IL nAtomLC
11	IL MW	IL ETA_EtaP_L	IL nHBAcc_Lipinski
12	IL BCUTw-1h	IL sumI	IL nHBAcc2
13	IL ETA_EtaP_L	IL ETA_Shape_P	IL ETA_Beta
14	IL WTPT-4	IL nAtomLC	IL WTPT-4
15	IL Kier2	IL WTPT-4	IL SP-2
16	IL Kier1	IL RotBFrac	IL MAXDP2
17	IL hmax	IL ASP-3	IL MAXDP
18	IL BCUTc-1h	IL ETA_EtaP	IL BCUTp-1l
19	IL CrippenLogP	IL nAtomLAC	IL ETA_AlphaP
20	IL ETA_AlphaP	IL MDEC-11	IL ETA_dAlpha_A
21	IL DELS2	IL BCUTc-1h	IL gmax
22	IL ETA_Eta_F	IL AVP-3	IL HybRatio
23	IL BCUTp-1l	IL AVP-5	IL hmin
24	IL AMR	IL JGI8	IL C2SP2
25	IL ETA_EtaP_F_L	IL BCUTc-1l	IL nBondsD
26	IL DELS	IL AVP-0	IL nBondsD2
27	IL AVP-3	IL WTPT-2	IL nBondsM
28	IL nAtomLC	Helper-SP-3	IL nwHBa
29	IL topoDiameter	IL ASP-1	IL AMR
30	IL MAXDN	IL topoDiameter	IL apol
31	IL ALogP	IL MDEN-33	IL nAtom
32	IL VP-1	PEGylated- SC-3	IL nHeavyAtom
33	IL ETA_EtaP_F	IL MAXDN	IL ETA_BetaP_s
34	IL AVP-2	IL JGI1	IL ETA_dBetaP
35	IL SP-1	IL C3SP3	IL MDEN-13
36	IL BCUTc-1l	PEGylated- SPC-5	IL hmax
37	IL ETA_EtaP	IL hmin	IL BCUTp-1h
38	IL MLogP	IL JGI10	IL GGI7
39	IL WPATH	IL MAXDP2	IL WTPT-5
40	IL fragC	IL CrippenLogP	IL BCUTw-1h
41	IL hmin	IL JGI4	IL SPC-4
42	IL VP-2	Helper-nHsNH2	IL ALogP
43	IL nHBa	IL ETA_EtaP_B	IL ALogp2
44	IL sumI	IL gmin	IL AVP-1
45	IL BCUTw-1l	IL VP-0	IL AVP-2
46	IL ETA_Epsilon_1	IL gmax	IL Mv
47	IL SP-4	IL BCUTp-1l	IL ASP-4
48	IL VABC	IL MDEO-11	IL DELS
49	IL JGI7	IL nBondsD	IL ASP-2
50	IL MAXDP2	IL ASP-2	IL ASP-6

Table 5: Top 50 most important variables for random forests, XGBoost, and decision tree, with importance scores.

Rank	RF	RF_score	XGB	XGB_score	Tree	Tree_score
1	Ionizable %	1.35	Ionizable %	12.76	IL Mp	13.84
2	IL Mp	0.74	PEGylated %	6.91	IL Mi	11.67
3	Helper %	0.67	Helper %	5.71	IL BCUTc-1h	9.19
4	IL AMW	0.64	IL C2SP3	2.88	IL ndO	8.08
5	IL ECCEN	0.63	IL JGI9	2.35	IL AMW	7.99
6	IL WTPT-3	0.61	IL ECCEN	2.30	IL ETA_Eta_F_L	7.24
7	PEGylated %	0.60	IL AVP-2	2.21	Ionizable %	7.05
8	IL Mi	0.58	IL ETA_Eta_F_L	2.00	IL C1SP2	6.95
9	IL ALogp2	0.51	IL RotBtFrac	1.88	IL ETA_Eta_F	6.76
10	IL JGI9	0.48	IL Kier2	1.87	IL nAtomLC	6.57
11	IL MW	0.48	IL ETA_EtaP_L	1.79	IL nHBAcc_Lipinski	6.32
12	IL BCUTw-1h	0.47	IL sumI	1.70	IL nHBAcc2	6.32
13	IL ETA_EtaP_L	0.47	IL ETA_Shape_P	1.63	IL ETA_Beta	6.28
14	IL WTPT-4	0.46	IL nAtomLC	1.51	IL WTPT-4	6.28
15	IL Kier2	0.46	IL WTPT-4	1.48	IL SP-2	5.73
16	IL Kier1	0.46	IL RotBFrac	1.46	IL MAXDP2	5.21
17	IL hmax	0.45	IL ASP-3	1.42	IL MAXDP	5.16
18	IL BCUTc-1h	0.45	IL ETA_EtaP	1.40	IL BCUTp-1l	5.13
19	IL CrippenLogP	0.44	IL nAtomLAC	1.38	IL ETA_AlphaP	5.00
20	IL ETA_AlphaP	0.43	IL MDEC-11	1.37	IL ETA_dAlpha_A	5.00
21	IL DELS2	0.42	IL BCUTc-1h	1.27	IL gmax	4.97
22	IL ETA_Eta_F	0.41	IL AVP-3	1.27	IL HybRatio	4.68
23	IL BCUTp-1l	0.41	IL AVP-5	1.21	IL hmin	4.29
24	IL AMR	0.40	IL JGI8	1.19	IL C2SP2	3.74
25	IL ETA_EtaP_F_L	0.39	IL BCUTc-1l	1.11	IL nBondsD	3.74
26	IL DELS	0.39	IL AVP-0	1.08	IL nBondsD2	3.74
27	IL AVP-3	0.39	IL WTPT-2	1.02	IL nBondsM	3.74
28	IL nAtomLC	0.39	Helper-SP-3	1.01	IL nwHBa	3.74
29	IL topoDiameter	0.38	IL ASP-1	0.99	IL AMR	3.59
30	IL MAXDN	0.38	IL topoDiameter	0.96	IL apol	3.59
31	IL ALogP	0.38	IL MDEN-33	0.93	IL nAtom	3.59
32	IL VP-1	0.37	PEGylated- SC-3	0.85	IL nHeavyAtom	3.59
33	IL ETA_EtaP_F	0.36	IL MAXDN	0.81	IL ETA_BetaP_s	3.51
34	IL AVP-2	0.36	IL JGI1	0.81	IL ETA_dBetaP	3.51
35	IL SP-1	0.36	IL C3SP3	0.79	IL MDEN-13	3.51
36	IL BCUTc-1l	0.35	PEGylated- SPC-5	0.79	IL hmax	2.79
37	IL ETA_EtaP	0.35	IL hmin	0.77	IL BCUTp-1h	2.66
38	IL MLogP	0.35	IL JGI10	0.77	IL GGI7	2.66
39	IL WPATH	0.35	IL MAXDP2	0.76	IL WTPT-5	2.66
40	IL fragC	0.35	IL CrippenLogP	0.76	IL BCUTw-1h	2.59
41	IL hmin	0.34	IL JGI4	0.73	IL SPC-4	2.53
42	IL VP-2	0.34	Helper-nHsNH2	0.72	IL ALogP	1.74
43	IL nHBa	0.34	IL ETA_EtaP_B	0.71	IL ALogp2	1.74
44	IL sumI	0.33	IL gmin	0.67	IL AVP-1	1.74
45	IL BCUTw-1l	0.33	IL VP-0	0.64	IL AVP-2	1.74
46	IL ETA_Epsilon_1	0.33	IL gmax	0.62	IL Mv	1.74
47	IL SP-4	0.33	IL BCUTp-1l	0.61	IL ASP-4	1.73
48	IL VABC	0.33	IL MDEO-11	0.57	IL DELS	1.73
49	IL JGI7	0.33	IL nBondsD	0.56	IL ASP-2	1.63
50	IL MAXDP2	0.33	IL ASP-2	0.54	IL ASP-6	1.63